摘要
本文针对单机网络爬虫获取Web空间数据在抓取覆盖率和抓取效率上均受到一定程度的限制,难以保证所抓取数据的及时性以及全面性问题,研究了基于分布式网络爬虫的Web空间数据获取方法,设计了基于分布式网络爬虫的Web空间数据获取原型系统并且最终实现,并且通过对原型系统进行相关的测试来证实了本文所提出解决方法的有效性。
The acquisition of Web spatial data by single network crawler is limited in the crawl coverage and crawl efficiency so it is difficult to ensure the timeliness and comprehensiveness of data acquisition. In this paper,based on distributed web crawlers,the method of web spatial data acquiring was studied; a prototype system of Web spatial data acquisition was designed and implemented and the proposed method in this paper was validated to be effective by testing the prototype system.
引文
[1]Leasure D R. Geodata Crawler:A centralized national geodatabase and automated multi-scale data crawler to overcome GIS bottlenecks in data analysis workflows[C].Dresden,Germany:Esa Convention,2013.
[2]Tezuka T,Kurashima T,Tanaka K. Toward tighter integration of web search with a Geographic information system[C]//Proceedings of the 15th international conference on World Wide Web. Edinburgh:ACM,2006:277-286.
[3]Zhang C J,Zhang X Y,ZhuS N,et al. Method of Toponym Database Updating Based on Web Crawler[J]. J.Geo-Inf. SCI,2011,13:492-499.
[4]Hua-Ping Zhang,Qian Mo.Structured POI data Extraction from Internet News[C].Beijing:The 4th International Universal Communication Symposium(IUCS),2010.
[5]Li W,Yang C. An active crawler for discovering geospatial web services and their distribution pattern-a case of study of OGC web map service[J].International Journal Geographical Information Science,2010,24(8):1127-1147.
[6]CHEN X,CHEN R,WEI W. Design and Realization of Web Service Snatch and Parse Engine Based on Web Crawler[J]. Geomatics World,2010,3:016.
[7]Jiang J,Yang C,Ren Y. A spatial information crawler for opengis wfs[C]//The 6th International Conference on Advanced Optical Materials and Devices. Guangzhou:International Society for Optics and Photonics,2008:71432C-9.
[8]王明军.基于Web的空间数据爬取与度量研究[D].武汉:武汉大学,2013.
[9]蔡地.互联网多源矢量空间数据自动获取与管理方法研究[D].北京:中国测绘科学研究院,2015.
[10]Ager A,Schrader-Patton C,Bunzel K,et al. Internet Map Services:New portal for global ecological monitoring,or geodata junkyard?[C]//Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research&Application. Washington,DC,USA:ACM,2010:37.
[11]Ryan Mitchell. Web Scraping with Python[M]. Sebastopol:O’Reilly Media,Inc,2015:7-24.
[12]Scrapy developers. Scrapy Documentation Release 1.0.3[EB/OL].(2015-8-15)https://pypi.org/project/Scrapy/1.0.3/.
[13]阮正杰.基于Twisted架构的GPS协议转换软网关的设计与实现[D].杭州:浙江工业大学,2013.