基于分布式网络爬虫的Web空间数据获取方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于分布式网络爬虫的Web空间数据获取方法研究

详细信息查看全文 | 推荐本文 |

英文篇名：Research on Web Spatial Data Acquisition Based on Distributed Web Crawler
作者：冯玲 ; 黄亮 ; 曾李阳 ; 朱齐华
英文作者：FENG Ling;HUANG Liang;ZENG Liyang;ZHU Qihua;College of Land Resources Engineering ,Kunming University of Science and Technology;National Geographic Information Bureau Sichuan Basic Geographic Information Center;
关键词：Web空间数据 ; 分布式网络爬虫 ; 原型系统
英文关键词：Web spatial data;;distributed web crawler;;the prototype system
中文刊名：GZDI
英文刊名：Journal of Guizhou University(Natural Sciences)
机构：昆明理工大学国土资源工程学院;国家测绘地理信息局四川基础地理信息中心;
出版日期：2019-02-15
出版单位：贵州大学学报(自然科学版)
年：2019
期：v.36
基金：四川省科技支撑项目资助(J2015ZC05);; 数字制图与国土信息应用工程国家测绘地理信息局重点实验室开放基金项目资助(DM2014SC04)
语种：中文;
页：GZDI201901007
页数：4
CN：01
ISSN：52-5002/N
分类号：38-41

摘要

本文针对单机网络爬虫获取Web空间数据在抓取覆盖率和抓取效率上均受到一定程度的限制,难以保证所抓取数据的及时性以及全面性问题,研究了基于分布式网络爬虫的Web空间数据获取方法,设计了基于分布式网络爬虫的Web空间数据获取原型系统并且最终实现,并且通过对原型系统进行相关的测试来证实了本文所提出解决方法的有效性。
The acquisition of Web spatial data by single network crawler is limited in the crawl coverage and crawl efficiency so it is difficult to ensure the timeliness and comprehensiveness of data acquisition. In this paper,based on distributed web crawlers,the method of web spatial data acquiring was studied; a prototype system of Web spatial data acquisition was designed and implemented and the proposed method in this paper was validated to be effective by testing the prototype system.

引文

[1]Leasure D R. Geodata Crawler:A centralized national geodatabase and automated multi-scale data crawler to overcome GIS bottlenecks in data analysis workflows[C].Dresden,Germany:Esa Convention,2013.
    [2]Tezuka T,Kurashima T,Tanaka K. Toward tighter integration of web search with a Geographic information system[C]//Proceedings of the 15th international conference on World Wide Web. Edinburgh:ACM,2006:277-286.
    [3]Zhang C J,Zhang X Y,ZhuS N,et al. Method of Toponym Database Updating Based on Web Crawler[J]. J.Geo-Inf. SCI,2011,13:492-499.
    [4]Hua-Ping Zhang,Qian Mo.Structured POI data Extraction from Internet News[C].Beijing:The 4th International Universal Communication Symposium(IUCS),2010.
    [5]Li W,Yang C. An active crawler for discovering geospatial web services and their distribution pattern-a case of study of OGC web map service[J].International Journal Geographical Information Science,2010,24(8):1127-1147.
    [6]CHEN X,CHEN R,WEI W. Design and Realization of Web Service Snatch and Parse Engine Based on Web Crawler[J]. Geomatics World,2010,3:016.
    [7]Jiang J,Yang C,Ren Y. A spatial information crawler for opengis wfs[C]//The 6th International Conference on Advanced Optical Materials and Devices. Guangzhou:International Society for Optics and Photonics,2008:71432C-9.
    [8]王明军.基于Web的空间数据爬取与度量研究[D].武汉:武汉大学,2013.
    [9]蔡地.互联网多源矢量空间数据自动获取与管理方法研究[D].北京:中国测绘科学研究院,2015.
    [10]Ager A,Schrader-Patton C,Bunzel K,et al. Internet Map Services:New portal for global ecological monitoring,or geodata junkyard?[C]//Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research&Application. Washington,DC,USA:ACM,2010:37.
    [11]Ryan Mitchell. Web Scraping with Python[M]. Sebastopol:O’Reilly Media,Inc,2015:7-24.
    [12]Scrapy developers. Scrapy Documentation Release 1.0.3[EB/OL].(2015-8-15)https://pypi.org/project/Scrapy/1.0.3/.
    [13]阮正杰.基于Twisted架构的GPS协议转换软网关的设计与实现[D].杭州:浙江工业大学,2013.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700