广西农业信息地理匹配引擎设计与实现
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Design and implementation of geographic matching engine for Guangxi agricultural information
  • 作者:朱明 ; 何永宁 ; 吴博
  • 英文作者:ZHU Ming;HE Yong-ning;WU Bo;School of Earth Science and Resources,China University of Geosciences;Geographic Information Center of Guangxi;
  • 关键词:农业信息 ; 地理匹配引擎 ; 地名分词 ; 地名检索 ; 地名匹配算法 ; 广西
  • 英文关键词:agricultural information;;geographic matching engine;;place name segmentation;;place name search;;place name matching algorithm;;Guangxi
  • 中文刊名:GXNY
  • 英文刊名:Journal of Southern Agriculture
  • 机构:中国地质大学(北京)地球科学与资源学院;广西壮族自治区基础地理信息中心;
  • 出版日期:2019-02-03 07:05
  • 出版单位:南方农业学报
  • 年:2019
  • 期:v.50;No.400
  • 基金:广西创新驱动发展专项项目(桂科AA18118048)
  • 语种:中文;
  • 页:GXNY201901030
  • 页数:7
  • CN:01
  • ISSN:45-1381/S
  • 分类号:207-213
摘要
【目的】研究高并发、大流量农业信息地理匹配引擎,改进其算法,解决广西区内壮语地名匹配问题,实现农业信息的自动匹配与空间定位,以满足农业大数据平台高并发、大流量的地理匹配需求。。【方法】通过改造开源的Solr全文搜索引擎,结合广西地名中的少数民族语言特点,扩充地名词典、设计数据组织方式与逆向分词算法、改进TF-IDF算法。【结果】在改进方法的基础上设计并实现了农业地理信息地理匹配引擎。经过第三方15484条数据测试,能够准确切分壮语地名,引擎在500并发下仍具有良好的响应速度,匹配准确率达98.43%。地理匹配引擎目前已应用到糖业发展大数据平台中,并取得了良好的效果。【建议】针对测试中出现的问题,建议在下一步工作中扩充并完善词库内容、增强语义推理能力、研究基于空间语义的定位算法,提高广西农业信息的定位精度。
        【Objective】This paper mainly studied and developed a geographic matching engine for agricultural information with high concurrency and large data flow. Through improving place name segmentation,searching and matching algorithms,problems of Zhuang language place name matching were resolved,which enabled agricultural information automatch with spatial localization,and met the geographical matching demand of high-concurrency and large data flow in agricultural big data platform.【Method】By reforming the Solr full-text search engine,a novel geographic matching engine was designed and implemented through absorbing characteristics of minority languages in Guangxi place names,expanding the geographical name dictionary,designing reverse word segmentation algorithm and improving TF-IDF algorithm.【Result】The agricultural geographic matching engine was developed based on the improved method. More than15484 third-party entries were tested. The results showed that Zhuang place names could be divided accurately. The response speed of the engine was fast under 500 concurrency with accuracy of 98.43%. The engine has been applied in Sugarcane Industry Development Big Data Platform and achieved sound effects.【Suggestion】Based on the problems in the test,the experiment suggested to expand and improve the lexicon content,enhance the semantic reasoning ability,study the location algorithm based on spatial semantics,and improve the location accuracy of Guangxi agricultural information in the next step.
引文
柴洁.2014.基于IKAnalyzer和Lucene的地理编码中文搜索引擎的研究与实现[J].城市勘测,(6):45-50.[Chai J.2014.Research and implementation of Chinese search engine in geocoding based on IKAnalyzer and Lucene[J].Urban Geotechnical Investigation&Surveying,(6):45-50.]
    陈德权.2013.GIS地名搜索系统的关键技术设计与实现[J].测绘与空间地理信息,36(8):58-60.[Chen D Q.2013.Design and implementation of key technologies for GISplace search system[J].Geomatics&Spatial Information Technology,36(8):58-60.]
    陈利燕,林鸿,张新长.2016.一种改进的Lucene算法及在空间数据融合中的应用[J].测绘通报,(10):106-109.[Chen L Y,Lin H,Zhang X C.2016.An improved Lucene algorithm and its application to spatial data fusion[J].Bulletin of Surveying and Mapping,(10):106-109.]
    程钢,卢小平.2014.顾及通名语义的汉语地名相似度匹配算法[J].测绘学报,43(4):404-410.[Cheng G,Lu XP.2014.Matching algorithm for Chinese place names by similarity in consideration of semantics of general names for place[J].Acta Geodaetica et Cartographica Sinica,43(4):404-410.]
    公冶小燕,林培光,任威隆,张晨,张春云.2017.基于改进的TF-IDF算法及共现词的主题词抽取算法[J].南京大学学报(自然科学),53(6):1072-1080.[Gongye X Y,Lin P G,Ren W L,Zhang C,Zhang C Y.2017.A method of extracting subject words based on improved TF-IDF algorithm and co-occurrence words[J].Journal of Nanjing University(Natural Science),53(6):1072-1080.]
    梁明,罗荣,胡最.2014.基于Lucene和PostGIS的地图搜索研究[J].测绘通报,(11):42-45.[Liang M,Luo R,Hu Z.Map search based on Lucene and PostGIS[J].Bulletin of Surveying and Mapping,(11):42-45.]
    马照亭,李志刚,孙伟,印洁.2011.一种基于地址分词的自动地理编码算法[J].测绘通报,(2):59-62.[Ma Z T,Li Z G,Sun W,Yin J.2011.An automatic geocoding algorithm based on address segmentation[J].Bulletin of Surveying and Mapping,(2):59-62.]
    唐旭日,陈小荷,张雪英.2010.中文文本的地名解析方法研究[J].武汉大学学报(信息科学版),35(8):930-935.[Tang X R,Chen X H,Zhang X Y.2010.Research on toponym resolution in Chinese text[J].Geomatics and Information Science of Wuhan University,35(8):930-935.]
    王俊超,刘晨帆,徐明世,纪山,兰伟.2012.语义相似性度量技术在地名匹配研究中的应用[J].辽宁工程技术大学学报(自然科学版),31(6):871-874.[Wang J C,Liu C F,Xu M S,Ji S,Lan W.2012.Application of semantic similarity measurement technology in place name matching[J].Journal of Liaoning Technical University(Natural Science),31(6):871-874.]
    武永亮,赵书良,李长镜,魏娜娣,王子晏.2017.基于TF-IDF和余弦相似度的文本分类方法[J].中文信息学报,31(5):138-145.[Wu Y L,Zhao S L,Li C J,Wei N D,Wang Z Y.Text classification method based on TF-IDFand cosine similarity[J].Journal of Chinese Information Processing,31(5):138-145.]
    夏兰芳,毛炜青,郭功举.2012.上海城市地理编码系统应用与研究[J].测绘通报,(1):78-80.[Xia L F,Mao W Q,Guo G J.2012.The application and research of geocoding system based on the city of Shanghai[J].Bulletin of Surveying and Mapping,(1):78-80.]
    徐道柱,焦洋洋,苏雪梅.2017.基于Lucene的地名管理模型设计与实现[J].测绘与空间地理信息,40(3):6-10.[Xu D Z,Jiao Y Y,Su X M.2017.Design and implementation of toponym management model based on Lucene[J].Geomatics&Spatial Information Technology,40(3):6-10.]
    叶敏,汤世平,牛振东.2017.一种基于多特征因子改进的中文文本分类算法[J].中文信息学报,31(4):132-137.[Ye M,Tang S P,Niu Z D.2017.An improved Chinese text classification algorithm based on multiple feature factors[J].Journal of Chinese Information Processing,31(4):132-137.]
    俞敬松,王惠临,杨洁.2016.大规模地名本体数据库系统的建构技术与方法[J].图书情报工作,60(8):126-131.[Yu J S,Wang H L,Yang J.2016.Research on largescale toponym ontology database construction techniques and methods[J].Library and Information Serivce,60(8):126-131.]
    朱少楠,张雪英,李明,王宇.2013.基于行政隶属关系树状图的地名消歧方法[J].地理与地理信息科学,29(3):39-42.[Zhu S N,Zhang X Y,Li M,Wang Y.2013.Toponym disambiguation based on administrative district relation tree[J].Geography and Geo-Information Science,29(3):39-42.]
    邹崇尧,朱贵方,赵双明.2014.基于搜索引擎技术的地名地址定制查询研究[J].测绘通报,(8):92-94.[Zou C Y,Zhu G F,Zhao S M.2014.Research on customized query of geographic name and address based on search engine[J].Bulletin of Surveying and Mapping,(8):92-94.]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700