用户名: 密码: 验证码:
面向经济普查项目需求的模糊中文地址匹配方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
地理(地址)编码技术是GIS中的一项基础性技术,它是将自然语言描述的地址信息,通过地址标准化、地址分词、数据库匹配、空间定位等一系列处理操作,赋予其空间位置信息,并最终定位到电子地图上的过程。随着GIS技术的不断发展与推广,越来越多的行业应用中出现了对于地理编码技术的需求,例如公共卫生、犯罪分析、政治科学、灾害管理、交通预测等领域。国外的地理编码技术已经日趋成熟,逐步迈向了市场化、产业化。但是由于国情的不同,国外的现有技术无法直接应用于我国的地理编码需求。因此,针对中文的地理编码技术有待于我们的进一步研究和完善。
     本文利用北京市的部分经济普查统计数据进行了相关的地址匹配研究,并最终开发实现了面向经济普查的地理编码工具软件。在研究过程中,本文主要针对地理编码技术中的以下几个方面进行了探索与改进:
     (1)由于目前地址匹配过程中,对于地址残缺与地址歧义两类模糊地址的匹配成功率与准确率偏低,所以本文提出了一种基于规则的地址分词匹配方法,通过对算法的改进,加入规则树与歧义存储等机制,提高了对于这两类模糊地址的匹配成功率。
     (2)由于传统地址匹配过程中,地址分词与数据库匹配两个步骤相对独立,导致数据库访问次数过多,系统运行效率低下。为此,本文在所提出的基于规则的地址分词匹配方法中,将两个过程合二为一,边分词边匹配,实现了在最终分词结束的同时获得匹配结果的目的,从而提高了地址匹配的效率。
     (3)对于目前已有的地址模型进行了部分改进,根据地址记录中存在行政区划部分与街道信息部分的区别,对两部分信息分别进行处理与存储,提高了地址数据的匹配速度。
     (4)为了减少地址数据采集和地址标准化的成本与工作量,本文有效利用了经济普查项目中的已有数据,通过数据挖掘,建立了标准地址库并完成了地理编码任务。
Geocoding is a basic GIS technology, which is a process of giving the spatial location information to the natural language described address and locating in the map by a series of processing operations, including address standardization, word segmentation, database matching and spatial location. With the development of GIS, there are more and more industries demanding geocoding, such as public health, crime analysis, political science, disaster management, traffic forecasts and other fields. Foreign geocoding technology has matured and gradually moves towards a market-oriented industrialization. However, due to the differences in national circumstances, existing technologies from abroad can not be applied to our country directly. Therefore, further research needs to be done on the Chinese geocoding technology.
     The paper designed the experiment by using the economic census data of Beijng, and developed the geocoding tools at last. In the course of the study, the paper mainly set focus on four aspects, including: (1) Because of the low accuracy and low matching rate in geocoding of fuzzy addresses, this paper presents a rule-based Chinese address geocoding method, which improved the fuzzy address matching success rate by adding rule tree and ambiguity storage mechanism to the algorithm.
     (2) In the traditional process of geocoding, address segmentation and database matching are two independent steps, which resulting in the excessive access of the database and low system efficiency. Therefore, in the rule-based Chinese address geocoding method proposed in this paper, it combined the two processes into one, and accomplished the process of database matching at the same time of finishing the address segmentation, which improved the speed of geocoding.
     (3) The paper made some research and revision about the existing address model. According to the difference between administrative divisions part and detailed street address part, the paper separate the address into two parts,which improved the matching speed.
     (4) In order to reduce the workload of data collection and address standardization, the paper made effective use of economic census data by data mining, established the standard address dataset and completed the geocoding task in the project.
引文
[1]王凌云,李琦,江洲.国内地理编码数据库系统开发与研究[J].计算机工程与应用,2004(21):167-169.
    [2]李军,李琦,毛东军,郭玲玲.北京市地理编码数据库的研究[J].计算机工程与应用,2004(2):1-6.
    [3]http://elocation.oracle.com/geocoder/gcdemo.jsp
    [4]Geolytics,Inc.Census 2000 Demographic Data Products[EB/OL].[2009-05-01]. http://www.geolytics.com/USCensus,Census-2000-Products,Categories.asp.
    [5]Christen, P. A probabilistic geocoding system based on a national address file. Proceedings of the 3rd Australasian Data Mining Conference.2004.
    [6]Levine, N., and K. E. Kim. The spatial location of motor vehicle accidents:a methodology for geocoding intersections. Computers, Environment, and Urban Systems,1998(22):557-76.
    [7]Davis Jr., C. A., F. T. Fonseca, and K. A. De Vasconcelos Borges.2003.A flexible addressing system for approximate geocoding, GeoInfo,2003.
    [8]Hutchinson, M., and B. Veenendall. Towards using intelligence to move from geocoding to geolocating. Proceedings of the 7th Annual URISA GIS in Addressing Conference,2005.
    [9]Beal, J. R.Contextual geolocation, a specialized application for improving indoor location awareness in wireless local area networks. In T. Gibbons, ed., MICS2003: The 36th Annual Midwest Instruction and Computing Symposium,2003.
    [10]Lee, J.3D GIS for geocoding human activity in microscale urban environments. In M. J. Egenhofer, C. Freksa, and H. J. Miller, eds., Geographic information science. Third International Conference,2004:162-78.
    [11]Bakshi, R., C. A. Knoblock, and S. Thakkar. Exploiting online sources to accurately geocode addresses. ACM-GIS'04:Proceedings of the 12th ACM International Symposium on Advances in Geographic Information Systems,2004:194-203.
    [12]Hutchinson, M., and B. Veenendall. Towards a framework for intelligent geocoding. The National Biennial Conference of the Spatial Sciences Institute,2005.
    [13]Goldberg, D. W., Wilson, J.P. and Knoblock, C. A. From Text To Geographic Coordinates:The Current State of Geocoding. Urban and Regional Information Systems Association Journal,2006.
    [14]M.R. Cayo and T.O. Talbot, Positional error in automated Geocoding of residential addresses, Int J Health Geographics.2003:10-21.
    [15]Chung, K., D.H. Yang, and R. Bell.Health and GIS:toward spatial statistical analyses. Journal of Medical Systems.2004,28(4):349-60.
    [16]Davis, CA, Jr; Fonseca, FT; Borges, KAV. A Flexible Addressing System for Approximate Geocoding.5th Brazilian Symposium on Geoinformatics.2003.
    [17]Davis, CA, Jr, Frederico T.Fonseca.Assessing the Certainty of Locations Produced by an Address Geocoding System. Geoinformatica[J], 2007(11):103-129.
    [18]Sengar,V.Robust Location Search from Text Queries. Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems.
    [19]Dearwent, S. M., R. R. Jacobs, and J. B. Halbert. Locational uncertainty in georeferencing public health datasets. Journal of Exposure Analysis Environmental Epidemiology.2001,11(4):329-34.
    [20]Karimi, H. A., M. Durcik, and W. Rasdorf. Evaluation of uncertainties associated with geocoding techniques. Journal of Computer-Aided Civil and Infrastructure Engineering.2004,19(3):170-85.
    [21]Krieger, N. Overcoming the absence of socioeconomic data in medical records: validation and application of a census-based methodology. American Journal of Public Health.1992,82(5):703-10.
    [22]Krieger, N.Place, space, and health:GIS and epidemiology.Epidemiology.2003,14(4):384-85
    [23]Zandbergen, PA. A comparison of address point, parcel and street geocoding techniques. Comput Environ Urban Syst.2008,32(3):214-232.
    [24]李军,李琦,毛东军,郭玲玲.北京市地理编码数据库的研究[J].计算机工程与应用,2004(2):1-6.
    [25]蒋景瞳,刘若梅.我国城市地理信息标准化述评[J].工程勘察,2006,(3):50-54.
    [26]张鹤,孔令彦,陈倬,孙乐兵(2008).城市地址编码发展历史及现状分析.测绘通报,(7):58-60.
    [27]陈细谦,迟忠先,昃宗亮,苏立强.地理编码在空间数据仓库ETL中的应用[J].小型微型计算机系统.2005(04).
    [28]江洲,李琦.地理编码(Geocoding)的应用研究[J].地理与地理信息科学,2003,(03).
    [29]王凌云,李琦,江洲.国内地理编码数据库系统开发与研究[J].计算机工程与应用,2004,(21).
    [30]陈细谦,迟忠先,金妮(2004).城市地理编码系统应用与研究.计算机工程,30(23):50-52.
    [31]郭会,基于自动机分词的中文地址地理编码技术研究与实现[D].北京:中科院地理所.2009.
    [32]孙亚夫,陈文斌.基于分词的地址匹配技术.中国地理信息系统协会第四次会员代表大会暨第十一届年会论文集,2007[C].出版地不详,2007:114-125.
    [33]张铁燕,翁敬农,黄坚.城市地理编码方法的探索与实践[A].中国地理信息系统协会第九届年会论文集[C],2005.
    [34]张作华,孙凌宇.基于城市地址编码技术的探讨[J].井冈山师范学院学报(自然科学),2005,26(3):42-25.
    [35]王秀明(2007).地理信息系统地址自动匹配.闽西职业技术学院学报,9(2):75-77.
    [36]张林曼,吴升(2008).地理编码系统中地址匹配引擎的设计与实现,33(6):12-14.
    [37]Clodoveu A. Davis Jr, Frederico T. Fonseca. Assessing the Certainty of Locations. Produced by an Address Geocoding System[J]. Geoinformatica,2007, 11(1):103-129.
    [38]何莘,王琬芜.自然语言检索中的中文分词技术研究进展及应用[J].情报科学2008,(05).
    [39]胡青,徐建华,王志海(2008).GIS数据库中地址自动匹配方法研究.测绘与空间地理信息,31(6):50-52.
    [40]Environmental Systems Research Institute (ESRI),Inc. ArcGIS 9 User Manual: Displaying Locations from Tabular Data.2004.
    [41]於建峰,王光霞,万刚.基于汉字模糊音的地名查询方法设计与实现.测绘科学技术学报.2008(02).
    [42]陆峰,刘焕焕,陈传彬.一种中文自然语言表达交通信息的跨阶分词算法[J].武汉大学学报信息科学版,2009,34(8):943-947.
    [43]申思,基于文本地址的空间点数据分析研究与应用[D].北京:北京大学.2008.
    [44]Duck-Hye Yang, Lucy Mackey Bilaver, Oscar Hayes, Robert Goerge. Improving Geocoding Practices:Evaluation of Geocoding Tools[J]. Journal of Medical Systems,2004,28(4):361-370.
    [45]Joseph C.Giarratano,Gary D.Riley.专家系统[M].陈忆群,刘星成,译。第四版.北京:机械工业出版社.2006:10-11.
    [46]F. Benjanmin Zhan, Jean D Brender, Ionara De Lima.Match Rate and Positional Accuracy of Two Geocoding Methods for Epidemiologic Research[J]. Annals of epidemiology.2006,16(11):842-849
    [47]Bonner MR,Han D, Nie J, Rogerson P, Vena JE, Freudenheim AL. Positionalaccuracy of geocoded addresses in epidemiologic research. Epidemiology.2003; 14:408-412.
    [48]Yang D-H, Bilaver LM, Hayes O, Goerge R. Improving geocoding practices:Evaluation of geocoding tools. J Med Syst.2004;28:361-370.
    [49]McElroy JA, Remington PL, Trentham-Dietz A, Robert SA, Newcomb PA. Geocoding addresses from a large population-based study:Lessons learned. Epidemiology.2003; 14:399-407.
    [50]Ward MH, Nuckols JR, Giglierano J, Bonner MR, Wolter C, Airola M, et al. Positional accuracy of two methods of geocoding. Epidemiology. 2005; 16:542-547.
    [51]Whitsel EA, Rose KM, Wood JL, Henley AC, Liao DP, Heiss G. Accuracy and repeatability of commercial geocoding. Am J Epidemiol.2004; 160:1023-1029.
    [52]Cayo MR, Talbot TO. Positional error in automated geocoding of residential addresses. Int J Health Geographics.2003;2:10-21.
    [53]江洲,李琦,王凌云,空间信息融合与地理编码数据库的开发[J].计算机工程.2004,30(5).
    [54]章意锋,吴健平,程怡,曾春润.ArcGIS中地理编码方法的改进[J]测绘与空间地理信息,2007,(03).
    [55]白一鸣,李来财.TIGER中地理/人口数据的关联与组织[J]大众科技,2006,(02)
    [56]贺日兴,李家龙,秦迎伟,易伟程.基于地址匹配技术实现MIS系统与GIS系统的快速关联[J]警察技术,2003,(06)
    [57]朱前飞.MapInfo中的地理编码及应用[J]四川测绘,2001,(03).
    [58]朱建伟,王泽民.地理编码原理及其本地化解决方案[J]北京测绘,2004,(02).
    [59]兰小机,彭涛,王飞.赣州市地理编码系统及其关键技术[J]测绘科学,2009,(02).
    [60]叶海波.城市地址编码的技术及应用[D]中国石油大学,2009.
    [61]张伟.基于WebGIS的地址采集管理系统开发与研究[D]西南大学,2007.
    [62]陈小琥.基于Web方式的地理编码服务[D]武汉大学,2005.
    [63]司瑞洁.基于GIS的上海城市人为灾害空间分布研究[D]上海师范大学,2008.
    [64]刘化召.IBSS地址标准化实现策略[J]电信技术,2007,(07).
    [65]佟文会,江洲,李小林.地址编码关键技术——地址数据内容规范研究[J]标准科学,2009,(11).
    [66]高巍.在大城市实现有线电视用户地址标准化的设想[J].广播与电视技术,2007(10):99-102.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700