命名实体识别在方志内容挖掘中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
中国方志类古籍起源早、持续久、类型全、数量多。据《中国地方志联合目录》的统计,仅保存至今的宋至民国时期的方志就有8264种,11万余卷,占中国古籍的十分之一左右。整理和使用方志资料,是我国历史上的一个优良传统。《方志物产》是20世纪50年代,我国著名农史学家、中国农史学科主要创始人之一的万国鼎先生组织数十人历时6年,人工摘抄方志整理的专题性资料。该方志资料详细记载了物产的名称、性能、作用及分布情况,具有极高的农业科技和经济史料价值。信息技术日益发展的今天,如何利用现代信息技术整理方志资料,降低开发利用的难度,已成为一个十分现实的课题。本文将以《方志物产》为基础,探索方志类古籍整理的新方法。
     首先从方志整理的主要内容、基本手段、现有成果三方面论述方志的整理,详细介绍《方志物产》的缘起及其手工整理和数字化整理的过程,分析目前方志整理存在的问题,进而引出本研究的目的和意义;其次从命名实体识别的概念和作用、识别的任务、中文命名实体识别的特点和难点等方面阐述命名实体识别的基本语言学知识,重点讨论命名实体识别的方法,对目前国内外已有的相关研究作总结;.然后结合方志类古籍的特点以及《方志物产》中地名的特点,制定《方志物产》地名识别方法。
     以广东、福建和台湾三省《方志物产》为例,构建《方志物产》地名识别系统,通过对地名识别结果的统计分析,进行《方志物产》内容挖掘。主要研究内容如下:
     (1)《方志物产》地名识别系统的设计和构建。该系统包括全文数据库和地名识别子系统两大功能模块。
     全文数据库构建,从三省《方志物产》物产叙述格式的特点出发,借鉴前人分析、提取的统一行文格式,对三省《方志物产》文本格式作规范处理,并以此为据设计数据库结构。全文数据库具有全文检索、关键词检索、聚类检索和数据统计等功能。
     物产地名识别子系统,采用规则与统计相结合的命名实体识别方法,结合方志类古籍自身的特点,实现物产地名的自动识别。物产地名识别子系统具有规则管理、地名识别、地名库修正、信息统计四大功能。经测试,该系统能够满足相关研究人员在方志类古籍领域进行古籍检索和知识发现的需要。系统的识别效果可通过规则的不断完善得以逐步优化。
     (2)《方志物产》的物产研究
     按历史时期、志书类型、地域位置对广东、福建、台湾三省《方志物产》的全部载述物产进行统计和分析。按历史时期统计分析的结果表明:从明代到清代再到民国时期,平均每部志书记载物产的数量呈递增趋势。按志书类型统计分析的结果表明:从通志到府志再到县志,平均每部志书所载物产的数量呈递减趋势。按地域位置统计分析的结果表明:广东、福建、台湾三省《方志物产》记述的不仅是这三省的物产,还包括海南省全部和广西部分地域的物产。
     (3)基于物产地名的《方志物产》内容挖掘研究,包括全部正确地名的统计分析、各省物产分布、物产传播和外来物产引进研究。
     全部正确地名的统计分析,基于7179条有效地名识别记录。各省《方志物产》地名识别结果分别按省内地名、省外地名、国外地名和宽泛地名分类统计。统计分析的结果表明:相比其他两省,台湾省同外界的物产交流、传播相对更为广泛。
     各省物产分布研究,基于相关统计数据,详细分析了广东、福建、台湾三省物产的具体分布情况,并利用ArcGIS软件绘制物产分布专题地图,全面、直观地显示相关内容。研究结果表明:决定一个地域物产多样性的主要因素有两点,一是该地域的自然因素,包括其地理位置、自然环境和气候条件;二是该地域的人文因素,包括人类对自然资源的开发与利用、外来物产的引进和传播。
     各省物产传播研究,基于相关统计数据,详细分析了广东、福建、台湾三省物产的传播概况,同样利用ArcGIS软件绘制专题地图,进行全面、直观的显示。研究结果表明:地区间物产交流和传播的广度随地区间距离的扩大呈递减趋势。距离越远,物产交流和传播相对越少
     各省外来物产引进研究,基于相关统计数据,分析、比较了广东、福建、台湾三省外来物产的引进概况。研究表明:促进物产引进和传播的原因有两点,一是地区间的贸易往来。二是殖民侵略和战争。
     (4)基于识别规则的《方志物产》内容挖掘研究,包括全部识别规则的统计分析、物产分布比较研究、物产引进和传播途径研究。
     全部识别规则的统计分析,同样基于7179条有效地名识别记录。根据规则表达的含义,将识别规则分为识别物产分布地名的规则和识别物产引进传播地名的规则两类,各类分别加以统计。
     物产分布比较研究,基于识别规则的相关统计数据,挖掘出志书对物产原产地、分布地、各地物产孰优孰劣、孰多孰少等相关内容的描述,进而归纳出部分物产的原产地、优产地和高产地。
     物产引进和传播途径研究,基于识别规则的分类统计数据,总结出明清时期外来物产引进和传播的主要途径:一是对外贸易,二是朝贡,三是朝廷使者或僧侣传入。
     总而言之,本文以农史资料《方志物产》为语料,将信息组织的理论、方法借助于命名实体识别技术实现《方志物产》的地名识别,通过对识别结果的文献计量学分析,进行《方志物产》内容挖掘研究,旨在探索一种基于内容的古籍整理新方法。本研究所做的主要工作和贡献在于:
     (1)将命名实体识别相关理论和方法尝试应用于方志类古籍文献,用来识别、挖掘方志文献中的地名;
     (2)运用文献计量学方法,分析《方志物产》地名识别结果中的物产名、物产地名和识别规则,获得物产分布、物产引进和传播等相关知识,实现基于内容的古籍数字化整理;
     (3)借助GIS专题地图,直观显示《方志物产》中物产分布、物产引进和传播等知识内容,突破传统的文字表述模式,使方志类古籍这一历史文化资源的时空特性得以充分揭示。
     命名实体包括人名、地名、组织机构名等,本文重点是对广东、福建和台湾三省《方志物产》中的地名进行识别,其他的诸如志书名称、成书年代、物产名称等命名实体是文档处理过程中采用机器辅助粗分出来的。今后可通过修改或重新录入、组织规则,实现对其他省份的方志资料,或其他类型的古籍资料进行地名以外的人名、官职名、机构名等其他命名实体的识别研究,以求从多角度挖掘和利用古籍资料,为现代工农业生产和科学研究提供史料参证。
Ancient books, such as Chinese local chronicles, have very early origins and also continued for a long time. These ones have all kinds of types and a large number. According to the statistics of Union Catalog of Chinese Local Chronicles, about more than 110.000 volumes of 8264 kinds of Local Chronicles, which account for about one-tenth of Chinese ancient books, are still preserved, and they are only the ones compiled from the period of Song Dynasty to Republic of China. Collecting and using Local Chronicles is a good Chinese tradition in history. In the 1950's, Wan Guoding, the famous historian of agriculture and one of the principal founders of the subject of Chinese Agricultural History, led dozens of people to extract and finish the thematic material named Local Chronicle: Produce. These materials have great value in the field of agricultural science and technology and also the field of economy as they recorded the information about the names, performances, uses and distributions of products in detail. Nowadays, in the information age with the rapid development of information technology, how to use these techologies to collect materials about local chronicles and reduce the difficulty of exploitation at the same time, has become a realistic subject. Based on Local Chronicle:Produce, this paper attempts to explore a new method to collect ancient books such as local chronicles.
     Firstly, the author focuses on the main contents of the collection of local chronicles, varied kinds of methods on the behaviour of collection and also the existing research achievements. Then, this paper elaborates on the origin of Local Chronicle:Produce and gives an account of the process of collecting Local Chronicle:Produce both by hand and digitally. After this, problems on local chronicles collecting are analyzed and the purpose and meaning of the present research is brought out. And then the paper introduces some basic linguistic knowledge about the concept, the role, the task of recognizing as well as the characteristics and difficulties of named entity recognition. The author also summarizes the current related researches both at home and abroad and discusses the methods of named entity recognition. At last, the author formulates the method of location names recognition from Local Chronicle:Produce according to the characteristics of Chinese local chronicles and the location names in Local Chronicle:Produce.
     Based on the Local Chronicle:Produce of Guangdong, Fujian and Taiwan, this paper focuses on the construction of a recognition system of location names in Local Chronicle: Produce, and also the exploration of the method of content mining of Chinese local chronicles. Then, according to the statistics about the related recogniton results, the author has a research on products, location names and rules. The main contents are as follows:
     (1) The recognition system about location names in Local Chronicle:Produce includes two function modules of full-text database and the location names recognition subsystem.
     The construction of the full-text database:Based on the characteristics of the statement format of Local Chronicle:Produce of Guangdong, Fujian and Taiwan, this paper makes a standard textual format and also designs the structure of database, drawing on previous analysis. And the full-text database has the functions of the full text retrieval, key words retrieval, the cluster retrieval and the data analysis.
     Recognition subsystem of location names in Local Chronicle:Produce:it uses the Rules-based and Statistics-based method to achieve automatic recognition of location names about products, combining with the local chronicles'own peculiarity. The subsystem has the functions of the rule management, the location names recognition, the database of the location names and the statistics of the information. After some tests, it proves that the system can meet the needs of the related researchers on ancient books retrieval and knowledge discovery. And the recognizing effect will be optimized by improving and perfecting the rules gradually.
     (2) Analysis and research about the production of Local Chronicle:Produce:
     This article makes a statistics and analysis about all productions recorded in Local Chronicle:Produce of Guangdong, Fujian and Taiwan from the sides of the period of history. the types of local chronicle and also their regions. The result which is counted from historical period shows that the average number of products recorded in each local chronicle is increased progressively from Ming Dynasty to Qing Dynasty and then to Republic of China. The result counted from local chronicle's types shows that the average number of products recorded in each local chronicle is gradually decreasing from the province to the district and then to the county. Counted from regions, the statistical result shows that regions of productions in Local Chronicle:Produce of Guangdong, Fujian and Taiwan not only contain the products in the three provinces, but also all the ones in Hainan Province and part fields of Guangxi Province.
     (3) The research of the content mining of Local Chronicle:Produce, based on the location names,includes the statistics and analysis about all the correct location names, the distribution of the products in varied provinces, the propagation of the products and also the introduction of products that are introduced from other places.
     All the correct statistics and analysis are based on the 7179 operative recognition records of location names. Provinces classify and analyse the records according to the names in the provinces, outside the provinces, abroad and also the names which covers wide fields. Statistical analysis shows that compared to the other two provinces, the exchanges and the communication that Taiwan Province has with the outside world is relatively wider.
     Based on the relevant statistical data, the research about the distribution of the prodcts, analyses the specific distribution of products in the provinces of Guangdong, Fujian and Taiwan, and uses ArcGIS software to draw thematic maps, so the relevant content can be showed comprehensively and intuitively. The result shows that there are two main factors which determine the diversity of local products. The first one is the region's natural factors, including its geographical location, natural environment and climatic conditions. The second one is the human factor in the region, including the development and utilization of natural resources and also the introduction of the products from other places.
     Based on the relevant statistical data, the research about the dissemination of provincial products, analyses the spread of the products in the provinces of Guangdong, Fujian and Taiwan in detail, with the same ArcGIS software to draw the thematic maps. The result shows that the range of the products'inter-regional exchange and dissemination reduces gradually with the expansion of the distance between the regions. The farther the distance does, the less exchange and dissemination the products will do.
     Based on the relevant statistical data, the research about the introduction of the products from other places, compares the introduction situation of the Guangdong. Fujian and Taiwan provinces. The result shows that there are two reasons to promote the introduction and spread of the products. The first one is the trading between the regions. The second one is the colonial aggression and war.
     (4) Based on the recognition rules, the researches of the content mining of Local Chronicle:Produce include the reseach about the statistical analysis of all the recognition rules, the comparison of the products' distribution in varied provinces and also the research about the way of the products' propagation and introduction.
     All the statistics and analysis are based on the 7179 operative recognition records of the recognition rules. According to the meaning that the rules express, the system classify these recognition rules to two types, the rule to identify the distribution names of the places that the products distribute, and also the rule in order to identify the places where the products are introduced from.
     Based on the statistical data related to the recognition rules, this paper discusses the distribution of the products, shows the details about the products' places of origin, places where they distribute, their merits and also their accounts that the local record describes. And it also summarizes part of the products' origin places and high-yield places.
     Based on the statistical data related to the recognition rules, this paper also explores how the products are introduced from other places and how they are spreaded to other ones. It summarizes three main ways for the products to be introduced and spreaded in the Ming and Qing Dynasties. The products can be introduced and spreaded by foreign trading, the way of tribute, or be passed by the monks.
     In short, this paper takes Local Chronicle:Produce as corpus and realizes the location names recognition of Local Chronicle:Produce by using the named entity recognition technology. Based on the bibliometric analysis on the recognition results, the paper researches on the content mining of Local Chronicle:Produce in order to explore a new method of collecting ancient books based on the contents. The innovations of this paper are:
     (1) It uses the theories and methods about named entity recognition on ancient books, such as Chinese local chronicles, to recognize location names from Chinese local chronicles.
     (2) It analyzes the products' names, location names and recognition rules from recognition results of Local Chronicle:Produce by bibliometric method. Knowledges about products'distribution, propagation and introduction are acquired And it achieves the digital collection of ancient books, which based on the content.
     (3) It uses the GIS thematic maps, so that the distribution and the introduction of the products in Local Chronicle:Produce be showed more intuitively. It breaks the traditional mode of written expression, so that the space feature of the chronicles can be fully revealed.
     Named entities include person names, location names and organization names and so on. This paper just recognizes the location names in Guangdong, Fujian and Taiwan provinces of Local Chronicle:Produce. And in the future, the one can do some researches on the recognition of other entities like person names, organization names an so on by modifying or re-entry, re-organize rules, so that the one can mining and use the ancient information from multiple perspectives, providing the industrial and agricultural productions and scientific researches the historical reference evidence.
引文
[1]来新夏主编.方志学概论[M].福州:福建人民出版社,1984:1
    [2]王晖.四论方志性质与特征[J].中国地方志,2005(1):5-13
    [3]林衍经著.方志学综论[M].上海:华东师范大学出版社,2007:30
    [4]来新夏主编.方志学概论[M].福州:福建人民出版社,1984:44
    [5]来新夏.中国地方志的史料价值及其利用[J].国家图书馆学刊,2005(1):5-8
    [6]杨军昌编著.中国方志学概论[M].贵阳:贵州人民出版社,1999:284
    [7]林衍经著.方志学综论[M].上海:华东师范大学出版社,2007:121
    [8]林衍经著.方志学综论[M].上海:华东师范大学出版社,2007:120
    [9]来新夏主编.方志学概论[M].福州:福建人民出版社,1984:32
    [10]陈正祥.中国方志的地理学价值[M].香港:中文大学出版社,1965:42
    [11]王思明等主编.万国鼎文集[M].北京:中国农业科学技术出版社,2005:375
    [12]Line Eikvil著,陈鸿标译.网上信息抽取技术纵览,2003
    [13]吴玉娴.从明清文献看外来植物的引进与传播[D].广州:暨南大学,2007
    [14]来新夏主编.方志学概论[M].福州:福建人民出版社,1984:139-144
    [15]刘刚.建设“数字方志”传承华夏文明[J].中国图书馆学报,2003(3):48-50
    [16]中国国家图書馆.地方志资源库[DB/OL].http://res4.nlc.gov.cn/home/search.trs?method=advSearch[2010.09.02]
    [17]南京图书馆.江苏文化数据库[DB/OL].http://www2.jslib.org.cn/was40/jswh.jsp[2010.09.02]
    [18]广东中山图书馆.广东特色资源库[DB/OL].http://eweb.zslib.com.cn/com/gddfwx/htm.php?nowmenuid=7[2010.09.02]
    [19]重庆图书馆.重庆地方文献[DB/OL].http://www.cqlib.cn/dfwx/201004/t20100429_24453.html[2010.09.02]
    [20]福建省图书馆.地方文献专题数据库[DB/OL].http://61.154.14.239:8080/was40/index_fjdfz.htm[2010.09.02]
    [21]北京大学数字图书馆.古文献资源库[DB/OL].http://rbdl.calis.edu.cn/aopac/pages/Browse.htmf[2010.09.02]
    [22]北京师范大学图书馆.馆藏线装方志书目数据库[DB/OL].http://digi.lib.bnu.edu.cn:8080/digilib/search?channelid=26966[2010.09.02]
    [23]毛建军.国内数字方志资源的开发与建设探析[J].山东图书馆季刊,2006(3):114-116
    [24]爱如生数字书城.中国方志库[DB/OL].http://www.er07.com/product/productMaping.jsp?id=18[2010.03.23]
    [25]爱如生数字书城.中国方志库可以分省购买吗?[DB/OL].http://www.er07.com/product.do?method=showAppendInfo&id=5[2010.03.23]
    [26]衡中青.地方志知识组织及内容挖掘研究[D].南京:南京农业大学,2007
    [27]Grishman R, Sundheim B. Message Understanding Conference-6:A Brief History[C]. In: Proceedings of the 16th International Conference on Computational Linguistics COLING-96,1996-08
    [28]Rau L F. Extracting Company Names from Text[C].In:Proceeding s of the 7th IEEE Conference on Artificial Intelligence Applications.1991:29-32.
    [29]Bikel D M, Schwarta R, Weischedel R M. An Algorithm that Learns What's in a Name[J].Machine Learning Journal Special Issue on Natural Language Learning,1999,34(1-3):211-231
    [30]Liao W, Veeramachaneni S. A Simple Semi-supervised Algorithm for Named Entity Recognition[C].In:Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing.2009:58-65.
    [31]Ratinov L, Roth D. Design Challenges and Misconceptions in Named Entity Recognition[C].In: Proceedings of the 13th Conference on Computational Natural Language Learning.2009:147-155
    [32]孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995(2):16-27
    [33]张小衡,王玲玲.中文机构名称的识别与分析[J].中文信息学报,1997(4):21-32
    [34]Zhang Y, Zhou J F. A Trainable Method for Extracting Chinese Entity Names and Their Relations[C].In:Proceedings of the 2nd Chinese Language Processing Workshop. HongKong. 2000:66-76
    [35]乔永波.规则与统计相结合的中文命名实体识别[D].济南:山东大学,2007
    [36]王浩畅,赵铁军,李艳.生物医学命名实体识别的特征选取与评价[A].孙茂松,陈群秀.清华大学出版社[C].:清华大学出版社,2007
    [37]王浩畅,赵铁军,刘延力,于浩.生物医学文本中命名实体识别的智能化方法[A].杨义先.北京邮电大学学报编辑部[C].:北京邮电大学学报编辑部,2006
    [38]王世昆.中医症状病机实体识别及其关系挖掘研究[D].厦门:厦门大学,2009
    [39]刘非凡,赵军,吕碧波,于浩,夏迎炬.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报,2006(1):7-13
    [40]邹涛.一种电子产品领域命名实体识别方法研究[D].西安:西安电子科技大学,2010
    [41]王铮.基于CRF的古籍地名自动识别研究[D].南宁:广西民族大学,2008
    [42]付瑞吉.音乐命名实体识别技术研究[D].哈尔滨:哈尔滨工业大学,2009
    [1]刘柏修,刘斌主编.当代方志学概论[M].北京:方志出版社,1997.8:257-260
    [2]杨军昌编著.中国方志学概论[M].贵阳:贵州人民出版社,1999.10:282-284
    [3]刘柏修,刘斌主编.当代方志学概论[M].北京:方志出版社,1997.8:261-265
    [4]杨军昌编著.中国方志学概论[M].贵阳:贵州人民出版社,1999.10:286-296
    [5]王德恒,许明辉,贾辉铭著.中国方志学[M].北京:文化艺术出版社,1994.7:195-204
    [6]林衍经著.方志学综论[M].上海:华东师范大学出版社,1988.1:117-121
    [7]杨军昌编著.中国方志学概论[M].贵阳:贵州人民出版社,1999.10:240-245
    [8]百度百科-肇域志[OL].[2010-12-18].http://baike.baidu.com/view/3223966.htm
    [9]王思明,陈少华.淡泊以明志,宁静以致远——记中国农业历史研究的主要开拓者与奠基人万国鼎教授[M].王思明,陈少华主编.万国鼎文集.中国农业科学技术出版社,2005.10:375
    [10]钱茂伟,祝建伟.中国方志数字图书馆建设的构想[J].中国地方志,2005(2):42-48
    [11]蓝凌云.数字方志的内涵、特征及其困惑[J].中国地方志,2006(3):37-40
    [12]衡中青.地方志知识组织及内容挖掘研究[D].南京:南京农业大学,2007
    [13]孙琳.索引之星与Word索引软件的比较[J].中国索引,2006(4):6-11
    [14]陈国庆.数字技术在古籍整理中的运用初编[D].兰州:兰州大学,2008
    [15]常娥.古籍智能处理技术研究[D].南京:南京农业大学,2007
    [16]常娥,黄建年,侯汉清.古籍智能整理与开发系统构建研究[J].情报资料工作,2009(4):43-47
    [17]胡俊峰,俞士汶.唐宋诗之计算机辅助深层研究[J].北京大学学报,2001(5):727-733
    [18]同[14]
    [19]林尔正,林丹红.计算机应用于古籍整理研究概况[J].情报探索,2007(6):28-29
    [20]常娥,侯汉清,曹玲.古籍自动校勘的研究和实现[J].中文信息学报,2007(2):83-88
    [21]同[19]
    [22]同[19]
    [23]同[19]
    [24]黄建年,侯汉清.农业古籍断句标点模式研究[J].中文信息学报,2008(4):31-38
    [25]刘刚.中国方志书目与索引述略[J].北京图书馆馆刊,1997(1):48-54
    [26]刘柏修,刘斌主编.当代方志学概论[M].北京:方志出版社,1997.8:265-275
    [27]林衍经著.方志学综论[M].上海:华东师范大学出版社,1988.1:115
    [28]杨军昌编著.中国方志学概论[M].贵阳:贵州人民出版社,1999.10:296-304
    [29]同[12]
    [30]张琪玉.情报语言漫笔(L)[J].图书馆理论与实践,2003(6):47-49
    [31]刘刚.建设“数字方志”传承华夏文明[J].中国图书馆学报,2003(3):48-50
    [32]聂华.电子书的发展及其对图书馆的影响[J].大学图书馆学报,2005(2):28-33
    [33]同[9]
    [34]章楷.回忆万国鼎先生和四十多年前的查抄方志工作[M].王思明,陈少华主编.万国鼎文集.中国农业科学技术出版社,2005.10:402
    [35]章楷.回忆万国鼎先生和四十多年前的查抄方志工作[M].王思明,陈少华主编.万国鼎文集.中国农业科学技术出版社,2005.10:404
    [36]同[9]
    [37]同[12]
    [38]同[34]
    [39]朱自振编.中国茶叶历史资料续辑(方志茶叶资料汇编)[M].南京:东南大学出版社,1991.4
    [40]中国农业遗产研究室编.中国农学遗产选集·甲类第15种·常绿果树(上编)[M].北京:农业出版社,1991.5
    [41]王达等合编.中国农学遗产选集·甲类第一种·稻(下编)[M].北京:农业出版社,1993.4
    [42]叶静渊主编,中国农业遗产研究室编.中国农学遗产选集·甲类第16种·落叶果树(上编)[M].北京:农业出版社,2002.12
    [43]同[12]
    [44]王明慧.古籍数字化的障碍及解决方案[J].农业图书情报学刊,2006(8):31-33
    [45]苏文珠.古籍数字化所面临的问题及对策[J].河北科技图苑,2007(3):19-21
    [46]朱锁玲,包平.我国古籍数字化进展与研究述评[J].图书馆理论与实践,2009(9):18-21
    [1]Grishman R, Sundheim B. Message Understanding Conference-6:A Brief History[C]. In: Proceedings of the 16th International Conference on Computational Linguistics COLING-96,1996-08
    [2]丁卓冶.中文命名实体识别的研究[D].大连:大连理工大学,2008
    [3]陈禹.基于语篇的中文命名实体识别研究[D].厦门:厦门大学,2008
    [4]张素香.信息抽取中关键技术的研究[D].北京:北京邮电大学,2007
    [5]邹涛.一种电子产品领域命名实体识别的方法研究[D].西安:西安电子科技大学,2010
    [6]赵琳瑛.基于隐马尔科夫模型的中文名命名实体识别研究[D].西安:西安电子科技大学,2008
    [7]Message Understanding Conference [EB/OL] http://www-nlpir.nist.gov/related_projects/muc/.[2010.11.12]
    [8]CoNLL:the conference of SIGNLL[EB/OL].http://ifarm.nl/signll/conll/
    [9]About SIGNLL[EB/OL].http://ifarm.nl/signll/about/[2010.11.12]
    [10]About the ACL [EB/OL]. http://www.aclweb.org/index.php?option=com_content&task=view&id=38&Itemid=35[2010.11.12]
    [11]Automatic Content Extraction (ACE) Evaluation[EB/OL]. http://www.itl.nist.gov/iad/mig//tests/ace/[2010.11.12]
    [12]TAC 2010 Workshop[EB/OL]. http://www.nist.gov/tac/[2010.11.12]
    [13]863中文命名实体评测[EB/OL].http://www.863data.org.cn[2010.11.15]
    [14]杨尔弘,方莹,刘冬明,乔羽.汉语自动分词和词性标注评测[J].中文信息学报,2006(1)44-49
    [15]2004年863命名实体评测结果报告[EB/OL].http://mtgroup.ict.ac.cn/demo/cwmt/data/2004/2004_Named_Entity_Recognition_Evaluation_Report_p ublic(Chinese).pdf[2010.11.15]
    [16]向晓雯.基于条件随机场的中文命名实体识别[D].厦门:厦门大学,2006
    [17]乔永波.规则与统计相结合的中文命名实体识别[D].济南:山东大学,2007
    [18]马龙.基于条件随机域模型的中文地名识别的研究[D].大连:大连理工大学,2009
    [19]张小衡,王玲玲.中文机构名称的识别与分析[J].中文信息学报,1997(4):21-32
    [20]俞鸿魁.基于层次隐马尔可夫模型的汉语词法分析和命名实体识别技术[D].北京:北京化工大学,2004
    [21]同[3]
    [22]曾冠明.基于条件随机场的中文命名实体识别研究[D].北京:北京邮电大学,2007
    [23]孙镇,王惠临.命名实体识别研究进展综述[J].现代图书情报,2010(6):42-47
    [24]同[2]
    [25]史海峰.基于CRF的中文命名实体识别研究[D].苏州:苏州大学,2010
    [26]张晓艳.基于混合统计模型的汉语命名实体识别方法的研究与实现[D].长沙:国防科学技术大学,2004
    [27]H. H. Chen, Y. W. Ding. S. C. Tsai, and G. W. Bian, "Description of the NTU System Used for MET2," In Proceedings of the 7th Message Understanding Conference, Fairfax, Virginia.1998
    [28]Black W J, Rinaldi F. Mowatt D. FACILE:Description of the NE System used for MUC-7[C]. In Proceedings of the MUC-7, Washington D.C.1998
    [29]G R Krupka. K Hausman. IsoQuest Inc.:Description of the NetOwl TM Extractor System as Used for MUC-7. In Proceedings of the 7th Message Understanding Conference. Fairfax, Virginia,1998.
    [30]Fukumoto J. Shimohata M, Masui F, Sasaki M. Oki Electric Industry:Description of the Oki System as Used for MET-2. In:Proceedings of the 7th Message Understanding Conference, Fairfax, Virginia,1998.
    [31]王宁,葛瑞芳,苑春法等.中文金融新闻中公司名的识别[J].中文信息学报,2002(2):l-6
    [32]Chua Tat-Seng. etal. Learning Pattern Rules for Chinese Named Entity Extraction[C], Proceeding of AAAI'02,2002
    [33]D. Farmakiotou, V. Karkaletsis. RULE-BASED NAMED ENTITY RECOGNITION FOR GREEK FINANCIAL TEXTS. COMLEX 2000:75-78
    [34]张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005(4):44-48
    [35]同[25]
    [36]第一届中国中文信息学会汉语处理评测(CIPS-CLPE)暨第四届国际中文自然语言处理Bakeoff[EB/OL]. http://www.china-language.gov.cn/bakeoff08/.[2010-12-12]
    [37]孙镇,王惠临.命名实体识别研究进展综述[J].现代图书情报技术,2010(6):42-47
    [38]张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005(4):44-48
    [39]牟力科.Web中文信息抽取技术与命名实体识别方法的研究[D].西安:西北大学,2008
    [40]刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004(8):1421-1429
    [41]同[6]
    [42]同[26]
    [43]曲晓棠,沈晓红.基于最大熵模型的中文命名实体识别研究[J].科技信息(学术研究),2008(30):15-17
    [44]同[38]
    [45]王江伟.基于最大熵模型的中文命名实体识别[D].南京:南京理工大学,2005
    [46]杨华.基于最大熵模型的中文命名实体识别方法研究[D].哈尔滨:哈尔滨工程大学,2008
    [47]Lafferty J.McCallum A, Pereira F. Conditional random fields:Probabilistic models for segmenting and labeling sequence data[A]. In:Prco. ICML2001[C], New York:ACM,2001
    [48]张朝胜,郭剑毅,线岩团,余正涛,雷春雅,王海雄.基于条件随机场的英文产品命名实体识别[J].计算机工程与科学,2010(6):115-117
    [49]Viola P, Mukund Narasimhand. Leaening to Extract Information from Semi-structured Text using a DiscriminativeContext Free Grammar[C]. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. New York:ACM,2005:330-337
    [50]同[5]
    [51]同[5]
    [52]张剑.基于CRF的英文命名实体识别研究[D].哈尔滨:哈尔滨工业大学,2006
    [53]王世昆,李绍滋,陈彤生.基于条件随机场的中医命名实体识别[J].厦门大学学报(自然科学版),2009(3):359-364
    [54]同[25]
    [55]庄明,老松杨,吴玲达.一种统计和词性相结合的命名实体发现方法[J].计算机应用,2004(1):22-24
    [56]张华平,刘群.基于角色标注的中国人名自动识别研究[J].计算机学报,2004(1):85-91
    [57]向晓雯,史晓东,曾华琳.一个统计与规则相结合的中文命名实体识别系统[J].计算机应用,2005(10):2404-2406
    [58]同[17]
    [59]王世昆.中医症状病机实体识别及其关系挖掘研究[D].厦门:厦门大学,2009
    [60]李诺,张全.利用地名用字分析的中文地名识别处理[J].计算机工程与应用,2009(28):230-232
    [1]海上丝绸之路-百度百科http://baike.baidu.com/view/23000.html?wtp=tt
    [2]吴玉娴.从明清文献看外来植物的引进与传播[D].广州:暨南大学,2007
    [3]台湾-百度百科http://baike.baidu.com/view/2200.htm
    [4]衡中青.地方志知识组织及内容挖掘研究[D].南京:南京农业大学,2007
    [5]Donny Mack, Doug Seven.ASP.NET数据驱Web开发[M].林琪,张伶,朱涛江.北京:中国电力出版社,2003
    [6]周治平.ADO数据存取技术[J].计算机应用,1999(7):93-95
    [7]叶德谦,马勤勇.使用ADO.NET对关系数据库的访问[J].微型电脑应用,2001(8):39-42
    [8]牟力科.Web中文信息抽取技术与命名实体识别方法的研究[D].西安:西北大学,2008
    [9]同[4]
    [10]同[5]
    [11]同[6]
    [12]同[7]
    [1]衡中青.地方志知识组织及内容挖掘研究[D].南京:南京农业大学,2007
    [2]陈华.《海国闻见录》所载非洲地名考[J].暨南学报(哲学社会科学),1993(4):78-82
    [3]徐复岭.新刊古籍中地名失校剖析[J].中国图书评论,2002(5):34-36
    [4]王铮.基于CRF的古籍地名自动识别研究[D].南宁:广西民族大学,2008
    [5]常娥.古籍智能处理技术研究[D].南京:南京农业大学,2007
    [6]同[1]
    [1]来新夏主编.方志学概论[M].福州:福建人民出版社,1984
    [2]林衍经著.方志学综论[M].上海:华东师范大学出版社,2007
    [3]杨军昌编著.中国方志学概论[M].贵阳:贵州人民出版社,1999
    [4]陈正祥.中国方志的地理学价值[M].香港:中文大学出版社,1965
    [5]王思明等主编.万国鼎文集[M].北京:中国农业科学技术出版社,2005
    [6]Line Eikvil著,陈鸿标译.网上信息抽取技术纵览,2003
    [7]刘柏修,刘斌主编.当代方志学概论[M].北京:方志出版社,1997
    [8]王德恒,许明辉,贾辉铭著.中国方志学[M].北京:文化艺术出版社,1994
    [9]王思明,陈少华.淡泊以明志,宁静以致远——记中国农业历史研究的主要开拓者与奠基人万国鼎教授[M].王思明,陈少华主编.万国鼎文集.中国农业科学技术出版社,2005
    [10]章楷.回忆万国鼎先生和四十多年前的查抄方志工作[M].王思明,陈少华主编.万国鼎文集.中国农业科学技术出版社,2005
    [11]朱自振编.中国茶叶历史资料续辑(方志茶叶资料汇编)[M].南京:东南大学出版社,1991.4
    [12]中国农业遗产研究室编.中国农学遗产选集·甲类第15种·常绿果树(上编)[M].北京:农业出版社,1991.5
    [13]王达等合编.中国农学遗产选集·甲类第一种·稻(下编)[M].北京:农业出版社,1993.4
    [14]叶静渊主编,中国农业遗产研究室编.中国农学遗产选集·甲类第16种·落叶果树(上编)[M].北京:农业出版社,2002.12
    [15]Donny Mack, Doug Seven.ASP.NET数据驱Web开发[M].林琪,张伶,朱涛江.北京:中国电力出版社,2003
    [1]吴玉娴.从明清文献看外来植物的引进与传播[D].广州:暨南大学,2007
    [2]衡中青.地方志知识组织及内容挖掘研究[D].南京:南京农业大学,2007
    [3]乔永波.规则与统计相结合的中文命名实体识别[D].济南:山东大学,2007
    [4]王世昆.中医症状病机实体识别及其关系挖掘研究[D].厦门:厦门大学,2009
    [5]邹涛.一种电子产品领域命名实体识别方法研究[D].西安:西安电子科技大学,2010
    [6]王铮.基于CRF的古籍地名自动识别研究[D].南宁:广西民族大学,2008
    [7]付瑞吉.音乐命名实体识别技术研究[D].哈尔滨:哈尔滨工业大学,2009
    [8]陈国庆.数字技术在古籍整理中的运用初编[D].兰州:兰州大学,2008
    [9]常娥.古籍智能处理技术研究[D].南京:南京农业大学,2007
    [10]丁卓冶.中文命名实体识别的研究[D].大连:大连理工大学,2008
    [11]陈禹.基于语篇的中文命名实体识别研究[D].厦门:厦门大学,2008
    [12]张素香.信息抽取中关键技术的研究[D].北京:北京邮电大学,2007
    [13]赵琳瑛.基于隐马尔科夫模型的中文名命名实体识别研究[D].西安:西安电子科技大学,2008
    [14]向晓雯.基于条件随机场的中文命名实体识别[D].厦门:厦门大学,2006
    [15]马龙.基于条件随机域模型的中文地名识别的研究[D].大连:大连理工大学,2009
    [16]俞鸿魁.基于层次隐马尔可夫模型的汉语词法分析和命名实体识别技术[D].北京:北京化工大学,2004
    [17]曾冠明.基于条件随机场的中文命名实体识别研究[D].北京:北京邮电大学,2007
    [18]史海峰.基于CRF的中文命名实体识别研究[D].苏州:苏州大学,2010
    [19]张晓艳.基于混合统计模型的汉语命名实体识别方法的研究与实现[D].长沙:国防科学技术大学,2004
    [20]牟力科.Web中文信息抽取技术与命名实体识别方法的研究[D].西安:西北大学,2008
    [21]王江伟.基于最大熵模型的中文命名实体识别[D].南京:南京理工大学,2005
    [22]杨华.基于最大熵模型的中文命名实体识别方法研究[D].哈尔滨:哈尔滨工程大学,2008
    [23]张剑.基于CRF的英文命名实体识别研究[D].哈尔滨:哈尔滨工业大学,2006
    [1]王晖.四论方志性质与特征[J].中国地方志,2005(1):5-13
    [2]来新夏.中国地方志的史料价值及其利用[J].国家图书馆学刊,2005(1):5-8
    [3]刘刚.建设“数字方志”传承华夏文明[J].中国图书馆学报,2003(3):48-50
    [4]毛建军.国内数字方志资源的开发与建设探析[J].山东图书馆季刊,2006(3):114-116
    [5]Bikel D M, Schwarta R, Weischedel R M. An Algorithm that Learns What's in a Name[J]. Machine Learning Journal Special Issue on Natural Language Learning,1999,34(1-3):211-231
    [6]孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995(2):16-27
    [7]张小衡,王玲玲.中文机构名称的识别与分析[J].中文信息学报,1997(4):21-32
    [8]刘非凡,赵军,吕碧波,于浩,夏迎炬.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报,2006(1):7-13
    [9]钱茂伟,祝建伟.中国方志数字图书馆建设的构想[J].中国地方志,2005(2):42-48
    [10]蓝凌云.数字方志的内涵、特征及其困惑[J].中国地方志,2006(3):37-40
    [11]孙琳.索引之星与Word索引软件的比较[J].中国索引,2006(4):6-11
    [12]常娥,黄建年,侯汉清.古籍智能整理与开发系统构建研究[J].情报资料工作,2009(4):43-47
    [13]胡俊峰,俞士汶.唐宋诗之计算机辅助深层研究[J].北京大学学报,2001(5):727-733
    [13]林尔正,林丹红.计算机应用于古籍整理研究概况[J].情报探索,2007(6):28-29
    [15]常娥,侯汉清,曹玲.古籍自动校勘的研究和实现[J].中文信息学报,2007(2):83-88
    [16]黄建年,侯汉清.农业古籍断句标点模式研究[J].中文信息学报,2008(4):31-38
    [17]刘刚.中国方志书目与索引述略[J].北京图书馆馆刊,1997(1):48-54
    [18]张琪玉.情报语言漫笔(L)[J].图书馆理论与实践,2003(6):47-49
    [19]聂华.电子书的发展及其对图书馆的影响[J].大学图书馆学报,2005(2):28-33
    [20]王明慧.古籍数字化的障碍及解决方案[J].农业图书情报学刊,2006(8):31-33
    [21]苏文珠.古籍数字化所面临的问题及对策[J].河北科技图苑,2007(3):19-21
    [22]朱锁玲,包平.我国古籍数字化进展与研究述评[J].图书馆理论与实践,2009(9):18-21
    [23]杨尔弘,方莹,刘冬明,乔羽.汉语自动分词和词性标注评测[J].中文信息学报,2006(1)44-49
    [24]孙镇,王惠临.命名实体识别研究进展综述[J].现代图书情报,2010(6):42-47
    [25]王宁,葛瑞芳,苑春法等.中文金融新闻中公司名的识别[J].中文信息学报,2002(2):1-6
    [26]张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005(4):44-48
    [27]刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004(8):1421-1429
    [28]曲晓棠,沈晓红.基于最大熵模型的中文命名实体识别研究[J].科技信息(学术研究),2008(30):15-17
    [29]张朝胜,郭剑毅,线岩团,余正涛,雷春雅,王海雄.基于条件随机场的英文产品命名实体识别[J].计算机工程与科学,2010(6):115-117
    [30]王世昆,李绍滋,陈彤生.基于条件随机场的中医命名实体识别[J].厦门大学学报(自然科学版),2009(3):359-364
    [31]庄明,老松杨,吴玲达.一种统计和词性相结合的命名实体发现方法[J].计算机应用,2004(1):22-24
    [32]张华平,刘群.基于角色标注的中国人名自动识别研究[J].计算机学报,2004(1):85-91
    [33]向晓雯,史晓东,曾华琳.一个统计与规则相结合的中文命名实体识别系统[J].计算机应用,2005(10):2404-2406
    [34]李诺,张全.利用地名用字分析的中文地名识别处理[J].计算机工程与应用,2009(28):230-232
    [35]周治平.ADO数据存取技术[J].计算机应用,1999(7):93-95
    [36]叶德谦,马勤勇.使用ADO.NET对关系数据库的访问[J].微型电脑应用,2001(8):39-42
    [37]陈华.《海国闻见录》所载非洲地名考[J].暨南学报(哲学社会科学),1993(4):78-82
    [38]徐复岭.新刊古籍中地名失校剖析[J].中国图书评论,2002(5):34-36
    [1]中国国家图书馆.地方志资源库[DB/OL].http://res4.nlc.gov.cn/home/search.trs?method=advSearch[2010.09.02]
    [2]南京图书馆.江苏文化数据库[DB/OL].http://www2.jslib.org.cn/was40/jswh.jsp[2010.09.02]
    [3]广东中山图书馆.广东特色资源库[DB/OL].http://eweb.zslib.com.cn/com/gddfwx/htm.php?nowmenuid=7[2010.09.02]
    [4]重庆图书馆.重庆地方文献[DB/OL].http://www.cqlib.cn/dfwx/201004/t20100429_24453.html[2010.09.02]
    [5]福建省图书馆.地方文献专题数据库[DB/OL].http://61.154.14.239:8080/was40/index_fjdfz.htm[2010.09.02]
    [6]北京大学数字图书馆.古文献资源库[DB/OL].http://rbdl.calis.edu.cn/aopac/pages/Browse.htm[2010.09.02]
    [7]北京师范大学图书馆.馆藏线装方志书目数据库[DB/OL].http://digi.lib.bnu.edu.cn:8080/digilib/search?channelid=26966[2010.09.02]
    [8]爱如生数字书城.中国方志库[DB/OL].http://www.er07.com/product/productMaping.jsp?id=18[2010.03.23]
    [9]爱如生数字书城.中国方志库可以分省购买吗?[DB/OL].http://www.er07.com/product.do?method=showAppendInfo&id=5[2010.03.23]
    [10]百度百科-肇域志[OL].http://baike.baidu.com/view/3223966.htm[2010-12-18]
    [11]Message Understanding Conference [EB/OL] http://www-nlpir.nist.gov/related_projects/muc/.[2010.11.12]
    [12]CoNLL:the conference of SIGNLL[EB/OL].http://ifarm.nl/signll/conll/
    [13]About SIGNLL[EB/OL].http://ifarm.nl/signll/about/[2010.11.12]
    [14]About the ACL [EB/OL]. http://www.aclweb.org/index.php?option=com_content&task=view&id=38&Itemid=35[2010.11.12]
    [15]Automatic Content Extraction (ACE) Evaluation[EB/OL]. http://www.itl.nist.gov/iad/mig//tests/ace/[2010.11.12]
    [16]TAC 2010 Workshop[EB/OL]. http://www.nist.gov/tac/[2010.11.12]
    [17]863中文命名实体评测[EB/OL].http://www.863data.org.cn[2010.11.15]
    [18]2004年863命名实体评测结果报告[EB/OL]. http://mtgroup.ict.ac.cn/demo/cwmt/data/2004/2004_Named_Entity_Recognition_Evaluation_Report_p ublic(Chinese).pdf[2010.11.15]
    [19]第一届中国中文信息学会汉语处理评测(CIPS-CLPE)暨第四届国际中文自然语言处理Bakeoff[EB/OL]. http://www.china-language.gov.cn/bakeoff08/.[2010-12-12]
    [20]海上丝绸之路-百度百科[OL]http://baike.baidu.com/view/23000.html?wtp=tt[2011-2-18]
    [21]台湾-百度百科[OL]http://baike.baidu.com/view/2200.htm[2010-2-20]
    [1]Grishman R, Sundheim B. Message Understanding Conference-6:A Brief History[C]. In: Proceedings of the 16th International Conference on Computational Linguistics COLING-96,1996-08
    [2]Rau L F. Extracting Company Names from Text[C].In:Proceeding s of the 7th IEEE Conference on Artificial Intelligence Applications.1991:29-32.
    [3]Liao W, Veeramachaneni S. A Simple Semi-supervised Algorithm for Named Entity Recognition[C].In:Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing.2009:58-65.
    [4]Ratinov L, Roth D. Design Challenges and Misconceptions in Named Entity Recognition[C]. In: Proceedings of the 13th Conference on Computational Natural Language Learning.2009:147-155
    [5]Zhang Y, Zhou J F. A Trainable Method for Extracting Chinese Entity Names and Their Relations[C].In:Proceedings of the 2nd Chinese Language Processing Workshop, HongKong. 2000:66-76
    [6]王浩畅,赵铁军,李艳.生物医学命名实体识别的特征选取与评价[A]孙茂松,陈群秀.清华大学出版社[C].:清华大学出版社,2007
    [7]王浩畅,赵铁军,刘延力,于浩.生物医学文本中命名实体识别的智能化方法[A].杨义先.北京邮电大学学报编辑部[C].:北京邮电大学学报编辑部,2006
    [8]Grishman R, Sundheim B. Message Understanding Conference-6:A Brief History[C]. In: Proceedings of the 16th International Conference on Computational Linguistics COLING-96,1996-08
    [9]H. H. Chen, Y. W. Ding, S. C. Tsai, and G. W. Bian, "Description of the NTU System Used for MET2," In Proceedings of the 7th Message Understanding Conference, Fairfax, Virginia,1998
    [10]Black W J, Rinaldi F. Mowatt D. FACILE:Description of the NE System used for MUC-7[C]. In Proceedings of the MUC-7, Washington D.C.1998
    [11]G R Krupka, K Hausman. IsoQuest Inc.:Description of the NetOwl TM Extractor System as Used for MUC-7. In Proceedings of the 7th Message Understanding Conference. Fairfax, Virginia,1998.
    [12]Fukumoto J, Shimohata M, Masui F, Sasaki M. Oki Electric Industry:Description of the Oki System as Used for MET-2. In:Proceedings of the 7th Message Understanding Conference, Fairfax, Virginia,1998.
    [13]Chua Tat-Seng. etal. Learning Pattern Rules for Chinese Named Entity Extraction[C], Proceeding of AAAI'02,2002
    [14]Lafferty J,McCal 1 um A, Pereira F. Conditional random fields:Probabilistic models for segmenting and labeling sequence data[A]. In:Prco. ICML2001[C], New York:ACM,2001
    [15]Viola P, Mukund Narasimhand. Leaening to Extract Information from Semi-structured Text using a DiscriminativeContext Free Grammar[C]. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. New York:ACM,2005:330-337

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700