面向自由文本的细粒度关系抽取的关键技术研究

英文题名：Research on Key Technology of Free Text Oriented Fine-grained Relation Extraction
作者：朱倩
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：实体关系 ; 细粒度关系 ; 信息抽取 ; 描述逻辑 ; HNC理论 ; 自然语言处理 ; 半监督学习
英文关键词：entity relation ; fine-grained relation ; information extraction ; description logics ; HNC ; natural language processing ; semi-supervised learning
学位年度：2011
导师：程显毅
学科代码：081203
学位授予单位：江苏大学
论文提交日期：2011-11-01

摘要

信息抽取(IE, Information Extraction)是继信息检索和机器翻译之后,信息处理领域倍受关注的一个重要的研究方向。IE的目的是抽取出指定的事件、事实等信息并填入一个数据库中供用户查询使用,只有得到各个实体之间的正确关系,才能进行正确的数据库填充。实体关系抽取成为影响IE系统质量的一个关键技术,有着广泛的应用背景。随着Internet的快速发展和网上信息量的迅猛增长,及自然语言处理技术和机器学习技术的不断发展和成熟,从自由文本中抽取出有用的结构化信息已经成为可能。
     目前实体关系抽取研究已经取得了很多的成果,也越来越走入人们的日常生活,比如像google的Powerset语义搜索引擎、apache软件基金会的Lucene全文检索引擎架构等等。但是,对文本浅层特征的利用以及依赖于少量特定领域的训练文本,使得它们的效果往往不尽如人意,实体抽取技术仍然面临着很多困难。
     本文以Triples<实体,属性,值>(Entity-Artribute-Value,EAV)为研究对象(本文称为细粒度关系,或EAV关系),以HNC (Hierarchical Network of Concepts,概念层次网络)理论、描述逻辑和半监督学习理论为基础,研究语义层面的细粒度关系(实体-属性、实体-属性值、属性-属性、属性-属性值之间的关系)抽取的关键技术,本文的主要贡献：
     1、构建了描述细粒度关系本体的逻辑系统ALCIQ(EAV)(3.5)。在传统的知识管理方式下,由于信息资源缺少统一的语义描述,用户难以实现相关资源的语义融合,本体技术是解决这一困难的重要手段。本体的建立对于需要交换信息,共享信息的人或异构系统来说,将有助于清除在概念和术语上的分歧,对领域内的概念理解达成共识,成为人机之间,机器和机器之间互相理解的语义基础。本文基于本体技术给出了EAV建模的描述逻辑ALCIQ(EAV),基于ALCIQ(EAV)推理算法实现了EAV本体依赖、EAV角色依赖、EAV外部依赖和EAV的形式化,有效地解决了细粒度关系范围的界定。
     2、提出了基于HNC的词语语义关联度计算方法(4.3.4)。在细粒度关系抽取中,关联度计算可以发现词语之间的固有联系和隐含关系,可以联想孤立词语的关联词语(相似词语、相反词语、搭配词语、共现词语等),是词语语义相似度和词语语义相关度扩充。本文通过HNC把整个世界作为一个普遍联系的有机整体,假设词语之间也是相互联系的,词语之间构成一张无向带权图(网),用一条边来连接相关联的两个词语,边上的权重为两个词语的关联度,通过在概念网络寻找两个词语的路径来计算词语之间的固有联系和隐含关系。利用HNC联想机制,计算HNC符号的中层表达式,实现词语联想。解决了语义层面上的词语关联度计算,扩展了词语语义相似度和词语语义相关度概念,是抽取实体、属性、属性值的基础。实验结果表明通过词语语义关联度抽取的属性和属性值更能客观地反映真实的细粒度语义关系。
     3、提出了基于半监督学习的未定义关系类别的细粒度关系抽取算法(5.3)。未定义关系类别的关系抽取是细粒度关系抽取的核心问题,针对预定义关系类别应用的局限性,本文基于半监督学习给出了未定义关系类别的聚类算法,该算法包括：基于正例和未标注数据学习算法、关系模式泛化算法和关系模式置信度计算算法,并在维基百科上展示了一个细粒度关系抽取的实验,在训练数据较少的情况下,其效果仍然是可接受的。
     4、给出一个细粒度关系抽取应用案例——中文科技术语分析(6.2)。中文科技术语分析有利于确定中文科技术语的内涵与分类,界定与判断新术语,把握中文科技术语所属领域的发展重点与发展方向。为了验证细粒度关系抽取的效果,将本文的细粒度关系抽取方法应用于中文科技术语分析。首先,利用ALCIQ(EAV)对科技术语建模,界定中文科技术语文本范围；然后,计算“术语-属性-属性值”关联度,抽取中文科技术语的属性及其相应的值；最后,基于半监督学习的未定义关系类别算法对中文科技术语聚类。
Information Extraction is an important research direction in the field of information processing after information retrieval and machine translation. The purpose of IE is to extract appointed events or facts and fill them into a database for users to query it, and only when the relations between the entities are right, then the database can be correctly filled. Relation extraction has become one key technology that effect the performance of IE system and it has extensive application background. With the rapid development of Internet and the rapid growth in the amount of online information, and with the development and maturity of natural language processing and machine learning techniques, it has become possible to extract useful structured information from free text.
     At present, relation extraction has gotten many achievements, and it has more and more pacing into people's daily lives, such as google's Powerset semantic search engine and Lucene full-text search engine architecture of apache software foundation etc. But since they all use text's shallow features and depend on the training text from few specific areas, so their performance is not satisfactory, and relation extraction still facing many difficulties.
     The paper's research object is Entity-Artribute-Value triples(EAV), and with the theory of Hierarchical Network of Concepts, description logics and semi-supervised learning theory to research the key technology of semantic-level fine-grained relation extraction(the relation between Entity-Artribute, Entity-Value, Artribute-Artribute, Artribute-Value), and the main contributions of the paper are:
     1. ALCIQ(EAV)(3.5) is constructed to describe fine-grained relation Ontology. According to traditional knowledge management pattern, the information lacks uniform semantic description, so it is hard for users to realize relevant information resource semantic fusion. Ontology technology is an important means to resolve this difficulty. For the people and heterogeneous systems who want to exchange information or share information, the establishment of Ontology can help clear the divergences of concepts and terminology, reach a consensus on the understanding of the concepts of the field, and it is the semantic basis of the mutual understanding between machines or people and machine. Based on Ontology technology, the paper presents ALCIQ(EAV) which is used to EAV modeling, the paper also realized the formalization of EAV Ontology dependency, EAV role dependency, EAV external dependency and EAV integrity with ALCIQ(EAV) reasoning algorithm, and it effectively solve the definition of the fine-grained relation scope.
     2. Semantic association degree algorithm is presented based on HNC (4.3.4) When fine-grained relation is extracted, association degree calculation can find inherent link and implicit relationship between words, it can also associate isolated word with its relational word(similar word, contrary word, collocating word, concurring word etc.) and it is the expansion of semantic similarity degree and semantic correlation degree. Let the world be a universal connected organic whole with HNC, and suppose words are connected with each other, thus the words compose a undirected weighted graph, and the associated words are connected by edge, while the weight of the edge is the association degree of these two words, therefore, inherent link and implicit relationship between words can be obtained by searching the path between two words in the HNC.Words association can be realized by computing HNC symbols'middle-level expression with HNC's association mechanism. The solving of word association degree computing and the expanding of semantic similarity degree and semantic correlation degree are the basic of extracting entity, attribute and attribute value. The experiment result shows the attribute and attribute value that extracted by semantic association degree can more objectively represent actual fine-grained semantic relation.
     3. The type-undefined fine-grained relation extraction algorithm is proposed based on semi-supervised learning (5.3). The type-undefined relation extraction is the key problem of fine-grained relation extraction. To resolve the limitation of type-defined relation application, the paper gives a type-undefined relation clustering algorithm based on semi-supervised learning, and the algorithm is composed of:one learning algorithm based on positive examples and unlabeled data, one relation pattern generalization algorithm and one relation pattern confidence computation algorithm, and the fine-grained relation extraction experiment is also carry out on Wikipedia, the result is acceptable even though the training data is relatively few.
     4. The fine-grained relation extraction application is showed—Chinese technical terms analysis (6.2). Chinese technical terms analysis is beneficial to determine the connotation and class of Chinese technical terms, define and judge new terms, and it can also contribute to hold the development focus and development direction of the field that the Chinese technical terms belongs. To validate the effect of fine-grained relation extraction, the extraction method presented in the paper is applied to Chinese technical terms analysis. Firstly, Chinese technical terms is modelinged with ALCIQ(EAV), and the boundary of the term is determinated, second, the association degree of "term-artribute-value" is computed, and the artribute of Chinese technical term and its value is extracted, finally, the type-undefined relation extraction algorithm is used to process Chinese technical term clustering based on semi-supervised learning.

引文

[1]Automatic Content Extraction 2008 Evaluation Plan (ACE08)[EB/OL]. [2008-05-30]. http://www.nist.gov/speech/tests/ace/2008/doc/ace08-evalplan.v1.2.pdf.
    [2]Schutz A, Buitelaar P. RelExt:A Tool for Relation Extraction from Text in Ontology Extension[C].4th International Semantic Web Conference, Galway, Ireland, November 6-10, 2005:593-606.
    [3]Katrenko S, Adriaans P. Learning Relations from Biomedical Corpora Using Dependency Tree Levels[C]. In:Proc. BENELEARN conference(2006),2006.
    [4]ACE.2007.The nist ace evaluation website. http://www.nist.gov/speech/tests/ace/ace07/.
    [5]Automatic Content Extraction 2008 Evaluation Plan (ACE08)[EB/OL].[2008-05-30]. http://www.nist.gov/speech/tests/ace/2008/doc/ace08-evalplan.v1.2.pdf.
    [6]徐健,张智雄,吴振新.实体关系抽取的技术方法综述[J].现代图书情报技术,2008,8(总第168期)：18-23.
    [7]黄曾阳.HNC(概念层次网络)理论——计算机理解语言研究的新思路[M].北京：清华大学出版社,1998.
    [8]Relationship Extraction[EB/OL]. [2008-05-30]. http://en.wikipedia.org/wiki/Relationship_extraction.
    [9]钱龙华.命名实体间语义关系抽取研究[D].苏州：苏州大学工学博士论文,2009.
    [10]MUC[EB/OL]. [2008-05-30]. http://www. nist.gov/iaui/894.02/related_projects/muc/.
    [11]ACE [EB/OL]. [2008-05-30]. http://www. nist.gov/speech/tests/ace/.
    [12]ACE08 Annotation Tasks [EB/OL]. [2008-05-30]. http://projects.ldc.upenn.edu/ace/ annotation/.
    [13]Banko M, Cafarella M J. Soderland S,eta.l Open Information Extraction from the Web[C]. In:Proceeding of the International Joint Conferences on Artificial Intelligence,2007.
    [14]Ciravegna F,Wilks Y.Designing adaptive information extraction for the semantic web in amilcare[J].Annotation for the Semantic Web:in the Series Frontiers in Artificial Intelligence and Applications.Amsterdam:IOS Press,Amsterdam,2003:96.
    [15]J.Domingue, M.Dzbor. E.Motta, Magpie. Supporting Browsing and Navigation on the Semantic Web. In N.Nunes and C.Rich, editors, Proceedings ACM Conference on Intelligent User Interfaces(IUI),2004:191-197.
    [16]Atanas Kiryakov, Borislav Popov, Damyan Ognyanoff, Dimitar Manov,Angel Kirilov, Miroslav Goranov.Semantic annotation.indexing.and retrieval. Journal of Web Semantics, ISWC 2003 Special Issue,2004.1(2):671-680.
    [17]Brewster, C., Ciravegna, F.. Wilks. Y. User-Centred Ontology Learning for Knowledge Management. In 7th Int'l Conference Applications of Natural Language to Information Systems, Stockholm, LNAI, Springer 2002.
    [18]J.Hassell, B.Aleman-Meza, I.B.Arpinar. Ontology-Driven Automatic Entity Disambiguation in Unstructured Text.In 5th Internal Semantic Web Conference (ISWC'06).Springer,2006.
    [19]L.K.McDowell and M.Cafarella.Ontology-Driven Information Extraction with OntoSyphon. In 5th Internal Semantic Web Conference (ISWC'06).Springer,2006.
    [20]Yaoyong Li,Kalina Bontcheva.Hierarchical,Perceptron-like Learning for Ontology Based Information Extraction.In:Proceedings of the 16th International World Wide Web Conference,2007:777-786.
    [21]A.Schutz,P.Buitelaar.RelExt:a tool for relation extraction from text in ontology extension, In:International Semantic Web Conference,2005:593-606.
    [22]Philipp Cimiano,Siegfried Handschuh,Steffen Staab.Towards the Self-Annotating Web.In: Proceedings of the 13th WWW Conference,ACM,New York,May 2004:462-471.
    [23]Hong Jia-Fei,Li Xiang-Bing,Huang Chu-Ren.Ontology-based Prediction of Compound Relations:A Study Based on SUMO.Proceedings of the 18th Pacific Asia Conference on Language,Information and Computation,December 8th-10th,Waseda University,Tokyo, Japan,2004:309.
    [24]Wolfgang Holzinger,Bernhard Krupl,Marcus Herzog.Using Ontologies for Extracting Product Features from Web,In 5th Internal Semantic Web Conference (ISWC'06). Springer, 2006:286-299.
    [25]Soderland G. Learning text analysis rules for domain specific natural language processing [D]. Amherst:University of Massachuset ts,1997.
    [26]Hobbs, Jerry R,Douglas E, et al. FASTUS:a cascaded finite state transducer for extracting information from natural language text [A]. Finite State Devices for Natural Language Processing[C]. Cambridge:M IT Press,1996.383-406.
    [27]车万翔，刘挺,李生.浅层语义分析.全国第八届计算语言学联合学术会议(JSCL-2005)南京,2005.8.
    [28]Enhong Chen,Gaofeng Wu.An Ontology Learning Method Enhanced by Frame Semantics.In:Proceedings of the Seventh IEEE International Symposium on Multimedia, pages(lSM 2005),2005:374-382.
    [29]S.M.Harabagiu,C.A.Bejan,P.Morarescu.2005.Shallow Semantics for Relation Extraction.In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI'05),Edinburgh,Scotland,UK,2005:1061-1066.
    [30]Gamallo,P.,Gonzalez,M.,Agustini,A.,Lopes,G.,de Lima,V.S.Mapping syntactic depende-ncies onto semantic relations.In:Proceedings of the ECAIWorkshop on Machine Learning and Natural Language Processing for Ontology Engineering,2002.
    [31]杨建明.关系抽取方法研究[J].电子技术,2008,23(3)：36-61.
    [32]Kavalec M,Svatek V.A study on automated relation labelling in ontology learning. In:Buitelaar P,Cimiano P,Magnini B,eds.Ontology Learning from Text:Methods,Evaluation and Applications,Amsterdam:IOS Press,2005.
    [33]http://vocab.org/relationship/
    [34]Hasegawa, T. and Sekine, S. and Grishman, R. Discovering Relations among Named Entities from Large Corpora. In Proc. of ACL-2004, pages 415-422,2004.
    [35]Chen, J., D. Ji, C. L. Tan and Z. Niu. Unsupervised Feature Selection for Relation Extraction. IJCNLP-05, Jeju Island,Korea.
    [36]Feldman. R. and B. Rosenfeld. Boosting Unsupervised Relation Extraction by Using NER. EMNLP-06, Sydney, Australia.
    [37]Fukumoto J.,Masui F.,Shimohata M.,and Sasaki M.Oki Eletricity Industry:Description of the Oki System as Used for MUC-7[C].In Proceedings of the 7th Message Understanding Conference(MUC-7).1998.
    [38]Huyck C.R.Description of the American University in Cairo's System Used for MUC-7[C].In Proceedings of the 7th Message Understanding Conference(MUC-7).1998.
    [39]Humphreys H.,R.Gaizauskas,S.Azzam,C.Huyck,B.Mitchell,H.Cunningham,and Y.Wilks. University of Sheffield:Description of the LaSIE-Ⅱ System Used for MUC-7[C].In Proceedings of the 7th Message Understanding Conference(MUC-7).1998.
    [40]Aone C.and Ramos-Santacruz M.REES:A large-scale relation and event extraction system[C]. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP00),2000.pages 76-83.
    [41]Miller S.,Fox H.,Ramshaw L.,and Weischedel R..A novel use of statistical parsing to extract information from text[C].In Proceedings of the 6th Applied Ntural Language Processing Conference(ANLP'2000),29 Apr-4 May 2000,pages 226-233.Seattle,USA.
    [42]Iria J. T-Rex:A Flexible Relation Extraction Framework[C].In:Proceeding of the 8th Annual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK'05), Manchester. January 2005.
    [43]Sabou M. Mathieu d'Aquin, Motta E. SCARLET:Semantic relation Discovery by Harvesting on Line ontology[C]. In:Proceedings of the 5th European Semantic Web Conference, June,2008.
    [44]Kambhatla N.Combining lexical,syntactic and semantic features with Maximum Entropy models for extracting relations[C].ACL'2004(poster),July 2004,pages 178-181.Barcelona, Spain.
    [45]Zhao S.B.and Grishman R.Extracting relations with integrated information using kernel methods[C].ACL'2005,June 2005,pages 419-426.Ann Arbor,USA.
    [46]Zhou G.D.,Su J.,Zhang J.,and Zhang M.Exploring various knowledge in relation extraction[C].ACL'2005. June 2005,pages 427-434.Ann Arbor,USA.
    [47]Wang T.,Li Y.Y..and Bontcheva K.Automatic Extraction of Hierarchical Relations from Text [C].In Proceedings of the Third European Semantic Web Conference(ESWC 2006),2006, pages401-416.
    [48]Jiang J.and Zhai C.X.A Systematic Exploration of the feature Space for Relation Extraction[C].NAACL-HLT'2007,2007,pages 113-120.Rochester,NY,USA.
    [49]D.Zelenko, C.Aone, andA.Richardella.Kenrel methods for relation extraction[R].J.Mach. LearnRes,3:1083-1106,2003.
    [50]刘克彬.基于核函数的命名实体关系抽取技术研究.上海交通大学硕士论文,2007.
    [51]Pabitra M.,Murthy C.A..and Sankar K.Unsupervised Feature Selection Using Feature Similarity [J].IEEE transactions on pattern analysis and machine intelligence,2002,24(3).
    [52]Hasegawa T.,Sekine S.,and Grishman R.,Discovering Relations among Named Entities from Large Corpora[C].ACL'2004,2004.Barcelona,Spain.
    [53]Zhang M.,Su J.,Wang D.M.,Zhou G.D.,and Tan C.L.Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering[C]. IJCNLP'2005,2005,pages 378-389.
    [54]Zelenko D.,Aone C.,and Richardella A..Kernel methods for relation extraction[J].Journal of Machine Learning Research.2003,3(2003):1083-1106.
    [55]Chen J.X.,Ji D.H.,Tan C.L.,and Niu Z.Y.Unsupervised Feature Selection for Relation Extraction[C].In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management,2005,pages 411-418.
    [56]Dash M.and Li H.Feature Selection for Clustering[C].Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PKADD).April 18～20,2002. Kyoto,Japan.
    [57]Fung G.P.C.,Jeffrey X.Y.,and Lu H.J.Discriminative Category Matching:Efficient Text Classification for Huge Document Collections[C].In Proceedings of ICDM'2002,pages 187-194. Japan.
    [58]Abney S.Bootstrapping[C].ACL'2002,2002.pages 221-229.
    [59]Brin S.Extracting patterns and relations from the World Wide Web[C].In WebDB Workshop at 6th International Conference on Extending Database Technology(EDBT'98),1998.
    [60]Agichtein E.and Gravano L.Snowball:Extracting Relations from Large Plain-Text Collections [C]. Proceedings of the fifth ACM conference on Digital libraries,2000.
    [61]Blum A.and Mitchell T.Combining labeled and unlabeled data with co-training[C].In Proceedings of the Workshop on Computational Learning Theory,1998.
    [62]Zhang Z.Weakly supervised relation classification for Information Extraction[C].In proceedings of ACM 13th conference on Information and Knowledge Management (CIKM'2004),8-13 Nov.2004,pages 581-588. Washington D.C.,USA.
    [63]Chen J.X.,Ji D.H.,and Tan C.L.Relation Extraction using Label Propagation Based Semi-supervised Learning[C].COLING-ACL'2006,July 2006,pages 126-139.Sydney, Australia.
    [64]Yangarber R.,R.Grishman P.Tapanainen,and S.Huttunen.Unsupervised discovery of scenario-level Patterns for information extraction.In proceedings of the Applied Natural Language Processing Conference(ANLP2000). Seattle,WA,2000.
    [65]Stevenson,M. An Unsupervised WordNet- based Algorithm for Relation Extraction. Proceedings of the Fourth International Conference on Language Resources and Evaluation workshop "Beyond Named Entity:Semantic Labeling for NLP tasks ",Lisbon, Portugal,2004.
    [66]M.Skounakis,M.Cren and S.Ray.Hierarchieal Hidden Markov Models for Information Extraction. In Proceedings of the 18th International joint Conference on Artificial Intelligence, Acapuleo, Mexieo, Morgan Kaufmann,2003,pp1010-1018.
    [67]Dan Roth and Wen-tau Yih.Probabilistic Reasoning for Entity & Relation Recognition. In 19th international Conference on Computational Linguistics,2002.
    [68]W.R.v.Hage,H.Kolb,G.Schreiber.A Method for Learning Part-Whole Relations. Proceedings of the 5th International Semantic Web Conference,2006.
    [69]R.Girju,A.Badulescu,D.Moldovan. Automatic Discovery of Part-Whole Relations. Computational Linguistics archive Volume 32,Issue 1,March 2006:83-135.
    [70]Girju,R..Moldovan,D.,Tatu,M.,Antohe,D.On the semantics of noun compounds. Computer Speech and Language 19(4),2005:479-496.
    [71]Vanderwende,L.SENS:the system for evaluating noun sequences.In K.Jensen,G.E. Heidorn and S.D.Richardson Ed.,Natural Language Processing:the PLNLP approach, Kluwer Academic Publishers,1993:161-73.
    [72]Gildea D,Jurafsky D.Automatic labeling of semantic roles.Computational Linguistics,2002. 28(3):245-288.
    [73]Skusa A, Ruegg A, Kohler J. Extract ion of b iological interact ion networks from scien tific literature [J]. Brief Bioinform (S1467-5463),2005,6(3):263-276.
    [74]Adamic L A, Wilk inson D, Huberman B, et al. A literature based method for identifying gene-disease connect ions [J]. Proc IEEE Comput Soc Bioinform Conf(S1555-3930), 2002,1:109-117.
    [75]Wren J D, Bekered jian R, Stew art J A. Know ledge discovery by automated identification and ranking of implicit relationships[J]. Bioinform atics(S1367-4083),2004,20(3):389-398.
    [76]A I- M ubaid H, Singh R K. A new text mining approach for finding protein to disease associations [J]. Am J Biochem Biotechnol (S1553-3668),2005,1(3):145-152.
    [77]Ding J, Berleant D, N ettleton D et al. Mining MEDLINE:abstracts, sentences, or phrases? [J]. Pac Symp Biocomput(S1793-5091),2002:326-37.
    [78]Park J C, K mi H S, K mi J J. Bidirectional incremental parsing for automatic pathway identif ication with combinatory categorical grammar[J].Pac Symp B iocompu t(S1793-5091),2001:396-407.
    [79]Temk in J M, GilderM R. Extraction of protein interaction information from unstructured text using a cntext- free grammar[J]. Bioinformatics(S1367-4083),2003,19:2046-2053.
    [80]Rinald i F, Schneider G, Kaljurand K, et al. M in ing of relations between proteins over biomedical scientific literature using a deep 1 inguistic approach[J]. Artif Inte HMed(S0933-3657),2007,39 (2):127-36.
    [81]邹霞.英语复合词的述谓结构与语义格研究[J].邵阳学院学报(社会科学版),2007,6(3)：89-91.
    [82]吴明智,崔雷.生物医学实体关系抽取的研究[J].中华医学图书情报杂志,2010,19(5)：5-11.
    [83]Ono T, H ishigaki H, Tanigami A. Automatic extraction of information on protein-protein interactions from the biological literature [J].Bioin formatics(S1367-4083),2007,17(2): 155-161.
    [84]姜吉发.自由文本的信息抽取模式获取的研究[D].北京：中国科学院计算技术研究所博士论文，2004.
    [85]Specia L, Motta E. A Hybrid Approach for Extracting Semantic Relations from Texts[EB/OL]. [2008-05-30]. http://www. dcs.she.f ac. uk/-lucia/publications/ SpeciaMotta_OLP2-2006.pdf.
    [86]Feldman.R.,Sanger.J. The Text Mining Handbook-Advanced Approaches In Analyzing Ustructured Data. [M]北京：人民邮电出版社,2009.
    [87]Shubin Zhao Ralph Grishman. Extracting Relations with Integrated Information Using Kernel Methods[C]. ACL2005.
    [88]Feldman, R. and B. Rosenfeld. Self-Supervised Relation Extraction from the Web. ISMIS-2006, Bari, Italy.
    [89]O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Web-scale information extraction in knowitall:(preliminary results). Proceedings of the 13th international conference on World Wide Web, pages 100-110, 2004.
    [90]于洋.整体结构及其表示与推理[D].长沙：国防科学技术大学工学博士论文,2007.
    [91]Lutz C,Satiler U.Tendera L.The Complexity of Finite Model Reasoning in Description Logics. Information and Computation.2005,199:132-171.
    [92]Ido Dagan, Lillian Jane Lee, Fernando C N Pereira.Similarity-based models of word cooccurrence probabilities[J].Machine Learning,1999;34(1-3):43～69.
    [93]王斌.汉英双语语料库自动对齐研究[M].北京：中国科学院计算所,1999.
    [94]刘群,李素建.基于《知网》的词汇语义相似度计算[J].Computational Linguistics and Chinese Language Processing,2002; 7(2):59～76.
    [95]苗传江.HNC(概念层次网络)理论导论[M].北京：清华大学出版社,2005.
    [96]缪建明,张全.HNC语境框架及其语境歧义消解[J].计算机工程,2007,33(15)：10-13.
    [97]黄曾阳.在反思中前进,在碰撞中成长[A].语言概念空间的基本定理和数学物理表示式[C].海洋出版社,2004.7.
    [98]苗传江.基于HNC句类体系的句子语义研究[J].语言文字应用,2006,1：126-134.
    [99]张运良,张全.基于HNC理论的语义相关度计算方法[[J].计算机工程与应用,2005.34：1-4.
    [100]许云,樊孝忠张锋.基于知网的语义相关度计算[J].北京理工大学学报,2005,25(5)：411-414.
    [101]Suchanek F M, Kasneei G, Weikum G.2008.YAGO:A Large ontology from Wikipedia WordNet. Elsevier Journal of web Semanties,2008.
    [102]Etzioni O, Cafarella M, Downey D, etal.2004.Web-seale information extraction in knowitall. In WWW, NewYork, May2004.
    [103]Loebe E. An Analysis of Roles:Toward Ontology-Based Modeling. Master'Thesis, University of Leipzig.2003.
    [104]Auer,S.,Lehmann,J.,What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. In Proceedings of the 4th European Semantic Web Conference (ESWC'07),2007.
    [105]Girju,R.,Badulescu,A.,Moldovan.D.,Learning semantic constraints for the automatic discovery of part-whole relations.In Proceedings of HLT-NAACL'03,2003.
    [106]Roth,D.,Yih,W.,A linear programming formulation for global inference in natural anguage tasks.In Proceedings of the 8th International Conference on Computational Natural Language Learning(CoNLL'04),2004.
    [107]Ruiz-Casado,M.,Alfonseca,E.,Castells,P.,Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia.In Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB'05),2005.
    [108]Zhou,G.D.,Su,J.,Zhang,J.,Zhang,M.,Exploring Various Knowledge in Relation Extraction.In Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05),2005.
    [109]Chen,J.,Ji,D.,Tan,C.L.,Niu,Z.,Relation Extraction Using Label Propagation Based Semi-supervised Learning. In Proceedings of 44th Annual Meeting of the Association for Computational Linguistics(ACL'06),2006.
    [110]El-Yaniv,R.,Nisenson,M.,Optimal Single-Class Classification Strategies.In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (NIPS'06),2006.
    [111]Blum,A.,Chawla,S.,Learning from labeled and unlabeled data using graph mincuts. In Proceedings of the 18th International Conference on Machine Learning (ICML'01), MA, USA,2001.
    [112]Blum,A.,Lafferty,J.,Rwebangira,R.,Reddy,R.,Semi-Supervised Learning using Randomized Mincuts.In Proceedings of the 21st International Conference on Machine Learning (ICML'04),Banff,Canada,2004.
    [113]Blum.A.,Mitchell,T.,Combining labeled and unlabeled data with co-training.In Proceedings of the 11th Annual Conference on Computational Learning Theory COLT'98),Madison, Wisconsin,USA,1998.
    [114]Zhou,Z.H.,Li,M.,Tri-Training:Exploiting Unlabeled Data Using Three Classifiers.IEEE Transactions on Knowledge and Data Engineering(TKDE),2005.
    [115]Breiman,L.,Bagging Predictors.Machine Learning,24,1996,123-140.
    [116]Zhang,Z.,Weakly-supervised relation classification for information extraction. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management(CIKM'04), Washington,DC,USA,2004.
    [117]Kamal Nigam, Andrew K. McCallum, Sebastian Thrun, and Tom M. Mitchell., "Text classification from labeled and unlabeled documents using EM." Machine Learning,39(2/3), 2000, pp 103-134.
    [118]D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL'95), Cambridge, MA,1995,189-196.
    [119]Yangarber,R.,Counter-Training in Discovery of Semantic Patterns.In Proceedings of 41st Annual Meeting of the Association for Computational Linguistics ACL'03).2003.
    [120]Vulkel,M.,Krotzsch,M.,Vrandecic,D.,Haller,H.,Suder,R.,Semantic Wikipedia.In Proceed-ings of the 15th International World Wide Web Conference WWW'06).2006.
    [121]Suchanek,F.M.,Kasneci,G.,Weikum,G.,YAGO:A Core of Semantic Knowledge Unifying WordNet and Wikipedia.In Proceedings of the 16th International World Wide Web Conference(WWW'07),2007.
    [122]Zhu,X.,Semi-supervised Learning Literature Survey.TR 1530,Univ.of Wisconsin, Madison, Dec.2006.
    [123]Resnik,P.,Selectional constraints:an information-theoretic model and its computational realization. Cognition, Elsevier,1996.
    [124]Denoyer,L.,The Wikipedia XML Corpus.SIGIR Forum,2006.
    [125]http://opennlp.sourceforge.net/
    [126]http://lucene.apache.org/
    [127]Chang,C.-C.,Lin,C.-J.,LIBSVM:A Library for Support Vector Machines,2001.Software available at http://www.csie.ntu.edu.tw/-cjlin/libsvm.
    [128]王璐，朱东华,任智军.科技术语属性抽取方法研究.现代图书情报技术.2007,5：69-72.
    [129]王强军,李芸,张普.信息技术领域术语抽取的初步研究.自然语言处理,2003(1)：32-33,37.
    [130]凌祺.樊孝忠.领域词汇自动获取的研究.微机发展,2005,15(8)：148-150.
    [131]刘建舟,何婷婷,姬东鸿,等.基于开放式语料的汉语术语的自动抽取.Advances in computation of oriental languages-proceedings of the 20th international conference on computer processing of oriental languages.沈阳,2003:43-49.
    [132]何婷婷.张勇.基王质子串分解的中文术语自动抽取.计算机工程,2006,32(23)：188-190.
    [133]胡文敏,何婷婷,张勇.基于卡方检验的汉语术语抽取.计算机应用,2007,27(12)：3019-3025.
    [134]岑咏华,韩哲,季培培.基于隐马尔科夫模型的中文术语识别研究.现代图书情报技术,2008(12)：54-58.
    [135]《数学辞海》编辑委员会编.数学辞海·第一卷.北京：中国科学技术出版社.2002：644.
    [136]《马克思主义哲学原理》.武汉：中国地质大学,网络教育学院.
    [137]苗东升.《系统科学大学讲稿》.北京：中国人民大学出版社,2007：72.
    [138]http://baike.baidu.com/view/1761486.htm#sub1761486
    [139]http://baike.baidu.com/view/168249.htm#sub168249
    [140]周浪，张亮,冯冲,等.基于词频分布变化统计的术语抽取方法.计算机科学,2009，,36(5)：177-180.
    [141]电子政务主题词表编制与应用系统课题组.《综合电子政务主题词表》(试用本)范畴表[M].北京：科学技术文献出版社,2005.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700