基于树核的无指导中文语义关系抽取研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
命名实体语义关系抽取是信息抽取中的主要任务之一。在中文语义关系抽取方面,有指导的学习方法占主导地位,目前还没有采用无指导学习方法的相关研究。同时,由于树核函数在英文语义关系抽取中取得了一定的成功,因此本文提出了采用树核函数的方法来实现无指导的中文语义关系抽取。
     本文把无指导的语义关系抽取看作是一个用句法树表示的关系实例的聚类问题。首先提取中文语句中存在语义关系的实体对作为关系实例,并采用句法树中的最短路径包含树作为它们的结构化表示形式;然后,利用卷积树核函数的方法计算两棵句法树之间的结构相似度;最后,选用自底向上的层次聚类算法,以完全连通和平均连通作为簇相似度计算方法,将关系实例聚类到不同的簇中,从而实现无指导的中文语义关系抽取。
     在ACE RDC 2005中文基准语料库上的实验表明,采用该方法的关系大类抽取和关系子类抽取的F值分别达到了60.1和44.6,这说明基于树核函数的无指导学习方法在中文语义关系抽取上是有效和可行的。
Semantic relation extraction between named entities is one of the main tasks in the field of information extraction. As for Chinese semantic relation extraction, supervised learning methods dominate in this area, while so far there is no unsupervised learning. Motivated by the success of semantic relation extraction based on convolution tree kernel in Englishtexts, this paper proposes a convolution tree kernel-based approach for unsupervised Chinese semantic relation extraction.
     We cast the task of supervised relation extraction as a problem of clustering relation instances expressed as parse trees. First, all NE pairs with potential relationship existing in Chinese sentences are extracted, with the shortest path-enclosed trees as their structural representations;Then, the similarities between two parse trees are computed based on convolution tree kernel; Finally, a bottom-up hierarchical clustering algorithm,together with cluster similarity computation methods such as maximum linkage and average linkage, is used to group the relation instances into different clusters, thus the task of Chinese unsupervised relation extraction is performed.
     Evaluation on the ACE RDC 2005 Chinese benchmark corpus shows that the approach achieves the F-measure of 60.1 and 44.6 for major type relation extraction and subtype relations extraction. This suggests that our method is reasonable and effective for unsupervised Chinese relation extraction.
引文
[1]李保利,陈玉忠,俞士汉.信息提取研究综述[J].计算机工程与应用.2003, 39(10):1-5.
    [2] MUC[EB/OL].http://www.itl.nist.gov/iaui/874.02/related_project/muc/,1987-1998.
    [3] ACE[EB/OL].The Automatic Context Extraction Project.http://www.ldc.upen.edu /Project/ACE, 2002-2005.
    [4]车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报, 2005, 19(2):1-6.
    [5]董静,等.中文实体关系抽取中的特征选择研究[J].中文信息学报, 2007, 21(4): 80-85, 91.
    [6] Li W. J., Zhang P., Wei F. R., Hou Y. X., and Lu Q. A Novel Feature-based Approach to Chinese Entity Relation Extraction[C].ACL,2008: 89–92.
    [7] Che W. X., et al. Improved-Edit-Distance Kernel for Chinese Relation Extraction [C]//IJCNLP, 2005: 132~137.
    [8]刘克彬,等.基于核函数中文关系自动抽取系统的实现[J].计算机研究与发展, 2007, 44(8): 1406-1411.
    [9] Huang R. H., Sun L., Feng Y. Y. Study of Kernel-Based Methods for Chinese Relation Extraction [C].LNCS (Lecture Notes in Computer Science),2008:Volume 4993, pages 598-604.
    [10] Takaaki Hasegawa, Satoshi Sekine and Ralph Grishman. Discovering Relations among Named Entities from Large Corpora[C].ACL, 2004: 415-422.
    [11] Zhang Ming, Sun Jian, Wang Danmei, et al. Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-base Clustering. Natural Language Processing-IJCNLP 2005[M]. 2005,378-389.
    [12] Zhang M., Zhang J., Su J., and Zhou G. D. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features [C]. COLING-ACL,2006: 825-832.
    [13] Zhou G. D., Zhang M., Ji D. H., Zhu Q. M. Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information [C]. EMNLP/CoNLL, 2007: 728-736.
    [14] Qian L.H., Zhou G.D, Zhu Q.M., Qian P.D. Exploiting constituent dependencies for tree kernel-based semantic relation extraction[C].COLING, 2008: 697-704.
    [15] Collins M. and Duffy N. Convolution Kernels for Natural Language[C].NIPS, 2001: 625-632.
    [16] Sager N. Syntactic Analysis of Natural Language. Advances in Computers 8, 1967, papes 153-188.
    [17] Academic Press, NY. Aberdeen J., Day D., Hirschman F., Robinson P., et al. Description of the Alembic system used for MUC-6[A]. MUC-6 [C]. 1995,141-155.
    [18] MUC-7. Proceedings of the 7th Message Understanding Conference (MUC-7). 1998. Morgan Kaufmann, San Mateo, CA.
    [19] C. Aone and M. Ramos-Santacruz. Rees: A large-scale relation and event extraction system. In Proceedings of the 6th Applied Natural Language Processing Conference, pages 76-83, 2000.
    [20] Miller S.,Fox H.,Ramshaw L.,et al. A novel use of statistical parsing to extract information from text[A]. ANLP’2000[C]. 2000, 226-233.
    [21] Cristianini N. and Shawe-Taylor J. An Introduction to Support Vector Machines [M]. Cambridge University Press, Cambirdge University, 2000.
    [22] Zhang T. Regularized winnow methods [C]. In Advances in Neural Information Processing Systems 13, 2001, pages 703-709.
    [23] Kambhatla N. Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations [C]. ACL’2004 (poster), July 2004, pages 178-181. Barcelona, Spain.
    [24] Zhao S. B. and Grishman R. Extracting relations with integrated information using kernel methods [C]. ACL’2005, June 2005, pages 419-426. Ann Arbor, USA.
    [25] Zhou G.D., Su J., Zhang J. and Zhang M. 2005. Exploring various knowledge in relation extraction. ACL’2005: 427-434.
    [26] Zhou G.D., Su J. and Zhang M. 2006a. Modeling commonality among related classes in relation extraction, COLING-ACL’2006: 121-128.
    [27] Zhou G.D., Zhang M. and Fu G.H. 2006b. Hierarchical Learning Strategy in Relation Extraction using Support Vector Machines, AIRS’2006, LNCS4182.
    [28] Wang T., Li Y. Y., and Bontcheva K. Automatic Extraction of Hierarchical Relations from Text [C]. In Proceedings of the Third European Semantic Web Conference (ESWC 2006), 2006, pages 401-416.
    [29] Jiang J. and Zhai C. X. A Systematic Exploration of the feature Space for Relation Extraction [C]. NAACL-HLT’2007, 2007, pages 113~120. Rochester, NY, USA.
    [30] Zelenko D., Aone C. and Richardella. Kernel methods for relation extraction[J]. Journal of Machine Learning Research. 2003, 3 (2003) : 1083-1106.
    [31] Culotta A. and Sorensen J. 2004. Dependency tree kernels for relation extraction[A]. ACL’2004[C]. 2004,423-429.
    [32] Bunescu R. C. and Raymond J. M. A Shortest Path Dependency Kernel for Relation Extraction [C]. EMNLP’2005, 2005, pages 724-731. Vancover, B.C.
    [33] Zhou G. D., Zhang M., Ji D. H., and Zhu Q. M. Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information [C]. EMNLP/CoNLL’2007, 2007, pages 728-736. Prague, Czech.
    [34] Brin S. Extracting patterns and relations from world wide web[A]. Proceedings of WebDB Workshop at 6th International Conference on Extending Database Technology[C], 1998, 172-183.
    [35] Agichtein E and Gravano L. Snowball:Extracting relations from large plain-text collections[A]. Proceedings of the 5th ACM International Conferenceon Digital Librariess(ACMDL’2000) [C], 2000.
    [36] Zhang Z.Weakly supervised relation classification for information extraction[A]. CIKM [C], 2004.
    [37] Chen J. X., Ji D. H., Tan C. L., et al. Relation extraction using label propagation based semi-supervised learning[A]. COLING-ACL [C], 2006, 129-136.
    [38] Zhou G.D., Li J.H., Qian L.H. Zhu Q.M. 2008b. Semi-supervised learning for relation extraction.IJCNLP’2008.
    [39] Chen J. X., Ji D. H., Tan C. L.,et al. Unsupervised Feature Selection for Relation Extraction[C]//Conference on Information and Knowledge Management Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, 2007:411-418.
    [40] Zhang Y. M. and Zhou J. F. A Trainable Method for Extracting Chinese Entity Names and their Relations. In Proceedings of the 2nd Chinese Language Processing Workshop, October 2000. Hong Kong.
    [41] Walter D., Bosch A. V. D., Zavrel J., Veenstra J., Buchholz S., and Busser B. Rapid development of NLP modules with memory-based learning [C]. In Proceedings of ELSNET in Wonderland, 1998, papes 105-113. Utrech, Netherlands.
    [42]张素香,文娟,秦颖等.哈尔滨工程大学学报. 2006, 27(增), 370-373.
    [43]梅家驹,竺一鸣,高蕴琦,殷鸿翔.同义词词林[M].上海:上海辞书出版社, 1996.
    [44] Lodhi H., Saunders C., Shawe-Taylor J., Cristianini N., and Watkins C. Text classification using string kernel[C]. Journal of Machine Learning Research, 2002(2): 419-444.
    [45] Bunescu R. C. and Raymond J. M. Subsequence Kernels for Relation Extraction [C]. In Proceedings of NIPS’2005, December 2005. Vancover, B.C.
    [46] Christopher D. Manning, Hinrich Sch tze. Foundations of Statistical Natural Language Processing [M]. Beijing: Publishing House of Electronics Industry, 2005.
    [47] Selim S Z, Ismail M A. K-Means-Type Algorithms: A Generalized Convergence Theorem and Charadterization of Local Optimality. IEEE Trans Pattern Analysis and Machine Intelligence,1984.
    [48] Haussler D. Convolution Kernels on Discrete Structures. Technical Report UCS-CRL-99-10, 1999. University of California, Santa Cruz.
    [49] Suzuki J., Hirao T., Sasaki Y., and Maeda E. Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data [C]. ACL’2003, 2003.
    [50] Moschitti A. A Study on Convolution Kernels for Shallow Semantic Parsing [C]. ACL’2004, 2004. Barcelona, Spain.
    [51]庄成龙,钱龙华,周国栋.基于树核函数的实体语义关系抽取方法研究[J].中文信息学报, 2009,23(1): 3-8, 34.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700