面向话题的事件信息融合研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
事件信息抽取(Events Information Extraction: Events IE)目前是信息抽取(Information Extraction: IE)中的一个重要领域。本文提出了一种跨文本事件信息融合方法,该方法在事件IE的基础上引入了多源信息融合理论,并结合命名实体识别、指代消解等其它信息抽取技术,对多源、多文本同话题事件进行信息融合。本文的主要内容包括元事件融合和话题事件融合两部分,具体内容如下:
     1.在元事件融合中,考虑到自然语言表述的多样性,对事件描述中的事件元素进行规格化处理,并针对事件元素中的时间信息、命名实体和数字信息的不同表述特点,采用不同的规格化方法;
     2.在共指元事件聚类过程中,由于事件描述中常出现事件元素的缺失,为了提高共指元事件聚类的召回率,提出了关键元素集合的概念。并针对事件信息的特点,利用事件中的语义和语用信息提出一种适用于事件信息的相似度算法;
     3.在事件元素融合时,在元素的基本可信度上,针对各类事件元素的不同表述特点,根据元素的精度和准度不同调整元素的可信度,提高精度高的元素值被选中的概率。在元素选择时,在可信度计算的基础上,采用了投票策略,增加了最后结果的可信度;
     4.在话题事件融合中,为了能更好地表示话题型事件,本文定义了一种基于元事件的话题事件表示模型(Event-based Topic Description Model: ETDM)。该模型可有效地将话题事件进行结构化和层次化表示,接近人类的认知模式,同时可根据不同需要进行信息定制。最后给出了话题事件的融合方法。
     实验表明,本文元事件融合可以有效合并事件信息,大大降低了信息系统的冗余度,完善了单个事件信息,通过对多源信息的冗余性和互补性进行融合,达到增加目标特征矢量的维数,降低信息的不确定性,改善信息的置信度等目的。对话题事件的融合不仅能有效地将相关事件联系起来,并能将整个话题以层次化、结构的形式表示。
Event Information Extraction (Event IE) is an important point in the area of Information Extraction (IE). In this dissertation, we provide a method to achieve cross document event information fusion. This method is at the basis of Event IE and combination of information fusion basic theory and other information extraction technologies, such as Named Entity Recognition and Co-reference Resolution, etc. This dissertation includes two main parts, basic event information fusion and topic event information fusion.
     1. Before the event fusion process, we must standardize the event roles, such as time mention, named entity mention and so on, because the natural languages’representation is diversity. So we standardize each type of entity in same format based on its own characteristics.
     2. Event mentions always omit some event roles, so we defined the key roles set to improve the recall of the co-reference basic event clustering. And then based on the characteristics of events, this dissertation proposes an approach to calculate the similarity of two different events by using the pragmatics and semantics information of the event tagging.
     3. In the process of the event role fusion, this dissertation introduces the trustworthiness to improve the performance. It adjusted the trustworthiness by the precision of the candidate to improve the probability of the roles with high precision. Furthermore, it adopts the Frequency Voting method to select the event roles and then to increase the trustworthiness.
     4. In the process of topic event fusion, we define an Event-based Topic Description Model (ETDM), which can hierarchize and structure the topic and that behavior is similar with the cognitive model of human. It also provides a fusion approach to fuse topic events.
     The experimental results show that the event fusion method is useful to fuse the event mentions and organize the relative events. It can reduce the information redundant sharply and then consummate the event information. The topic fusion method is also useful to contact relative events, and organize the topics in hierarchy and structure form.
引文
[1]李保利,陈玉忠,俞士汶.信息提取研究综述[J].计算机工程与应用. 2003, 39(10):1-5.
    [2] Ralph Grishman. Information extraction, in Handbook of Computational Linguistics[M] :2002.
    [3] Amit Bagga, Analyzing the Complexity of a Domain With Respect To An Information Extraction Task [EB/OL], Proceeding of the MUC-7, http://www.muc.saic,com. 1998.
    [4] ACE. The Automatic Content Extraction (ACE) Projects [EB/OL]. http//www.ldc.upenn.edu/ Projects/ACE/. 2004.
    [5] Aone C., SW Bennett. Evaluating automated and manual acquisition of anaphora resolution strategies[C]. Cambridge, Massachusetts,ACL 1995:122-129.
    [6] Appelt D.E., DJ Israel. Introduction to Information Extraction Technology[A]. Tutorial for IJCAI-99 [C]. Stockholm, 1999.
    [7] Kambhatla N. Combining lexical,syntactic and semantic features with Maximum Entropy models for extracting relations [A]. ACL’2004 [C]. Barcelona, Spain, July 2004.
    [8] Zhao S.B., Grishman R. Extracting Relations with Integrated Information Using Kernel Methods [A]. ACL’2005[C]. USA, 2005:419-426.
    [9] Zhou GuoDong, Su Jian, Zhang Jie, et al. Exploring various knowledge in relation extraction [A]. ACL’2005[C]. USA, 2005:427-434.
    [10] WANG Ting, Li Yaoyong, Kalina Bontcheva, et al. Automatic Extraction of Hierarchical Relations from Text [A]. Proceedings of the Third European Semantic Web Conference (ESWC 2006) [C]. 2006:401-416.
    [11] Zelenko D., Aone C., Richardella. Kernel Methods for Relation Extraction [J]. Journal of Machine Learning Research. 2003, 3(2003): 1083-1106.
    [12] Culotta A., Sorensen J. Dependency Tree Kernels for Relation Extraction [A].ACL’2004[C]. 2004:423-429.
    [13] ZHANG M., SU J., WANG D. M., et al. Discovering Relations from a Large Raw Corpus Using Tree Similarity-based Clustering [A]. IJCNLP’2005 [C]. 2005:378-389.
    [14] Shalom Lappin, Herbert J. Leass. An Algorithm for Pronominal Anaphora Resolution [J]. Computational Linguistics. 1994, 20(4):535-561.
    [15] Zhou GuoDong, Su Jian. A High-performance Coreference Resolution System Using a Multi-agent Strategy [A]. Proceedings of the 20th international conference on Computational Linguistics[C]. 2004:522-528.
    [16] Ge N.Y., Hale J., Charniak B. A Statistical Approach to Anaphora Resolution [A]. In Proc. of the 6th Workshop on Very Large Corpora[C]. 1998:161-171.
    [17] Cardie C., Wagstaff K. Noun Phrase Coreference as Clustering [A]. EMNLP[C], 1999:82-89.
    [18] Wee.Meng Soon, Hwee Tou Ng and Daniel Chung Yong lim. A Machine Learning Approach to Co-reference Resolution of Noun Phrase [J]. Computational Linguistics. 2001, 27(4):521-544.
    [19] Vincent Ng, Claire Cardie. Improving Machine Learning Approaches to Co-reference Resolution [A]. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics[C]. 2002:104-111.
    [20] Yang X.F., Su J., Tan C.L. Improving Pronoun Resolution Using Statistics–Based Semantic Compatibility Information [A]. ACL-2005[C]. USA 2005:165-172.
    [21] McCarthy J.F., Lehnert W. Using Decision Trees for Coreference Resolution [A]. In Proc. of the 14th International Joint Conference on Artificial Intelligence [C]. Montreal, Canada: Morgan Kaufmann, 1995:1050-1055.
    [22] Ralph Grishman. Research in Information Extraction [A]. Tipster Text Program Phase III: Proceedings of a Workshop[C]. Baltimore, Maryland. October 13-15, 1998.
    [23]姜吉发.自由文本的信息抽取模式获取的研究[D].博士,北京:中国科学院, 2004.
    [24] Hong-woo Chun, Young-sook Hwang, Hae-Chang Rim. Unsupervised Event Extraction from Biomedical Text Based on Event and Pattern Information [A]. CICLing 2004, LNCS 2945[C]. 2004:533-536.
    [25] Hong-woo Chun, Young-sook Hwang, Hae-Chang Rim. Unsupervised Event Extraction from Biomedical Literature Using Co-occurrence Information and Basic Patterns [A]. IJCNLP 2004, LNAI 3248[C]. 2005:777-786.
    [26]杨尔弘.突发事件信息抽取研究[D].博士,北京:北京语言大学, 2005.
    [27] Yangarber R. Scenario customization for Information Extraction [D]. PhD Dissertation, New York University, 2001.
    [28] Filatova E, Hatzivassiloglou V. Event-based Extractive summarization [A]. Proceedings of ACL 2004 Workshop on Summarization[C]. 2004: 104-111.
    [29]于江德,肖新峰,樊孝忠.基于隐马尔可夫模型的中文文本事件信息抽取[J].微电子学于计算机,2007,24(10):92-94.
    [30] Hai Leong Chieu, Hwee Tou Ng. A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text [A]. Proceedings of the 18th National Conference on Artificial Intelligence[C]. 2002:786-791.
    [31] David Ahn. The stages of event extraction [A]. Proceedings of the Workshop on Annotations and Reasoning about Time and Events[C]. 2006: 1-8.
    [32]赵妍妍,秦兵,车万翔等.中文事件抽取技术研究[J].中文信息学报. 2008.22(1):3-8.
    [33] Edward, Waltg, James. Llinas Multi-sensor Data Fusion[C]. Artech House Boston. London, 0-89006-277-3, 1990.
    [34]潘泉,于昕,程咏梅等.信息融合理论的基本方法与进展[J].自动化学报, 2003, 29 (4):599-609.
    [35] Valet L, Mauria G, Bolon Ph. A statistical overview of recent literature in information fusion [A]. Proceedings of 2000 International Conference on Information Fusion[C]. France, Paris.2000: 95-102.
    [36] Smets P. The transferable belief model [J]. A artificial Intelligence, 1994, 66 (2) :197-234.
    [37] Dong-Hoon-lee, Daihee-Park. An Efficient Algorithm for Fuzzy Weighted Average [J]. Fuzzy Sets and Systems, 1997, 87(1):39-45.
    [38] Pham T D, Yan H. A Kriging Fuzzy Integral [J]. Information Sciences, 1997, 98(1):157-173.
    [39] Solaiman B, Pierce L E, Ulaby F T. Multisensor Data Fusion Using Fuzzy Concepts: Application to land-cover classification using ers-l/jers-l sar composites [A]. IEEE Transactions on Geosciences and Remote Sensing (Special Issue on Data Fusion) [C]. 1999, 37(3): 1336-1325.
    [40] Dubois D, Prade H. Qualitative Possibility Theory and Its Applications to Constraint Satisfaction and Decision under Uncertainty [J]. International Journal of Intelligent Systems. 1999 , 14 (1) :45-61.
    [41] Nageswara S V Rao, David B Reister, Jacob Barhen. Fusion Method for Physical Systems Based on Physical Laws[A]. Proceedings of 2000 International Conference on Information Fusion[C]. France, Paris. 2000: 89-95.
    [42] Serge Reboul, Damien Brig, Mohammed Benjelloun. Optimal segmentation by random process fusion [A]. Proceedings of 2000 International Conference on Information Fusion[C]. France, Paris, 2000: 344-348.
    [43] Jacqueline Le Moigne, James Smith. Image registration and fusion in remote sensing for NASA [A]. In: Proceedings of 2000 International Conference on Information Fusion[C]. France, Paris. 2000: 375-382.
    [44] Gideon S.Mann, David Yarowsky. Multi-Field Information Extraction and Cross-Document Fusion [A]. Proceedings of the 43rd Annual Meeting of the ACL[C]. 2005:483-490.
    [45] Qiu L., Pang P., Lin S., Chen P. A Novel Approach to Multi-document Summarization [A]. Proceedings of the 18th International Conference on Database and Expert Systems Applications (DEXA 2007) [C]. 2007.
    [46]徐永东,徐志明,王晓龙.基于信息融合的多文档自动文摘技术[J].计算机学报, 2007,30(11):2048-2054.
    [47] Danie1 N, Radev D, Allison. Sub-event based Multi-document Summarization [A]. Proceeding of the HLT-NAACL 2003 Workshop on Text Summarization[C]. 2003: 9-16.
    [48] Li Wen-jie, Xu Wei, Wu Mingli. Extractive Summarization using Inter- and Intra- Event Relevance [A]. Proceedings of COLING-ACL[C]. 2006: 369-376.
    [49] Li J.H., Zhou G.D., Kong F., Zhu Q.M., Qian P.D. Hierarchical parsing with Maximum Entropy models[A]. International Conference on Chinese Computing (ICCC’2007) [C]. 2007.
    [50] Zhou G.D., Kong F. and Zhu Q.M. Context-sensitive convolution tree kernel for pronoun resolution [A]. IJCNLP’2008[C].
    [51] Gideon S.Mann and David Yarowsky. Multi-Field Information Extraction and Cross-Document Fusion [A]. Proceedings of the 43rd Annual Meeting of the ACL[C]. 2005:483-490.
    [52] Chinatsu Aone, Mila Ramos-Santacruz. REES: A Large-Scale Relation and Event Extraction System [A]. Proceedings of the 6th Applied Language Processing Conference (ANLP-00) [C]. 2006.
    [53] Jakub Piskorski, Hristo Tanev, Pinar Oezden-Wennerberg. Extracting Violent Events from On-Line News for Ontology Population [A]. In 10th International Conference on Business Information Systems. Lecture Notes in Computer Science, LNCS 4439[C]. Poznan, Poland. 2007: 287–300.
    [54]吴平博,陈群秀,马亮.基于时空分析的线索性事件的抽取与集成系统研究[J].中文信息学报, 2006,20(1): 21-28.
    [55]王昀.金融领域中汉语时间信息抽取的研究[D].硕士,北京:清华大学,2004.
    [56] Li Qingzhong, Gao Wei, Li Wen-jie, et al..Design Issues in a Chinese Financial Information Extraction System[A]. Proceedings of 20th International Conference on Computer Processing of Oriental Languages[C]. Shenyang, China. August 4-6. 2003: 417-423.
    [57]王厚峰.鲁棒性汉语人称代词消解研究[J].软件学报. 2005, 16(5): 700-707.
    [58]王厚峰.汉语中人称代词的消解研究[J].计算机学报. 2001 [2]:136-143.
    [59]王晓斌.基于语篇表述理论的汉语人称代词的消解研究[J].厦门大学学报,2004 [1]:31-35.
    [60]李国臣,罗云飞.采用优先选择策略的中文人称代词的指代消解[J].中文信息学报. 2005, (4).
    [61] June-Jei Kuo, Hsin-His Chen. Cross Document Event Clustering Using Knowledge Mining from Co-reference Chains [A]. AIRS 2005, LNCS 3689[C], 2005: 121-134.
    [62]穗志方,俞士汶.基于骨架依存树的语句相似度计算模型[C].中文信息处理国际会议(ICCIP’98).1998.
    [63]李彬等.基于语义依存的汉语句子相似度计算[J].计算机应用研究.2003(12).
    [64] XU Yong-Dong, XU Zhi-Ming. Using Multiple Features and Statistical Model to Calculate Text Units Similarity [A]. Proceedings of the Fourth International Conference on Machine Learning and Cybernetics[C], Guangzhou, 18-21 August 2005: 2834-2839.
    [65]赵妍妍,秦兵,刘挺.基于多特征融合的句子相似度计算[C].全国第八届计算语言学联合学术会议论文集. 2005.
    [66] Schank R.C., Abelson R. P. Scripts, Plans, and Goals and Understanding [M]. Hillsdale, N.J.: Lawrence Erlbaum Associates. 1977.
    [67] Nelson K, Gruendel J. Event Knowledge: Structure and Function in Development. Erlbaum. 1986.
    [68]王寅.事件域认知模型及其解释力[J].现代外语. 2005(18): 17-26.
    [69]梁晗,陈群秀,吴平博.基于事件框架的信息抽取系统[J].中文信息学报. 2006(20): 40-46.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700