多源文本下结合实体的事件发现方法ESP
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:ESP:a Event Detection Algorithm for Multi-source Text
  • 作者:秦宇君 ; 史存会 ; 刘悦 ; 俞晓明 ; 程学旗
  • 英文作者:QIN Yujun;SHI Cunhui;LIU Yue;YU Xiaoming;CHENG Xueqi;Chinese Academy of Sciences;Institute of Computing Technology,Chinese Academy of Sciences;
  • 关键词:事件发现 ; 事件核心实体 ; 多源文本 ; 文本聚类
  • 英文关键词:event detection;;core entity;;multi-source text;;text clustering
  • 中文刊名:SXDR
  • 英文刊名:Journal of Shanxi University(Natural Science Edition)
  • 机构:中国科学院大学;中国科学院计算技术研究所;
  • 出版日期:2019-01-29 16:44
  • 出版单位:山西大学学报(自然科学版)
  • 年:2019
  • 期:v.42;No.163
  • 基金:国家重点研发计划项目(2017YFB0803302)
  • 语种:中文;
  • 页:SXDR201901005
  • 页数:10
  • CN:01
  • ISSN:14-1105/N
  • 分类号:46-55
摘要
网络舆论对人们生活的影响程度与日俱增,通过结合多源数据进行事件发现可以更好地捕捉舆情事件,提高舆情系统的效果。针对在多源文本场景下如何将来自新闻、微博、微信等多通道的数据融合,文章根据事件的定义,提出了事件核心实体的概念,设计了事件核心实体识别方法,并且将事件核心实体应用到事件发现过程,提出了结合实体的事件发现方法 ESP(Entity Single-Pass)。该方法通过引入实体信息,丰富了多源文本中每篇文档的表达,从而提高了多源文本事件发现的效果。实验表明,在微博、新闻等数据上,我们的方法与K-means和SinglePass方法相比,在NMI与RI两项指标上分别提高了0.2和0.3,证明了ESP算法的有效性。
        The impact of public opinion on people's lives is increasing day by day,combining multi-source data for event discovery can better capture public opinion events and improve the effectiveness of the public opinion system.For the fusion of multi-channel data from news,micro-blog,and micro-channels in the multi-source text scenario,this paper proposes the concept of core event entity according to the definition of an event,and designs a core event entity identification method.Moreover,we apply the core event entity to the process of event detection and propose an entity discovery event method ESP(Entity Single-Pass).By introducing entity information,the method enriches the expression of each document in multi-source texts,thereby improving the effect of multi-source text event discovery.The experiments show that compared with K-means and Single-Pass methods,our method has improved the evaluation indexes of NMI and RI by 0.2and 0.3respectively,which proves the effectiveness of ESP algorithm.
引文
[1] Allan J.Topic Detection and Tracking:Event-based Information Organization[M].Springer Science&Business Media,2002.DOI:10.1007/978-1-4615-0933-2.
    [2] Yang Y,Pierce T,Carbonell J.A Study of Retrospective and On-line Event Detection[C]∥Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,1998:28-36.DOI:10.1145/290941.290953.
    [3]张小川,余林峰,桑瑞婷,等.融合CNN和LDA的短文本分类研究[J].软件工程,2018,21(6).DOI:10.19644/j.cnki.issn2096-1472.2018.06.006.Zhang X C,Yu L F,Sang R T,et al.A Study of the Short Text Classification with CNN and LDA[J].Software Engineering,2018,21(6).DOI:10.19644/j.cnki.issn2096-1472.2018.06.006.
    [4]段莹.支持向量机在文本分类中的应用[J].计算机与数字工程,2012,40(7):87-88,149.DOI:10.3969/j.issn.1672-9722.2012.07.030.Duan Y.Application of SVM in Text Categorization[J].Computer&Digital Engineering,2012,40(7):87-88,149.DOI:10.3969/j.issn.1672-9722.2012.07.030.
    [5] Kleinberg J.Bursty and Hierarchical Structure in Streams[C]∥Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2002:91-101.DOI:10.1145/775047.775061.
    [6] Allan J,Papka R,Lavrenko V.On-line New Event Detection and Tracking[C]∥Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,1998:37-45.DOI:10.1145/290941.290954.
    [7] Petrovi c'S,Osborne M,Lavrenko V.Streaming First Story Detection with Application to Twitter[C]∥Human Language Technologies:The 2010Annual Conference of the North American Chapter of the Association for Computational Linguistics.Association for Computational Linguistics,2010:181-189.DOI:10.5194/bgd-4-3913-2007.
    [8] Li R,Lei K H,Khadiwala R,et al.Tedas:A Twitter-based Event Detection and Analysis System[C]∥Data Engineering(icde),2012IEEE 28th International Conference on.IEEE,2012:1273-1276.DOI:10.1109/ICDE.2012.125.
    [9] Kleinberg J.Bursty and Hierarchical Structure in Streams[C]∥Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2002:91-101.DOI:10.1145/775047.775061.
    [10] Fung G P C,Yu J X,Yu P S,et al.Parameter Free Bursty Events Detection in Text Streams[C]∥Proceedings of the31st International Conference on Very Large Data Bases.VLDB Endowment,2005:181-192.DOI:http:∥respository.ust.hk/ir/Record/1783.1-20381.
    [11] Ge T,Cui L,Chang B,Z Sui,et al.Event Detection with Burst Information Networks[C]∥Proceedings of COLING2016,the 26th International Conference on Computational Linguistics:Technical Papers,pages,3276-3286.Osaka,Japan,December 11-17 2016.
    [12]刘炜,刘菲京,王东,等.一种基于事件本体的文本事件要素提取方法[J].中文信息学报,2016,30(4):167-175.DOI:http:∥jcip.cipsc.org.cn/CN/Y2016/V30/I4/167.Liu W,Liu F J,Wang D,et al.A Text Event Elements Extraction Method Based on Event Ontology[J].Journal of Chinese Information Processing,2016,30(4):167-175.DOI:http://jcip.cipsc.org.cn/CN/Y2016/V30/I4/167.
    [13]吕学强,任飞亮,黄志丹,等.句子相似模型和最相似句子查找算法[J].东北大学学报:自然科学版,2003,24(6):531-534.DOI:10.3321/j.issn:1005-3026.2003.06.006.Lu X Q,Ren F L,Huang X D,et al.Sentence Similarity Model and the Most Similar Sentence Search Algorithm[J].Journal of Northeastern University(natural Science),2003,24(6):531-534.DOI:10.3321/j.issn:1005-3026.2003.06.006.
    [14] Mikolov T,Chen K,Corrado G,et al.Efficient Estimation of Word Representations in Vector Space[J].Computer Science,2013.DOI:https:∥www.researchgate.net/publication/234131319-Efficient-Estimation-of-Word-Representations-in-Vector-Space.
    [15]汪静,罗浪,王德强.基于Word2Vec的中文短文本分类问题研究[J].计算机系统应用,2018,27(5).DOI:10.15888/j.cnki.csa.006325.Wang J,Luo L,Wang D Q.Research on Chinese Short Text Classification Based on Word2Vec[J].Computer Systems&Applications,2018,27(5).DOI:10.15888/j.cnki.csa.006325.
    [16]唐明,朱磊,邹显春.基于Word2Vec的一种文档向量表示[J].计算机科学,2016,43(6):214-217,269.DOI:10.11896/j.issn.1002-137X.2016.6.043.Tang M,Zhu L,Zou X C.Document Vector Representation Based Word2Vec[J].Computer Science,2016,43(6):214-217,269.DOI:10.11896/j.issn.1002-137X.2016.6.043.
    [17]王荣波,谌志群,周建政,等.基于Wikipedia的短文本语义相关度计算方法[J].计算机应用与软件,2015,32(1):82-84.DOI:10.3969/j.issn.1000-386x.2015.01.021.Wang R B,Chen Z Q,Zhou J Z,et al.Short Texts Semantic Relevance Computation Method Based on Wikipedia[J].Computer Applications Software,2015,32(1):82-84.DOI:10.3969/j.issn.1000-386x.2015.01.021.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700