多策略候选集构建与实体链接
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Multi-strategy candidate construction and entity linking
  • 作者:杨紫怡 ; 盛晨 ; 孔芳 ; 周国栋
  • 英文作者:YANG Zi-yi;SHENG Chen;KONG Fang;ZHOU Guo-dong;School of Computer Science and Technology,Soochow University;
  • 关键词:指称扩展 ; 选集构建 ; 实体链接
  • 英文关键词:mention expansion;;candidate construction;;entity linking
  • 中文刊名:JSJK
  • 英文刊名:Computer Engineering & Science
  • 机构:苏州大学计算机科学与技术学院;
  • 出版日期:2018-12-15
  • 出版单位:计算机工程与科学
  • 年:2018
  • 期:v.40;No.288
  • 基金:国家自然科学基金(61472264,61333018,61673290)
  • 语种:中文;
  • 页:JSJK201812019
  • 页数:10
  • CN:12
  • ISSN:43-1258/TP
  • 分类号:132-141
摘要
针对实体链接中候选集构建问题提出了一种多策略结合的候选集构建算法。综合利用多种策略提取上下文中的完整指称,降低候选实体数量,同时提高正确实体的召回率,构建一个高质量的实体候选集。在TAC2014英文语料上使用本文提出的多种策略进行了实验和分析,确定最优候选集构建策略的同时,也证明了本文方法确实能够达到提升候选集召回率和准确率的目的。进一步验证了候选集质量对完整的实体链接系统的性能影响明显。相比基准算法,使用最优候选集构建策略提取的候选集能使整体的实体链接系统的性能提升3.7%。
        Aiming at the candidate set construction problem in entity linking systems,we propose a multi-strategy candidate set construction method.We use a variety of strategies to extract the complete mentions in the context,reduce the number of candidates and improve the recall of the correct entity in order to construct a high quality candidate set.We conduct experiments on the TAC2014 English entity linking corpus and then analyze the results.The optimal construction strategy is chosen.Experimental results prove that the proposed method can improve the recall and precision of candidate sets.We further validate that the quality of the candidate sets has a significant effect on the performance of the whole entity linking system.Compared with the baseline algorithm,the candidate set extracted by the optimal candidate set construction strategy can improve the performance of the whole entity linking system by 3.7%.
引文
[1] Li Yu-heng.Research on entity linking method for micro-blog text[D].Beijing:University of Chinese Academy of Sciences,2015.(in Chinese)
    [2] Liu Qiao,Zhong Yun,Li Yang,et.al.Graph-based collective Chinese entity linking algorithm[J].Journal of Computer Research and Development,2016,53(2):270-283.(in Chinese)
    [3] Bunescu R C,Pasca M.Using encyclopedic knowledge for named entity disambiguation[C]∥Proc of the 11st Conference of the European Chapter of the Association for Computational Linguistics,2006:9-16.
    [4] Miller G A,Charles W G.Contextual correlates of semantic similarity[J].Language and Cognitive Processes,1991,6(1):1-28.
    [5] Mihalcea R,Csomai A.Wikify!:Linking documents to encyclopedic knowledge[C]∥Proc of the 16th ACM Conference on Information and Knowledge Management,2007:233-242.
    [6] Cucerzan S.Large-scale named entity disambiguation based on wikipedia data[C]∥Proc of EMNLP,2007:708-716.
    [7] Han X P,Sun L.An entity-topic model for entity linking[C]∥Proc of the 2012Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2012:105-115.
    [8] Hachey B,Radford W,Curran J R.Graph-based named entity linking with wikipedia[C]∥Proc of the 12th International Conference on Web Information Systems Engineering,2011:213-226.
    [9] Moro A,Raganato A,Navigli R.Entity linking meets word sense disambiguation:A unified approach[J].Transactions of the Association for Computational Linguistics,2014(2):231-244.
    [10] Dalton J,Dietz L.A neighborhood relevance model for entity linking[C]∥Proc of the 10th Conference on Open Research Areas in Information Retrieval,2013:149-156.
    [11] Guo Yu-hang,Qin Bing,Liu Ting,et.al.Research progress of entity linking[J].Intelligent Computer and Applications,2014,4(5):9-13.(in Chinese)
    [12] Guo Yu-hang,Che Wan-xiang,Liu Ting,et al.A graphbased method for entity linking[C]∥Proc of the 5th International Joint Conference on Natural Language Processing,2011:1010-1018.
    [13] Gottipati S,Jiang J.Linking entities to a knowledge base with query expansion[C]∥Proc of the Conference on Empirical Methods in Natural Language Processing,2011:804-813.
    [14] Zhang W,Su J,Tan C L,et al.Entity linking leveraging:Automatically generated annotation[C]∥Proc of the 23rd International Conference on Computational Linguistics,2010:1290-1298.
    [15] Pilz A,PaaβG.From names to entities using thematic context distance[C]∥Proc of the 20th ACM International Conference on Information and Knowledge Management,2011:857-866.
    [16] Zheng Z C,Li F T,Huang M L,et al.Learning to link entities with knowledge base[C]∥Proc of Human Language Technologies:The 2010Annual Conference of the North American Chapter of the Association for Computational Linguistics,2010:483-491.
    [17] Ratinov L,Roth D,Downey D,et al.Local and global algorithms for disambiguation to wikipedia[C]∥Proc of the49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies,2011:1375-1384.
    [18] Monahan S,Lehmann J,Nyberg T,et al.Cross-lingual cross-document coreference with entity linking[C]∥Proc of Text Analysis Conference(TAC2011),2011:1-10.
    [19] Chen Z,Ji H.Collaborative ranking:A case study on entity linking[C]∥Proc of the Conference on Empirical Methods in Natural Language Processing,2011:771-781.
    [20] Dredze M,McNamee P,Rao D,et al.Entity disambiguation for knowledge base population[C]∥Proc of the 23rd International Conference on Computational Linguistics,2010:277-285.
    [21] Barrena A,Soroa A,Agirre E,et al.Alleviating poor context with background knowledge for named entity disambiguation[C]∥Proc of the 54th Annual Meeting of the Association for Computational Linguistics,2016:1903-1912.
    [22] Landau F M,Durrett G,Klein D.Capturing semantic similarity for entity linking with convolutional neural networks[C]∥Proc of NAACL-HLT,2016:1256-1261.
    [23] Durrett G,Klein D.A joint model for entity analysis:Coreference,typing,and linking[J].Transactions of the Association for Computational Linguistics,2014(2):477-490.
    [24] Tsai C T,Dan R.Cross-lingual wikification using multilingual embeddings[C]∥Proc of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2016:589-598.
    [25] Shen W,Wang J Y,Luo P,et al.Linking named entities in Tweets with knowledge base via user interest modeling[C]∥Proc of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2013:68-76.
    [26] Gattani A,Lamba D S,Garera M,et al.Entity extraction,linking,classification,and tagging for social media:A wikipedia-based approach[J].Proceedings of the VLDB Endowment,2013,6(11):1126-1137.
    [27] Zhang W,Sim Y C,Su J,et al.Entity linking with effective acronym expansion,instance selection and topic modeling[C]∥Proc of the 22nd International Joint Conference on Artificial Intelligence,2011:1909-1914.
    [28] Sil A,Yates A.Re-ranking for joint named-entity recognition and linking[C]∥Proc of the 22nd ACM International Conference on Information&Knowledge Management,2013:2369-2374.
    [29] Joachims T.Training linear SVMs in linear time[C]∥Proc of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2006:217-226.
    [30] Joachims T.Optimizing search engines using clickthrough data[C]∥Proc of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2002:133-142.
    [31] Xiang Xiao-wen,Shi Xiao-dong,Zeng Hua-lin.Chinese named entity recognition system using statistics-based and rules-based method[J].Journal of Computer Applications,2005,25(10):2404-2406.(in Chinese)
    [32] O’callaghan L,Mishra N,Meyerson A,et al.Streaming-data algorithms for high-quality clustering[C]∥Proc of the 18th International Conference on Data Engineering,2002:685-694.
    [33] Aggarwal C C,Han J W,Wang J Y,et al.A framework for projected clustering of high dimensional data streams[C]∥Proc of the 30th International Conference on Very Large Data Bases,2004:852-863.
    [34] Ji H,Nothman J,Hachey B.Overview of TAC-KBP2014entity discovery and linking tasks[C]∥Proc of Text Analysis Conference(TAC2014),2014:1333-1339.
    [35] Manning C D,Surdeanu M,Bauer J,et al.The stanford CoreNLP natural language processing toolkit[C]∥Proc of the 52nd Annual Meeting of the Association for Computational Linguistics,2014:55-60.
    [1]李禹恒.面向微博文本的实体链接方法研究[D].北京:中国科学院大学,2015.
    [2]刘峤,钟云,李杨,等.基于图的中文集成实体链接算法[J].计算机研究与发展,2016,53(2):270-283.
    [11]郭宇航,秦兵,刘挺,等.实体链指技术研究进展[J].智能计算机与应用,2014,4(5):9-13.
    [31]向晓雯,史晓东,曾华琳.一个统计与规则相结合的中文命名实体识别系统[J].计算机应用,2005,25(10):2404-2406.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700