基于共指消解的实体搜索模型研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Coreference Resolution Based Entity Search Model
  • 作者:熊玲 ; 徐增壮 ; 王潇斌 ; 洪宇 ; 朱巧明
  • 英文作者:XIONG Ling;XU Zengzhuang;WANG Xiaobin;HONG Yu;ZHU Qiaoming;School of Computer Science and Technology,Soochow University;
  • 关键词:共指消解 ; 伪相关反馈 ; 实体搜索
  • 英文关键词:coreference resolution;;pseudo relevant feedback;;entity search
  • 中文刊名:MESS
  • 英文刊名:Journal of Chinese Information Processing
  • 机构:苏州大学计算机科学与技术学院;
  • 出版日期:2018-05-15
  • 出版单位:中文信息学报
  • 年:2018
  • 期:v.32
  • 基金:国家自然科学基金(61373097,61672367,61672368)
  • 语种:中文;
  • 页:MESS201805011
  • 页数:8
  • CN:05
  • ISSN:11-2325/N
  • 分类号:94-101
摘要
实体属性挖掘(slot filling,SF)旨在从大规模文档集中挖掘给定实体(称作查询)的特定属性信息。实体搜索是SF的重要组成部分,负责检索包含给定查询的文档(称为相关文档),供后续模块从中抽取属性信息。目前,SF领域关于实体搜索的研究较少,使用的基于布尔逻辑的检索模型忽略了实体查询的特点,仅使用查询的词形信息,受限于查询歧义性,检索结果准确率较低。针对这一问题,该文提出一种基于跨文档实体共指消解(cross document coreference resolution,CDCR)的实体搜索模型。该方法通过对召回率较高但准确率较低的候选结果进行CDCR,过滤不包含与给定实体共指实体的文档,提高检索结果的准确率。为了降低过滤造成的召回率损失,该文使用伪相关反馈方法扩充查询实体的描述信息。实验结果显示,相比于基准系统,该方法能有效提升检索结果,准确率和F1分别提升5.63%、2.56%。
        The goal of Slot Filling(SF)is extracting certain attribute value of given entity(query)from large scale corpus.Entity search,as an important component of SF,retrieves documents referring to the given entity for other components to extracting attribute values from them.In contrast to the existing entity search based on boolean logic,we propose a cross document coreference resolution(CDCR)based entity search model.This CDCR improves the precision of IR results by filtering documents which do not contain mentions referring to the given entity.To minimize the loss of recall in filtering process,we introduce the pseudo relevant feedback method to augment the information of given entity.Experimental results show that our model outperforms the baseline by increasing the precision and F1 score by 5.63% and 2.56%,respectively.
引文
[1]Dang H T,Surdeanu M.Task description for knowledge-base population at TAC 2013[C]//Proceedings of the 6th Text Analysis Conterence,2013.
    [2]Ji H,Grishman R.Knowledge base population:Successful approaches and challenges[C]//Proceedings of the 49th ACL:Human Language Technologies-Volume 1,2011:1148-1158.
    [3]Surdeanu M,Ji H.Overview of the English slot filling track at the TAC2014knowledge base population evaluation[C]//Proceeding of Text Analysis Conference(TAC2014),2014.
    [4]Manning C D,Raghavan P,Schütze H.Introduction to information retrieval[M].Cambridge:Cambridge university press,2008:1-18.
    [5]Roth B,Barth T,Wiegand M,et al.Effective slot filling based on shallow distant supervision methods[C]//Proceedings of the Sixth Text Analysis Conference,2013.
    [6]Angeli G,Chaganty A,Chang A,et al.Stanford’s2013KBP system[C]//Proceedings of the Sixth Text Analysis Conference,2013.
    [7]Yu D,Li H,Cassidy T,et al.RPI-BLENDER TAC-KBP2013knowledge base population system[C]//Proceedings of the Sixth Text Analysis Conference,2013.
    [8]Li Y,Zhang Y C,Li D Y,et al.PRIS at knowledge base population 2013[C]//Proceedings of the Sixth Text Analysis Conference,2013.
    [9]Soderland S,Gilmer J,Bart R,et al.Open information extraction to KBP relations in 3hours[C]//Proceedings of the Sixth Text Analysis Conference,2013.
    [10]Xu S,Zhang C X,Niu Z D,et al.BIT’s slot-filling method for TAC-KBP 2013[C]//Proceedings of the Sixth Text Analysis Conference,2013.
    [11]Min B,Li X,Grishman R,et al.New York University 2012system for KBP slot filling[C]//Proceedings of the Fifth Text Analysis Conference.2012.
    [12]Manning Christopher D,Mihai Surdeanu,John Bauer,et al.The Stanford CoreNLP Natural Language Processing Toolkit[C]//Proceedings of the52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations,2014:55-60
    [13]Rao D,McNamee P,Dredze M.Streaming cross document entity coreference resolution[C]//Proceedings of the 23rd International Conference on Computational Linguistics:Posters.Association for Computational Linguistics,2010:1050-1058.
    [14]Xu J,Croft W B.Query expansion using local and global document analysis[C]//Proceedings of the19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,1996:4-11.
    (1)https://github.com/beroth/relationfactory

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700