维基百科语义背景知识的共指消解研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Coreference Resolution Research with the Semantic Background Knowledge of Wikipedia
  • 作者:张贵军
  • 英文作者:Zhang Guijun;School of economics and information, Shanxi province finance and Taxation College;
  • 关键词:突发事件 ; 共指消解 ; 维基百科 ; 语义特征 ; 最大熵模型
  • 英文关键词:Paroxysmal event;;co reference resolution;;semantic features;;Wikipedia;;maximum entropy model
  • 中文刊名:HBYD
  • 英文刊名:Information & Communications
  • 机构:山西省财政税务专科学校,经济信息学院;
  • 出版日期:2018-01-15
  • 出版单位:信息通信
  • 年:2018
  • 期:No.181
  • 语种:中文;
  • 页:HBYD201801002
  • 页数:5
  • CN:01
  • ISSN:42-1739/TN
  • 分类号:9-13
摘要
文章以突发事件新闻语料为研究背景,深度挖掘维基百科作为消解的背景语义知识,提炼出四类基于维基百科的特征,分别是解释网页内容的特征、同义词的特征、链接图的特征和分类图的特征。在标注的20万字语料上进行训练与测试,经过实验测试,证明将维基百科引入突发事件共指消解是一个有效的方法,系统F值为66.7%,其中,基于维基百科链接图的特征对系统贡献最大。利用爬山算法的SBS算法做特征选择,在剔除掉7个特征之后,使得系统F值提升了3.58%。
        In this paper we mainly took emergency news corpus as the research background, With the help of a deep excavation of Wikipedia's semantic knowledge, we had extracted four categories of characteristics based on the Wikipedia, which respectively is the Wikipedia page text features, the hyperlink graph features and categories graph features. We trained and tested on the tagging 200000 Chinese characters corpus. Through experimental tests, introducing Wikipedia to co reference resolution was an effective method and F value of system was 66.7%, the hyperlink graph features made the largest contribution to the system.We selected mountain climbing algorithm(SBS algorithm) as feature selection to weed out the seven characteristics, which made the F value of system to increase by 3.58%.
引文
[1]Vincent Ng.Shallow Semantics for Coreference Resolution[C].//Proc of the Twentieth International Joint Conference on Artificial Intelligence,Hyderabad,India:AAAI,2007:1689-1694.
    [2]王海东,胡乃全,孔芳,等.指代消解中语义角色特征的研究[J].中文信息学报,2009,23(1):23-29.
    [3]Yoshioka M.Analysis of Japanese Wikipedia Category for Constructing Wikipedia Ontology and Semantic Similarity Measure[M]//Information Retrieval Technology.Springer International Publishing,2014:470-481.
    [4]Ponzetto,Strube.Exploiting semantic role labeling WorldNet and Wikipedia for coreference resolution[C].//Proc of NAACL,Rochester:ACL 2006:192-199.
    [5]Yang X,Su J.Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns[C].//Proc of the 45th Annual meeting of the Association for Computational Linguistics.Prague,Czech Republic:ACL,2007:528-535.
    [6]庞宁,杨尔弘.多种语义特征在突发事件新闻中的共指消解研究[J].中文信息学报,2014,28(1):26-32.
    [7]高俊伟,孔芳,朱巧明.语料对中文名词短语指代消解影响研究[J].中文信息学报,2013,3(1):61-68.
    [8]董国志,朱玉全,程显毅.中文人称代词指代消解的研究[J].计算机应用研究,2011,5(1),1774-1779.
    [9]费仲超,周雅倩,黄萱菁,等.口语对话中的代词指代消解[J].软件学报,2011,22(2):233-244.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700