摘要
文章以突发事件新闻语料为研究背景,深度挖掘维基百科作为消解的背景语义知识,提炼出四类基于维基百科的特征,分别是解释网页内容的特征、同义词的特征、链接图的特征和分类图的特征。在标注的20万字语料上进行训练与测试,经过实验测试,证明将维基百科引入突发事件共指消解是一个有效的方法,系统F值为66.7%,其中,基于维基百科链接图的特征对系统贡献最大。利用爬山算法的SBS算法做特征选择,在剔除掉7个特征之后,使得系统F值提升了3.58%。
In this paper we mainly took emergency news corpus as the research background, With the help of a deep excavation of Wikipedia's semantic knowledge, we had extracted four categories of characteristics based on the Wikipedia, which respectively is the Wikipedia page text features, the hyperlink graph features and categories graph features. We trained and tested on the tagging 200000 Chinese characters corpus. Through experimental tests, introducing Wikipedia to co reference resolution was an effective method and F value of system was 66.7%, the hyperlink graph features made the largest contribution to the system.We selected mountain climbing algorithm(SBS algorithm) as feature selection to weed out the seven characteristics, which made the F value of system to increase by 3.58%.
引文
[1]Vincent Ng.Shallow Semantics for Coreference Resolution[C].//Proc of the Twentieth International Joint Conference on Artificial Intelligence,Hyderabad,India:AAAI,2007:1689-1694.
[2]王海东,胡乃全,孔芳,等.指代消解中语义角色特征的研究[J].中文信息学报,2009,23(1):23-29.
[3]Yoshioka M.Analysis of Japanese Wikipedia Category for Constructing Wikipedia Ontology and Semantic Similarity Measure[M]//Information Retrieval Technology.Springer International Publishing,2014:470-481.
[4]Ponzetto,Strube.Exploiting semantic role labeling WorldNet and Wikipedia for coreference resolution[C].//Proc of NAACL,Rochester:ACL 2006:192-199.
[5]Yang X,Su J.Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns[C].//Proc of the 45th Annual meeting of the Association for Computational Linguistics.Prague,Czech Republic:ACL,2007:528-535.
[6]庞宁,杨尔弘.多种语义特征在突发事件新闻中的共指消解研究[J].中文信息学报,2014,28(1):26-32.
[7]高俊伟,孔芳,朱巧明.语料对中文名词短语指代消解影响研究[J].中文信息学报,2013,3(1):61-68.
[8]董国志,朱玉全,程显毅.中文人称代词指代消解的研究[J].计算机应用研究,2011,5(1),1774-1779.
[9]费仲超,周雅倩,黄萱菁,等.口语对话中的代词指代消解[J].软件学报,2011,22(2):233-244.