摘要
词义消歧在多个领域有重要应用。基于Lesk及其改进算法是无监督词义消歧研究的典型代表,但现有算法多基于上下文与义项词覆盖,通常未考虑上下文中词与歧义词的距离影响。为此提出一种基于词向量的词义消歧方法,利用向量表示上下文以及义项,并考虑融合上下文与义项的语义相似度及义项分布频率进行词义消歧。在Senseval-3数据集上测试,结果表明,该方法能有效实现词义消歧。
Word sense disambiguation have important applications in many fields.Lesk algorithm and its improved algorithm are typical representatives of unsupervised word-sense disambiguation.However,most of the existing algorithms are mostly based on word coverage of context and gloss.In addition,the effect of distance between ambiguous words and word in context is not considered.This paper proposes a method of word-sense disambiguation based on word vectors,which uses vectors to represent contexts and gloss and also considers combined semantic similarity between context and gloss with the distribution frequency of gloss.The test results on the Senseval-3 dataset show that this method can effectively achieve word-sense disambiguation.
引文
[1]NAVIGLI R.Word sense disambiguation:asurvey[J].ACM Computing Surveys,2009,42(2):1-69.
[2]AGIRRE E,EDMONNDS P.Word sense disambiguation[J].Algorithm and Application,2007(10):1-28.
[3]董振东,董强.知网和汉语研究[J].当代语言学,2001,3(1):33-44.
[4]FELLBAUM C.WordNet:An electronic lexical database[M].Cambridge:MIT press,1998.
[5]LESK M.Automatic sense disambiguation using machine readable dictionaries:how to tell a pine cone from an ice cream cone[C].Proceedings of the 5th Annual International Conference on Systems Documentation,1986:24-26.
[6]蒋振超,李丽双,黄德根,等.基于词语关系的词向量模型[J].中文信息学报,2017,31(3):25-31.
[7]BASILE P,CAPUTO A,SEMERARO G.An enhanced Lesk word sense disambiguation algorithm through a distributional semantic model[C].International Conference on Roceedings of Coling,2014.
[8]KILGARRIFF A,ROSENZWEING J.Framework and Results for English SENSEVAL[J].Computers and the Humanities,2000,34(1-2):15-48.
[9]BANERJEE S,PEDERSEN T.An adapted Lesk algorithm for word sense disambiguation using WordNet[J].Computational Linguistics and Intelligent Text Processing,2002(2276)136-145.
[10]王永生.基于改进的Lesk算法的词义排歧算法[J].微型机与应用,2013(24):69-71.
[11]BASILE P,CAPUTO A,SEMERARO G.An enhanced Lesk word sense disambiguation algorithm through a distributional semantic model[C].International Conference on Roceedings of Coling,2014.
[12]HINTON G E.Learning distributed representation of concepts.[C].Proceedings of CogSci,1986:1-12.
[13]杨安,李素建,李芸.基于领域知识和词向量的词义消歧方法[J].北京大学学报:自然科学版,2017,53(2):204-210.
[14]TAGHIPOUR K,NG H T.Semi-supervised word sense disambiguation using word embeddings in general and specific domains[J].The 2015Annual Conference of the North American Chapter of the ACL,2015(5):314-323.
[15]BASILE P,CAPUTO A,SEMERARO G.An enhanced Lesk word sense disambiguation algorithm through a distributional semantic model[C].International Conference on Roceedings of Coling,2014.
[16]周练.Word2vec的工作原理及应用探究[J].图书情报导刊,2015(2):145-148.