摘要
词义消歧是自然语言处理领域的基本任务。在词语词向量表示的基础上,计算获得多义词语上下文窗口的向量表示。利用统计的多义词及词义个数,基于K-means算法聚类文本语料集中多义词的上下文窗口表示,在原始文本语料集中对多义词语根据聚类类别进行标记。在标记的文本语料集上,训练获得多义词语每个词义的向量表示。对句子中的多义词语,给出了一种基于多义词向量表示的词义消歧方法,实验结果显示该方法有效可行。
Word sense disambiguation is a basic task in natural language process. To the original text corpus,the vector representation of polysemous word context window is calculated based on vector representation. Using statistical polysemy and the numbers of word sense,the vector representation of polysemy context window is clustered based on K-means,and the polysemous words are marked in the original text corpus. On the marked text corpus,the vector representation of polysemy' word sense is trained by using neural network language model. A word sense disambiguation method based on polysemy vector representation is presented.The experimental results show that the method is effective and feasible.
引文
[1]高雪霞,炎士涛.基于Word Net词义消歧的语义检索研究[J].湘潭大学自然科学学报,2017,39(2):118-121.
[2]郭鸿奇,李国佳.一种基于词语多原型向量表示的句子相似度计算方法[J].智能计算机与应用,2018,8(2):38-42.
[3]杨陟卓.基于上下文翻译的有监督词义消歧研究[J].计算机科学,2017,44(4):252-255,280.
[4]张春祥,徐志峰,高雪瑶.一种半监督的汉语词义消歧方法[J].西南交通大学学报,2018,3(2):http://kns.cnki.net/kcms/detail/51.1277.U.20180306.1913.006.html.
[5]杨安,李素建,李芸.基于领域知识和词向量的词义消歧方法[J].北京大学学报(自然科学版),2017,53(2):204-210.
[6]REISINGER J,MOONEY R J.Multi-prototype vector-space models of w ord meaning[C]//Proceedings of the 11thAnnual Conference of the North American Chapter of the Association for Computational Linguistics(NAACL-2010).Los Angeles:ACL,2010:109-117.
[7]HUANG E H,SOCHER R,MANNING C D,et al.Improving w ord representations via global context and multiple w ord prototypes[C]//M eeting of the Association for Computational Linguistics.Korea Jeju Island:ACM,2012:873-882.
[8]BENGIO Y,DUCHARME R,VINCENT P,et al.A neural probabilistic language model[J].Journal of M achine Learning Research,2003,3(6):1137-1155.