基于词向量的无监督词义消歧方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Unsupervised Word Disambiguation Method Based on Word Embeddings
  • 作者:吕晓伟 ; 章露露
  • 英文作者:LV Xiao-wei;ZHANG Lu-lu;Faculty of Information Engineering and Automation,Kunming University of Science and Technology;
  • 关键词:词义消歧 ; 词向量 ; 自然语言处理 ; 机器翻译 ; Word2vec
  • 英文关键词:word sense disambiguation;;word embedding;;natural language processing;;machine translation;;Word2vec
  • 中文刊名:RJDK
  • 英文刊名:Software Guide
  • 机构:昆明理工大学信息工程与自动化学院;
  • 出版日期:2018-07-17 13:35
  • 出版单位:软件导刊
  • 年:2018
  • 期:v.17;No.191
  • 语种:中文;
  • 页:RJDK201809047
  • 页数:3
  • CN:09
  • ISSN:42-1671/TP
  • 分类号:197-199
摘要
词义消歧在多个领域有重要应用。基于Lesk及其改进算法是无监督词义消歧研究的典型代表,但现有算法多基于上下文与义项词覆盖,通常未考虑上下文中词与歧义词的距离影响。为此提出一种基于词向量的词义消歧方法,利用向量表示上下文以及义项,并考虑融合上下文与义项的语义相似度及义项分布频率进行词义消歧。在Senseval-3数据集上测试,结果表明,该方法能有效实现词义消歧
        Word sense disambiguation have important applications in many fields.Lesk algorithm and its improved algorithm are typical representatives of unsupervised word-sense disambiguation.However,most of the existing algorithms are mostly based on word coverage of context and gloss.In addition,the effect of distance between ambiguous words and word in context is not considered.This paper proposes a method of word-sense disambiguation based on word vectors,which uses vectors to represent contexts and gloss and also considers combined semantic similarity between context and gloss with the distribution frequency of gloss.The test results on the Senseval-3 dataset show that this method can effectively achieve word-sense disambiguation.
引文
[1]NAVIGLI R.Word sense disambiguation:asurvey[J].ACM Computing Surveys,2009,42(2):1-69.
    [2]AGIRRE E,EDMONNDS P.Word sense disambiguation[J].Algorithm and Application,2007(10):1-28.
    [3]董振东,董强.知网和汉语研究[J].当代语言学,2001,3(1):33-44.
    [4]FELLBAUM C.WordNet:An electronic lexical database[M].Cambridge:MIT press,1998.
    [5]LESK M.Automatic sense disambiguation using machine readable dictionaries:how to tell a pine cone from an ice cream cone[C].Proceedings of the 5th Annual International Conference on Systems Documentation,1986:24-26.
    [6]蒋振超,李丽双,黄德根,等.基于词语关系的词向量模型[J].中文信息学报,2017,31(3):25-31.
    [7]BASILE P,CAPUTO A,SEMERARO G.An enhanced Lesk word sense disambiguation algorithm through a distributional semantic model[C].International Conference on Roceedings of Coling,2014.
    [8]KILGARRIFF A,ROSENZWEING J.Framework and Results for English SENSEVAL[J].Computers and the Humanities,2000,34(1-2):15-48.
    [9]BANERJEE S,PEDERSEN T.An adapted Lesk algorithm for word sense disambiguation using WordNet[J].Computational Linguistics and Intelligent Text Processing,2002(2276)136-145.
    [10]王永生.基于改进的Lesk算法的词义排歧算法[J].微型机与应用,2013(24):69-71.
    [11]BASILE P,CAPUTO A,SEMERARO G.An enhanced Lesk word sense disambiguation algorithm through a distributional semantic model[C].International Conference on Roceedings of Coling,2014.
    [12]HINTON G E.Learning distributed representation of concepts.[C].Proceedings of CogSci,1986:1-12.
    [13]杨安,李素建,李芸.基于领域知识和词向量的词义消歧方法[J].北京大学学报:自然科学版,2017,53(2):204-210.
    [14]TAGHIPOUR K,NG H T.Semi-supervised word sense disambiguation using word embeddings in general and specific domains[J].The 2015Annual Conference of the North American Chapter of the ACL,2015(5):314-323.
    [15]BASILE P,CAPUTO A,SEMERARO G.An enhanced Lesk word sense disambiguation algorithm through a distributional semantic model[C].International Conference on Roceedings of Coling,2014.
    [16]周练.Word2vec的工作原理及应用探究[J].图书情报导刊,2015(2):145-148.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700