一种基于多义词向量表示的词义消歧方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A word sense disambiguation method based on polysemy vector representation
  • 作者:李国佳 ; 赵莹地 ; 郭鸿奇
  • 英文作者:LI Guojia;ZHAO Yingdi;GUO Hongqi;School of Information Engineering,North China University of Water Resources and Electric Power;School of Electric Pow er,North China University of Water Resources and Electric Pow er;
  • 关键词:多义词向量表示 ; K-means ; 词义消歧
  • 英文关键词:polysemy vector representation;;K-means;;word sense disambiguation
  • 中文刊名:DLXZ
  • 英文刊名:Intelligent Computer and Applications
  • 机构:华北水利水电大学信息工程学院;华北水利水电大学电力学院;
  • 出版日期:2018-08-28
  • 出版单位:智能计算机与应用
  • 年:2018
  • 期:v.8
  • 基金:华北水利水电大学2017年创新创业计划项目(2017XB136)
  • 语种:中文;
  • 页:DLXZ201804010
  • 页数:5
  • CN:04
  • ISSN:23-1573/TN
  • 分类号:57-61
摘要
词义消歧是自然语言处理领域的基本任务。在词语词向量表示的基础上,计算获得多义词语上下文窗口的向量表示。利用统计的多义词及词义个数,基于K-means算法聚类文本语料集中多义词的上下文窗口表示,在原始文本语料集中对多义词语根据聚类类别进行标记。在标记的文本语料集上,训练获得多义词语每个词义的向量表示。对句子中的多义词语,给出了一种基于多义词向量表示的词义消歧方法,实验结果显示该方法有效可行。
        Word sense disambiguation is a basic task in natural language process. To the original text corpus,the vector representation of polysemous word context window is calculated based on vector representation. Using statistical polysemy and the numbers of word sense,the vector representation of polysemy context window is clustered based on K-means,and the polysemous words are marked in the original text corpus. On the marked text corpus,the vector representation of polysemy' word sense is trained by using neural network language model. A word sense disambiguation method based on polysemy vector representation is presented.The experimental results show that the method is effective and feasible.
引文
[1]高雪霞,炎士涛.基于Word Net词义消歧的语义检索研究[J].湘潭大学自然科学学报,2017,39(2):118-121.
    [2]郭鸿奇,李国佳.一种基于词语多原型向量表示的句子相似度计算方法[J].智能计算机与应用,2018,8(2):38-42.
    [3]杨陟卓.基于上下文翻译的有监督词义消歧研究[J].计算机科学,2017,44(4):252-255,280.
    [4]张春祥,徐志峰,高雪瑶.一种半监督的汉语词义消歧方法[J].西南交通大学学报,2018,3(2):http://kns.cnki.net/kcms/detail/51.1277.U.20180306.1913.006.html.
    [5]杨安,李素建,李芸.基于领域知识和词向量的词义消歧方法[J].北京大学学报(自然科学版),2017,53(2):204-210.
    [6]REISINGER J,MOONEY R J.Multi-prototype vector-space models of w ord meaning[C]//Proceedings of the 11thAnnual Conference of the North American Chapter of the Association for Computational Linguistics(NAACL-2010).Los Angeles:ACL,2010:109-117.
    [7]HUANG E H,SOCHER R,MANNING C D,et al.Improving w ord representations via global context and multiple w ord prototypes[C]//M eeting of the Association for Computational Linguistics.Korea Jeju Island:ACM,2012:873-882.
    [8]BENGIO Y,DUCHARME R,VINCENT P,et al.A neural probabilistic language model[J].Journal of M achine Learning Research,2003,3(6):1137-1155.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700