基于How-net的词语语义相似度算法

英文篇名：Lexical Semantic Similarity Algorithm Based on How-net
作者：马永起 ; 韩德培 ; 蒙立荣 ; 余杰 ; 程铮
英文作者：MA Yongqi;HAN Depei;MENG Lirong;YU Jie;CHENG Zheng;Institute of Computer Application,Chinese Academy of Engineering Physics;Eastern Communications Co.,Ltd.;School of Computer,National University of Defense Technology;
关键词：相似度 ; 路径长度 ; 概念相似度 ; 义原距离 ; 特征结构
英文关键词：similarity;;length of path;;concept similarity;;distance of sememe;;feature structure
中文刊名：JSJC
英文刊名：Computer Engineering
机构：中国工程物理研究院计算机应用研究所;东方通信股份有限公司;国防科技大学计算机学院;
出版日期：2018-06-15
出版单位：计算机工程
年：2018
期：v.44;No.488
语种：中文;
页：JSJC201806027
页数：5
CN：06
ISSN：31-1289/TP
分类号：157-161

摘要

对词语相似度、义原相似度和概念相似度进行研究,结合How-net义原树,提出一种计算义原相似度的算法。考虑义原节点所处的深度、义原节点间的距离以及义原节点兄弟数目,在义原相似度基础上,给出词语语义相似度算法。实验结果表明,与评论的倾向性算法和语义相似度算法相比,该算法在不增加算法复杂度的情况下,提高了词语语义相似度准确性。
On the study of lexical similarity,sememe similarity and concept similarity,this paper propose a sememe similarity computation algorithm based on the How-net semantic tree. This algorithm considers the node distance,the node depth and the number of brother node,so that gives out a lexical semantic similarity algorithm based on sememe similarity. Experimental results show that this algorithm can increase the accuracy of word semantic similarity and do not increase the complexity of algorithm compared with the tendency algorithm and semantics similarity algorithm in literature.

引文

[1]DONG Zhengdong,DONG Qiang,HAO Changling.How Net and its computation of meaning[C]//Proceedings of International Conference on Computational Linguistics:Demonstrations.Washington D.C.,USA:Association for Computational Linguistics,2010:53-56.
    [2]李峰,李芳.中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007,21(3):99-105.
    [3]蒋溢,丁优,熊安萍,等.一种基于知网的词汇语义相似度改进计算方法[J].重庆邮电大学学报(自然科学版),2009,21(4):533-537.
    [4]丁建立,慈祥,黄剑雄.网络评论倾向性分析[J].计算机应用,2010,30(11):2937-2940.
    [5]张沪寅,刘道波,温春艳.基于《知网》的词语语义相似度改进算法研究[J].计算机工程,2015,41(2):151-156.
    [6]魏弹,向阳.基于2008版《知网》的词语相似度计算方法[J].计算机工程,2015,41(9):215-219.
    [7]刘群,李素建.基于《知网》的词汇语义相似度计算[J].中文计算语言学,2002,7(2):59-76.
    [8]张亮,尹存燕,陈家骏.基于语义树的中文词语相似度计算与分析[J].中文信息学报,2010,24(6):23-30.
    [9]游春晖.基于语义情感倾向的文本相似度计算[D].成都:电子科技大学,2008.
    [10]AGIRRE E,RIGAU G.A proposal for word sense disambiguation using conceptual distance[EB/OL].[2016-11-21].https://www.researchgate.net/publicati on/1782688_A_Proposal_for_Word_Sense_Disambiguation_using_Conceptual_Distance?ev=prf_cit.
    [11]王小林,王东,杨思春,等.基于《知网》的词语语义相似度算法[J].计算机工程,2014,40(12):177-181.
    [12]LIN D.Aninformation-theoretic definition of similarity[C]//Proceedings of the 15 International Conference on Machine Learning.[S.l.]:Morgan Kaufmann Publishers Inc.,1998:296-304.
    [13]郭增新.基于语义的文本聚类算法研究[D].西安:西安电子科技大学,2012.
    [14]黄世维.互联网信息情感倾向性的研究与实现[D].西安:西安电子科技大学,2012.
    [15]江敏,肖诗斌,王弘蔚,等.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报,2008,22(5):84-89.
    [16]徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700