基于词间距和点互信息的影评情感词库构建
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Building of sentimental word library about movie comments based on the word spacing and Mutual Information
  • 作者:王侨云 ; 朱广丽 ; 张顺香
  • 英文作者:WANG Qiaoyun;ZHU Guangli;ZHANG Shunxiang;School of Computer Science and Engineering, Anhui University of Science & Technolgy;
  • 关键词:情感分析 ; 影评 ; 情感词库 ; 点互信息语义相似度
  • 英文关键词:sentiment analysis;;film comments;;sentimental word library;;mutual information semantic similarity
  • 中文刊名:FYSZ
  • 英文刊名:Journal of Fuyang Normal University(Natural Science)
  • 机构:安徽理工大学计算机科学与工程学院;
  • 出版日期:2019-06-13
  • 出版单位:阜阳师范学院学报(自然科学版)
  • 年:2019
  • 期:v.36;No.120
  • 基金:安徽省自然科学基金面上项目(1908085MF189);; 安徽高校拔尖人才培育项目(gxbjZD15)资助
  • 语种:中文;
  • 页:FYSZ201902010
  • 页数:7
  • CN:02
  • ISSN:34-1069/N
  • 分类号:44-50
摘要
在线影评的情感词能够直观的表达观众的电影观后感,已成为情感分析研究的热点之一。如何针对海量纷繁的影评数据建立领域特有的情感词库来提高影评情感分析准确性是当前影评情感分析亟待解决的问题。本文提出了一种新的基于词间距和点互信息的中文影评情感词库构建方法。该方法首先结合影评语料和基础词典利用K-means++聚类选择出情感倾向明显的正负面种子词集;再利用词间距和点互信息(Distance of Word Point-wise Mutual Information,DW-PMI)算法计算出影评领域词与种子词的语义相似度,得到影评领域情感词表;最后将影评领域情感词表加入基础情感词典构建出中文影评情感词库。实验结果证明所构建的词库可显著提高中文影评情感分析的准确度。
        Sentiment words in the online comments can directly express the audience viewing experience, which has become one of hot topics in the study of sentiment analysis. How to set up special library based on massive amounts of numerous and complicated sentimental words in the field of film data, how to improve the film sentiment analysis accuracy are considered to be the urgent problem that the current sentiment analysis have to solve. This paper proposes a method to form a Chinese movie comments sentimental dictionary, based on mutual information and word spacing points. This method firstly combined with corpus and dictionary comments; K-means++ clustering method is used to select the positive and negative affection tendency seed word; Then use the improved Mutual Information based on the Distance between the Word and Point(DW-PMI) algorithm to calculate the film field word and the seed word semantic similarity, get sentimental vocabulary of comments field; Finally add the sentimental vocabulary to reviews field basic sentimental lexicon, so as to build Chinese film comment sentimental words library. The experimental results prove that the constructed thesaurus can significantly improve the accuracy of Chinese film sentiment analysis.
引文
[1]王银,张顺香.微博话题中的情感分析研究[J].阜阳师范学院学报(自然科学版),2017,34(2):50-56.
    [2]周胜臣,翟文婷,石英子,等.中文微博情感分析研究综述[J].计算机应用与软件.2013,30(3):161-164,181.
    [3]热西旦木·吐尔洪太,吾守尔·斯拉木,伊尔夏提·吐尔贡.词典与机器学习方法相结合的维吾尔语文本情感分析[J].中文信息学报,2017(1):182-188,196.
    [4]CHEN T,XU R F,HE Y L.Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN[J].Expert Systems With Applications,2017,72:221-230.
    [5]徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情报学报,2008,27(2):180-185.
    [6]何炎祥,孙松涛,牛菲菲,等.用于微博情感分析的一种情感语义增强的深度学习模型[J].计算机学报,2017(4):773-790.
    [7]WU F Z,HUANG Y F,SONG Y Q,et al.Towards building a high-quality microblog-specific Chinese senti-ment lexicon[J].Decision Support Systems,2016(87):39-49.
    [8]HU Z,HU J,DING W,et al.Review sentiment analysis based on deep learning[C]//2015 IEEE 12th International Conference on e-Business Engineering.IEEE,2015.
    [9]王勇,吕学强,姬连春,等.基于极性词典的中文微博客情感分类[J].计算机应用与软件,2014(1):34-37.
    [10]ZHANG S X,WEI Z L,WANG Y,et al.Sentiment analysia of Chinese micro-blog text based on extended sentiment dictionary[J].Future Generation Computer Systems,2018,81:395-403.
    [11]PARK S,KIM Y.Building thesaurus lexicon using dictionary-based approach for sentiment classification[C]//IEEE International Conference on Software Engineering Research.IEEE,2016.
    [12]谢松县,赵舒怡.一种基于混合特征的中文情感词典扩展方法[J].计算机工程与科学,2016,38(7):1502-1509.
    [13]马秉楠,黄永峰,邓北星.基于表情符的社交网络情绪词典构造[J].计算机工程与设计,2016,37(5):1129-1133.
    [14]朱琳琳,徐健.网络评论情感分析关键技术及应用研究[J].情报理论与实践,2017(01):125-130,135.
    [15]周咏梅,阳爱民,林江豪.中文微博情感词典构建方法[J].山东大学学报(工学版),2014,44(3):36-40.
    [16]秦锋,王恒,郑啸,等.基于上下文语境的微博情感分析[J].计算机工程,2017(3):241-252.
    [17]FANG Y,TAN H,ZHANG J.Multi-strategy sentiment analysis of consumer reviews based on semantic fuzziness[J].IEEE Access,2018:1-1.
    [18]DO H J,CHOI H J.Korean twitter emotion classification using automatically built emotion lexicons and fine-grained features[J].Shanghai Management Science,2015:142-150.
    [19]王志涛,於志文,郭斌等.基于词典和规则集的中文微博情感分析[J].计算机工程与应用,2015,51(8):218-225.
    [20]张筱丹,胡学钢.基于Web的新闻文本自动摘要研究[J].阜阳师范学院学报(自然科学版),2009,26(1):41-43.
    [21]朱新华,马润聪,孙柳,等.基于知网与词林的词语语义相似度计算[J].中文信息学报,2016,30(4):29-36.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700