基于双重注意力模型的微博情感分析方法

英文篇名：Microblog sentiment analysis method based on a double attention model
作者：张仰森 ; 郑佳 ; 黄改娟 ; 蒋玉茹
英文作者：ZHANG Yangsen;ZHENG Jia;HUANG Gaijuan;JIANG Yuru;Institute of Intelligent Information Processing,Beijing Information Science and Technology University;
关键词：情感分析 ; 双重注意力模型 ; 微博 ; 语义表示 ; 情感符号
英文关键词：sentiment analysis;;double attention model;;microblog;;semantic representation;;emotion symbol
中文刊名：QHXB
英文刊名：Journal of Tsinghua University(Science and Technology)
机构：北京信息科技大学智能信息处理研究所;
出版日期：2018-02-15
出版单位：清华大学学报(自然科学版)
年：2018
期：v.58
基金：国家自然科学基金资助项目(61370139,61772081,61602044)
语种：中文;
页：QHXB201802002
页数：9
CN：02
ISSN：11-2223/N
分类号：12-20

摘要

微博情感分析是获取微博用户观点的基础。该文针对现有大多数情感分析方法将深度学习模型与情感符号相剥离的现状,提出了一种基于双重注意力模型的微博情感分析方法。该方法利用现有的情感知识库构建了一个包含情感词、程度副词、否定词、微博表情符号和常用网络用语的微博情感符号库;采用双向长短记忆网络模型和全连接网络,分别对微博文本和文本中包含的情感符号进行编码;采用注意力模型分别构建微博文本和情感符号的语义表示,并将两者的语义表示进行融合,以构建微博文本的最终语义表示;基于所构建的语义表示对情感分类模型进行训练。该方法通过将注意力模型和情感符号相结合,有效增强了对微博文本情感语义的捕获能力,提高了微博情感分类的性能。基于自然语言处理与中文计算会议(NLPCC)微博情感测评公共数据集,对所提出的模型进行评测,结果表明:该模型在多个情感分类任务中都取得了最佳效果,相对于已知最好的模型,在2013年的数据集上,宏平均和微平均的F1值分别提升了1.39%和1.26%,在2014年的数据集上,宏平均和微平均的F1值分别提升了2.02%和2.21%。
Microblog sentiment analysis is used to get a user's point of view.Most sentiment analysis methods based on deep learning models do not use emotion symbols.This study uses a double attention model for microblog sentiment analysis that first constructs a microblog emotion symbol knowledge base based on existing emotional semantic resources including emotion words,degree adverbs,negative words,microblog emoticons and common Internet slang.Then,bidirectional long short-term memory and a full connection network are used to encode the microblog text and the emotion symbols in the text.After that,an attention model is used to construct the semantic representations of the microblog text and emotion symbols which are combined to construct the final semantic expression of the microblog text.Finally,the emotion classification model is trained on these semantic representations.The combined attention model and emotion symbols enhance the ability to capture the emotions and improve the microblog sentiment classification.This model gives the best accuracy for many sentiment classification tasks on the Natural Language Processing and Chinese Computing(NLPCC)microblog sentiment analysis task datasets.Tests on the2013 and 2014 NLPCC datasets give F1-scores for the macro and micro averages that are 1.39% and 1.26% higher than the known best model for the 2013 dataset and 2.02% and 2.21% higher for the 2014 dataset.

引文

[1]丁兆云,贾焰,周斌.微博数据挖掘研究综述[J].计算机研究与发展,2014,51(4):691-706.DING Z Y,JIA Y,ZHOU B.Survey of data mining for microblogs[J].Journal of Computer Research and Development,2014,51(4):691-706.(in Chinese)
    [2]TABOADA M,BROOKE J,TOFILOSKI M,et al.Lexicon-based methods for sentiment analysis[J].Computational Linguistics,2011,37(2):267-307.
    [3]WIEBE J,WILSON T,CARDIE C.Annotating expressions of opinions and emotions in language[J].Language Resources and Evaluation,2005,39(2):165-210.
    [4]PANG B,LEE L.Opinion mining and sentiment analysis[J].Foundations and Trends in Information Retrieval,2008,2(1/2):1-135.
    [5]PANG B,LEE L,VAITHYANATHAN S.Thumbs up?:Sentiment classification using machine learning techniques[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,USA,2002:79-86.
    [6]KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:1746-1751.
    [7]ZHANG Y S,JIANG Y R,TONG Y X.Study of sentiment classification for Chinese microblog based on recurrent neural network[J].Chinese Journal of Electronics,2016,25(4):601-607.
    [8]WANG J,YU L C,LAI K R,et al.Dimensional sentiment analysis using a regional CNN-LSTM model[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics.Berlin,Germany,2016:225-230.
    [9]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[C]//Proceedings of the International Conference on Learning Representations.San Diego,USA,2015.
    [10]LUONG M T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal,2015:1412-1421.
    [11]HERMANN K M,TOMS K,GREFENSTETTE E,et al.Teaching machines to read and comprehend[C]//Proceedings of the International Conference on Neural Information Processing Systems.Montreal,Canada,2015:1693-1701.
    [12]RAFFEL C,ELLIS D P W.Feed-forward networks with attention can solve some long-term memory problems[C]//Proceedings of the International Conference on Learning Representations.San Juan,Puerto Rico,2016.
    [13]栗雨晴,礼欣,韩煦,等.基于双语词典的微博多类情感分析方法[J].电子学报,2016,44(9):2068-2073.LI Y Q,LI X,HAN X,et al.A bilingual lexicon-based multi-class semantic orientation analysis for microblogs[J].Acta Electronica Sinica,2016,44(9):2068-2073.(in Chinese)
    [14]BARBOSA L,FENG J.Robust sentiment detection on Twitter from biased and noisy data[C]//Proceedings of the International Conference on Computational Linguistics.Beijing,2010:36-44.
    [15]JIANG F,LIU Y Q,LUAN H B,et al.Microblog sentiment analysis with emoticon space model[J].Journal of Computer Science and Technology,2015,30(5):1120-1129.
    [16]何炎祥,孙松涛,牛菲菲,等.用于微博情感分析的一种情感语义增强的深度学习模型[J].计算机学报,2017,40(4):773-790.HE Y X,SUN S T,NIU F F,et al.A deep learning model enhanced with emotion semantics for microblog sentiment analysis[J].Chinese Journal of Computer,2017,40(4):773-790.(in Chinese)
    [17]SONG K,FENG S,GAO W,et al.Build emotion lexicon from microblogs by combining effects of seed words and emoticons in a heterogeneous graph[C]//Proceedings of the ACM Conference on Hypertext and Social Media.New York,USA,2015:283-292.
    [18]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,2012,9(8):1735-1780.
    [19]GRAVES A,JAITLY N,MOHAMED A R.Hybrid speech recognition with deep bidirectional LSTM[C]//Proceedings of the Automatic Speech Recognition and Understanding.Olomouc,Czech Republic,2013:273-278.
    [20]徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情报学报,2008,27(2):180-185.XU L H,LIN H F,PAN Y,et al.Constructing the affective lexicon ontology[J].Journal of the China Society for Scientific and Technical Information,2008,27(2):180-185.(in Chinese)
    [21]BERMINGHAM A,SMEATON A F.Classifying sentiment in microblogs:Is brevity an advantage?[C]//Proceedings of the ACM International Conference on Information and Knowledge Management.Toronto,Canada,2010:1833-1836.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700