卷积神经网络下的Twitter文本情感分析
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Sentiment Analysis of Twitter Data Based on CNN
  • 作者:王煜涵 ; 张春云 ; 赵宝林 ; 袭肖明 ; 耿蕾蕾 ; 崔超然
  • 英文作者:Wang Yuhan;Zhang Chunyun;Zhao Baolin;Xi Xiaoming;Geng Leilei;Cui Chaoran;School of Computer Science and Technology,Shandong University of Finance and Economics;Storage R &D Department,Inspur Electronic Information Industry Co.,Ltd;
  • 关键词:Twitter文本 ; 情感分析 ; 词向量模型 ; 卷积神经网络
  • 英文关键词:Twitter data;;sentiment analysis;;word embedding model;;convolutional neural network(CNN)
  • 中文刊名:SJCJ
  • 英文刊名:Journal of Data Acquisition and Processing
  • 机构:山东财经大学计算机科学与技术学院;浪潮电子信息产业股份有限公司存储研发部;
  • 出版日期:2018-09-15
  • 出版单位:数据采集与处理
  • 年:2018
  • 期:v.33;No.151
  • 基金:山东省高等学校优势学科人才团队培育计划资助项目;; 山东省自然科学杰出青年基金(JQ201316)资助项目;; 山东省自然科学基金(ZR2016FQ18)资助项目;; 山东省高等学校科技计划(J17KA065)资助项目
  • 语种:中文;
  • 页:SJCJ201805017
  • 页数:7
  • CN:05
  • ISSN:32-1367/TN
  • 分类号:157-163
摘要
随着社交网络的日益普及,基于Twitter文本的情感分析成为近年来的研究热点。Twitter文本中蕴含的情感倾向对于挖掘用户需求和对重大事件的预测具有重要意义。但由于Twitter文本短小和用户自身行为存在随意性等特点,再加之现有的情感分类方法大都基于手工制作的文本特征,难以挖掘文本中隐含的深层语义特征,因此难以提高情感分类性能。本文提出了一种基于卷积神经网络的Twitter文本情感分类模型。该模型利用word2vec方法初始化文本词向量,并采用CNN模型学习文本中的深层语义信息,从而挖掘Twitter文本的情感倾向。实验结果表明,采用该模型能够取得82.3%的召回率,比传统分类方法的分类性能有显著提高。
        With the increasing popularity of social networks,sentiment analysis based on Twitter text has become a hotspot in recent years.The sentiment tendencies contained in tweets are important for mining user needs and predicting major events.However,the existing sentiment classification methods are mostly based on hand-made text features,and it is hard to mine implicit deep semantics of texts.In addition,because of special characteristics,such as short text and arbitrariness of users′behavior,it is more difficult to improve performance of current sentiment classification.This paper presents a novel Twitter sentiment classification model based on convolutional neural network(CNN).In order to explore sentiment tendency of tweets,the proposed model utilizes a dynamic CNN architecture to learn deep semantics from tweets,which initializes input word embedding with word2 vec method.Experimental results show that our proposed model can achieve a recall rate of 82.3%,which is much higher than performances of traditional classification methods.
引文
[1] Taboada M,Brooke J,Tofiloski M,et al.Lexicon-based methods for sentiment analysis[J].Computational Linguistics,2011,37(2):267-307.
    [2] Ku L W,Wu T H,Lee L Y,et al.Construction of an evaluation corpus for opinion extraction[C]∥Proceedings of NTCIR-5Workshop Meeting.Tokyo,Japan:[s.n.],2005.
    [3] Pang B,Lee L,Vaithyanathan S.Thumbs up?:Sentiment classification using machine learning techniques[C]∥Proceedings of the ACL-02Conference on Empirical Methods in Natural Language Processing.Philadelphia,USA:Association for Computational Linguistics,2002:79-86.
    [4] Ekmekcioglu F C,Lynch M F,Willett P.Stemming and n-gram matching for term conflation in Turkish texts[J].Information Research News,1996,7(1):2-6.
    [5] Kouloumpis E,Wilson T,Moore J.Twitter sentiment analysis:The good the bad and the omg[C]∥International Conference on Weblogs and Social Media.Barcelona,Spain:[s.n.],2011,11:164.
    [6] Bouamor D,Semmar N,Zweigenbaum P.Identifying bilingual multi-word expressions for statistical machine translation[C]∥International Conference on Language Resources and Evaluation.Istanbul,Turkey:[s.n.],2012:674-679.
    [7] Kennedy A,Inkpen D.Sentiment classification of movie reviews using contextual valence shifters[J].Computational Intelligence,2006,22(2):110-125.
    [8] Xavier G,Antoine B,Yoshua B.Domain adaptation for large-scale sentiment classification:A deep learning approach[C]∥Proceedings of the 28th International Conference on Machine Learning.Washington:ACM,2011:97-110.
    [9] Paccanaro A,Hinton G E.Learning distributed representations of concepts using linear relational embedding[J].IEEE Transactions on Knowledge and Data Engineering,2001,13(2):232-244.
    [10]Tomas M,Stefan K,Lukas B,et al.Extensions of recurrent neural network language model[C]∥2011IEEE International Conference on Acoustics,Speech and Signal Processing.Prague,Czech Republic:IEEE,2011:5528-5531.
    [11]Collobert R,Weston J,Bottou L,et al.Natural language processing(almost)from scratch[J].Journal of Machine Learning Research,2011,12:2493-2537.
    [12]Hu M,Liu B.Mining and summarizing customer reviews[C]∥Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2004:168-177.
    [13]Shen Y,He X,Gao J,et al.Learning semantic representations using convolutional neural networks for web search[C]∥Proceedings of the 23rd International Conference on World Wide Web.Korea:ACM,2014:373-374.
    [14]王伟,周咏梅,阳爱民,等.基于种子词的微博表情符情感倾向判定方法[J].数据采集与处理,2017,32(1):198-204.Wang Wei,Zhou Yongmei,Yang Aimin,et al.Determination method for sentiment orientation of microblog smileys based on seed words[J].Journal of Data Acquisition and Processing,2017,32(1):198-204.
    [15]Bengio Y,Ducharme R,Vincent P,et al.A neural probabilistic language model[J].Journal of Machine Learning Research,2003,3:1137-1155.
    [16]Ronan C,Jason W.A unified architecture for natural language processing:Deep neural networks with multitask learning[C]∥Proceedings of the 25th International Conference on Machine Learning.New York,USA:ACM,2008:160-167.
    [17]Mikolov T,Sutskeer I,Chen K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26:3111-3119.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700