基于虚假评论识别的微博评论情感分析的研究与应用

英文篇名：SENTIMENTAL ANALYSIS OF WEIBO COMMENTS BASED ON FAKE COMMENTS RECOGNITION AND ITS APPLICATION
作者：罗昌银 ; 但唐朋 ; 李艳红 ; 陈昌昊 ; 王泰
英文作者：Luo Changyin;Dan Tangpeng;Li Yanhong;Chen Changhao;Wang Tai;School of Computer,Central China Normal University;School of Computer Science,South-Central University For Nationalities;National Engineering Research Center for E-Learning,Central China Normal University;
关键词：机器学习 ; 情感分析 ; 自然语言处理 ; 虚假评论识别 ; PU学习算法
英文关键词：Machine learning;;Sentimental analysis;;Natural language processing;;Fake comments recognition;;Positive and unlabeled learning
中文刊名：JYRJ
英文刊名：Computer Applications and Software
机构：华中师范大学计算机学院;中南民族大学计算机科学学院;华中师范大学国家数字化学习工程技术研究中心;
出版日期：2019-04-12
出版单位：计算机应用与软件
年：2019
期：v.36
基金：国家自然科学基金项目(61309002);; 湖北省自然科学基金项目(2017CFB135);; 中央高校基金项目(CCNU18QN017,CZZ17003)
语种：中文;
页：JYRJ201904008
页数：8
CN：04
ISSN：31-1260/TP
分类号：61-68

摘要

微博作为时下热门的社交网络平台,针对其所产生的评论文本进行情感分析已经成为人工智能领域的一个研究热点。考虑到虚假评论会降低情感分析的准确度,从评论用户的状态和行为出发,提出一种基于用户状态与行为的可信度评价体系,用于提取虚假评论特征。结合该特征与PU(Positive and unlabeled)学习算法进行虚假评论识别;运用SVM分类器和随机梯度下降回归模型对去除虚假评论的文本进行主观句分类与情感分析。实验表明,进行虚假评论识别后的情感分析准确率、召回率分别达到0.88和0.89,比传统方法具有更高的分析效能。
As a popular social network platform nowadays, sentimental analysis of comments text generated by Weibo has become a hot research topic in the field of artificial intelligence. Considering that fake comments could reduce the accuracy of sentimental analysis, this paper proposed a credibility evaluation system based on users' status and behavior to extract the features of fake comments. Combining this feature with PU learning, fake comments were identified. We used SVM classifier and stochastic gradient descent regression model to classify subjective sentences and analyze sentiments of texts that remove fake comments. Experiments show that the accuracy and recall rates of sentimental analysis after fake comments recognition are 0.88 and 0.89 respectively, which have higher analysis efficiency than traditional methods.

引文

[1] 任亚峰, 姬东鸿, 张红斌, 等. 基于 PU 学习算法的虚假评论识别研究[J]. 计算机研究与发展, 2015, 52(3): 639-648.
    [2] Kasabov N K. NeuCube: A spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data[J]. Neural Networks, 2014, 52: 62-76.
    [3] Kamps J, Marx M, Mokken R J, et al. Using WordNet to measure semantic orientations of adjectives[C]//Proceedings of the 4th International Conference on Language Resources and Evaluation, 2004: 1115-1118.
    [4] Grabner H, Bischof H. On-line boosting and vision[C]//Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2006: 260-267.
    [5] 梁军,柴玉梅,原慧斌,等.基于深度学习的微博情感分析[J]. 中文信息学报, 2014, 28(5): 155-161.
    [6] Jindal N, Liu B. Opinion spam and analysis[C]//Proceedings of the 2008 international conference on web search and data mining. ACM, 2008: 219-230.
    [7] Li J, Ott M, Cardie C, et al. Towards a general rule for identifying deceptive opinion spam[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014, 1: 1566-1576.
    [8] Lau R Y K, Liao S Y, Kwok R C W, et al. Text mining and probabilistic language modeling for online review spam detecting[J]. ACM Transactions on Management Information Systems, 2011, 2(4): 1-30.
    [9] Ren Y, Ji D, Zhang H. Positive unlabeled learning for deceptive reviews detection[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 488-498.
    [10] Fusilier D H, Montes-y-Gómez M, Rosso P, et al. Detecting positive and negative deceptive opinions using PU-learning[J]. Information processing & management, 2015, 51(4): 433-443.
    [11] Mukherjee A, Venkataraman V, Liu B, et al. What yelp fake review filter might be doing?[C]//Proceedings of the 7th International Conference on Weblogs and Social Media.Palo Alto:AAAI Press,2013:409-418.
    [12] Hammad A S A, El-Halees A. An approach for detecting spam in arabic opinion reviews[J]. The International Arab Journal of Information Technology, 2013, 12.
    [13] Li F, Huang M, Yang Y, et al. Learning to identify review spam[C]//Proceedings of the Twenty-Second international joint conference on Artificial Intelligence. AAAI Press, 2011: 2488-2493.
    [14] Ott M, Choi Y, Cardie C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011: 309-319.
    [15] 张艳丰,李贺,彭丽徽,等.基于情感语义特征抽取的在线评论有用性分类算法与应用[J].数据分析与知识发现, 2017(12): 74-83.
    [16] 李婷婷,姬东鸿.基于SVM和CRF多特征组合的微博情感分析[J].计算机应用研究,2015, 32(4): 978-981.
    [17] 李永忠,胡思琪.基于HowNet和PAT树的网购评语情感分析[J].图书情报研究,2016,9(3): 66-70.
    [18] Asfaram A, Ghaedi M, Azqhandi M H A, et al. Statistical experimental design, least squares-support vector machine (LS-SVM) and artificial neural network (ANN) methods for modeling the facilitated adsorption of methylene blue dye[J]. RSC Advances, 2016, 6(46): 40502-40516.
    [19] 张瑾.基于改进TF-IDF算法的情报关键词提取方法[J]. 情报杂志,2014, 33(4): 153-155.
    [20] 李璐旸,秦兵,刘挺.虚假评论检测研究综述[J].计算机学报,2018,41(4):946-968.