基于聚类与句子加权的欺骗性评论检测

英文篇名：Deceptive Comment Detection Based on Clustering and Sentence Weighting
作者：张建鑫
英文作者：ZHANG Jian-xin;College of Computer Science and Engineering,Shandong University of Science and Technology;
关键词：欺骗性评论 ; 聚类 ; 句子加权 ; 神经网络
英文关键词：deceptive review detection;;clustering;;sentence weighting;;neural network
中文刊名：RJDK
英文刊名：Software Guide
机构：山东科技大学计算机科学与工程学院;
出版日期：2019-02-15
出版单位：软件导刊
年：2019
期：v.18;No.196
语种：中文;
页：RJDK201902010
页数：4
CN：02
ISSN：42-1671/TP
分类号：40-43

摘要

消费者在购物前往往会参考产品评论,欺骗性评论容易误导顾客使其作出错误决定。现有检测欺骗性垃圾评论的方法大多采用机器学习方法,难以学习评论的潜在语义。因此提出一个基于聚类与注意力机制的神经网络模型学习评论语义表示。该模型使用基于密度峰值的快速搜索聚类算法找出词向量空间语义群,通过KL-divergence计算权重,然后综合句子中单词与单词所属的语义群得到句子表示。实验结果表明,该模型准确率达82.2%,超过现有基准,在欺骗性垃圾评论识别中具有一定使用价值。
Consumers prefer to read product reviews before shopping. Deceptive comments can easily mislead customers to make wrong decisions. Existing methods for detecting fraudulent spam comments mostly use machine learning,but it is difficult to learn the underly?ing semantics of comments. This paper proposes a neural network model based on clustering and attention mechanism to learn the se?mantic representation of comments. Specifically,this paper first makes the fast search clustering algorithm based on density peaks to find the semantic group in the word vector space,and calculates the weight by KL-divergence. Then it synthesizes the words in the sen?tence and the semantic group to which the word belongs to get the sentence representation. The experimental results show that the accu?racy of the proposed model reaches 82.2%,which exceeds the current benchmark. Therefore,it has certain value in the identification of fraudulent spam comments.

引文

[1]林政,谭松波,程学旗.基于情感关键句抽取的情感分类研究[J].计算机研究与发展,2012,49(11):2376-2382.
    [2]李素科,蒋严冰.基于情感特征聚类的半监督情感分类[J].计算机研究与发展,2013,50(12):2570-2577.
    [3] OTT M,CHOI Y,CARDIE C,et al. Finding deceptive opinion spamby any stretch of the imagination[C]. Proceedings of the 49th AnnualMeeting of the Association for Computational Linguistics Human Lan?guage Technologies,2011:309-319.
    [4]任亚峰,姬东鸿,张红斌,等.基于PU学习算法的虚假评论识别研究[J].计算机研究与发展,2015,52(3):639-648.
    [5]杜伟夫,谭松波,云晓春,等.一种新的情感词汇语义倾向计算方法[J].计算机研究与发展,2009,46(10):1713-1720.
    [5] LI J,OTT M,CARDIE C,et al. Towards a general rule for identi-fy?ing deceptive opinion spam[C]. Proceedings of the 52nd Annual Meet?ing of the Association for Computational Linguistics, 2014:1566-1576.
    [6] SHOJAEE S,MURAD M A A,AZMAN A B,et al. Detecting de?cep-tive reviews using lexical and syntactic features[C]. 2013 13thInternational Conference on Intelligent Systems Design and Applica?tions,2013:53-58.
    [7] LI F,HUANG M,YANG Y,et al. Learning to identify review spam[C].IJCAI Proceedings-International Joint Conference on Artificial In?telligence,2011:2488.
    [8] HAMMAD A S A,EL-HALEES A. An approach for detecting spam inArabic opinion reviews[J]. The International Arab Journal of Informa?tion Technology,2013,12(1):1-9.
    [9] MUKHERJEE A,VENKATARAMAN V,LIU B,et al. What yelp fakereview filter might be doing?[C]. Proceedings of the International Con?ference on Weblogs and Social Media,2013:409-418.
    [10]胡熠,陆汝占,李学宁,等.基于语言建模的文本情感分类研究[J].计算机研究与发展,2007,44(9):1469-1475.
    [11] MIKOLOV T,SUTSKEVER I,CHEN K,et al. Distributed represen?tations of words and phrases and their compositionality[C]. Interna?tional Conference on Neural Information Processing Systems,2013:3111-3119.
    [13] MNIH A,HINTON G E. A scalable hierarchical distributedlan-guage model[C]. Advances in neural information processingsys-tems,2009:1081-1088.
    [14] BENGIO Y,DUCHARME R,VINCENT P,et al. A neural probabilis?tic language model[J]. Journal of Machine Learning Research,2003,3(2):1137-1155.
    [15]赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848.
    [16]张珊,于留宝,胡长军.基于表情图片与情感词的中文微博情感分析[J].计算机科学,2012,39(Z11):146-148.
    [17] RODRIGUEZ A,LAIO A. Machine learning clustering by fast searchand find of density peaks[J]. Science,2014,344(6191):1492.
    [18]贾培灵,建聪,彭延军.一种基于簇边界的密度峰值点快速搜索聚类算法[J].南京大学学报:自然科学,2017,53(2):368-377.
    [19] OTT M. Linguistic models of deceptive opinion spam[C].The Work?shop on Computational Approaches to Subjectivity,2013:31-33.
    [20] KIM Y. Convolutional neural networks for sentence classification[DB/OL]. https://arxiv.org/abs/1408.5882.
    [21]胡新辰.基于LSTM的语义关系分类研究[D].哈尔滨:哈尔滨工业大学,2015.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700