微博演化网络的负信息分类方法

英文篇名：Micro Blog Evolutionary Network to Classification Method of Negative Information
作者：赵一 ; 何克清 ; 李昭 ; 黄贻望
英文作者：ZHAO Yi;HE Keqing;LI Zhao;HUANG Yiwang;State Key Laboratory of Software Engineering, Computer School, Wuhan University;College of Computer and Information Technology, Three Gorges University;
关键词：序列最小优化(SMO) ; 支持向量机(SVM) ; 演化网络 ; UCI数据集 ; 负信息
英文关键词：sequential minimal optimization(SMO);;support vector machine(SVM);;evolutionary network;;UCI data set;;negative information
中文刊名：KXTS
英文刊名：Journal of Frontiers of Computer Science and Technology
机构：武汉大学计算机学院软件工程国家重点实验室;三峡大学计算机与信息技术学院;
出版日期：2015-10-30 16:18
出版单位：计算机科学与探索
年：2017
期：v.11;No.100
基金：国家重点基础研究发展计划(973计划)~~
语种：中文;
页：KXTS201701011
页数：8
CN：01
ISSN：11-5602/TP
分类号：96-103

摘要

针对Sina微博博文的转发关系,建立起用户转发博文之间的演化网络,从而利用SMO SVM(sequential minimal optimization support vector machine)分类算法对博文进行分类,筛选出恶意博文、垃圾广告、垃圾营销信息,使用户能够精确地屏蔽不想要的博文和博主。第一步基于微博转发关系的演化网络和SVM分类算法对整个Sina微博进行分类;第二步利用复杂网络等技术对经常发送恶意广告的博主进行标注,从而在网络中对他们进行屏蔽;最后找出垃圾信息的来源以及分辨出博主是不是恶意转发者,在宏观上能更好地遏制垃圾信息的传播。与用户从UCI数据集中实际反馈情况进行比较,实验结果表明,机器学习分类的实验结果吻合度达到89%。
Aiming at the relationship of the Sina micro blogging, this paper establishes the evolving network by users transmit blog, which classifies blog by SMO SVM(sequential minimal optimization support vector machine) algorithm, and implements the classification of malicious posts, spam, trash marketing information. The method enables users to accurately block the unwanted posts and blogger. The first step, classifying the entire Sina micro blogs based on the evolving network of transmit relationship and SVM classification algorithm; The second step, annotating the bloggers of often sending malicious advertisements by using the complex network technology; When the malicious bloggers sending message, blocking them in the network; Finally, finding out the source of spam, and discerning the blogger malicious or not, on the macro to better curb the spread of spam. The results of this paper are compared with user feedback actual situation from the UCI data set, the experimental results of machine learning classification reaches 89%.

引文

[1]Bowles C.Twitter Core data library team Hadoop optimization experience[EB/OL].Twitter Job Bole[2015-07-02].http://blog.jobbole.com/88283/.
    [2]Peng Xixian,Zhu Qinghua,Liu Xuan.Research on behavior characteristics and classification of micro-blog users-taking“Sina Micro-blog”as an example[J].Information Science,2015,33(1):69-75.
    [3]Hui Bei,Wu Yue.Anti-spam model based on semi-na?ve Bayesian classification model[J].Journal of Computer Applications,2009,29(3):903-904.
    [4]Wang Peng,Gao Cheng,Chen Xiaomei.Research on LDAmodel based on text clustering[J].Information Science,2015,33(1):63-68.
    [5]Diao Yufeng,Yang Liang,Lin Hongfei.LDA-based opinion spam discovering[J].Journal of Chinese Information Processing,2011,25(1):41-47.
    [6]Chen Xiao,Huang Shuguang,Qin Li.Social network model based on micro-blog transmission[J].Journal of Computer Applications,2015,35(3):638-642.
    [7]Li Dahua.Probability model and computer vision[EB/OL].MIT Courser[2015-07-02].http://www.sigvc.org/bbs/thread-165-1-1.html.
    [8]Sina mirco-blog.UCI data sites[EB/OL].(2015-03)[2015-07-02].http://www.archive.ics.uci.edu/ml.
    [9]Kernighan B W,Lin S.An efficient heuristic procedure for partitioning graphs[J].Bell System Technical Journal,1970,49(2):291-307.
    [10]Zhao Yi,He Keqing,Chen Jingliang,et al.Evolution knowledge tree for services computing domain in Wikipedia[J].Journal of Wuhan University:Natural Science Edition,2015,61(4):331-338.
    [11]Plantt J C.Sequential minimal optimization:a fast algorithm for training support vector machines,MSR-TR-98-14[R].Microsoft Research,1998.
    [12]Han Zhongming,Zhang Hui,Xie Xiaomeng.Effective sentiment classification method based on SVM for microblogging text[D].Beijing:Beijing Technology and Business University,2013.
    [13]Yang Chao,Feng Shi,Wang Daling,et al.Analysis on Web public opinion orientation based on extending sentiment lexicon[J].Journal of Chinese Computer Systems,2010,31(4):44-49.
    [14]Ding Jianli,Ci Xian,Huang Jianxiong.Orientation analysis of Web reviews[J].Journal of Computer Applications,2010,30(11):2937-2940.
    [15]Wang Zhenyu,Wu Zeheng,Hu Fangtao.Words sentiment polarity calculation based on HowN et and PMI[J].Computer Engineering,2012,38(15):187-193.
    [16]Li Yingle,Yu Hongtao,Liu Lixiong.Predict algorithm of micro-blog retweet scale based on SVM[J].Application Research of Computers,2013,30(9):2594-2597.
    [2]彭希羡,朱庆华,刘璇.微博客用户特征分析及分类研究--以“新浪微博”为例[J].情报科学,2015,33(1):69-75.
    [3]惠孛,吴跃.基于不完全朴素贝叶斯分类模型的垃圾邮件分类模型[J].计算机应用,2009,29(3):903-904.
    [4]王鹏,高铖,陈晓美.基于LDA模型的文本聚类研究[J].情报科学,2015,33(1):63-68.
    [5]刁宇峰,杨亮,林鸿飞.基于LDA模型的博客垃圾评论发现[J].中文信息学报,2011,25(1):41-47.
    [6]陈骁,黄曙光,秦李.基于微博转发的社交网络模型[J].计算机应用,2015,35(3):638-642.
    [10]赵一,何克清,陈荆亮,等.面向维基百科服务计算领域的演化知识树[J].武汉大学学报:理学版,2015,61(4):331-338.
    [12]韩忠明,张慧,解筱梦.基于SVM的微博文本情感倾向性识别[D].北京:北京工商大学,2013.
    [13]杨超,冯时,王大玲,等.基于情感词典扩展技术的网络舆情倾向性分析[J].小型微型计算机系统,2010,31(4):44-49.
    [14]丁建立,慈祥,黄剑雄.网络评论倾向性分析[J].计算机应用,2010,30(11):2937-2940.
    [15]王振宇,吴泽衡,胡方涛.基于How Net和PMI的词语情感极性计算[J].计算机工程,2012,38(15):187-193.
    [16]李英乐,于洪涛,刘力雄.基于SVM的微博转发规模预测方法[J].计算机应用研究,2013,30(9):2594-2597.