Research on Weibo Hotspot Finding Based on Self-Adaptive Incremental Clustering
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on Weibo Hotspot Finding Based on Self-Adaptive Incremental Clustering
  • 作者:宋慧琳 ; 彭迪云 ; 黄欣 ; 冯俊
  • 英文作者:SONG Huilin;PENG Diyun;HUANG Xin;FENG Jun;School of Management, Nanchang University;School of Economics and Management, Nanchang University;School of Electronics and Information Engineering, Tongji University;
  • 英文关键词:incremental clustering;;Weibo;;hotspot finding
  • 中文刊名:TRAN
  • 英文刊名:上海交通大学学报(英文版)
  • 机构:School of Management, Nanchang University;School of Economics and Management, Nanchang University;School of Electronics and Information Engineering, Tongji University;
  • 出版日期:2019-06-15
  • 出版单位:Journal of Shanghai Jiaotong University(Science)
  • 年:2019
  • 期:v.24
  • 基金:the Innovation Special Fund for the Postgraduates of Jiangxi Province(No.YC2016-B016)
  • 语种:英文;
  • 页:TRAN201903014
  • 页数:8
  • CN:03
  • ISSN:31-1943/U
  • 分类号:94-101
摘要
Weibo,also known as micro-blog,with its extremely low threshold of information release and interactive communication mode,has become the primary source and communication form of Internet hotspots.However,characterized as a kind of short text,the sparsity in semantic features,plus its colloquial and diversified expressions makes clustering analysis more difficult.In order to solve the above problems,we use the Biterm topic model(BTM)to extract features from the corpus and use vector space model(VSM)to strengthen the features to reduce the vector dimension and highlight the main features.Then,an improved Weibo feature-incorporated incremental clustering algorithm and the Weibo buzz calculation formula are proposed to describe the buzz of Weibo,and then the discovery of hotspots can be reasonably made.The experimental results show that the incremental clustering algorithm presented in this paper can effectively improve the accuracy of clustering in different dimensions.Meanwhile,the calculation formula of Weibo buzz reasonably describes the evolution process of Weibo buzz from a qualitative point of view,which can help discover the hotspots effectively.
        Weibo,also known as micro-blog,with its extremely low threshold of information release and interactive communication mode,has become the primary source and communication form of Internet hotspots.However,characterized as a kind of short text,the sparsity in semantic features,plus its colloquial and diversified expressions makes clustering analysis more difficult.In order to solve the above problems,we use the Biterm topic model(BTM)to extract features from the corpus and use vector space model(VSM)to strengthen the features to reduce the vector dimension and highlight the main features.Then,an improved Weibo feature-incorporated incremental clustering algorithm and the Weibo buzz calculation formula are proposed to describe the buzz of Weibo,and then the discovery of hotspots can be reasonably made.The experimental results show that the incremental clustering algorithm presented in this paper can effectively improve the accuracy of clustering in different dimensions.Meanwhile,the calculation formula of Weibo buzz reasonably describes the evolution process of Weibo buzz from a qualitative point of view,which can help discover the hotspots effectively.
引文
[1]GUO C L.Research and design on hot topic detection and tracking system in internet[D].Chengdu,China:University of Electronic Science and Technology of China,2013(in Chinese).
    [2]LU R F.Language features of Weibo[J].Journal of Changchun Education Institute, 2013,29(14):42-44(in Chinese).
    [3]LI Y D.Research on hot topic detection methods for microblog[D].Nanjing,China:Nanjing Normal University,2013(in Chinese).
    [4]BEIL F,ESTER M,XU X W.Frequent termbased text clustering[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Doata Mining.Edmonton,Alberta,Canada:ACM,2002:436-442.
    [5]HU J X,Xu H B,LIU Y,et al.Algorithm of repeatsbased term extraction and its application in text clustering[J].Computer Engineering, 2007,33(2):65-67(in Chinese).
    [6]GABRILOVICH E, MARKOVITCH S.Feature generation for textual information retrieval using world knowledge[D].Haifa,Israel:Israel Institute of Technology,2006.
    [7]LIU X L,CAO F Y,LIANG J Y.incremental algorithm for clustering short texts on news comments[J].Journal of Frontiers of Computer Science and Technology,2018,12(6):950-960(in Chinese).
    [8]HOTHO A,STAAB S,STUMME G.Ontologies improve text document clustering[C]//Proceedings of3rd IEEE International Conference on Data Mining.Melbourne,FL,USA:IEEE,2003:1-4.
    [9]FREY B J,DUECK D.Clustering by passing messages between data points[J].Science, 2007,315(5814):972-976.
    [10]SONG L,ZHANG P J.System design of micro-blog public opinion based on LDA topic modeling method[J].Network Security Technology&Application,2014(4):5-6(in Chinese).
    [11]TANG Q L.Short text clustering method based on BTM[D].Hefei,China:Anhui University,2014(in Chinese).
    [12]ZHANG Y.A short text similarity calculation method based on feature extension using BTM topic mode[D].Hefei,China:Anhui University,2014(in Chinese).
    [13]ALLAN J.Introduction to topic detection and tracking[C]//Topic Detection and Tracking.Boston,MA:Springer,2002:1-16.
    [14]XU X P.The Methods and characteristics of predicting future via twitter[D].Hangzhou,China:Zhejiang University,2011(in Chinese).
    [15]SAKAKI T, OKAZAKI M,MATSUO Y.Earthquake shakes Twitter users:Real-time event detection by social sensors[C]//Proceedings of the 19th International Conference on World WIDE WEB.Raleigh,NC,USA:ACM,2010:851-860.
    [16]PHUVIPADAWAT S,MURATA T.Breaking news detection and tracking in Twitter[C]//2010IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.Toronto,ON,Canada:IEEE,2010:120-123.
    [17]O'CONNOR B,BALASUBRAMANYAN R,ROUTLEDGE B R,et al.From tweets to polls:Linking text sentiment to public opinion time series[C]//The 4th International AAAI Conference on Weblogs and Social Media.Washington,DC,USA:AAAI,2010:122-129.
    [18]NIE W H,ZENG C,JIA D W.Microblog hot topics detection based on heat matrix[J].Computer Engineering,2017,43(2):57-62(in Chinese).
    [19]JIANG H M.Characteristics of microblog and its influence on public opinion[J].Journalism Lover,2011(5):85-86(in Chinese).
    [20]YANG L,LIN Y,LIN H F.Micro-blog hot events detection based on emotion distribution[J].Journal of Chinese Information Processing, 2012,26(1):84-90(in Chinese).
    [21]CHENG J S,SUN A,HU D N,et al.An Information diffusion-based recommendation framework for microblogging[J].Journal of the Association for Information Systems,2011,12(7):463-486.
    (1)http://www.ict.ac.cn/freeware/003 ictclas.asp

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700