基于双向量模型的自适应微博话题追踪方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Self-adaptive Method Based on Double-vector Model for Microblog Topic Tracking
  • 作者:黄畅 ; 郭文忠 ; 郭昆
  • 英文作者:HUANG Chang;GUO Wen-zhong;GUO Kun;College of Mathematics and Computer Sciences,Fuzhou University;Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing;Key Laboratory of Spatial Data Mining & Information Sharing,Ministry of Education;
  • 关键词:话题追踪 ; 微博 ; 自适应 ; 双向量
  • 英文关键词:topic tracking;;microblog;;self-adaptive;;double-vector
  • 中文刊名:XXWX
  • 英文刊名:Journal of Chinese Computer Systems
  • 机构:福州大学数学与计算机科学学院;福建省网络计算与智能信息处理重点实验室;空间数据挖掘与信息共享教育部重点实验室;
  • 出版日期:2019-06-14
  • 出版单位:小型微型计算机系统
  • 年:2019
  • 期:v.40
  • 基金:国家自然科学基金项目(61300104,61300103,61672158)资助;; 福建省高校杰出青年科学基金项目(JA12016)资助;; 福建省高等学校新世纪优秀人才支持计划项目(JA13021)资助;; 福建省杰出青年科学基金项目(2014J06017,2015J06014)资助;; 福建省科技创新平台计划项目(2009J1007,2014H2005)资助;; 福建省自然科学基金项目(2013J01230,2014J01232)资助;; 福建省高校产学合作项目(2014H6014,2017H6008)资助
  • 语种:中文;
  • 页:XXWX201906012
  • 页数:7
  • CN:06
  • ISSN:21-1106/TP
  • 分类号:69-75
摘要
针对微博文本篇幅短小、网络新词层出不穷等特点以及在话题发展过程中产生的漂移问题,提出了基于双向量模型的自适应微博话题追踪方法.该方法首先提出双向量模型,将文本用词嵌入和VSM向量空间模型两种方法分别向量化,保留文本语义的同时也解决了微博新词问题.其次,将话题和微博分别用双向量模型表示,计算话题双向量模型和微博双向量模型的余弦相似度作为话题与微博的相似度.接着,将话题与微博的相似度与自适应学习获得的相似度阈值进行比较,判定微博是否为话题相关微博.最后,自适应更新话题模型,能够有效地应对微博话题发展所产生的漂移.实验结果表明,该方法能够实时地跟踪话题并降低了话题相关微博的漏检率和误检率.
        In order to handle the characteristics of microblog such as short texts,continuous emergence of network neologisms and topic drifting,an adaptive microblog topic tracking method based on Double-Vector model is proposed. Firstly,a Double-Vector model is proposed to transform texts into vectors with word embedding technology and VSM( Vector Space Model),so that the text semantics is preserved and the problem of microblog neologisms is solved. Secondly,the similarity between a microblog and a topic is represented by the cosine value of the Double-Vector model of the microblog and the Double-Vector model of the topic. Thirdly,the similarity between a microblog and a topic is compares with the similarity threshold that is obtained by self-adaptive learning to determine whether the microblog is topic relevant microblog or not. Finally,through self-adaptive updating the topic model,the topic drift aroused by the development of microblog topics can be effectively overcomed. Experimental results show that the proposed method can effectively track the changes of the topic in real time and reduce the missing rate and false positive rate of the topic related microblog.
引文
[1] Allan J. Topic detection and tracking[M]. Springer US,2002.
    [2] Pilli L E,Mazzon J A. Information overload,choice deferral,and moderating role of need for cognition:empirical evidence[J]. Revista De Administra92o,2016,51(1):36-55.
    [3] Xiong Cai-quan,Ke Lv,Wang Hao,et al. Personalized group recommendation model based on argumentation topic[C]//Conference on Complex,Intelligent,and Software Intensive Systems(CISIS),Springer,Cham,2018:206-217.
    [4] Gao Tian,Du Jun-ping,Wang Su,et al. Topic detection for emergency events based on FCM document clustering[C]//IEEE International Conference on Broadband Network and Multimedia Technology(IEEE IC-BNMT),IEEE,2011:1181-1185.
    [5] Cui Zheng-yan. Short message classification of microblogging based on semantic[J]. Modern Computer,2010,(8):18-20,24.
    [6] Ye Cheng-xu,Yang Ping,Liu Shao-peng. Hot microblogging topics discovery based on subject terms[J]. Computer Applications&Software,2016,(2):46-50.
    [7] Lu Rong,Xiang Liang,Liu Ming-rong,et al. Discovering news topics from microblogs based on hidden topics analysis and text clustering[J]. Pattern Recognition&Artificial Intelligence,2012,25(3):382-387.
    [8] Tang Xiao-bo,Wang Zhong-qin,Zhong Lin-xia. Microblog topic tracking model based on Wikipedia semantic extension[J]. Information Science,2017,(2):80-85.
    [9] Duan Ya-juan,Wei Fu-ru,Zhou Ming,et al. Graph-based collective classification for tweets[C]//ACM International Conference on Information and Knowledge Management(CIKM),ACM,2012:2323-2326.
    [10] Kyosuke Nishida,Takahide Hoshide,Ko Fujimura. Improving tweet stream classification by detecting changes in word probability[C]//International Acm Sigir Conference on Research&Development in Information Retrieval(SIGIR),ACM,2012:971-980.
    [11] Fu Peng,Lin Zheng,Yuan Feng-cheng,et al. Convolutional neural network and user information based model for microblog topic tracking[J]. Pattern Recognition&Artificial Intelligence,2017,30(1):73-80.
    [12] Zheng Yan,Lu Ran. An adaptive topic tracking method based on feedback stories[C]//International Symposium on Information Technology in Medicine and Education(ISITME),IEEE,2012:1021-1025.
    [13] Zhang Jia-ming,Xi Yao-yi,Wang Bo,et al. Method of micro-blog event tracking based on word vector[J]. Computer Engineering&Applications,2016,52(17):73-78.
    [14] Lin J,Snow R,Morgan W. Smoothing techniques for adaptive online language models:topic tracking in tweet streams[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(SIGKDD),ACM,2011:422-429.
    [15] Feng Jun-jun,He Xiao-chun,Wang Hai-pei. Research on microblog topic tracking based on naive bayesian network[J]. Computer&Digital Engineering,2017,45(11):2244-2247.
    [16] Tang Xiao-jun. A method of tracking the topic of microblogs based on random forest[D]. Huainan:Anhui University of Science and Technology,2017.
    [17] Wang Hui. Research and design of microblog topic tracking method[D]. Beijing:Beijing Jiaotong University,2014.
    [18] Wu Jun-na. Research on technologies of adaptive topic tracking[D]. Beijing:North China Electric Power University,2013.
    [19] Yan Xiao-hui,Guo Jia-feng,Lan Yan-yan,et al. A biterm topic model for short texts[C]//International Conference on World Wide Web(WWW),ACM,2013:1445-1456.
    [20] Mikolov Tomas,Chen Kai,Corrado Grey,et al. Efficient estimation of word representations in vector space[C]//Proceedings of Workshop at International Conference on Learning Representations(ICLR),2013.
    [21] Hong Yu,Zhang Yu,Liu Ting,et al. Topic detection and tracking review[J]. Journal of Chinese Information Processing,2007,21(6):71-87.
    [5]崔争艳.基于语义的微博短信息分类[J].现代计算机(专业版),2010,(8):18-20,24.
    [6]叶成绪,杨萍,刘少鹏.基于主题词的微博热点话题发现[J].计算机应用与软件,2016,(2):46-50.
    [7]路荣,项亮,刘明荣,等.基于隐主题分析和文本聚类的微博客中新闻话题的发现[J].模式识别与人工智能,2012,25(3):382-387.
    [8]唐晓波,王中勤,钟林霞.基于维基语义扩展的微博话题追踪模型研究[J].情报科学,2017,(2):80-85.
    [11]付鹏,林政,袁凤程,等.基于卷积神经网络和用户信息的微博话题追踪模型[J].模式识别与人工智能,2017,30(1):73-80.
    [13]张佳明,席耀一,王波,等.基于词向量的微博事件追踪方法[J].计算机工程与应用,2016,52(17):73-78.
    [15]冯军军,贺晓春,王海沛.基于朴素贝叶斯网络的微博话题追踪技术研究[J].计算机与数字工程,2017,45(11):2244-2247.
    [16]唐孝军.基于随机森林的微博话题追踪的方法探究[D].淮南:安徽理工大学,2017.
    [17]王慧.微博话题追踪方法研究与设计[D].北京:北京交通大学,2014.
    [18]武军娜.自适应话题跟踪技术研究[D].北京:华北电力大学,2013.
    [21]洪宇,张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700