面向微博的事件检测算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
微博以其便捷快速的信息分享方式及庞大的用户关系网络,使微博信息通过用户网络呈指数急速增长、蔓延,进一步加剧了信息、时代数据丰富而信息匮乏的矛盾。从这些海量繁杂的微博数据中梳理出具有价值的事件信息,不仅可以帮助网友获取自己感兴趣的事件资讯并掌握身边发生的新闻要事,还可以从舆论监控和民意调查的角度协助政府部门进行应急管理和行政决策。
     然而由于微博数据简短、不规范、新颖的特点,应用传统的网络文本分析和数据挖掘技术的效果不再理想,事件检测领域在特定的微博环境下面临了新的挑战,相关研究尚处于探索阶段,亟待寻求面向微博的有效的事件检测方法。
     针对微博文本的特点,论文提出了整套面向微博的事件检测模型,并对每个模块进行了详细的算法设计。通过实验证明,该模型可有效、及时、准确地检测出微博数据中的事件信息。课题的主要研究内容和创新点如下:
     第一,论文提出了一种基于N元关系统计的自监督特征抽取方法。不同于传统的事件检测模型,论文提出了将事件检测重点从文档转换为特征,通过微博数据自身的特点来表示微博特征,从而能更好地表达微博文本所要传递的信息,适应微博文本的特点
     第二,论文通过引入词激活力和词亲密度的概念,提出了一种新的词聚类方法。通过新的词聚类方法,论文成功将事件检测模型从文档聚类转换为特征聚类,从而将孤立的微博数据特征关联为有序词类进而表达检测事件。
     第三,论文设计并实现了一个完整有效的微博事件检测模型。基于论文提出的事件检测模型算法的创新,论文在事件检测表示、事件检测性能评测方面同样给出了相应的解决方案,为面向微博的事件检测模型各方面工作开阔了思路。
Microblog which is known for its convenient way of rapid sharing of information and extensive user connections makes rapid growth and spread of information through the user network exponentially, further exacerbating the data-rich and information-poor contradictions in the information age. From these massive complex microblog data, teasing out the event with the valuable information can not only help users access to information on the event they're interested in and learn news events going on around them, but also from the perspective of monitoring public opinion and public opinion polls help government departments for emergency management and executive decision.
     However, as the short, irregular, innovative features for microblog data, using text analysis and data mining technology in the application of traditional network, the effect is no longer ideal. Event detection in particular in the fields of microblog environment faces new challenges, and related research is still in the exploratory phase, urgently seeking for a valid event detecting method for microblog.
     For the text features of microblog, paper presents the entire event detection model for microblog, and describes the design details of algorithm over every module. Experimental results shows that the paper model can effective, timely and accurately detect the event information for microblog data. The main research contents and innovative points for the paper are as follows
     First, Paper present a new self supervised feature selection method based on n-gram relationship. Different from the traditional model for event detection, paper presents the main part of event detection should be changed from document to features. Representing the microblog features by microblog data itself, the method can express the information better, and also adapt to the characteristics of the microblog.
     Second, this paper proposes a new word clustering method based on the concept of word activation forces and word affinity. For the new manner, the paper event detection model is successfully converted from document clustering to feature clustering, to associate isolated microblog feature to word clusters in order to express detected events.
     Third, paper designs and implements a complete and valid microblog event detection model. Based on the innovation of event detection algorithm the thesis puts forward, The paper also gives the corresponding solutions in the area of event detection representation and event detection performance evaluation, providing all aspects of the work of the open ideas for event detection model based on microblog.
引文
[1]http://baike.baidu.com/view/1567099.htm. 百度百科
    [2]郭军.Web搜索[M].高等教育出版社.2009.216-221
    [3]杨尔弘.突发事件信息提取研究.[学位论文],北京,北京语言大学,2005
    [4]陈伟,张成,王灿.新闻数据流的在线事件检测.浙江大学学报(工学版).第6期.2011.1006-1012
    [5]Chen You, Yang Sen, Cheng XueQi. Bursty Topics Extraction for Web Forums. In WIDM'09. HongKong, China.2009.55-58
    [6]Robin Singh Bhadoria, Manish Dixit, Rohit Bansal, et al. Detecting and Searching System for Event on Internet Blog Data Using Cluster Mining Algorithm. In Proceedings of the International Conference on Information Systems Design and Intelligent Applications 2012 (INDIA 2012) Advances in Intelligent and Soft Computing Volume 132. Visakhapatnam, India.2012.83-91
    [7]Liu Zhiming, Liu Lu. Recognition and Analysis of Opinion Leaders in Microblog Public Opinions. Systems Engineering.2011(06).2011
    [8]Li He et al. Analysis of Micro-bloggers' Characteristics & Mining of Core Micro-bloggers. Information Studies:Theory & Application.2011 (11).2011
    [9]Ma Te, Dong DaHai, Guo YanHong. Microblogging users classification and behavior analysis. In Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC),2011 2nd International Conference. Henan, China.2011.6292-6294
    [10]Ting IHsing, Chang PeiShan, Wang ShyueLiang. Understanding Microblog Users for Social Recommendation Based on Social Networks Analysis. Journal of Universal Computer Science. vol.18(no.4).2012.554-576
    [11]Fu MengHsuan, Chen LingYu, Lee KuanRong, et al. A Novel Opinion Analysis Scheme Using Social Relationships on Microblog. Future Information Technology, Application, and Service Lecture Notes in Electrical Engineering. Volume 164.2012.687-695
    [12]Yin Dawei, Hong Liangjie, Xiong Xiong, et al. Link formation analysis in microblogs. SIGIR'll Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. New York, USA.1235-1236
    [13]Bermingham, Adam. Sentiment analysis and real-time microblog search. [Dissertation], Ireland, Dublin City University,2012
    [14]Weng JuiYu, Yang ChengLun, Chen BoNian, et al. IMASS:an intelligent microblog analysis and summarization system. In HLT'11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies:Systems Demonstrations. Stroudsburg, PA, USA.2011.133-138
    [15]Mathioudakis M, Koudas N. TwitterMoniter:Trend detection over the Twitter stream. In SIGMOD'10:Proceedings of the 2010 International Conference on Management of Data. New York, USA.2010.1155-1158
    [16]Yanyan Du, Yanxiang He, Ye Tian, et al. Microblog bursty topic detection based on user relationship. In Information Technology and Artificial Intelligence Conference (ITAIC),2011 6th IEEE Joint International. Chongqing, China.2011.260-263
    [17]Wu Shanchan, Gong Leanna, William Rand, et al. Making recommendations in a microblog to improve the impact of a focal user. In RecSys'12 Proceedings of the sixth ACM conference on Recommender systems. New York, USA.2012.265-268
    [18]Fu Meng-Hsuan, Lin Fang-Yu Kuo Yau-Hwang, et al. Resonance-Relationship Network Construction by Information Analysis Based on Microblog Interactions. In CONTENT 2012, The Fourth International Conference on Creative Content Technologies. Nice, France.2012. 8-13
    [19]http://open.weibo.com/.新浪开放平台
    [20]廉捷,周欣,曹伟等.新浪微博数据挖掘方案.清华大学学报(自然科学版).第51卷(第10期).2011.1300-1305
    [21]郭军.Web搜索[M].高等教育出版社.2009.216-217
    [22]洪宇,张宇,刘挺等.话题检测与跟踪的评测及研究综述.中文信息学报.第26卷(第2期).2007.71-87
    [23]Y. Yang, T. Pierce, and J. Carbonell. A study on retrospective and online event detection. In proceedings of the 21th annual international ACM SIGIR conference on research and development in information retrieval. New York, USA.1998.28-36
    [24]骆卫华,于满泉,许洪波等.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报.第20卷(第1期).2006.29-36
    [25]Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Philip S. Yu, et al. Parameter free bursty events detection in text streams. In VLDB'05:Proceedings of the 31st international conference on Very large data bases.2005.181-192
    [26]He Qi, Chang Kuiyu, and Lim Ee-Peng. Analyzing feature trajectories for event detection. In SIGIR'07:Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. New York, USA.2007.207-214
    [27]Chen Ling and Abhishek Roy, Event detection from flickr data through wavelet-based spatial analysis. In CIKM'09:Proceedings of the 18th ACM conference on Information and knowledge management. New York, USA.2009.523-532
    [28]刘宏杰,陆浩,张楠等.基于微博的六度空问理论研究.计算机应用研究.第8期 (第29卷).2012.2826-2829
    [29]Mario Cataldi, Luigi Di Caro, Claudio Schifanella. Emerging Topic Detection on Twitter based on Temporal and Social Terms Evaluation. In MDMKDD'10 Proceedings of the Tenth International Workshop on Multimedia Data Mining. Washington DC, USA.2010.
    [30]Sun Yizhou, Han Jiawei, Gao Jing, et al. iTopicModel:Information Network Integrated Topic Modeling. Proceedings of ICDM'09. Washington DC, USA.2009.493-502
    [31]Adam Bermingham, Alan Smeaton. Classifying Sentiment in Microblogs:Is Brevity an Advantage?. CIKM 19th International Conference on Information and Knowledge Management. Toronto, Ontario, Canada.2010
    [32]谢丽星,周明,孙茂松.基于层次结构的多策略中文微博情感分析和特征抽取.中文信息学报.第1期(第26卷).2012.73-83
    [33]Miles Efron. Information Search and Retrieval in Microblogs. Journal of the American Society for Information Science and Technology.62(6).2011.996-1008
    [34]Kamran Massoudi, Manos Tsagkias, Maarten de Rijke, et al. Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts. Lecture Notes in Computer Science Advances in Information Retrieval. Volume 6611.2011.362-367
    [35]路荣,项亮,刘明荣等.基于隐主题分析和文本聚类的微博客新闻话题发现研究.模式识别与人工智能.第3期.2012.382-387
    [36]Weng Jianshu and Lee Bu-Sung. Event Detection in Twitter. In the Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM 2011). Barcelona. 2011
    [37]Chao Zhang, Nan Sun, Xia Hu, et al. Query Segmentation Based on Eigenspace Similarity. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Singapore. 2009.185-188
    [38]Jun Guo, Hanliang Guo, Zhanyi Wang. Word Activation Forces Map Word Networks [OL]. [2011-3-10] http://www.nature.com/srep/2011/11012/srep00113/full/srep00113.html
    [39]郭军.Web搜索[M].高等教育出版社.2009.193-194
    [40]张瑾,杨森,王孝宗等.话题检测与跟踪研究进展综述.信息技术快报.第4期(第8卷).2010.52-60
    [41]郑斐然,苗夺谦,张志飞.等.一种中文微博新闻话题检测的方法.计算机科学.第1期(第39卷).2012.138-141

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700