网络突发事件推手检测与热点预测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,网络突发事件数量急剧增加,如果能够在突发事件爆发的初期就能对事件的规模进行预测,并且对事件形成的网络推动力量有一个明确的认识,掌握其是否有人幕后推动,将非常有利于更快速对事件做出更合理应对。本文的主要研究工作有:
     (1)提出了检测突发事件幕后推手的方法。综合事件潜伏期热度、ID集中度、简单文章比例、新注册ID比例和作者地域集中度等方面的特征,计算生成一个综合指标,表示事件幕后推手存在的可能性以及推动力量的大小。
     (2)总结了网络突发事件的热度分布规律。提出了浏览-回复模型,并根据模型指出自然爆发的网络突发事件满足多次泊松分布的叠加。人为推动事件如果推动力量很小,去除推动因素后仍近似满足泊松分布叠加规律。
     (3)提出预测模型和指标。在分析突发事件分布规律的基础上,提出了基于曲线拟合的热度预测模型,可以在事件爆发初期,大致预测事件的总热度。并定义了针对不同需求的预测指标。
     本文各部分内容依赖性比较强,因此将实验与理论部分结合在一起,以便理解。本文对比“贾君鹏事件”和“李刚事件”,验证了幕后推手检测方法的有效性;选取多个网络突发事件进行分析,验证了泊松分布规律。对于绝大部分事件,预测算法均能有良好的预测效果。
In recent years, network emergencies increased rapidly. Earlier prediction of scale and mastery of driving forces behind network emergency would largely benefit our response. Our major research focused on the following aspects.
     (1) Proposed a driving force detection method. We synthesized the following features: the latency heat, ID concentration, proportion of simple articles, ratios of newly registered IDs and geographical concentration of authors and calculated a composite index indicating the possibility and intensity of driving force.
     (2) Summarized the distribution of network emergency. Set up a“Browse-Reply”model to describe the network emergency, and found that heat distribution of natural emergency fitted combination of Poisson distribution functions. Network emergency with weak driving force still fitted Poisson distribution after removing the driving parts.
     (3) Proposed a prediction model and indexes. We proposed a prediction method based on heat distribution information. The method can predict the entire heat of the network emergency on early outbreak stage. Then we defined different indexes according to different requirements of network users and network regulators.
     For dependency of each part, we didn’t put all the experiments at the end of the thesis. Instead, we put them separately on the end of each part to make it more comprehensible. We compared“Jia Junpeng event”and“Li Gang event”to verify the effectiveness of driving force detection method.
     We analyzed lots of network emergencies, verified that most of the emergencies fitted Poisson distribution and the prediction result was effective. Most of measurement errors were less than 50%.
引文
[1] CNNIC. The 26th Statistical Report on Internet Development in China [EB/OL]. 2010.7. http://www.cnnic.net.cn/uploadfiles/pdf/2010/7/15/100708.pdf.
    [2]亨廷顿.变化社会中的政治秩序[M].上海:世纪出版社, 2008.
    [3]佘廉,叶金珠.网络突发事件蔓延及其危险性评估[J].工程研究, 2011,3(2):157-163.
    [4]李雯静,许鑫,陈正权.网络舆情指标体系设计与分析[J].情报科学, 2009, 27(7):986-991.
    [5] Kuan Yu Chen, Luesak Luesukprasert, and Sengcho T. Chou. Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling [J].IEEE Transactions on Knowledge and Data Engineering, 2007, 19(8):1016-1025.
    [6] Kefan Xie, Shi Zhao, Gang Chen, Wenjing Cai. Research on Lifecycle Principle and Group Decision-making of Network Public Sentiment Emergency [J]. Journal of Wuhan University of Technology (Social Sciences Edition), 2010,23(4):482-486.
    [7]张钰.网络舆情预测模型与平台研究[D].北京交通大学, 2009,5.
    [8]李立辉,田翔,杨海东,胡月明.基于SVR的金融时间序列预测[J].计算机工程与应用, 2005,30:221-224.
    [9]戴稳胜,吕奇杰, DavidPitt.金融时间序列预测模型-基于离散小波分解与支持向量回归的研究[J].统计与决策, 2007,7:4-7.
    [10]刘利,何先平.基于遗传算法和模糊决策树的时间序列预测模型[J].计算机工程与设计, 2008,29(19):5044-5046.
    [11]杨国俊.基于BBS的舆情预测算法及应用研究[D].合肥工业大学, 2009,11
    [12] Naohiro Matsumura, David Egoldberg, Xavier Liora. Mining Directed Social Network from Message Board[C].Proc.14thInternationalWorldWideWeb Conference (WWW2005).2005:1092-1093.
    [13] Naohiro Matsumura,Yukio Ohsawa,Mitsuru Ishizuka. Influence Diffusion Model in Text-Based Communication [EB/OL]. http://www2002.org/CDROM/Poster/109/.2002.
    [14]洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报, 2007,21(06):71-87.
    [15] J Allan, V Lavrenko, and R Swan. Explorations with-in topic tracking and detection[A]. In: Topic Detection and Tracking: Event-based Information Organization [C]. Kluwer Academic: Massachusetts, 2002, 197-224.
    [16] J M Schultz and M Y Liberman. Towards a universal dictionary for multi-language IR applications [A].In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 225-241.
    [17] J Yamron, L Gillick, P van Mulbregt, and S Knecht. Statistical models of topical content [A].In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 115-134.
    [18] Franck Thollard. Probabilistic DFA Inference Using Kullback-Leibler Divergence andMinimality [A]. In: Proc of the 17th Int'l Conf on Machine Learning [C]. San Francisco: Morgan Kaufmann, 2000, 975-982.
    [19] Y Watanabe, Y Okaxta, K Kaneji, and Y Sakamoto. Multiple Media Database System for TV Newscasts and Newspapers [A]. In: Technical Report of IEIGE [C]. Japan, 1998, 47-54.
    [20] Y.Zhang, J. G. Carbonell, J. Allan. Topic Detection and Tracking: Detection-Task [A]. In: Proceedings of the Workshop of Topic Detection and Tracking [C], 1997.
    [21] J Carbonell, Y Yang, J Lafferty, R D. Brown, T. Pierce, and X. Liu. CMU Report on TDT-2: Segmentation, Detection and Tracking [A]. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop [C]. San Francisco: Morgan Kauffman, 1999, 117-120.
    [22] James Allan, Ron Papka, Victor Lavrenko. On-line New Event Detection and Tracking [A]. In: the proceedings of SIGIR'98 [C]. University of Massachusetts: Amherst, 1998, 37-45.
    [23] J M Schultz and Mark Liberman. Topic detection and tracking using idf-weighted cosine coefficient [A]. In: Proceedings of the DARPA Broadcast News Workshop [C]. San Francisco: Morgan Kaufmann, 1999, 189-192.
    [24] J P Yamron, S Knecht, and P V Mulbregt. Dragon's Tracking and Detection Systems for the TDT2000 Evaluation [A]. In: Topic Detection and Tracking Workshop [C]. USA: National Institute of Standard and Technology, 2000, 75 79.
    [25] J Allan, V Lavrenko, D Frey, V Khandelwal. UMass at TDT 2000 [A]. In: Proceedings of Topic Detection and Tracking Workshop [C]. USA: National Institute of Standar and Technology, 2000, 109-115.
    [26] Allan J, Papka R, Lavrenko V. On-Line New Event Detection and Tracking [A].In: Proceedings of SIGIR '98:21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. New York: ACM Press, 1998, 37-45.
    [27] Y Yang, T Pierce, J Carbonell. A study on Retrospective and On-Line Event detection [A]. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval [C]. 1998, CMU, USA: ACM, 28-36.
    [28] Zhili Wu, Chunhung Li. Topic Detection in On-line Discussion using Non-Negative Matrix Factorization [C]. IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2007.
    [29] Victor Cheng, C. H. Li. Topic Detection via Participation using Markov Logic Network [C]. Third International IEEE Conference on Signal-Image Technologies and Internet2Based System, 2007.
    [30]陈友,程学旗,杨森.面向网络论坛的突发话题发现[J].中文信息学报,2010,24(3):29-36.
    [31]金治明,李永乐.概率论与数理统计[M].北京:科学出版社. 2008.
    [32]梁之舜,邓集贤等.概率论及数理统计(第二版)上[M].北京:高等教育出版社. 1988.
    [33]周概容.概率论与管理统计基础[M].上海:复旦大学出版社,2004.
    [34]魏振军.概率论与数理统计三十三讲(第二版)[M].北京:中国统计出版社.2005.
    [35]孙佰清,董靖巍.重大公共危机网络舆情扩散监测和规律分析[J].哈尔滨工业大学学报,2011,13(01): 92-97.
    [36] Yanling Li, Guanzhong Dai, Yehang Zhu, Sen Qin. A High-Performance Extraction Method for Public Opinion on Internet[J].Wuhan University Journal of Natural Sciences, 2007, 12(5):902-906.
    [37] Donghui Zheng and Fang Li. Hot topic detection on BBS using aging theory [J]. Lecture Notes in Computer Science, 2009,5854():129-138
    [38]谢科范,赵湜,陈刚,蔡文静.网络舆情突发事件的生命周期原理及集群决策研究[J].武汉理工大学学报(社会科学版),2010,23(4):482-486.
    [39]郭艺.突发性事件的网络舆论传播与控制[D].武汉:华中科技大学,2010.
    [40] CNNIC. The 27th Statistical Report on Internet Development in China [EB/OL]. http://www.cnnic.net.cn/dtygg/dtgg/201101/t20110118_20250.html. 2011.1.
    [41] CNNIC. The 19th Statistical Report on Internet Development in China [EB/OL]. http://www.cnnic.net.cn/download/2007/cnnic19threport.pdf. 2007,1.
    [42] Keitaro Naruse, Masao Kubo. Lognormal Distribution of BBS Articles and its Social and Generative Mechanism [J]. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 2006, ():103-112.
    [43]来火尧,刘功申.基于主题相关性分析的文本倾向性研究[J].信息安全与通信保密, 2009, (3):77-81.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700