网络舆情预测关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
社会舆情在当今社会越来越受到关注。社会舆情在不同的语言环境,其内涵也不尽相同。总的来说,舆情的定义分为两种,广义舆情和狭义舆情。我们平时所关心的更多的是指狭义舆情。伴随互联网的快速发展,舆情又出现了新的表现形式网络舆情。建立舆情汇集和分析机制,畅通社情民意反映渠道是十六届四中全会《中共中央关于加强党的执政能力建设的决定》中提出的明确要求。对于网络舆情的获取、分析、预测将对构建和谐社会,维护社会安定,保证经济建设快速健康发展有着重要意义。
     近年来越来越多的学者开始关注网络舆情,对其进行相关的分析和研究。相关的科研文献也日益增多。其中,大量文献对于文本聚类、热点获取分析及语言倾向性等方向已经做出了深入研究,但对于网络舆情预测以及完整的网络舆情预测系统设计的研究尚属不多。
     本文通过对已有技术和知识进行分析总结的基础上,对于网络舆情预测系统进行设计,对于相关的关键技术进行了研究。本文的主要工作包括:
     1)分析了多种网络爬行策略,采用了基于启发搜索策略的网络爬行器,对网络数据进行抓取。
     2)对于舆情数据进行预处理。其中采用了基于层次主体树的文本聚类方法、基于话题关注度的热点获取技术、基于优化聚合表的数据聚合策略。
     3)讨论了灰色系统理论,使用灰色GM(1,1)模型构建了单项预测模型,给出了模型建立步骤,并分析了其局限性。
     4)分析了三种Markov预测方法的异同,采用了加权式Markov链预测方法。
     5)网络舆情预测系统设计,构建对于网络舆情的基于灰色理论和马尔科夫链的组合预测方法。
People pay more and more attention to the public sentiment in present day society. The public sentiment in different language environment, its meaning is different. In general, the definition of public sentiment is divided into two, broad and narrow definition of public sentiment. We are usually more concerned about public sentiment is narrow. With the rapid development of Internet, public opinion has emerged a new form of internet public sentiment. The establishment of public sentiment collection and analysis mechanism, smooth channels for social conditions and public opinion is 10 Fourth Plenary Session, "the CPC Central Committee on Strengthening the Party's governance capacity-building decisions" in the clear demands. For network analysis of public sentiment, acquisition, prediction will be building a harmonious society, maintain social stability and economic development to ensure rapid and healthy development of great significance.
     In recent years more and more scholars begin to study internet public sentiment and its associated analysis and research. Relevant scientific paper is increasing. Among them, a large number of documents for text clustering, hot spot analysis and language preference for such direction has been made in-depth study, but the forecast for the internet public sentiment and a complete network system design of the internet public sentiment forecast still not much.
     Based on the existing skills and knowledge to analyze and summarize the basis of internet public sentiment, A forecastsystem has been designed and the key technologies related to the study. This major work includes:
     1) The paper analyze a variety of network crawling strategy, using heuristic search strategies based on the network, crawler, crawling the data for the network.
     2) The public opinion data preprocessing. One of the main tree using the text based on hierarchical clustering method, based on the hot topics of concern about access to technology, optimization-based aggregation of data aggregation strategy table.
     3) The paper discuss the gray system theory, the use of gray GM (1,1) model for single prediction model was constructed, the steps are given model, and analyzes its limitations.
     4) The paper discussed three different Markov prediction method, using the weighted Markov Chain Model.
     5) IPS Forecast System is designed. The paper used a combination forecasting method based on gray theory and Markov chain.
引文
[1]Shasha Wang. Phase detection and prediction web public sentiment [C]. Proceedings of the 2009 International Conference on Web Information Systems and Mining (WISM 2009). Shanghai. 2009.116-117
    [2]Xu Chen. Situation Analysis and Prediction of Web Public Sentiment [C]. ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING. PEOPLESR.2008.707-710
    [3]Jianping zeng. Predictive Model for Internet Public Opinion [C]. Fourth International Conference on Fuzzy Systems and Knowledge Discovery. Haikou.2006.7-11
    [4]Verella JT. Modeling public opinion and voting as a complex system with agent-based simulations [C].2008 SYSTEMS AND INFORMATION ENGINEERING DESIGN SYMPOSIUM. Charlottesville.2008.261-266
    [5]Guan QL. Research and Design of Internet Public Opinion Analysis System [C].2009 IITA INTERNATIONAL CONFERENCE ON SERVICES SCIENCE, MANAGEMENT AND ENGINEERING, PROCEEDINGS. PEOPLESR.2009.173-177
    [6]刘远超.文档聚类综述.[J].中文信息学报.2006.Vol.20(3):56-62
    [7]周亚东.流量内容词语相关度的网络热点话题提取.[J].西安交通大学学报.2007.Vol.41(10):1142-1150
    [8]王强.灰色理论与时序模型的发动机状态监测分析.[J].计算机工程与应用.2009.Vol.45(8).246-248
    [9]林文龙.Web浏览预测的Markov模型综述.[J].计算机科学.2008.Vol.35(1).9-14
    [10]包涵.灰色模型在供水管网漏损预测中的应用比较.[J].给水排水.2010.Vol.36(1).157-159
    [11]中国互联网络信息中心(CNNIC).中国互联网络发展状况报告[R/OL]. Technical report, January,2010.http://www.cnnic.net.cn/
    [12]王涛.基于HTML标记的主题爬行器的设计与实现[D].[硕士学位论文].成都:电子科技大学,2006
    [13]Aggarwal C C, Al-Garawi F, Yu D. Intelligent crawling on the world wide web with arbitrary predicates. In:WWW2001, Hong Kong,2001,96-105.
    [14]蔡阳波.基于主题策略的网络爬行器算法研究[D].[硕士学位论文].重庆:重庆大学,2008
    [15]高磊,徐东平.启发式算法在搜索引擎的应用[J].电脑知识与技术,2007.
    [16]G.Pant, K. Tsioutsiouliklis, J. Johnson, and C.L. Giles.Panorama:ExtendingDigital Libraries with TopicalCrawlers[A].Proc.FourthACM/IEEE-CSJointConf.DigitalLibraries,2004, pp.142-150.
    [17]Ching-Chi Hsua, Fan Wub.Topic-specific crawling on the Web with the measurements of the relevancy context graph [J]. Information Systems 31 (2006) 232-246.
    [18]孙吉贵,刘杰,赵连宇.聚类算法研究.[J].软件学报.2008.Vol.19(1):48-61
    [19]Kumar P, Krishna PR, Bapi RS, De SK. Rough clustering of sequential data. Data & Knowledge Engineering,2007,3(2):183-199.
    [20]Huang ZX, Michael K. A note on K-modes clustering. Journal of Classification,2003, 20(2):257-26.
    [21]Yang MS, Hu YJ, Lin KCR, Lin CCL. Segmenttation techniques for tissue differentiation in MRI of ophthalmology using fuzzy clustering algorithm. Journal of Magnetic Resonance Imaging,2002, (20):173-179.
    [22]Li YJ. A clustering algorithm based on maximal θ-distant subtrees. Pattern Recognition, 2007,40(5):1425-1431.
    [23]Tsai CF, Tsai CW, Wu HC, Yang T. ACODF:A novel data clustering approach for data mining in large databases. Journal of Systems and Software,2004,73(1):133-145.
    [24]Birant D, Kut A. ST-DBSCAN:An algorithm for clustering spatial-temporal data. Data & Knowledge Engineering,2007,60(1):208-221.
    [25]Jain AK, Flynn PJ. Image segmentation using clustering. In:Ahuja N, Bowyer K, eds. Advances in Image Understanding:A Festchrift for Azriel Rosenfeld. Piscataway:IEEE Press, 1996.65-83.
    [26]Jain AK, Murty MN, Flynn PJ. Data clustering:A review. ACM Computing Surveys, 1999,31(3):264-323.
    [27]Cades I, Smyth P, Mannila H. Probabilistic modeling of transactional data with applications to profiling, visualization and prediction, sigmod. In:Proc. of the 7th ACM SIGKDD. San Francisco:ACM Press,2001.37-46. http://www.sigkdd.org/kdd2001/
    [28]王小芳.文本主题域划分与无监督特征提取[D].[博士学位论文].哈尔滨:吉林大学,2009
    [29]孙学刚,陈群秀,马亮.基于主题的Web文档聚类研究[J].中文信息学报.2003,Vol.17(3):21-26.
    [30]Li B, Chen Y, Bai X, et al. Experimental study on representing units in Chinese text categorization[C]. Proceedings of CICLing. Springer,2003.602-614.
    [31]Lewis D D, Hayes P J. Guest editorial—special issue on text categorization[C]. ACM Transactions on Information Systems,1994,12(3):231.
    [32]马帅,王腾蛟,等.一种基于参考点和密度的快速聚类算法[J].软件学报.2003,Vol.14(6):1089-1095.
    [33]Gabrilovich E, Markovitch S. Text categorization with many redundant features:using aggressive feature selection to make SVMs competitive with C4.5.Proceedings of the twenty first international conference on Machine learning (ICML'04)[C], New York, NY, USA:ACM Press,2004.41.
    [34]徐永东.多文档自动文摘关键技术研究[D].[博士学位论文].哈尔滨:哈尔滨工业大学, 2007
    [35]B. Regina, N. Elhadad, K. R. Mckeown. Sentence Ordering in Multidocument summarization. In Proceedings of the 1st Human Language Technology Conference. San Diego, California,2001:32-38
    [36]C. Y. Lin, E. Hovy. Neats:A Multidocument Summarizer. In Proceedings of the Document Understanding Workshop,2001
    [37]陈炯, 张永奎.一种基于词聚类的中文文本主题抽取方法.计算机应用.2005.Vol.25(4):754-756
    [38]W. Sunayama, M. Yachida. Panoramic View System for Extracting Key Sentences Based on Viewpoints and Application to a Search Engine. Journal of Network and Computer Applications. 2005,28(2):115-127
    [39]P. Weinstein, H. Parunak, P. Chiusano, S. Brueckner. Agents Swarming in Semantic Spaces to Corroborate Hypotheses. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, New York, USA,2004:1488-1489
    [40]吴斌,傅伟鹏,史忠植,等.一种基于群体智能的web文档聚类算法[J].计算机研究与发展,2002,Vol.39(11):1429-1435.
    [41]Zheng Chen, Wei2YingMa, J inwenMa. Learning to ClusterWeb Search Results[A]. In:p roceedings of the 27 th Annual InternationalACM SIGIR Conference [C]. Sheffield, South Yorkshire, UK, July 2004,210-217.
    [42]P. Willett. Recent trends in hierarchic document clustering:a critical review [J]. In: Information Processing andManagement,24 (5):577-597,1988.
    [43]MichaelDittenbach, DieterMerkl, Andreas Rauber. The Growing Hierarchical SelfOrganizingmap [A]. Inproceedings of the Int'l Joint Conference on NeuralNetworks (IJCNN'2000) [C]. Como, Italy, July 24227,2000.
    [44]Z. Y. Niu, D. H. J i and C. L. Tan. Document clustering based on cluster validation [A].13 th Conference on Information and KnowledgeManagement[C]. CIKM 2004,8-13 Nov 2004, Washington DC, USA.
    [45]周亚东,孙钦东,管晓宏,李卫,陶敬.流量内容词语相关度的网络热点话题提取.[J].西安交通大学学报.2007.Vol.41(10):1142-1150
    [46]刘思峰,党耀国,方志耕.灰色系统理论及其应用[M]北京:科学出版社,2004
    [47]肖新平,宋中民,李峰.灰技术基础及其应用[M]北京:科学出版社,2004
    [48]邓聚龙.灰色系统(社会·经济).[M]北京:国防工业出版社,1983
    [49]姜吴.灰色马尔可夫预测模型在台风诱发灾害研究中的应用[D].[硕士学位论文].青岛:中国海洋大学,2009
    [50]宋巧娜,唐德善.基于灰色马尔可夫模型的农业用水量预测[J].安徽农业科学,2007,35(6).1788-1789
    [51]张超,马存宝,许家栋.基于灰色马尔可夫SCGM(1,1)模型的空难人数预测.[J].系统工程理论与实践,2005,5.135-144
    [52]林晓言,陈有孝.基于灰色一马尔可夫链改进方法的铁路货运量预测研究.[J].铁道学报,2005,Vol.27(3):16-19
    [53]Liu D F, Jiang J T, Wang C. Extreme wave prediction in markov chain condition. Proceedings of the Eighth International Offshore and Polar Engineering Conference. Montreal: The International Society of Offshore and Polar Engineering.1998. Vol.3.84-88
    [54]何勇, 鲍一丹.灰色马尔何柯夫预测模型及其应用[J].系统工程理论与实践,1992,Vol.12(4).59-63

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700