基于社会网络的WEB舆情系统的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
舆论是指在一定的社会空间内,随着某一事件现象的发生、发展、变化,民众对该事件的态度。舆情是舆论的放大体,指民众对社会现象的看法、想法态度及意见的总和,也是民众的社会政治态度对执政者决策行为有影响和指导作用的那一部分。网络的普及使得社会舆论的发生机制有了很大的转变,胡主席指出“互联网已成为思想文化信息的集散地和社会舆论的放大器”,网路的开放性、自由性使得各阶层的人们都能便捷的使用,网络的虚拟性和隐蔽性让人们更愿意在网络上表达自己的想法看法和态度立场,社会舆论在网络中传播变化发展并最终有可能形成了网络舆情甚至有可能会影响政治体制管理,网络舆情已经成为社会舆情的最主要的构成之一,它是社会舆情的反映,然而基于网络信息的形式多样,各阶层民众的想法看法态度也不尽相同,以及涉及到的信息量极其庞大,传统的收集分析机制很难有效的完成舆情识别工作,因此必须构建一个高效舆情收集分析报告系统来完成这样的工作。
     本文将理论研究与实证研究相结合,在文献阅读研究的基础上,利用社会网络分析技术,对信息数据进行挖掘分析,主要研究内容包括四个方面:
     (1)本文首先引入基于多网关出口的分布式主题舆情爬虫,详细的介绍了该爬虫系统的构建方式、模块功能、实现方法以及页面双层节点结构的创新发明。该爬虫具有较高的时间效率和应用效率,很好的解决了数据来源的问题;
     (2)在研究了多个经典的层次算法基础上提出了一种新的层次聚类算法,通过该算法可以较好地解决人工设定阈值所引起的算法不稳定性,并且借助Hadoop平台,提高了算法的执行效率,为舆情发现打下了基础;
     (3)本文提出了三层社会舆情网络的建立方法,该方法将主题类别事件以网络的形式组织起来,并且网络中包含子网络,将主题涉及到的站点,用户一并进行了关联,使得分析对象不再是独立的节点,该三层网络是舆情发现的核心;
     (4)本文利用社会网络技术对三层社会舆情网络的舆情挖掘进行了探讨和尝试,提出了膨胀系数的概念,成功的实现了舆情发现以及关键节点的发现。
Public view is defined as the people’s attitude to some event in a certain social community. Public opinion is the amplifier of public view. It’s the sum of people’s views of various social phenomena, ideas, attitudes and opinions, and it also refers to the part of people’s social-political attitude that is influential and directional to the behavior of policy maker. The popularity of the network causes the mechanism of public opinion to undergo a drastic change.“The Internet has become the center of ideology, culture and the amplifier of public opinion”, said Chairman Hu. The open and free nature of Internet makes itself convenient to everybody, and the privy and virtual nature of Internet makes people more willing to express their ideas and opinions online. Public opinion spreads, changes, and develops in the network, which may gradually form internet public sentiment and affect the political system ultimately. Internet public sentiment, also known as the reflection of social public opinion, has already become the main part of social public opinion. However, because of the various forms of web information, the different ideas of people of different estates, and the extreme scale of web information, traditional collecting and analyzing mechanisms could not work effectively. As a result, it is necessary to build a new efficient system.
     In this paper we combine theory with empirical study. Based on the previous work in this field, we adopt social network analysis techniques to analyze WEB information.
     The four main aspects of our research are as follows:
     (1) Firstly, we introduce the distributed subject public opinion Crawler system which based on multi-Gateway export, construction of the system, function of modules and the invention of double-node structure are put forward in details. The system is effective and efficient, solving the problem of data source very well.
     (2) Based on the research of a number of classic hierarchical clustering algorithms, a new hierarchical clustering algorithm has been proposed, through which the instability caused by manual threshold setting is well solved. Then Hadoop platform is used to help the algorithm achieve better performance. which lays a solid foundation for the public opinion finding.
     (3) Then we proposed a method of building a three-tires social public-opinion network. This method organized the subject category event in form of network, in such network contains many sub-networks, these network connect the subject involving sites and users, this makes the analysis of the object is no longer an independent node. The three-tires network is the core of public opinion finding.
     (4) Finally, we used social network technique trying to mining the public opinion from the three-tires public-opinion network, and proposed the concept of expansion coefficient. We successfully achieve the goal, realizeing the finding of public opinion and the key node.
引文
[1]姜胜洪.网络舆情热点的形成与发展、现状及舆论引导.理论月刊,2008,4:4-36
    [2] http://www.aapor.org/
    [3] http://europa.eu.int/comm/public_opinion/index_en.htm
    [4]丁兆云.互联网多维层次式舆情指数若干计算方法的研究与实现.国防科技大学,2008.
    [5] F Wiseman. Methodological bias in public opinion surveys. Public Opinion Quarterly, 1972
    [6] N.Godbole, M.Srinivasaiah, S.kiena. Large-scale sentiment analysis for news and blogs. . ICWSM'07, 2007
    [7] T.Nasukawa, J.Yi. Sentiment analysis: Capturing favorability using natural language processing. Proceedings of the 2nd international conference, 2003
    [8] T.Mullen, N.Collier. Sentiment analysis using support vector machines with diverse information. Proceedings of EMNLP, 2004
    [9] S.Wasserman, K.Faust. Social network analysis: Methods and applications. 1994
    [10] RA.Hanneman, M.Riddle. Social Network Analysis. 2001
    [11] PJ.Carrington, J.Scott, S.Wasserman. Models and methods in social network analysis. 2005
    [12] NM.Tichy, ML.Tushman, C.Fombrun. Social network analysis for organizations. The Academy of Management, 1979
    [13] WU.Lihui, W.Bin, YU.Zhihua. Design and Realization of a General Web Crawler [J]. Computer Engineering, 2005
    [14] A.Heydon, M.Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 1999
    [15] J.Edwards, K.McCurley, J.Tomlin. An adaptive model for optimizing performance of an incremental web crawler. Proceedings of the 10th, 2001
    [16] M.Thelwall. A Web crawler design for data mining. Journal of Information Science, 2001
    [17] P.Boldi, B.Codenotti, M.Santini, S.Vigna. Ubicrawler: A scalable fully distributed web crawler. Software
    [18] V.Shkapenyuk, T.Suel. Design and implementation of a high-performance distributed web crawler. Proceedings of the International Conference on, 2002
    [19] P.Boldi, B.Codenotti, M.Santini, S.Vigna. Trovatore: Towards a highly scalable distributed web crawler. Poster Proc. of Tenth, 2001
    [20] TSCM.Jo, WJ Zhang, ADMKX Long. Odissea: A peer-to-peer architecture for scalable web search and information retrieval. Workshop on the Web, 2003
    [21] JK.Mullin. A second look at Bloom filters. Communications of the ACM, 1983
    [22] A.Broder, M.Mitzenmacher. Network applications of bloom filters: A survey. Internet Mathematics, 2004
    [23] H.Song, S.Dharmapurikar, J.Turner. Fast hash table lookup using extended bloom filter: an aid to network processing. ACM SIGCOMM, 2005
    [24] F.Chang, J.Dean, S.Ghemawat, WC.Hsieh. Bigtable: A distributed storage system for structured data. Proceedings of the 7th, 2006
    [25] W.Hsieh, J.Madhavan, R.Pike. Data management projects at Google. Proceedings of the 2006 ACM, 2006
    [26] RO.Duda, PE.Hart, DG.Stork. Pattern classification. 2001
    [27] MB.Eisen, PT.Spellman, PO.Brown. Cluster analysis and display of genome-wide expression patterns. Proceedings of the, 1998
    [28] RT.Ng, J.Han. Efficient and effective clustering methods for spatial data mining. Proceedings of the International Conference on Very, 1994
    [29] CY.Lee, EK.Antonsson. Dynamic partitional clustering using evolution strategies. IECON.
    [30] JB.Phipps. Dendrogram topology. Systematic Biology, 1971
    [31] R.Xu, D.Wunsch. Survey of clustering algorithms. IEEE Transactions on neural networks, 2005
    [32] Q.Wei. Analyzing popular clustering algorithms from different viewpoints.Ning, Z Ao
    [33]简涛,徐向东,张莉,韩正国.基于质量中心算法的两种点迹凝聚算法.空军雷达学院学报,2009,23(1):20,21-25
    [34] Margaret H.Dunham. Data mining introductory and Advanced Topic
    [35] PE.Danielsson. Euclidean distance mapping. Computer Graphics and image processing, 1980
    [36] R.De.Maesschalck, D.Jouan The mahalanobis distance.2000
    [37] http://en.wikipedia.org/wiki/Taxicab_geometry.
    [38]刘群,李素建.基于《知网》的词汇语义相似度计算.Computational Linguistics and Chinese, 2002
    [39] MJ.Pazzani, J.Muramatsu, D.Billsus. Syskill & Webert: Identifying interesting web sites. Proceedings of the National, 1996
    [40] G.Salton. Automatic text processing: the transformation. Analysis and Retrieval of Information by Computer, 1989
    [41]高辉,傅彦,陈旭.一种互联网舆情信息的分类处理方法.中国专利, 200810147719,2008
    [42] S.Haykin. Neural networks: a comprehensive foundation. 2008
    [43]姜欣,徐六通,张雷.C4. 5决策树展示算法的设计.计算机工程与应用,2003
    [44] http://hadoop.apache.org/
    [45] James Holland Jones, Mark S. Handcock. Social networks (communication arising)Sexual contacts and epidemic thresholds. Nature 423, 605 - 606 (05 Jun 2003) Brief Communications
    [46] L.Freeman. The Development of Social Network Analysis: A Study in the Sociology of Science. Empirical Press, Vancouver, BC, 2004
    [47] Philip Ball. Small word network. News@Nature News, 2002
    [48]罗家德.社会网络分析讲义.北京:社会科学文献出版社,2005, 4: 150-156

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700