骨干通道上的网络论坛通信信息监测和分析的关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
经过近二十年的飞速发展,互联网已经从一种便捷的通信工具,逐渐演变成为一个虚拟社会。网络论坛是这个虚拟社会的重要组成部分,论坛中的言论传播面广、传播速度快,在一定程度上引导了舆论倾向。因此对网络论坛的通信信息进行监测和分析是了解当前舆情态势的有效渠道,对于网络虚拟社会的有序治理具有重要意义。本论文针对骨干通道上的网络论坛通信信息的监测和分析的关键技术进行了研究,主要包括以下内容:
     1.提出了一个“分层分布的高速骨干通信信息监测分析系统结构模型”。该模型横向根据功能和技术特性分层,纵向遵循骨干网通道的属地化管理原则构建分布式监测节点。对网络通信数据捕获、信息抽取、信息存储、深度分析、协同监测业务等重要环节的处理过程进行了归纳和抽象。其中的数据捕获、信息抽取、深度分析技术作为后续章节展开的重点。
     2.提出了一个“基于逻辑输出端口分组的过滤分流器设备”的设计方案。可以进行灵活的数据包复制和过滤,实现组内不同逻辑端口分流负载的灵活分配以及外接交换机二级分流。并在此基础上,提出了“动态反馈式前向缓存的过滤分流机制”的过滤分流优化机制,可以实现会话级别基于数据内容的过滤,以及灵活的关联数据完整截获。可以显著降低信息还原抽取层设备的规模。
     3.提出了一个“基于SVM和层次CRF的旁路截获数据的网络论坛信息抽取方法”。采用SVM技术通过对截获数据的宏观特征分析,自动识别论坛网站;采用层次CRF技术,对网络论坛会话的行为类型进行判断,并对相关信息元素进行类型标注,在此基础上形成信息抽取所用的Wrapper;采用wrapper技术对骨干网通道上的网络论坛信息进行自动抽取。
     4.提出了“基于旁路截获数据的网络言论特征参量体系”,以及“基于网络言论特征参量的深度分析方法”。充分考虑了论坛网站、版面、网民和帖子实体间的相互关系,将如网民对言论的兴趣、网民参与议论的程度、言论的扩散速度、网站关注程度等要素融合在一起;能够充分利用旁路截获数据的特性,通过对旁路截获数据中抽取的重要元素进行分析而获取网络言论规律特性,进行网络言论的态势分析和趋势预测。
With rapid development in last twenty years, Internet, originally a convenient communication tool, has been evolved into a virtual society. BBS is one of the important communities of this virtual society. Widely and quickly spreading of BBS consensus, guides the public opinion to some extent. To monitor and analyze the BBS information is an effective way to know the public opinion situation. It can play an important role to govern the virtual society. This dissertation studies some key technologies to monitor and analyze BBS information transferred through backbone networks. The main contributions are as follows:
     1. Put forward a "Layered and Distributed System Architecture Model to Monitor and Analyze the Communication Information Transferred Through High Speed Backbone Networks". The model divides the layers by technical functionalities and features, and composes distributed monitoring nodes according to regional backbone administration. The key processes of Data Capture, Information Extraction, Information Storing,Information Deep Analyzing, coordinated Monitoring Applications are concluded and abstracted. Among those, Data Capture, Information Extraction, and Deep Analyzing technologies are described in follow-up chapters separately.
     2. A "Filtering and Distributing Device based on Logical Output Port Group" is designed. It can duplicate and filter data packets flexibly. It can allocate packets-flow to different logic ports in a group, and connect to external switches for second-level distribution. Based on that, a "Forward-Caching with Dynamic-Feedback Filtering and Distributing Mechanism" is suggested for optimization. It realizes session level data contents filtering, and flexible related-packets capture. The new mechanism can notably reduce the number of the devices deployed.
     3. A "Method to Extract BBS Information from Captured Packets by SVM and Layered CRF Technology" is proposed. It automatically recognizes the BBS sites by analyzing macro-features of captured packets with SVM. It adopts layered CRF technology to determine the behavior-type of BBS sessions, labels the elements-type, and composes wrappers for information extraction. Then it fulfills automatic extraction of BBS information transferred through backbone networks by wrapper technology.
     4. A "BBS consensus Characteristic Parameter Structure based on Captured Information" is defined. A "Deep Analyzing Method based on the BBS Consensus Characteristic Parameters " is proposed. They take into account the relationship among BBS sites, board, netizen and post-notes. They integrate the key elements together such as the interests and involvement netizen showed towards a specific consensus, the spread-speed, the attention paid to BBS sites, etc. By utilizing the characteristics and analyzing the key elements extracted from the captured-data, we can obtain the regular characteristics, evaluate the situation and predict the trends of BBS consensus.
引文
[1]中国互联网络信息中心(CNNIC).中国互联网络发展状况统计报告(第26次).2010年7月
    [2]艾瑞咨询集团,2009年中国网络社区研究报告简版,2009-11-6
    [3]ES 201 671 Telecommunications Security; Lawful Interception (LI); Handover Interface for the Lawful Interception of Telecommunications Traffic (revised version).
    [4]ES 201 158 Telecommunications Security; Lawful Interception (LI); Requirements for Network Functions
    [5]TS 102 234 Telecommunications Security; Lawful Interception (LI); Service-specific details for internet access services;
    [6]TS 102 233 Telecommunications Security; Lawful interception (LI); Service-specific details for e-mail services
    [7]TS 102 232 Telecommunications Security; Lawful Interception (LI); Handover Specification for IP Delivery
    [8]TS 101 671 Telecommunications Security; Lawful Interception (LI); Handover interface for the lawful interception of telecommunications traffic
    [9]TS 102 815 Telecommunications Security; Lawful Interception (LI); Service-specific details for Layer 2 Lawful Interception
    [10]TS 101 331 Telecommunications Security; Lawful Interception (LI); Requirements of Law Enforcement Agencies
    [11]TR 102 053 Telecommunications Security; Lawful Interception (LI); Notes on ISDN lawful interception functionality.
    [12]TR 101 944 Telecommunications Security; Lawful Interception (LI); Issues on IP Interception.
    [13]TR 101 943 Telecommunications Security; Lawful Interception (LI); Concepts of Interception in a Generic Network Architecture.
    [14]Aqsacom SA and Aqsacom Inc, LAWFUL INTERCEPTION FOR IP NETWORKS White Paper,2005.11
    [15]孙海长,郝桂英,刘凤,李志勇, Internet网络上合法侦听模型的研究,《微计算机信息》(管控一体化)2006年第22卷第9-3期
    [16]曹雄,王芙蓉,莫益军,对NGN中合法监听的研究,电信科学2006,22(2)
    [17]黄远,马宏,陈越,NGN网络合法侦听技术的研究与分析,微计算机信息,2007,23(18)
    [18]万国根,面向内容的网络安全监控模型及其关键技术研究,博士学位论文,2005.3.20
    [19]Zhang Shiyong, WuChengrong, Guo Wei, "Network Monitoring in Broadband Network," wise, vol.2, pp.0171, Second International Conference on Web Information Systems Engineering (WISE'01) Volume 2,2001
    [20]张世永;严明;郭巍,“专利:基于网络处理器和CPU阵列的交换架构的安全过滤分流器”,CN1610335,2005.04
    [21]张世永;严明;郭巍,“专利:保持连接特性的高速过滤分流方法”,CN1564547,2005.01
    [22]张世永,吴承荣,,严明,,杨明,“专利:基于有用连接数据完整的安全内容过滤分流 器”,(:N200710036221.4,2007.10
    [23]吴承荣,张世永,黄伟,梁瑾,“专利:远程用户操作过程记录和还原的方法”ZL101139037.9,2004.3
    [24]吴承荣;曾剑平;王巍,“专利:基于时间信息的关键子话题提取方法”,200910054888.6
    [25]吴承荣、张世永、奚文、吴杰,“专利:基于监控探针联动的网络安全事件溯源系统与方法”,200610148784
    [26]C. Fraleigh, C. Diot, B. Lyles, S. Moon, P. Owezarski, D.Papagiannaki, F. Tobagi, "Design and Deployment of a Passive Monitoring Infrastructure," Passive and ActiveMeasurement Workshop (PAM) 2001, Amsterdam, TheNetherlands, April,2001.
    [27]Shipra Agrawal, C. N. Kanthi, K. V. M. Naidu, Jeyashankher Ramamirtham, Rajeev Rastogi,Scott Satkin, and Anand Srinivasan "Monitoring Infrastructure for Converged Networks and Services", Bell Labs Technical Journal 12(2),63-78 (2007)
    [28]Alexandru Costan, Ciprian Dobre, Valentin Cristea, Ramiro Voicu, "A Monitoring Architecture for High-Speed Networks in Large Scale Distributed Collaborations," ispdc, pp.409-416,2008 International Symposium on Parallel and Distributed Computing,2008
    [29]蒋文保,郝双,戴一奇,刘庭华,“高速网络入侵检测系统负载均衡策略与算法分析”,清华大学学报(自然科学版),2006年第46卷第1期
    [30]王长安,王勇,“基于负载均衡实现高速网下入侵检测系统研究与实验”,网络安全技术与应用2008,(9)
    [31]Vinay J. Ribeiro, Rudolf H. Riedi, Matthew S. Crouse, and Richard G Baraniuk, "Multiscale Queuing Analysis of Long-Range-Dependent Network Traffic", Proceedings IEEE INFOCOM'00, March 2000.
    [32]侥云华,曹阳,杨艳,王习藿,“基于FARIMA模型的网络排队性能分析”,计算机工程,第32卷,第23期,2006年12月
    [33]N.Likhanov, B.Tsybakov and N.D.Georganas, "Analysis of an ATM Buffer with Self-Similar ("Fractal") Input Traffic", Proc.IEEE INFOCOM'95, Boston, MA,1995, pp.985-992.
    [34]B.Tsybakov, "SELF-SIMILAR TRAFFIC AND UPPER BOUNDS TOBUFFER-OVERFLOW PROBABILITY IN AN ATMQUEUE", Performance Evaluation Volume 32, Issue 1, February 1998,57-80
    [35]B. Tsybakov, N.D.Georganas, "On self-similar traffic in ATM queues:definitions, overflow probability bound, and cell delay distribution", Networking, IEEE/ACM Transactions Volume:5 Issue:3,1997,397-409
    [36]Henk C. Tijms, "A First Course in Stochastic Models",2003, ISBN 0-471-49880-7
    [37]Sunita Sarawagi, "Information Extraction", Foundations and Trends in Databases Vol.1, No. 3 (2007)261-377
    [38]李保利,陈玉忠,俞士汶,“信息抽取研究综述”,《计算机工程与应用》第39卷第10期第1-5页(2003)
    [39]刘迁,焦慧,贾惠波,“信息抽取技术的发展现状及构建方法的研究”,《计算机应用研究》第24卷第7期,2007年7月
    [40]陈钊,张冬梅,“Web信息抽取技术综述”,《计算机应用研究》2010年12期。
    [41]LiuL, PuC, Han W.X, "WRAP:An XML-enable Wrapper Construction System for Web Information Resource." In Proceedings of the 16th IEEE Intenrational Conference on Data Engineering[C], San Diego,California,2000
    [42]Valter Crescenzi,Giansalvatore Mecca,Paolo Merialdo, "RoadRunner:Towards Automatic Data Extraction fromLarge Web Sites", VLDB2001:109-118
    [43]L.Yi, B.Liu, X.L.Ii."Eliminating Noisy Information in Web Pages for Data Mining".KDD-03, 2003:331-335
    [44]Q.Chen, WSu, GC.Jisuanji. " Web Information Extraction Based on Web Structure Tree " Computer Engineering,2005,31 (20):54-55
    [45]何丽,韩文秀,“一种基于后缀树的Web访问模式挖掘算法”,计算机应用第24卷第11期,2004年11月
    [46]车万翔,刘挺,李生,“实体关系自动抽取”,中文信息学报,第19卷第2期,2004年11月
    [47]王晓斌,王鹏坡,石昭祥,自动粒度选择的半结构化页面信息抽取计算机工程与应用2009,45(6)
    [48]陈挺,刘嘉勇,夏天,范刚,“基于平板型Web论坛的信息抽取研究”,成都信息工程学院学报,第24卷第1期,2009年2月
    [49]李效东顾毓清,“基于DOM的Web信息提取”,计算机学报,第25卷第5期,2002年5月。
    [50]崔继馨,张鹏,杨文柱,“基于DOM的Web信息抽取”,河北农业大学学报,第28卷第3期,2005,5
    [51]蒲筱哥,“WEB自动文本分类技术研究综述”,情报学报,第28卷第2期2009年4月
    [52]Charles Sutton, Andrew McCallum, "In Introduction to Statistical Relational Learning (2006)"
    [53]Andrew McCallum, "Efficiently Inducing Features of Conditional Random Fields", Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03) (2003)
    [54]周俊生,戴新宇,尹存燕,陈家骏,“基于层叠条件随机场模型的中文机构名自动识别”,电子学报,200634(5)
    [55]王昊,“基于层次模式匹配的命名实体识别模型”,《现代图书情报技术》,2007年第5期
    [56]于江德,樊孝忠,庞文博,余正涛,“基于条件随机场的语义角色标注”,Journal of Southeast University(English Edition),2007年第3期
    [57]毛新年,董远,庞文博,何赛克,王海拉,“一种基于条件随机场和最大熵的两阶段识别中文命名实体方法”,《中国计算技术与语言问题研究——第七届中文信息处理国际会议论文集》2007年
    [58]Ming Li, Yabin Wang, Fuzhong Nian, Xuyang Wang, "Research of Applying Chain Conditional Random Fields to Semantic Role Labeling," kam, vol.1, pp.351-354,2009 Second International Symposium on Knowledge Acquisition and Modeling,2009
    [59]丁艳辉李庆忠董永权彭朝晖,“基于集成学习和二维关联边条件随机场的Web数据语义标注方法”,《计算机学报》2010年第2期
    [60]M. Skounakis, M. Craven & S. Ray. Hierarchical Hidden Markov Models for Information Extraction [A]. Proc. of the 18th International Joint Conference on Artificial Intelligence [C]. Acapulco, Mexico:Morgan Kaufmann,2003.427-433.
    [61]Chengrong Wu, Linghui Gong, Jianping Zeng. Multi-document Chinese Name Disambiguation Based on Latent Semantic Analysis. FSKD 2010
    [62]Jianping Zeng, Chengrong Wu, Wei Wang(汪卫). Multi-Grain Hierarchical Topic Extraction Algorithm for Text Mining. Expert Systems With Applications,2010,37(4):3202-3208
    [63]A. J. Sudbury, The proportion of the population never hearing a rumour, J. Appl. Prob.22 (1985)443.
    [64]K. Sznajd-Weron, J. Sznajd, Opinion evolution in closed community, Int. J. Mod. Phys. C 11 (2000)1157.
    [65]金兼斌,网络舆论的演变机制,《传媒》,2008年第4期
    [66]曹劲松,网络舆情的发展规律, 新闻与写作NEWS AND WRITING,2010, (5)
    [67]C. R. Sunstein, Why Societies Need Dissent, Harvard University Press,2003.
    [68]谢海光,陈中润,“互联网内容及舆情深度分析模式”,中国青年政治学院学报,2006年第3期
    [69]中国人民大学舆论研究所,“2010年全国两会网络舆情研究报告”,2010年3月15日
    [70]孙晓茜,林思明,刘锐,程学旗,“媒体舆论引导仿真”,智能系统学报,第5卷第2期,2010年4月
    [71]汪小帆,李翔,陈关荣,“复杂网络理论及其应用”,清华大学出版社,ISBN 7-302.12505-8,2006.4
    [72]Watts D J, Strogatz S H. Collective dynamics of small-word' networks. Nature,1998
    [73]Barabasi A L, Albert R. Emergence of Scaling in Random Networks, Science 1999
    [74]Yamir Moreno, Maziar Nekovee, and Amalio F. Pacheco, "Dynamics of rumor spreading in complex networks ", PHYSICAL REVIEW E 69,066130 (2004)
    [75]Chengrong Wu, Qinqin Wang, Jianping Zeng, Songnian Li. Modeling the Emergence on Dynamic Peer to Peer Network. ASID 2010.
    [76]Jianping Zeng, Chengrong Wu, Wei Wang(汪卫). Multi-Grain Hierarchical Topic Extraction Algorithm for Text Mining. Expert Systems With Applications,2010,37(4):3202-3208.
    [77]Jianping Zeng, Linghui Gong, Qinqin Wang, Chengrong Wu. Hierarchical Clustering for Topic Analysis Based on Variable Feature Selection. In proceedings of FSKD 2009,477-481.
    [78]Jianping Zeng, Shiyong Zhang, Chenrong Wu, Xiangwen Ji. Modelling the Topic Propagation over the Internet. Mathematical and Computer Modelling of Dynamical Systems, 2009,15(1):83-93.
    [79]Wei Wang(王巍), Jianping Zeng, Chengrong Wu(*), and Shiyong Zhang. A Framework for Network Topic Analysis Based on Search Engine.计算机工程,2009,35(3):257-260.
    [80]李增扬,韩秀萍,陆君安,何克清,“内部演化的BA无标度网络模型”,复杂系统与复杂性科学,第2卷第2期,2005年4月
    [81]Lubos Buzna, Karsten Peters, Dirk Helbing, "Modelling the dynamics of disaster spreading in networks", Physica A 363 (2006) 132-140
    [82]SEN QIN, GUAN-ZHONG DAI, YAN-LING LI, "DESIGN AND IMPLEMENTATION OF WEB HOT-TOPIC TALK MINING BASED ON SCALE-FREE NETWORK", Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian,13-16 August 2006
    [83]Xin Biao Lu Xiao Fan Wang Jin Qing Fang. "Consensus in Scale-free Networks", Communications, Circuits and Systems Proceedings,2006 International Conference, June 2006
    [84]Hua Wang, Yi Guo, " Consensus on scale-free network",2008 American Control Conference, June 11-13,2008
    [85]张立,刘云,“虚拟社区网络的演化过程研究”,物理学报,第57卷第9期2008年9月
    [86]国家统计局. 行业分类标准(行业分类标准(GB/T 4754-2002)). http://www.stats.gov.cn/tjbz/hyflbz/
    [87]T. K. Landauer, P. W. Foltz, D. Laham. Introduction to Latent Semantic Analysis. Discourse Processes,25, pp259-284.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700