用户名: 密码: 验证码:
流量分析与流记录分析系统的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,我国互联网特别是移动互联网迅猛发展,截止到2012年6月底,中国网民数量达到5.38亿,互联网普及率为39.9%。网络流量监控成为运营商进行网络管理和运营的重要手段,但随着网络应用的多样化,网络流量的识别和分类面临重大挑战。使用何种或多种识别方法能够对网络流量进行精确的识别并保证低的误判率已经成为当前研究的热点。随着网络线速越来越高,网络流量数据量大小急剧增长,普通的分析方法已经无法满足海量的流量数据分析需求。Google提出的MapReduce编程模型成为了海量数据分析的重要方法,而开源的Hadoop分布式平台克隆了这一模型,并得到了学术界和工业界的认可,Hadoop已经成为分析处理海量数据的重要手段。
     本文首先介绍了网络流量识别技术,包括深度报文检测和深度流检测。随后还介绍了海量数据分析平台,特别是Hadoop系统以及它在流量分析方面的应用。
     在研究流量识别技术的基础上,我们研发了网络流量分析分类系统(Traffic Analysis and Classification System, TACS)。本文详细介绍了该系统的主要功能、整体设计方案和关键子模块的设计说明。为了分析海量的流量数据,我们研发了基于Hadoop的海量流量数据分析系统LogAnalyser,使得处理分析海量数据变得方便快捷。本文详细介绍了LogAnalyser系统的主要功能、整体设计方案和关键子模块的设计说明。
     最后,本文使用TACS和LogAnalyser分别对报文数据和流记录数据进行分析,研究ADSL和CDMA网络中P2P流媒体业务的流量特征和GPRS网络的业务分布及网络质量特征。
In recent years, the Internet in China, especially the mobile Internet, was rapidly developed. Until the end of June2012, the number of Internet users in China has reached538million, the Internet penetration rate is39.9%. Network traffic monitoring has become an important technical measure to ISPs for network management and operation. With the diversification of network applications, the identification and classification of the network traffic is facing grand challenges. Research on identification and classification methods which can achieve high accuracy and low error rate has become a hot point. With the increasing of the network speed, the size of network traffic data increases sharply, the common analysis method has been unable to meet the massive traffic data analysis needs. Google's MapReduce programming model has become an important method for massive data analysis, and then Hadoop cloned this model and has been recognized by both academia and industry. Hadoop has become an important tool of massive data analysis.
     This thesis first introduces the network traffic identification and classification techniques, including deep packet inspection and the deep flow inspection methods. Then, the massive data analysis platform, especially Hadoop system and its application in flow analysis are introduced.
     We have developed a network traffic analysis and classification system (TACS) based on the research on traffic identification technology. This thesis describes the main function of TACS, the overall design of the program and the key sub-module design description. In order to analyze the vast amounts of traffic data, we developed a hadoop based system, LogAnalyser, making the processing and analysis of massive data become quick and easy. This thesis describes the main function, the design scheme of overall and key sub-modules of LogAnalyser.
     Finally, the ADSL and CDMA network traffic characteristics of P2P streaming application, GPRS network services distribution and network quality characteristics are analyzed using TACS and LogAnalyser.
引文
[1]中国互联网络信息中心(CNNIC)第30次中国互联网络发展状况统计报告2012.7 http://www.cnnic.cn/
    [2]中国互联网信息中心(CNNIC)中国移动互联网发展状况调查报告 2012.3http://www.cnnic.cn/
    [3]Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung The Google file system In Proceedings of the nineteenth ACM symposium on Operating systems principles October 19-22 2003 Bolton Landing NY USA
    [4]Jeffrey Dean, Sanjay Ghemawat MapReduce:simplified data processing on large clusters In Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation December 06-08 2004 San Francisco CA pp:10-10
    [5]CHANG F., DEAN J., GHEMAWAT S., et al. Bigtable:A distributed storage system for structured data In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06) (2006)
    [6]杨洁,袁仑,林平等Characterizing Internet Backbone Traffic Based on Deep Packets Inspection and Deep Flows Inspection 中国通信 2012 9(5):42-54
    [7]Madhukar A, Williamson C A longitudinal study of P2P traffic classification In 14th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems September 2006
    [8]Moore A, Papagiannaki K Toward the accurate identification of network applications In Proc. Passive and Active Measurement Workshop (PAM2005) Boston MA USA March/April 2005
    [9]Thuy T.T. Nguyen, Grenville Armitage A Survey of Techniques for Internet Traffic Classification using Machine Learning IEEE Communications Surveys and Tutorials 2008 10(4):56-76
    [10]Mingjiang Ye AutoSig-Automatically Generating Signatures for Applications IEEE International Conference on Computer and Information Technology Oct 2009 pp:104-109
    [11]杨洁,马婧Signature Based Identification of P2P Streaming Media Traffic 2010 2nd International Conference on Intellectual Technique in Industrial Practice (ITIP 2010) 2010年9月8-9中国 长沙 pp:308-312
    [12]Garcia-Dorado J L, Hernandez J A, Aracil J, et al. On the Duration and Spatial Characteristics of Internet Traffic Measurement Experiments IEEE Communications Magazine vol.46 2008 pp:148-155
    [13]Roughan M, Sen S, Spatscheck O, et al. Class-of-Service mapping for QoS:A statistical signature-based approach to IP traffic classification In:Proc. of the ACM SIGCOMM Internet Measurement Conf Taormina 2004 pp:135-148
    [14]Zuev D, Moore A W Traffic classification using a statistical approach In: Dovrolis C, ed. Proc. of the PAM 2005. LNCS 3431 Heidelberg Springer-Verlag 2005 pp:321-324
    [15]Nguyen T T T, Armitage G Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks In:Proc. of the 31st IEEE LCN 2006 Tampa 2006 pp:369-376
    [16]Quinlan J R C4.5:Programs for machine learning 1993 San Francisco CA USA Morgan Kaufmann Publishers Inc pp:302
    [17]Tavallaee M, Lu W, Ghorbani A A Online Classification of Network Flows In Proceedings of the 2009 Seventh Annual Communication Networks and Services Research Conference (CNSR'09) IEEE Computer Society Washington. DC USA pp:78-85
    [18]Quinlan J R Bagging, boosting, and C4.5 In Proceedings of the Thirteenth National Conference on Artificial Intelligence 1996 pp:725-730
    [19]Schapire R E The strength of weak learnability Machine Learning 1990 5(2):197-227.
    [20]Yang A M, Jiang S Y, Deng H A P2P Network Traffic Classification Method Using SVM In Proceedings of the 9th International Conference for Young Computer Scientists (ICYCS'08) IEEE Computer Society Washington. DC USA pp:398-403
    [21]Liu F, Li Z T, Nie Q B A new method of P2P traffic identification based on Support Vector Machine at the host level In Proceedings of the 2009 International Conference on Information Technology and Computer Science (ITCS'09) IEEE Computer Society Washington. DC USApp:579-582
    [22]杨洁,袁仑,贺阳等Timely traffic identification on P2P streaming media中国邮电高校学报(英文版)2012 19(2):67-73
    [23]Freund Y, Schapire R E A decision-theoretic generalization of on-line learning and an application to boosting Journal of Computer and System Sciences 55(1) August 1997 pp:119-139
    [24]覃雄派,王会举,杜小勇等 大数据分析——RDBMS与MapReduce的竞争与共生 软件学报 2012 23(1):32-45
    [25]L. Zhiqiang, L. Hongyan, M. Gaoshan MapReduce-based Backpropagation Neural Network over large scale mobile data In Proc. Natural Computation (ICNC) 2010 Sixth International Conference on 2010 pp:1726-1730
    [26]M. Xuefeng, W. Wenjun A Parallelized Network Traffic Classification Based on Hidden Markov Model In Proc. Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) 2011 International Conference on 2011 pp:107-112
    [27]L. Youngseok, K. Wonchul, S. Hyeongu An Internet traffic analysis method with MapReduce In Proc. Network Operations and Management Symposium Workshops (NOMS Wksps) 2010 IEEE/IFIP 2010 pp:357-361
    [28]Alfred V.aho, Margaret J.corasick Efficient String Matching-An Aid to Bibliographic Search 1975
    [29]丛蓉 基于采样的网络流量分类技术研究 [学位论文] 北京,北京邮电大学,2012
    [30]K. Papagiannaki, N. Taft, S. Bhattachayya, et al. On the feasibility of identifying elephants in Internet backbone traffic Technical Report TR01-ATL-110918 Sprint Labs Sprint ATL November 2001
    [31]X. Hei, C. Liang, Y. Liu, et al. A measurement study of a large-scale P2P IPTV system In IEEE Trans. Multimedia vol.9 no.8 Dec.2007 pp:1672-1687

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700