基于流量测量的高速IP业务感知技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于流量测量的高速IP业务感知技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on High-speed IP Service Awareness Based on Traffic Measurement
作者：张震
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：流量分类 ; 深度流检测 ; 深度用户检测 ; 流量测量 ; 机器学习 ; 近邻传播聚类 ; 半监督学习
英文关键词：Traffic Classification ; Deep Flow Inspection ; Deep User Inspection ; Traffic
英文关键词：Measurement ; Machine Learning ; Affinity Propagation Clustering ; Semi-supervised Learning
学位年度：2012
导师：汪斌强
学科代码：081001
学位授予单位：解放军信息工程大学
论文提交日期：2012-10-15

摘要

业务感知技术是深入理解网络内在本质、有效掌握网络运行状况的根本途径，是进行应用趋势分析、QoS管理、网络优化以及异常行为检测的重要手段。近年来随着现代网络技术的迅猛发展，互联网在总体规模和体系结构上发生了重大改变：用户数量快速膨胀、业务类型日趋复杂、P2P流量吞噬网络带宽、非法信息日益泛滥，端口伪装和应用层加密技术得到广泛应用。传统的基于端口和有效载荷的检测方法已不能适应当前及未来的业务感知需求，高速IP网络中的流量分类技术正面临着严峻的挑战。
     本文依托国家863计划重大课题“面向三网融合的统一安全管控网络”，结合项目对用户终端和融合业务实时识别和控制需求，立足于高速IP骨干网的流量测量为数据支撑，重点讨论了高速网络中业务的分类识别技术。鉴于基于机器学习的深度流检测技术（Deep Flow Inspection, DFI）和基于行为特征的深度用户检测技术（Deep User Inspection,DUI）在流量分类领域的巨大潜力，论文从“流级”和“用户级”对象的角度出发，紧密围绕“如何提取骨干链路的流量统计特征”和“如何提高流量分类的性能”展开研究，具体而言，本文的主要研究成果如下：
     1.针对传统大流检测算法漏检率高的缺陷，提出了一种基于LRU-BF（Least RecentUsed&Bloom Filters）策略的流量测量算法。该算法使用LRU淘汰机制、Bloom Filters快速表示方案，将“大流过滤”和“大流判断”分离，较大地提高了测量的准确性。基于“帕累托分布”和“超几何分布”推导了错误概率上界的解析表达式。仿真结果表明：与传统Na ve-LRU算法相比，LRU-BF在保持较低错误概率和空间复杂度的同时，也能够满足单线路40Gbps的线速报文处理能力。
     2.针对经典计数布鲁姆过滤器（Na ve Counting Bloom Filters, NCBF）低准确性和低空间利用率的缺陷，提出了几何布鲁姆过滤器（Geometric Bloom Filters， GBF）概要数据结构。该结构通过引入“哈希指纹”、布鲁姆过滤器两次分割、基于桶负载存放的方法，实现了流量统计特征的简洁表示和快速查询。基于“微分方程”理论对GBF模型进行了理论分析和求解，建立了错误概率和计算复杂度的关系表达式，论证了GBF的几何分布特性。与NCBF进行了对比仿真，结果表明：在计算复杂度相等的前提下，GBF的错误概率可降低至10-2数量级，空间利用率提升了约20%。
     3.针对传统方法分类精度较低的缺陷，提出了一种基于半监督近邻传播学习的流量分类方法（Traffic Classificationbased onSemi-supervised Affinity Propagation， SAP）。通过引入“近邻传播聚类”机制构建分类模型，使得分类器实现过程简单、运行高效，且不受初始点选择的困扰。应用“半监督学习”思想：将少量已标记样本流抽象为成对点约束，修改样本流之间的距离测度；应用“ε-近邻距离”的伸缩机制和“流形相似度”的距离测度，获取了样本流的空间分布先验信息，使得分类器更加贴近实际的网络环境。基于“中心极限定理”和“契比雪夫不等式”分析了SAP算法的分类凝聚性能，实验结果表明：在误差平方和保持较低的同时，算法的分类精度可提升至90%左右。
     4.针对近邻传播学习算法（Affinity Propagation, AP）较高计算复杂度和较低准确性的缺陷，提出一种分层组合型半监督近邻传播学习算法（Semi-supervised AffinityPropagation Algorithm based on Stratified Combination, SAP-SC）。SAP-SC继承并扩展了SAP的“半监督”的思想：通过引入“分层聚类”的思想，将一次聚类过程等分成若干次SAP聚类，每层只抽样处理聚类“困难”的数据点；使用“组合提升”的方法来提高聚类性能，通过加权组合投票决定每个数据流的所属类簇。最后，对算法的准确率和计算复杂度进行了理论分析和实验仿真，结果表明：与AP和SAP相比，SAP-SC的计算复杂度降低了O(N1/2)，分类精度提升至98%。
     5.针对传统机器学习分类算法的“概念漂移”问题，提出了一种基于用户连接图的流量分类机制（Internet Traffic Classification based on Host Connection Graph， HCG）。算法将{IP Address, Port}作为用户唯一标识，构造了用户连接图；应用“图挖掘”理论将用户连接图划分为互不相交的行为子簇，使得用户之间的通信抽象为一种“社会团体行为”；定义了基于信息熵的“用户行为模式”（User Behavior Mode, UBM），并使用“UBM+Port”对用户行为子簇进行了业务标签映射，实现了流量分类的目的。最后，立足于实际的网络链路数据进行了仿真实验，结果表明：在不牺牲识别准确率和计算复杂度的前提下，算法能够克服“概念漂移”问题。
Traffic classification serves as the basic for deeply understanding the essence of networkand effectively comprehending the operation of network. It is also an important component ofnetwork applications including trending network applications, QoS management, network opti-mization and anomaly behavior detection. With the rapid development of information technology,Internet has undergone great changes in the overall scale and architecture. The number of com-puter users is expanding fleetly. Network service becomes more diversiform. P2P traffic con-sumes backbone bandwidth at all times. Illegal information floods every hole and corner in In-ternet. Especially, technologies of disguising ports and encrypting application layer data arewidely utilized in reality. Confronted with those challenges and various contradictions, trafficclassification based on port and payload signature has not been functional.
     Combined with the fundamental technique research task of identifying user terminals andservices in the Common Security and Control Framework in Tri-Network Convergence projectbelonging to the National High-Tech Research and Development Program of China (863Pro-gram), this dissertation primarily discussed how to better classify network traffic based on meas-urement in high-speed backbone link. Considering the great potential of Deep Flow Inspection(DPI) based on machine learning and Deep User Inspection (DUI) based on user behavior intraffic classification, the paper circumvents two central scientific questions from the flow-leveland user-level viewpoints: How to extract traffic characteristics from the high-speed backbonelink? and How to improve the performance of traffic classification?. Its main work andachievements are outlined as follows:
     1. Considering the na ve algorithm s deficiency of high false negative probability, a novelscheme called LRU-BF (Least Recent Used&Bloom Filters) is presented. In order to achievehigh accuracy, the algorithm adopts mechanisms of LRU eliminating and Bloom Filters repre-sentation to separate the process of heavy-hitter filtration from the heavy-hitter recognition.Based on Pareto distribution and hypergeometirc distribution, analytical expressions about up-per-bound error probability are deduced. Simulated results indicate that LRU-BF can achievespace saving and lower error probability compared with Na ve-LRU algorithm. Meanwhile, itcan also support the40Gbps line-speed processing.
     2. Considering the deficiencies of Na ve Counting Bloom Filters (NCBF) which involvelower accuracy and lower space saving, a novel date structure called Geometric Bloom Filters(GBF) is presented. In order to achieve space-efficient storage and fast query, the structureadopts the following methods: introducing hash fingerprints, partitioning Bloom Filter twice andstoring elements based on bucket load. Based on theory of differential equation, analytical ex-pressions are deduced. Also, the relative expressions between error probability and space co m-plexity are conducted. In addition, the inner characteristic of GBF taking on geometric distribu-tion is proofed. Simulated results indicate that GBF can decrease the error probability to10-2andachieve20%space saving without sacrificing computational complexity compared with Na ve Counting Bloom Filter.
     3. Considering the inferior accuracy of traditional classified methods, a novel schemecalled Semi-supervised internet traffic identification based on Affinity Propagation (SAP) is pre-sented. In order to circumvent the problem with choosing initial points, the method introducesaffinity propagation clustering to construct classification model simply and effectively. Based onthe idea of semi-supervised, a few restrictions of labeled flows and priori manifold distributionof sampled space are abstracted. Also, manifold similarity is defined. Henceforth, thesemi-supervised method can not only largely reduce the complexity of marking sampled flows,but also nicely improve the performance of the classifier. Based on central limit theorem andChernoff bounds, the cohesive performance is analyzed. Experimental results show that the algo-rithm can both achieve90%classification accuracy and keep a lower sum of the squared error.
     4. Considering the complexity and accuracy of Affinity Propagation (AP), an improved af-finity propagation clustering algorithm called Semi-supervised Affinity Propagation clusteringalgorithm based on Stratified Combination (SAP-SC) is devised. SAP-SC succeeds to and ex-tends SAP. Introducing the stratified clustering method, the proposed algorithm equally partitionsthe integrative clustering process into several smaller blocks. Furthermore, focusing on the hardclustering data, every layer employs semi-supervised learning to conceive pairwise constraintsand map each sub-cluster with the corresponding label. In order to improve the clustering per-formance, assembled boosting method is utilized to weight together all layered results. Finally,theoretical analysis and experimental results show that computational complexity is degraded byO(N1/2) and the overall classification precise is boosted to98%.
     5. Considering the concept drift problem of traditional machine learning identificationmethods, a novel algorithm called traffic classification based on Host Connection Graph (HCG)is proposed. Considering {IP Address, Port} as the unique user identifier, HCG constructs a hostconnection graph and innovates the concept of user similarity. Based on the theory of graphmining, social community is abstracted from communications among hosts by partitioning thegraph into mutually intersectant behavior clusters. In order to reach traffic classification, HCGnot only conceives a definition called User Behavior Mode (UBM) to analyse the implicit trafficcharacteristics, but also maps application labels to every host behavior by employing UBM andPort. Finally, simulations are conducted based on the real network trace. Results demonstrate thatHCG can circumvent the concept shift problem and ameliorate gracefully computational com-plication without sacrificing accuracy.

引文

[1] ISC. Internet Systems Consortium [EB/OL]. http://www.isc.org.
    [2] CNNIC.中国互联网络信息中心[EB/OL]. http://www.cnnic.net.cn.
    [3]韦乐平．中国信息产业网[EB/OL]. http://www.cnii.com.cn/contenet/2011-06/09/content_883772.html.
    [4]中华人民共和国国务院．推进“三网融合”总体方案：国务院关于印发推进“三网融合”总体方案的通知（国发[2010]5号文）[EB/OL]． http://wenku.baidu.com/view/14fb1f2c4b73f24233605f8f.html.
    [5] E K Lua, J Crowcroft, M Pias, R Sharma, S Lim. A survey and Comparison of Peer-to-PeerOverlay Network Schemes [J]. IEEE Communications Survey&.Tutorial,2004,6(1):1-22.
    [6] DN Ren, YT Li, SH Chan. Fast-Mesh: A Low-Delay High-Bandwidth Mesh for Peer-to-Peer Live Streaming [J]. IEEE Transactions on Multimedia,2009,11(8):1446-1456.
    [7]刘琼,徐鹏,杨海涛,彭芸. Peer-to-Peer文件共享系统的测量研究[J].软件学报,2006,17(10):2131-2140.
    [8] MacManus R. Trend Watch: P2P Traffic Much Bigger than Web Traffic [EB/OL]. http://www.readwriteweb.com/archives/p2p_growth_trend_watch.php.
    [9] Mochalski K, Schulze H. Ipoque internet study2008/2009[EB/OL]. http://www.ipoque.com/resources/internet-studies/internet-study-2008_2009.
    [10] Cisco Visual Networking Index-Forecast and Methodology [EB/OL]. http://www.cisco.com/en/US/solution/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360_ns827_Networking_Solutions_White_Paper.html.
    [11]腾讯QQ专家网络安全形势分析报告[EB/OL]. http://guanjia.qq.com/security/report2011/page.html?id=5.
    [12] Danny McPherson. SP Infrastructure Security Survey Results [EB/OL]. http://www.arborn-etworks.com.
    [13]国家科技部863计划信息技术领域办公室．三网融合项目申请指南[EB/OL]. http://www.863.gov.cn/FuJianPath/1011/08/FJ_101108-14-01-49-8218_978.pdf.
    [14]李玉峰，兰巨龙，薛向阳．面向三网融合的统一安全管控技术[J]．中兴通信技术，2011，17(4)：23-28．
    [15] IPFIX Working Group. IP Flow Information Export Version4.34[EB/OL]. http://www.ietf.org/html.charters/ipfix-charter.html.
    [16] Claffy K C. Internet Traffic Characterization [D]. PhD Thesis, Department of ComputerScience and Engineering, University of California,1994.
    [17]杨家海，吴建平，安常青．互联网络测量理论与应用[M]．北京：人民邮电出版社，2009．
    [18] Arthur Callado, Carlos Kamienski, Géza Szabó, Balázs Péter-Ger, Judith Kelner, StênioFernandes and Djamel Sadok. A Survey on Internet Traffic Identification [J]. IEEE Commu-nications Surveys&Tutorials,11(3),2009:37-52.
    [19] IANA. Internet Assigned Numbers Authority (IANA)[EB/OL]. http://iana.org.
    [20] Ipoque Co. Internet Study2007[EB/OL]. http://www.ipoque.com/news_&_events/internet_studies/internet_study_2007.
    [21] Pouwelse J, Garbacki P, Epema D, and Sips H. The Bittorrent P2P File-sharing System:Measurements and Analysis [A]. In: Proceedings of the4thIPTPS [C], Ithaca, New York,2005:205-216.
    [22] Plissonneau L, Costeux J L, and Brown P. Detailed Analysis of eDonkey Transfers onADSL [A]. In: Proceedings of Next Generation Internet Design and Engineering [C], Valen-cia, Spain,2006:255-262.
    [23] Baset S A and Schulzrinne H G. An Analysis of Skype Peer-to-Peer Internet Telephony Pro-tocol [A], In: Proceedings of INFOCOM [C], Barcelona, Spain,2006:1-11.
    [24] A Madhukar, C Williamson. A Longitudinal Study of P2P Traffic Classification [A]. In:Proceedings of IEEE International Conference Symposium on Modeling, Analysis and S im-ulation of Computer and Telecommunication System (MASCOTS2006)[C], Monterey, CA,USA,2006:179-188.
    [25] Basher N, Mahanti A, Williamson C, and Arlitt M. A Comparative Analysis of Web andPeer-to-Peer Traffic [A]. In: Proceedings of the17thInternational Conference on World WideWeb [C], NewYork, USA,2008:287-296.
    [26]鲁刚，张宏利，叶麟．P2P流量识别[J]．软件学报，2011，22(6)：1281-1298．
    [27] Sen S, Spatscheck O, Wang DM. Accurate, Scalable in-Network Identification of P2P Traf-fic Using Application Signatures [A]. In: Proceedings of the13thInternational Conferenceon WorldWide Web [C]. New York City, USA,2004:512-521.
    [28] Haffner P, Sen S, Spatscheck O and Wang DM. ACAS: Automated Construction of Applica-tion Signatures [A]. In: Proceedings of ACM SIGCOMM Workshop on Mining NetworkData (MineNet2005)[C]. New York,2005:197-202.
    [29] Liu XB, Yang JH, Xie GG, Hu Y. Automated Mining of Packet Signatures for Traffic Identi-fication at Application Layer with Apriori Algorithm [J]. Journal on Communications,2009,29(12):51-59.
    [30] Park BC, Won YJ, Kim MS, Hong JW. Towards Automated Application Signature Genera-tion for Traffic Identification [A]. In: Proceedings of of Network Operations and Manage-ment Symposium [C], Salvador,2008:160167.
    [31] Gummmadi K P, Dunn R J, Saroiu S, Gribble S D, Levy H M and Zahorjan J. Measurement,Modeling and Analysis of a Peer-to-Peer File-Sharing Workload [A]. In: Proceedings of the9thACM symposium on Opetating Systems Principles [C], New York, USA,2003:314-329.
    [32] Andrew W M and Konstantina P. Toward the Accurate Identification of Network Applica-tions [A]. In: Proceedings of PAM [C]. Boston,2005:41-54.
    [33] Zhao R. The Research and Implementation of P2P Traffic Identification based on FeatureString [D]. MS. Thesis, University of Electronic Science and Technology of China,2009.
    [34]李伟男，鄂跃鹏，葛敬国，钱华林．多模式匹配算法及硬件实现[J]．软件学报，2006，17(12)：2403-2415．
    [35]谭建龙．串匹配算法及其在网络内容分析中的应用[D]．北京：中国科学院研究生院，2003．
    [36] Cormen Thomas H, Leiserson Charles E, Rivest Ronald L, Steim Clifford. Introduction toAlgorithms, Second Edition [M]. Beijing: China Machine Press,2008:557-568.
    [37] Navarro G, Raffinot M. Flexible Pattern Matching in Strings [M]. Beijing: Publishing Houseof Electronics Industry,2007:13-68.
    [38] Smith R, Estan C, Jha S, Kong SJ. Deflating the Big Bang: Fast and Scalable Deep PacketInspection with Extended Finite Automata [A]. In: Proceedings of the ACM SIGCOMMConference on Data Communication [C]. New York, USA,2008:207-218.
    [39] Xu K, Zhang M, Ye MJ, Chiu DM, Wu JP. Identify P2P Traffic by Inspecting Data TransferBehavior [J]. Journal of Computer Communications,2010,33(10):1141-1150.
    [40] Aceto G, Dainotti A, Donato W and PescapéA. PortLoad: Taking the Best of Two Worlds inTraffic Classification [A]. In: Proceedings of IEEE INFOCOM Conference on ComputerCommunications Workshops [C]. San Diego, California, USA,2010:1-5.
    [41] Dewes C, Wichmann A and Feldmann A. An Analysis of Internet Chat System [A]. In: Pro-ceedings of the3rdACM SIGCOMM Conference on Internet Measurement [C], MiamiBeach, Florida,2003:51-64.
    [42] Ehlert S and Petgang S. Analysis and Signature of Skype VoIP Session Traffic [A]. In: Pro-ceedings of4thIASTED International Conference on Communications, Internet and Info r-mation Technology [C], US Virgin Islands, USA,2006:83-89.
    [43] Guo ZB, Qiu ZD. Identification of BitTorrent Traffic for High Speed Network Using PacketSampling and Application Signatures [J]. Journal of Computer Research and Development,2008,45(2):227-236.
    [44] SNORT. SNORT Network Intrution Detection System [EB/OL]. http://www.snort.org.
    [45] Bro. Bro Intrution Detection System [EB/OL]. http://www-old.bro-ids.org, June.
    [46] Levandoski J, Sommer E and Strait M. Application Layer Packet Classifier for Linux[EB/OL]. http://l7-filer.sourceforg.net/, January,2009.
    [47] Cisco. Cisco SCE2000系列服务控制引擎[EB/OL]. http://www.cisco.com/web/CN/products_netsol/video/products_cable_cable_sce2000.html.
    [48] Allot. Allot NetEnforcer AC-2500Datasheet [EB/OL]. http://www.1st-computer-networks.co.uk/allot-netenforcer-2500.php.
    [49] Centre for Advanced Internet Architectures. Network Traffic based Application Identific a-tion (NetAI)[EB/OL]. http://caia.swin.edu.au/urp/dstc/netai.
    [50] University of Cambridge. Ground Truth Verification System (GTVS). http://www.cl.cam.ac.uk/research/srg/netos/brasil,2009.
    [51] Pang-Ning Tan，Michael Steinbach，Vipin Kumar.数据挖掘导论[M]．北京：人民邮电出版社，2006．
    [52] Jianwei Han and Micheal Kamber．数据挖掘概念与技术[M]．北京：机械工业出版社，2007．
    [53] McGregor A, Hall M, Lorier P, Brunskill J. Flow Clustering Using Machine Learning Tech-niques [A]. In: Proceedings of the Passive and Active Network Measurement [C], Heidel-berg: Springer-Verlag,2004:205-214.
    [54] Zander S, Nguyen T, Armitage G. Automated Traffic Classification and Application Identi-fication Using Machine Learning [A]. In: Proceedings of the IEEE Conference on LocalComputer Networks [C], Sydney: IEEE Computer Society Press,2005.250-257.
    [55] Bernaille L, Teixeira R, Akodkenou I, Soule A, Salamatian K. Traffic Classification on theFly [A]. In: Proceedings of ACM SIGCOMM Computer Communication Review [C],2006,36(2):23-26.
    [56] Erman J, Arlitt M, Mahanti A. Traffic Classification Using Clustering Algorithms [A]. In:Proceedings of ACM SIGCOMM Workshop on Mining Network Data [C], New York, USA,2006.281-286.
    [57] Bernaille L, Teixeira R, Salamatian K. Early Application Identification [A]. In: Proceedingsof ACM CoNEXT [C], New York, USA,2006.1-12.
    [58] M. Roughan, S. Sen, O. Spatscheck, et al.. Class-of-Service Mapping for QoS: A StatisticalSignature-based Approach to IP Traffic Classification [A]. In Proceedings of ACMSIGCOMM Internet Measurement Conference [C]. Taormina, Sicily, Italy,2004:135-148.
    [59] Moore A W and Zuev D. Internet Traffic Classification Using Bayesian Analysis Techniques
    [A]. In: Proceedings of SIGMETRICS [C], Alberta, Canada,2005,50-60.
    [60] Auld T, Moore A W, Gull S F. Bayesian Neural Networks for Internet Traffic Classification[J]. IEEE Transactions on Neural Networks, Janury,2007,18(1):223-239.
    [61] Xu P, Liu Q, Lin S. Internet Traffic Classification Using Support Vector Machine [J]. Jour-nal of Computer Research and Development,2009,46(3):407-414.
    [62] Vapnik V. The Nature of Satistical Learning Theory [M]. New York: Springer-Verlag,1995.
    [63]史忠植，王文杰．人工智能[M]．北京：国防工业出版社，2007．
    [64] Burges C J C. A Tutorial on Support Vector Machine for Pattern Recognition [J]. In: Pro-ceedings of Data Ming and Knowledge Discovery,1998,2(2):121-167.
    [65] Liu F, Li ZT, Hu ZB, et al.. Weight based Multiple Support Vector Machine Identification ofPeer-to-Peer Traffic [J]. Journal of Networks,2010,5(5):577-585.
    [66] Bermolen P, Mellia M, Meo M, Rossi D and Valenti S. Abacus: Accurate Behavioral Classi-fication of P2P-TV Traffic [J]. Computer Networks,2011,55(6):1394-1411.
    [67] Ross Quinlan. C4.5: Programs for Machine Learning [M]. San Francisco: Morgan Kauf-mann Publishers Inc.,1993.
    [68] Xu P, Lin S. Internet Traffic Classification Using C4.5Decision Tree [J]. Journal of Soft-ware,2009,20(10):2692-2704.
    [69] Li W, Canini M, Moore A W, Bolla R. Efficient Application Identification and the Temporaland Spatial Stability of Classification Schema [J]. Computer Networks,2009,53(6):790-809.
    [70] Soysal M, Schmidt E G. Machine Learning Algorithms for Accurate Flow-based NetworkTraffic Classification: Evaluation and Comparison [J]. Performance Evaluation,2010,67(6):451-467.
    [71] Williams N, Zander S, Armitage G. A Preliminary Performance Comparison of Five Ma-chine Learning Algorithms for Practical IP Traffic Flow Classification [J]. ACM SIGCOMMComputer Communication Review,2006,36(5):5-15.
    [72] Zhu XJ. Semi-Supervised Learning Literature Survey [EB/OL]. http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey_7_19_2008.pdf.
    [73] Blum A and Mitchell T. Combining Labeled and Unlabeled Data with Co-training [A]. In:Proceedings of the11thAnnual Conference on Computational Learning Theory [C], NewYork, USA,1998:92-100.
    [74] Joachims T. Transductive Inference for Text Classification Using Support Vector Machines
    [A]. In: Proceedings of the6thInternational Conference on Machine Learning [C], San Fran-cisco, USA,1999:200-209.
    [75] Nigam K, McCallum A K, Thrum S and Michell T. Text Classification from Labeled andUnlabeled Documents Using EM [J], Machine learning,39(2):103-134.
    [76] Blum Avirim and Chawla Shuchi. Learning from Labeled and Unlabeled Data Using GraphMincuts [A]. In: Proceedings of International Conference on Machine Learning, Massachu-setts [C], USA,2001:19-26.
    [77] Szummer Martin and Jaakkola Tommi. Partially Labeled Classification with Markov Ran-dom Walks [A]. Advances in Neural Information Processing Systems [C], Cambridge: MITPress,2002:945-952.
    [78] Zhu X, Ghahramani Z, and Lafferty J. Semi-supervised Learning Using Gaussian Fields andHarmonic Functions [A]. In: Proceedings of the20thInternational Conference on MachineLearning International Workshop [C], Menlo Park, California,2003:912-919.
    [79] Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C. Offline/Realtime Traffic Classifica-tion Using Semi-supervised Learning [J]. Performance Evaluation,2007,64(9-12):1194-1213.
    [80] Qian F, Hu GM, Yao XM. Semi-supervised Internet Network Traffic Classification Using aGaussian Mixture Model [J]. Journal of Electronics and Communications,2008,62(7):557-564.
    [81] Mori T, Uchida M, Goto S. Flow Analysis of Internet Traffic: World Wide Web versusPeer-to-Peer [J]. Journal Systems and Computers in Japan,2005,36(11):70-81.
    [82] Basher N, Mahanti A, Williamson C, Arlitt M, Mahanti A. A comparative Analysis of Weband Peer-to-Peer Traffic [A]. In: Proceedings of the17thInternational Conference on WorldWide Web [C]. New York, USA,2008:287-296.
    [83] Chen QZ, Shao B, Chen C. Design and Implementation of P2P Traffic Identification Systembased on Compound Characteristics [J]. Journal of Southest University (natural science edi-tion),2008,38(S1):109-113.
    [84] Moore A W and Zuev D. Discriminators for Use in Flow-based Classification [R]. Cam-bridge: Intel Research,2005.
    [85] Chhabra P, John A, Saran H. PISA: Automatic Extraction of Traffic Signatures [A]. In: Pro-ceedings of the4thInternational Conference in Networking [C], Waterloo, Canada,2005:730-742.
    [86] Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: Multilevel Traffic Classification inthe Dark [A]. In: Proceedings of SIGCOMM [C], Philadelphia, USA,2005:229-240.
    [87] Collins M, Reiter M. Finding Peer-to-Peer File-sharing Using Coarse Network Behaviors
    [A]. In: Proceedings of the11thEuropeanm Symposium on Research in Computer SecurityConference [C], Hamberg, Germany,2006:1-17.
    [88] Karagiannis Thomas, Andre Broido, Michalis Faloutsos. Transport Layer Identification ofP2P Traffic [A]. In: Proceedings of the4thACM SIGCOMM Conference on Internet Meas-urement [C],2004, New York, USA,2004:121-134.
    [89]鲁文斌，杨家海，刘洪波．基于节点连接模式的P2P节点识别算法[J]．清华大学学报：自然科学版，2009，49(7)：1029-1033．
    [90] Constantinou F, Mavrommatis P. Identifying Known and Unknown Peer-to-Peer Traffic [A].In: Proceedings of the5thIEEE International Symposuim on Network Computing and Ap-plications [C], Cambridge, Massachusetts,2006:93-102.
    [91] Faloutsos M. Traffic Monitoring and Application Classification: A Novel Approach
    [EB/OL]. http://www.ece.uci.edu/~athina/netsys/08/slides/faloutsos-slides.pdf.
    [92] Iliofotou M, Kim H, M Faloutsos. Graph-based P2P Traffic Classification at the Internet-backbone [A]. In: Proceedings of IEEE INFOCOM [C], Rio de Janeiro, Brazil,2009:1-6.
    [93] Gallagher B, Iliofotou M, Eliassi-Rad T, Faloutsos M. Link Homophily in the ApplicationLayer and Its Usage in Traffic Classification [A]. In: Proceedings of INFOCOM [C]. SanDiego, California, USA,2010:1-5.
    [94] Fang W, Peterson L. Inter-AS Traffic Patterns and Their Implications [A]. In: Proceedingsof IEEE GLOBECOM [C], Boston,1999:902-1005.
    [95]金澈清，钱卫宁，周傲英．数据流分析与管理综述[J]．软件学报，2004，15(8)：1172-1181．
    [96] Alberto Dainotti, Antonio Pescapèand Kimberly Claffy. Issues and Future Directions inTraffic Classification [J], IEEE Network,2012,26(1):35-40.
    [97]周明中．大规模网络IP流行为特性及其测量算法研究[D]．南京：东南大学博士学位论文，2006．
    [98]程光，龚俭，丁伟，徐加羚．面向IP流测量的哈希算法研究[J]．软件学报，2005，16(5)：56-64．
    [99] Abhishek Kumar Jun (Jim) Xu. Space-Code Bloom Filter for Efficient Per-Flow TrafficMeasurement [A]. In: Proceedings of IEEE INFOCOM [C], Hongkong,2004:315-328.
    [100] Cisco. Random Sampled Netflow [EB/OL]. http://www.cisco.com/en/US//products/sw/iosswrel/ps5207/products_feature_guide09186a00801a7618.html.
    [101] Estan C, Keys K, Moore D, Varghese G. Building a Better Netflow [A]. In: Proceedings ofSIGCOMM [C], Portland,2004:245-256.
    [102] Mori T, Kawahara R, Naito S, Goto S. On the Characteristics of Internet Traffic Variability:Spikes and Elephants [A]. In: Proceedings of Symposium on Applications and the Internet
    [C], Tokyo,2004:99-106.
    [103] Feldmann A, Greenberg A and et al.. Deriving Traffic Demands for Operational IP Net-works: Methodology and Experience [J]. IEEE/ACM Transactions on Networking,2001,9(3):265-279.
    [104] Estan C. New Directions in Traffic Measurement and Accounting [A]. In: Proceedings ofACM SIGCOMM [C]. Oklahoma City,2002:562-574.
    [105] Kodialam M, Lakshman T V and Mohanty S. Runs bAsed Traffic Estimator (RATE): ASimple, Memory Efficient Scheme for Per-Flow Rate Estimation [A]. In Proceedings ofIEEE INFOCOM [C], Hongkong,2004:542-556.
    [106] Fang Hao, Murali Kodialam, T.V. Lakshman, Hui Zhang. Fast, Memory-Efficient TrafficEstimation by Coincidence Counting [A]. In: Proceedings of IEEE INFOCOM [C], Miami,2005:432-445.
    [107] Abhishek Kumar, Jun (Jim) Xu. Sketch Guided Sampling-Using On-Line Estimates ofFlow Size for Adaptive Data Collection [A]. In: Proceedings of IEEE INFOCOM [C], Bar-celona,2006:467-482.
    [108] Ashwin Lall, Mitsunori Ogihara, Jun (Jim) Xu. An Efficient Algorithm for Measuring Me-dium-to Large-sized Flows in Network Traffic [A]. In: Proceedings of IEEE INFOCOM [C],Rio de Janeiro, Brazil,2009:2711-2715.
    [109] Tatsuya MORI, Tetsuya TAKINE, et al.. Identifying Heavy-Hitter Flows from SampledFlow Statistics [J]. IEICE Transactions,2007, E90-B(11):3061-3071.
    [110] Smitha, Inkoo Kim, A. L. Narasimha Reddy. Identifying Long-term High-bandwidthFlows at a Router [A]. In: Proceedings of the8thInternational Conference on High Perfor-mance Computing [C]. Hyderabad, India,2001:361-371.
    [111] Che LC, Qiu B. Landmark LRU: An Efficient Scheme for the Detection of Elephant Flowsat Internet Routers [J]. IEEE Communication Letters,10(7),2006:567-569.
    [112] Stefan Podlipnig, Laszlo B sz rmenyi. A Survey of Web Cache Replacement Strategies [J].ACM Computing Surveys,35(4),2003:374-398.
    [113] Bloom B. Space/time Tradeoffs in Hash Coding with Allowable Errors [J]. Communica-tions of the ACM,1970,13(7):422-426.
    [114] Garetto M and Towsley D. Modeling, Simulation and Measurements of Queuing Delayunder Long-tailed Internet Traffic [A]. In: Proceedings of ACM SIGMETRICS [C]. San Di-ego, California, USA,2003:47-57.
    [115] Downey B A. Evidence for Long Tailed Distributions in the Internet [A]. In: Proceedingsof ACM SIGCOMM Internet Measurement Workshop [C]. New York, USA,2001.229-241.
    [116]沈恒范．概率论与数理统计教程[M]．北京：高等教育出版社，2010．
    [117]唐策善，李龙澍，黄刘生．数据结构—用C语言描述[M]．北京：高等教育出版社，2006．
    [118] NLANR. National Laboratory for Applied Network Research [EB/OL]. http://pma.nlanr.net/.
    [119] Mullin J K. Optimal Semi-joins for Distributed Database Systems [J]. IEEE Transanctionon Software Engineering,1990,16(5):558-560.
    [120] J Byers, Jeffrey Considine, Michael Mitzenmacher, Stanislav Rost. Informed Content De-livery Across Adaptive Overlay Networks [J]. IEEE/ACM Transactions on Networking,Pittsburgh,2002,12(5):767-780.
    [121]王洪波，程时端，林宇．高速网络超链接主机检测中的流抽样算法研究[J].电子学报，2008，36(4)：809-818．
    [122] Heeyeol Y, Mahapatra R A. Memory-efficient Hashing by Multi-predicate Bloom Filtersfor Packet Classification [A]. In: Proceedings of IEEE INFOCOM, Washington, USA,2008:1795-1803.
    [123] Rhea S C, Kubiatowicz J. Probabilistic Location and Routing [A]. In: Proceedings of IEEEINFOCOM [C], Washington,2002:1248-1257.
    [124] Whitaker A, Wetherall D. Forwarding without Loops in Icarus [A]. In: Proceedings ofIEEE Openarch [C], Washington, USA,2002:63-75.
    [125] Fan L, Cao P, Almeida J and Broder A Z. Summary Cache: A Scalable Wide-area WebCache Sharing Protocol [J]. IEEE/ACM Transactions on Networking,2000,8(3):281-293.
    [126] Mitzenmacher M. Compressed Bloom Filters [J]. IEEE/ACM Transactions on Networking,2002,10(5):604-612.
    [127] Saar C, Yossi M. Spectral Bloom Filters [C]. In: Proceedings of ACM SIGMOD Interna-tional Conference on Management of Data. San Diego, California, USA,2003:241-252.
    [128] Xie K, Ming YH, Zhang DF, Xie GG, Wen JG. Basket Bloom filters for membership que-ries [J]. Chinese Journal of Computers,2007,30(4):597-607.
    [129] Flavio B, Michael M, Rina P, Sushil S, George V. An Improved Construction for CountingBloom Filters [A]. In: Proceedings of the14thConference on Annual European Symposium
    [C], Zurich, Germany,2006:684-695.
    [130] Broder A and Mitzenmacher M. Network Applications of Bloom Filters: A survey [J]. In-ternet Mathematics,2004,1(4):485-509.
    [131] Kurtz T G. Solutions of Ordinary Differential Equations as Limits of Pure Jump MarkovProcesses [J]. Journal of Applied Probability,1970,7(1):49-58.
    [132] Mitzenmacher M. The Power of Two Choices in Randomized Load Balancing [D].Ph.D.thesis, University of Califoria, Berkeley,1996.
    [133] Mitzenmacher M and Upfal E. Probability and Computing: Randomized Algorithm andProbabilistic Analysis [M]. Cambridge, U.K.: Cambridge Univ. Press,2005:823-829.
    [134] Frey B J, Dueck D. Clustering by Passing Messages between Data Points [J]. Science,2007,315(5814):972-976.
    [135] Zhang J, Tuo XG, Zhen Y, Huafu C. Analysis of fMRI Data Using an Integrated PrincipalComponent Analysis and Supervised Affinity Propagation Clustering Approach [J]. IEEETransactions on Biomedical Engineering,2011,58(11):3184-3196.
    [136] He YC, Chen QC, Wang XL, et al. An Adaptive Affinity Propagation Document Clustering[A]. International Conference on Information and System [C]. Cairo, Egypt,2010:1-7.
    [137] Liu HW. Community Detection by Affinity Propagation with Various Similarity Measures
    [A]. International Joint Conference on Computational Sciences and Optimization [C]. Yun-nan, China,2011:182-186.
    [138] Wagstaf K, Cardie C. Clustering with Instance-level Constraints [A]. In: Proceedings ofthe17thInternational Conference on Machine Learning [C]. Stanford: Morgan KaufmannPublishers,2000:1103-1110.
    [139] Bilenko M, Basu S, Mooney R J. Integrating Constraints and Metric Learning inSemi-Supervised Clustering [A]. In Proceedings of the21stInternational Conference onMachine Learning [C]. Canada,2004:81-88.
    [140] Seung HS，Lee DD．The Manifold Ways of Perception [J]. Science,2000,290(5500):2268-2269.
    [141]王守觉．仿生模式识别(拓扑模式识别)—一种模式识别新模型的理论与应用[J]．电子学报，2002，30(10)：1417-1420．
    [142] Sergios Thedoridis, Konstantinos Koutroumbas. Pattern Recognition [M]. Beijing: Pub-lishing House of Electronics Industry,2010:389-407.
    [143] Mitzenmacher M and Upfal E. Probability and Computing: Randomized Algorithm andProbabilistic Analysis [M]. Cambridge, U. K.: Cambridge University Press,2005:44-45.
    [144] Moore A W. Moore Set [EB/OL]. http://www.cl.cam.ac.uk/research/srg/netos/nprobe/data/papsers/sigmetrics/index.html.
    [145]宗瑜，金萍，陈恩红，李红，刘仁金．面Weblog的模糊协同聚类算法[J]．电子与信息学报，2012，34(3)：543-548．
    [146]刘若辰，沈正春，贾建，焦李成．基于免疫优势的克隆选择聚类算法[J]．电子学报，2010，38(4)：960-965．
    [147]叶有时，唐林波，赵保军．一种基于聚类的深空红外多目标快速检测算法[J]．电子与信息学报，2011，33(1)：77-84．
    [148] Martin Ester, Hans Kriegel, Jorg Sander, and Xiaowei Xu. A Density based Algorithm forDiscovering Clusters in Large Spatial Databases with Noise [A]. In Proceedings of Interna-tional Conference on Knowledge Discovery and Data Mining (KDD)[C], Portland,1996:226-231.
    [149]李雄飞，孙涛，武佳薇．对象间矢量感应聚类算法[J]．电子学报，2011，39(6)：1347-1352．
    [150] Luxburg U V. A Tutorial on Spectral Clustering [J]. Statistics and Computing,2007,17(4):395-416.
    [151]董俊，王锁萍，熊范纶．可变相似性度量的近邻传播聚类[J]．电子与信息学报，2010，32(3)：509-514．
    [152] Freund Y, Schapire R E. A Decision-theoretic Generalization of On-line Learning and anApplication to Boosting [J]. Journal of Computer and System Sciences,1997,55(1):119-139.
    [153] Frank A, Asuncion A. UCI Machine Learning Repository [EB/OL]. http://archive.ics.uci.edu/ml.
    [154] Wu C, Li B, Zhao S. Characterizing Peer-to-Peer Streaming Flows [J]. IEEE Journal onSelected Areas in Communications,2007,25(9):1-15.
    [155] Altman E, Nain P, Shwartz A. Predicting the Impact of Measures against P2P Networks onthe Transient Behaviors [A]. In: Proceedings of IEEE INFOCOM [C], Shanghai,2011:1440-1448.
    [156] Jin ZG, Wang Y, Wei B. P2P Botnets Detection based on User Behavior Sociality and Traf-fic Entropy Function [A]. In: Proceedings of Consumer Electronics, Communications andNetworks (CECNet)[C], Yichang, China,2012:1953-1955.
    [157] Saad S, Traore I, Ghorbani A, Sayed B and et al.. Detecting P2P Botnets through NetworkBehavior Analysis and Machine Learning [A]. In: Proceedings of Privacy, Security and Trust(PST)[C], Montreal, QC,2011:174-180.
    [158]格雷．熵与信息论[M]．北京：科学出版社，2012．
    [159]朱云峰，章毓晋．直推式多视图协同分割[J]．电子与信息学报，2011，33(4)：763-768．
    [160]徐森，卢志茂，顾国昌．使用谱聚类算法解决文本聚类集成问题[J]．通信学报，2010，31(6)：58-66．
    [161] Snort. Network intrusion prevention and detection system [EB/OL]. http://www.snort.org.
    [162]李先通，李建中，高宏．一种高效频繁子图挖掘算法[J]．软件学报，2007，18(10)：2469-2480．
    [163] Zhang Y, Breslau L, Paxson V and Shenker S. On the Characteristics and Origins of Inte r-net Flow Rates [C]. In: Proceedings of ACM SIGCOMM, Pitsburgh, PA, USA,2002.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700