基于布鲁姆过滤器的IP骨干网流量分析前端处理算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文结合国家863计划“十一五”重大项目“新一代高可信网络”总体技术相关课题的研究需求,重点研究高速骨干网流量分析的前端处理算法及其工程实现技术。以布鲁姆过滤器为切入点,本文的研究工作包括三个方面:第一,布鲁姆过滤器的研究与改进;第二,基于布鲁姆过滤器的前端处理算法的设计;第三,高速骨干网业务流实时分类前端系统的设计与实现技术。
     首先,本文对现有的三种低计算复杂度的计数型布鲁姆过滤器(Counting Bloom Filter, CBF)进行了分析比较,并提出了改进。对Na?ve Counting Bloom Filte(rNCBF)、Space-Code Bloom Filter(SCBF)和d-left Counting Bloom Filter(dlCBF)进行了深入分析,给出了其参数最优设置准则。采用计数误差、空间复杂度和负载适应性为性能指标,对上述三种CBF的性能进行了系统比较。发现虽然就计数误差和空间复杂度而言,dlCBF是三种CBF中最优的,但dlCBF在负载适应性方面却存在缺陷。对dlCBF进行了改进,提出了一种具有良好负载适应性的计数型布鲁姆过滤器BSdlCBF(Binary-Shrinking d-left Counting Bloom Filter)。通过仿真实验,将BSdlCBF和dlCBF、NCBF以及SCBF进行了比较,结果表明BSdlCBF的性能明显优于已有的三种计数型布鲁姆过滤器。
     其次,本文将BSdlCBF应用于前端处理算法的设计。基于BSdlCBF,提出了一种新的骨干网数据流流量测量算法MR-BSdlCBF(Multi-Resolution BSdlCBF),与已有的同类流量测量算法MRSCBF(Multi-Resolution SCBF)相比,MR-BSdlCBF算法的优势是负载适应性好,空间复杂度低,并可记录流标识。在MR-BSdlCBF算法基础上,本文最终提出了一种空间高效的数据包公平抽样算法SEFS(Space-Efficient Fair Sampling),SEFS算法不仅空间复杂度低,而且对于短流的抽样性能明显优于已有的公平抽样算法。SEFS算法较低的空间复杂度使之易于以IP核(Intellectual Property Core)的形式集成到网络设备中去。
     最后,本文实现了骨干网业务流实时分类前端系统。提出了骨干网业务流实时分类系统的前后端分离的系统结构。该系统结构的优点是消除了前后端的紧耦合,从而增强了系统实现的灵活性,提高了业务流分类的精度,降低了骨干网业务流实时分类的实现代价。基于这种系统结构,本文给出了骨干网业务流实时分类前端系统的硬件实现方法,并详细讨论了基于FPGA(Field Programmable Gate Array:现场可编程门阵列)的SEFS算法的实现技术。本文所实现的骨干网业务流实时分类前端系统已经在国家急需的“一种新型互联网内容监管系统”中得到了应用。
According to the fundamental technique research task of the“New Generation Network with High Trustability”project of the National High-Tech Research and Development Program of China (863 Program), this thesis studied the design and implementation of front end processing algorithms for traffic analysis system on high-speed backbone networks. Starting from the research on Bloom filters, this thesis then proposed front end processing algorithms based on an improved Bloom filter. The design and implementation of a front end processing system for real-time backbone traffic classification was presented finally.
     Firstly, this thesis proposed a new Counting Bloom filter (CBF) based on in-depth analysis and thorough comparison of three existing CBF schemes with low computational complexity, that is, the Na?ve Counting Bloom Filter (NCBF), the Space-Code Bloom Filter (SCBF) and the d-left Counting Bloom Filter (dlCBF). The optimal parameter setting rules for the three CBF schemes were presented through in-depth analysis. The performance of the three CBF schemes was compared using metrics of counting error, space complexity and load adaptability. The results showed that dlCBF outperforms NCBF and SCBF in terms of both counting error and space complexity. However, the load adaptability of dlCBF is very limited. Then, a new CBF schema called Binary Shrinking d-left Counting Bloom Filter (BSdlCBF) is proposed. The experimental results show that our proposal BSdlCBF outperforms the existing three CBF schemes in terms of counting error, space complexity and load adaptability.
     Then, this thesis presented front end processing algorithms based on BSdlCBF. A new per-flow traffic measurement algorithm named Multi-Resolution BSdlCBF (MR-BSdlCBF) was proposed. MR-BSdlCBF has better load adaptability and lower space complexity than its same category competitor, that is, the Multi-Resolution SCBF (MRSCBF). Another advantage of MR-BSdlCBF is that it can support flow label recording. Furthermore, this thesis also proposed a Space-Efficient Fair Sampling (SEFS) algorithm based on MR-BSdlCBF. SEFS has not only lower space complexity but also better sampling performance for short flows than existing fair sampling algorithm. It is feasible for network equipments to integrate SEFS as an IP core due to its high space efficiency.
     Lastly, this thesis implemented a front end processing system for real-time traffic classification on high-speed backbone networks. By decomposing the system into front end and back end, this thesis proposed a new architecture for real-time backbone traffic classification system. The benefit of this system architecture is that it eliminates the coupling between front end and back end, thus a more efficient and flexible system implementation can be achieved. Based on such architecture, the hardware implementation of the front end system for real-time backbone traffic classification is presented. Meanwhile, the implementation technique of SEFS algorithm using FPGA is discussed in detail. The front end processing system has been applied in“N ew Generation Internet Content Inspecting System”successfully.
引文
[1]中国互联网信息中心.第6次中国互联网络发展状况调查[EB/OL].2000-7, http://www.cnnic.cn/index/0E/00/11/index.htm.
    [2]中国互联网信息中心.第21次中国互联网络发展状况调查[EB/OL].2008-1, http://www.cnnic.cn/index/0E/00/11/index.htm.
    [3]科技部. 863计划新一代高可信网络重大项目申请指南[EB/OL].2007-11-08. http://www.edu.cn/xin_wen_gong_gao_1114/20071108/t20071108_264142.shtml.
    [4] Danny McPherson. SP Infrastructure Security Survey Results [EB/OL]. 2005, http:// www.arbornetworks.com.
    [5] V. Paxson, K. Asanovic, et al. Rethinking Hardware Support for Network Analysis and Intrusion Prevention[C]. In: Proceedings of the 1st USENIX Workshop on Hot Topics in Security, Seattle, 2006, 187-201.
    [6]周明中.大规模网络IP流行为特性及其测量算法研究[D].南京:东南大学博士学位论文, 2006.8.
    [7] R. Sommer and A. Feldmann. NetFlow: Information loss or win[C].In: ACM SIGCOMM Internet Measurement Workshop, Sacramento, 2002, 211-219.
    [8] IETF. Packet Sampling (PSAMP) Working Group [EB/OL]. http://www.ietf.org/ html.charters/psamp-charter.html.
    [9] IETF. IP Flow Information Export (IPFIX) Working Group [EB/OL]. http://www. ietf.org/html.charters/ipfix-charter.html.
    [10] Cisco System. NetFlow Services Solutions Guide [EB/OL]. http://www.cisco.com/ en/US/docs/ios/solutions_docs/netflow/nfwhite.html.
    [11] Daniela Brauckhoff, Bernhard Tellenbach. Impact of Packet Sampling on Anomaly Detection Metrics[C]. In ACM Internet Measurement Conference, Atlanta, 2006, 42-54.
    [12] Jianning Mai, Chen-Nee Chua. Is Sampled Data Sufficient for Anomaly Detection[C]. In ACM Internet Measurement Conference, Concord, 2006, 187-201.
    [13] Adam Lavitt Kirsch. Hash-Based Data Structures for Extreme Conditions [D]. PHD Dissertation of Harvard University, 2008.5.
    [14] Xilinx. Virtex-5 Family Overview [EB/OL]. http://www.xilinx.com/support/ documentation/.
    [15] IDT. SRAMs[EB/OL]. http://www.idt.com/?catID=58743&source=memory_app.
    [16] Samsung. High Speed SRAM [EB/OL]. http://www.samsung.com/global/business/ semiconductor/products/sram/Products_HighSpeedSRAM.html.
    [17] Cypress. SRAM [EB/OL]. http://www.cypress.com/products/?gid=5&fid=39& GoGatewayCategoryID=All&
    [18] Sarang Dharmapurikar. Algorithms and Architectures for Network Search Processors[D], PHD dissertation of Washington University, 2006.8.
    [19] A. Broder and M. Mitzenmacher. Network applications of Bloom filters: A survey [J].Internet Mathematics, 2004, 1(4):485-509.
    [20] C. Estan. New Directions in Traffic Measurement and Accounting[C]. In: Proc. of ACM Sigcomm, Oklahoma City, 2002, 562-574.
    [21] Abhishek Kumar Jun (Jim) Xu. Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement[C]. In: Proc. of IEEE Infocom’04, Hongkong, 2004, 315-328.
    [22] F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. Beyond Bloom filters: From approximate membership checks to approximate state machines[C]. In: Proc. of ACM SIGCOMM, Providence, 2006, 342-356.
    [23] K.C.Claffy, George C. polyzos, Hans-Werner Braun. Application of sampling methodologies to network traffic characterization[C]. In: Proc. of ACM Sigcomm 1993, Madison, 1993, 267-280.
    [24] Nick Duffield, Carsten Lund, Mikkel Thorup. Properties and Prediction of Flow Statistics from Sampled Packet Streams[C]. In Proc. of ACM Internet Measurement Conference, Nashville, 2002, 67-80.
    [25] Nicolas Hohn, Darryl Veitch. Inverting Sampled Traffic[C]. In: Proc. of ACM Internet Measurement Conference, Tokyo, 2003, 45-54.
    [26] Cristian Estan, Ken Keys, David Moore, George Varghese. Building a Better NetFlow[C]. In Proc. of ACM Sigcomm, Hague, 2006, 432-445.
    [27] Jack Drobisz, Kenneth J. Christensen. Adaptive Sampling Methods to Determine Network Traffic Statistics including the Hurst Parameter[C]. In: Proc. of IEEE 23rd Annual Conference on Local Computer Networks, Georgia, 1998, 563-572.
    [28] Baek-Young Choi, Jaesung Park, Zhi-Li Zhang. Adaptive Random Sampling for Traffic Load Measurement[C]. In: Proc. of IEEE ICC’03, Seoul, 2003, 892-905.
    [29]王俊峰,杨建华,周虹霞,谢高岗,周明天.网络测量中自适应数据采集方法[J].软件学报, ,2004: 15(8), 23-35.
    [30] N. G. Duffield, Matthias Grossglauser. Trajectory Sampling for Direct Traffic Observation [J]. IEEE/ACM Transactions on Networking, 2001, 9(3): 1232-1244.
    [31]程光,龚俭,丁伟.基于分组标识的网络流量抽样测量模型[J].电子学报, 2002, 30(12A): 89-93.
    [32] Abhishek Kumar, Jun (Jim) Xu. Sketch Guided Sampling -- Using On-Line Estimates of Flow Size for Adaptive Data Collection[C]. In: Proc. of IEEE Infocom’06, Barcelona, 2005, 467-482.
    [33] Shah, D., Iyer, S., Prahhakar, B., McKeown, N. Maintaining Statistics Counters in Router Line Cards [J]. IEEE Micro, 2002, 22(1): 76-81.
    [34] S. Ramabhadran and G. Varghese. Efficient Implementation of A Statistics Counter Architecture[C]. In: Proc. of ACM SIGMETRICS, Taipei, 2003, 345-355.
    [35] Qi Zhao, Jun Xu, Zhen Liu. Design of a Novel Statistics Counter Architecture with Optimal Space and Time Efficiency[C]. In: Proc. of ACM SIGMETRICS, Manila, 2006, 567-580.
    [36] Chengchen Hu, Sheng Wang, et al. Accurate and Efficient Traffic Monitoring UsingAdaptive Non-linear Sampling Method[C]. In: Porc. of IEEE Infocom’08, Phoenix, AZ, USA, 278-292.
    [37] Shijin Kong, Tao He, Xiaoxin Shao, Xing Li. Time-out Bloom Filter: A New Sampling Method for Recording More Flows[C]. Lecture Notes in Computer Science, 2006, 3961: 590-599.
    [38] Fang, W. Peterson, L. Inter-AS traffic patterns and their implications[C]. In: Proc. of IEEE GLOBECOM, Boston, 1999, 902-1005.
    [39] Ramana Rao Kompella, Cristian Estan. The Power of Slicing in Internet Flow Measurement[C]. In ACM Internet Measurement Conference, Michigan, 2005, 422-435.
    [40] Feldmann, A., Greenberg, A., et al. Deriving traffic demands for operational IP networks: methodologyand experience[J]. IEEE/ACM Transactions on Networking, 2001, 9(3): 265-279.
    [41] Frederic Raspall, Sebastia Sallent and Josep Yufera. Shared State Sampling[C]. In Proc. of ACM Internet Measurement Conference, Nevada, 2006, 145-155.
    [42] Kodialam, M., Lakshman, T. V., and Mohanty, S. Runs bAsed Traffic Estimator (RATE): A simple, Memory Efficient Scheme for Per-Flow Rate Estimation[C]. In Proc. of Proceedings of INFOCOM, Hongkong, 2004, 542-556.
    [43] Hao, F., Kodialam, M., and Lakshman, T. V. ACCEL-RATE: a faster mechanism for memory efficient per-flow traffic estimation[C]. In: Proc. of ACM SIGMETRICS, Missouri, 2004, 242-253.
    [44] Fang Hao, Murali Kodialam, T.V. Lakshman, Hui Zhang. Fast, Memory-Efficient Traffic Estimation by Coincidence Counting[C]. In: Proc. of IEEE Infocom’05, Miami, 2005, 432-445.
    [45] Rade Stanojevi. Scalable Heavy-hitter Identification[EB/OL]. http://www.hamilton.ie/ person/rade/ScalableHH.pdf.
    [46] Zhou Mingzhong , Gong Jian, Ding Wei. A New Algorithm for Long Flows Statistics: MGCBF[C]. In: Proc. of the International Conference on Computational Science 2006. LNCS 3994.112– 119.
    [47]王风宇,云晓春,王晓峰,王勇.高速网络监控中大流量对象的提取[J].软件学报, 2007, 18(12): 3060-3070.
    [48] A. Kumar, M. Sung, J. Xu, and J. Wang. Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution[C]. In: Proc. of ACM SIGMETRICS’04, Hongkong, 2004, 411-423.
    [49] T. Karagiannis, K. Papagiannaki, and M. Faloutsos. BLINC: multilevel traffic classification in the dark[C]. In: Proc. of SIGCOMM’05, Philadelphia, PA, USA, 2005, 229–240.
    [50] B. H. Bloom. Space/time Trade-offs in Hash Coding with Allowable Errors [J]. Communications of the ACM, 13(7):422–426, 1970.
    [51] Cohen S and Matias Y. Spectral Bloom Filters[C]. In Proceedings of the ACM SIGMOD, San Diego, California, USA, 2003, 224-236.
    [52] F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An ImprovedConstruction for Counting Bloom Filters[C]. European Symposium on Algorithms, Sydney, 2006, 678-680.
    [53] L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary Cache: a Scalable Wide-area Web Cache Sharing Protocol [J]. IEEE/ACM Transactions on Networking, 2000, 8(3):281-293.
    [54] A. Pagh, R. Pagh, and S. Rao. An Optimal Bloom Filter Replacement[C]. In: Proc. of the Sixteenth Annual ACM-SIAM Workshop on Discrete Algorithms, Maryland, 2005, 823-829.
    [55] B. Vocking. How Asymmetry Helps Load Balancing[J]. Journal of the ACM on Computer, 2003, 50(4): 568-589.
    [56] NLANR. Passive Measurement and Analysis (PMA) [EB/OL]. http://pma.nlanr.net.
    [57] Network Processing Forum [EB/OL]. http://www.npforum.org/.
    [58] Intel. Intel Network Processors [EB/OL]. http://www.intel.com/design/network/ products/npfamily/index.htm.
    [59] AMCC. AMCC Network Processors [EB/OL]. http://www.amcc.com/.
    [60]汪斌强.可重构路由器构件组研制[R].国家高技术研究发展计划(863计划)项目课题申请书.郑州:国家数字交换系统工程技术研究中心, 2008.6.
    [61] A. Broder and M. Mitzenmacher. Using Multiple Hash Functions to Improve IP Lookups[C]. In: Proc. of IEEE Infocom, Anchorage, 2001, 343-355.
    [62] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, George Varghese. Bloom Filters via d-Left Hashing and Dynamic Bit Reassignment (Extended Abstract) [R]. Paris: Euorp-NGI, 2006.
    [63]程光,龚俭,丁伟,徐加羚.面向IP流测量的哈希算法研究[J].软件学报, 2005, 16(5): 56-64.
    [64] Nick Duffield, Carsten Lund, Mikkel Thorup. Optimal Combine of Flow Traffic Measurement Results[C]. In: Proceedings of the 5th ACM SIGCOMM conference on Internet Measurem, Harrisburg, 2005, 423-436.
    [65]程光,龚俭,丁伟.基于抽样测量的高速网络实时异常检测模型[J],软件学报, 2003, 14(3):35-44.
    [66] Subhabrata Sen, Oliver Spatscheck, Dongmei Wang. Accurate, Scalable In-Network Identification of P2P Traffic Using Application Signatures[C]. In: Proc. of WWW'04, New York, USA, 2004, 326-332.
    [67] A. W. Moore and D. Zuev. Internet traffic classification using bayesian analysis techniques[C]. In: Proc. of SIGMETRICS, Alberta, Canada, 2005, 50–60.
    [68] Xilinx. Virtex-4 Family Overview [EB/OL]. http://www.xilinx.com/support/ documentation/.
    [69] M. Mitzenmacher and B. Vocking. The asymptotics of selecting the shortest of two, improved [J]. In Analytic Methods in Applied Probability: In Memory of Fridrikh Karpelevich, edited by Y. Suhov, American Mathematical Society, 2003, 187-200.
    [70] M. Mitzenmacher. Studying Balanced Allocations with Differential Equations [J].Combinatorics, Probability and Computing, 1999, 8(5): 473 - 482,.
    [71]徐士良. C常用算法程序集(第三版)[M].北京:清华大学出版社, 2004.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700