一种低存储消耗的超点检测算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
超点是在一个测量区间内链接了大量源IP(宿IP)的宿IP(源IP),实时超点检测对网络安全和管理具有重要意义。网络中的许多安全事件,如分布式拒绝服务攻击(DDoS)、蠕虫病毒、端口扫描等都具有类似的行为特征,这些攻击的共同特点是源IP(宿IP)发送或接收到大量来自不同宿IP(源IP)的链接,这些事件都属于超点检测问题。
     尽管已有一些超点检测的算法,但它们存在内存空间消耗较大或测量精度不高的问题。本文提出了一种新的检测超点的算法SuperpointTrap,它最根本的特点是内存消耗少,可以在速度快容量小的内存中运行(如SRAM)。它的有效性来自于一个新的存储数据的结构Cache。对于每个流,在Cache中只用一个比特位来记录它的信息,这个比特所在行位置由此流的源IP确定,列位置由此流的目的IP确定。当Cache中存储的流数目超过阈值时,判定该主机为超点,输出该主机的源IP,并清空它所在Cache中的信息,这样方便存储后到的报文,而且还节省了内存空间。为了进一步降低算法的错误否定率,本文还提出了两种改进算法:P-SuperpointTrap算法和BF-SuperpointTrap算法,并对以上三种算法进行了理论分析。
     本文对这三种算法采用不同数据源的Trace进行实验。实验结果表明,SuperpointTrap超点检测算法不仅节省了内存空间,还可以准确、高效地检测出超点的信息。而且通过与SuperpointTrap进行结果对比,两种改进的算法也可以进一步地降低算法的错误否定率。
A superpoint is a source IP (destination IP) that has communicated with a large number of distinct destinations (sources) during a measurement period. Detecting superpoints in real time is a meaningful work for network security and management. There has the similar behavioral characteristic in many network security incidents, e.g. distributed denial of service attacks (DDOS), worms and port scans. The common feature of these attacks is that the source IP (destination IP) will send or receive a larger number of links from distinct destinations (sources). All these source or destination IPs are the instances of superpoints.
     Although there have been some algorithms for detecting superpoints, they are not control the usage of the memory or do not deliver the desired accuracy. In this paper, we propose a new algorithm for detecting superpoints called as SuperpointTrap. The most essential advantage of SuperpointTrap is that it can work in tight memory space. Its accuracy and efficiency come from a new structure for data storage called Cache For each flow, Cache uses only one bit to record its information. The row and column of this bit are determined by the source IP and destination IP respectively. When the number of the flow in Cache is greater than the threshold, the host is considered as a superpoint. Then, this Superpoint's information is exported and the corresponding information in Cache is cleared to facilitate the following packets and to effectively save the memory consumption. To further reduce the false negative rate (FNR), we also propose two improved algorithms:P-SuperpointTrap and BF-SuperpointTrap and analyze the above three algorithms.
     In this paper, we use different data sources to test our algorithms and adopt false positive rate (FPR) and false negative rate (FNR) as our evaluation metric. The experimental results show that SuperpointTrap can not only saves memory, but also accurately and efficiently detects the superpoints. By the experimental comparison of SuperpointTrap and two improved algorithms, the two improved algorithms can further reduce the false negative rate.
引文
[1]程光,龚俭.互联网流测量[M].东南大学出版社,2008.
    [2]Roberts J W. Traffic Theory and the Internet[J]. IEEE Communications,2001,39(1): 94-99.
    [3]第28次中国互联网发展统计报告[R],中国互联网络信息中心(CNNIC),2012.
    [4]严芬等.DDoS攻击检测综述[J].计算机应用研究,2008,25(4):966-969.
    [5]Spafford EH. The Internet worm program:An analysis[R]. CSD-TR-823, West Lafayette:Department of computer science, purdue University,1988.1-29.
    [6]杨家海,吴建平.互联网络测量理论与应用[M].人民邮电出版社,2009.
    [7]Roesch M. Snort-lightweight intrusion detection for networks[A]. In Proceedings of the 13th USENIX conference on System administration[C]. USENIX Association Berkeley, CA, USA,1999:229-238.
    [8]Plonka D. Flowscan:a network traffic flow reporting and visualization tool[A]. In Proceedings of the 14th USENIX conference on System administration [C]. USENIX Association Berkeley, CA, USA,2000:305-317.
    [9]Venkataraman S, Song D, Gibbons P B et al. New Streaming Algorithms for Fast Detection of Superspreaders[A]. In Proceedings of Network and Distributed System Security Symposium[C].2005.
    [10]Zhao Q, Kumar A, Xu J. Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations [A]. In Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement[C]. USENIX Association Berkeley, CA, USA,2005:77-90.
    [11]Kumar A, Sung M, Xu J. Data Streaming Algorithms for Efficient and Accurate Estimation of Flow size Distribution[A]. In Proceedings of SIGMETRICS/Performance[C]. New York, USA.2004:177-188.
    [12]Kumar A, Xu J, Wang J et al. Space-Code Bloom Filter for Efficient per-flow Traffic Measurement [A]. In Proceedings of Twenty-third Annual Joint Conference of the IEEE Computer and Communications Societies[C]. Hong Kong,2004:1762-1773.
    [13]Zhang Y, Singh S, Sen S et al. Online identification of hierarchical heavy hitters: Algorithms, evaluation, and application[A]. In Proceedings of the 4th ACM SIGCOMM conference on Internet measurement[C]. New York, NY, USA,2004:101-114.
    [14]Estan C, Varghese G. New Directions in Traffic Measurement and Accounting[A]. In Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications[C]. New York, NY, USA,2002:323-336.
    [15]程光,龚俭,丁伟,吴桦,强士卿.基于自适应抽样的超点检测算法[J].中国科学E辑:信息科学2008.38(10):1679-1696.
    [16]Estan C, Varghese G, Fisk M. Bitmap algorithms for counting active flows on high speed links[A]. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement [C]. New York, NY, USA,2003:153-166.
    [17]Bloom B H, Space/Time Trade-offs in Hash Coding with Allowable Errors[J], Communications of the ACM.1970,13(7):422-426.
    [18]Kamiyama N, Mori T, Kawahara R. Simple and adaptive identification of superspreaders by flow sampling[A]. In Proceedings of 26th IEEE International Conference on Computer Communications[C]. Anchorage, AK, USA,2007:2481-2485.
    [19]王洪波,程时瑞,林宇.高速网络超链接主机检测中的流抽样算法研究[J].电子学报,2008.36(40):809-818.
    [20]彭艳兵,龚俭,刘卫江,杨望Bloom Filter哈希空间的元素还原[J].电子学报,2006.349(5):822-827.
    [21]Liu I J, Qu W Y, Gong J et al. A novel data streaming method detecting superpoints[A]. IEEE INFOCOM2011 Workshop on Security in Computers, Networking and Communications(SCNC)[C]. Shanghai, China,2011:1042-1047.
    [22]Liu W J, Qu W Y, He X N et al. Detecting superpoints through a reversible counting Bloom filter[J]. Journal of Supercomputing, DOI 10.1007/s11227-010-0511-2.
    [23]刘卫江,景泉,白磊.利用Bloom Filter实现长流识别[J].计算机应用研究.2008,25(1):161-163.
    [24]吴桦,龚俭,杨望.一种基于双重Counter Bloom Filter的长流识别算法[J].软件学报,2010.21(5):1115-1126.
    [25]Lu Y, Wang M, Bonomi F. ElephantTrap:A low cost device for identifying large flows[A], In Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects [C]. Washington, DC, USA,2007:99-108.
    [26]Wang P H, Guan X H, Gong W B. A New Virtual Indexing Method for Measuring Host Connection Degrees [A]. In Proceeding of IEEE INFOCOM 2011 [C]. Shanghai,2011:156-160.
    [27]Li T, Chen S, Ling Y B. Fast and Compact Per-Flow Traffic Measurement through Randomized Counter Sharing [A]. In Proceedings of IEEE INFOCOM 2011 [C]. Shanghai,2011: 1799-1807.
    [28]L.Cottrell.PingERTool[EB/OL]http://www.slac.stanford.edu/xorg/icfa/ntf/tool .html.
    [29]Crovella M, Krishnamurthy B. Internet Measurement:Infrastructure, Traffic and Applications[M]. New York:John Wiley& Sons,2006.
    [30]NLANR[EB/OL]. [2006]. http://moat.nlanr.net/ISMA/Report.
    [31]Tcpdump[EB/OL]. [2006]. http://www.tcpdump.org.
    [32]KC. Claffy. Internet traffic characterization [D]. San Diego:University of California,1994.
    [33]Quittek J, Zseby T, Claise B, et al. Requirements for IP Flow Information Export [J] (IPFIX), IETF RFC 3917,2004.
    [34]Claffy K, Polyzos G C, Braun H W. Traffic Characteristics of The T1 Nsfnet Backbone[J]. proceedings IEEE INFOCOM 93, San Francisco, California, March 28 April 1,1993:885-892.
    [35]Cao J, Jin Y, Chen A et al. Identifying high cardinality internet hosts [A]. In Proceedings of IEEE INFOCOM[C]. Rio de Janeiro,2009:810-818.
    [36]Charikar M, Chen K, Faranch-Coleon M. Finding frequent items in data streams [A]. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming[C]. Springer-Verlag London, UK,2002:693-703.
    [37]Demaine E, Lopez-Ortiz A, Ian-Munro J. Frequency estimation of internet packet streams with limited space[A]. In Proceedings of the 10th Annual European Symposium on Algorithms[C]. Springer-Verlag London, UK,2002:348-360.
    [38]Gibbons P B, Matias Y. New sampling-based summary statistics for improving approximate query answers[A], In Proceedings of the 1998 ACM SIGMOD international conference on Management of data[C]. New York, NY, USA,1998:331-342.
    [39]Karp R, Shenker S, Papadimitriou C. A simple algorithm for finding frequent elements in streams and bags[J]. ACM Transactions on Database Systems (TODS).2003, 28(1):51-55.
    [40]Manku G, Motwani R. Approximate frequency counts over data streams[A]. In Proceedings of the 28th international conference on Very Large Data Bases[C]. Hong Kong, China,2002:346-357.
    [41]Jain R, A Comparison of Hashing Schemes for Address Lookup in Computer Networks[J]. IEEE Transactions on Communications.1992,40(3):1570-1573.
    [42]Cao Z, Wang Z, Zegura E. Performance of Hashing-Based Schemes for Internet Load Balancing [A]. In Proceedings of the Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies[C]. Tel Aviv,2000:332-341.
    [43]程光等.基于统计分析的高速网络分布式抽样测量模型[J].计算机学报,2003,26(10):1266-1273.
    [44]Duffield N G, Grossglauser M. Trajectory Sampling for Direct Traffic Observation[J]. IEEE/ACM Trans.on Networking,2001,9(3):280-292.
    [45]Jain R, Chiu D, Hawe W. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems[R]. DEC Research Report TR-301, 1987:1-38.
    [46]Bhargava R, GoeI A, meyerson A. Using approximate majorization to characterize protocol fairness[A]. In Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems[C]. New York, NY, USA, 2001:330-331.
    [47]Zhao Q, Xu J, Kumar A. Detection of super sources and destinations in high speed networks:algorithms analysis and evaluation[J]. IEEE Journal on Selected Areas in Communications,2006,24(10):1840-1852.
    [48]Singh S, Estan C, Varghese G et al. Automated worm fingerprinting[A]. In Proceedings of the 6th conference on Symposium on Operating Systems Design& Implementation[C]. USENIX Association Berkeley, CA, USA,2004.
    [49]Keys K, Moore D, Estan C. A Robust System for Accurate Real-time Summaries of Internet Traffic[A]. In Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems[C]. New York, NY, USA, 2005:85-96.
    [50]Song H, Dharmapurikar S, Turner J et al. Fast hash table lookup using extended bloom filter:an aid to network processing[A]. In Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications[C]. New York, NY, USA,2005:181-192.
    [51]Yoon M, Li T, Chen S G et al. Fit a compact spread estimator in small high-speed memory[J]. IEEE/ACM Transactions on Networking(TON),2011,19(5):1253-1264.
    [52]Whang K, Vander-Zanden B, Taylor H. A Linear Time Probabilistic Counting Algorithm for Database Applications[J]. ACM Transcctions on Database Systems,1990, 15(2):208-229.
    [53]茆诗松,程依明,濮小龙.概率论与数理统计教程[M].北京:高等教育出版社,2004.
    [54]L. Fan, P. Cao, J. Ameida, and A. Z. Broder, Summary Cache:Scalable Wide-Area Web Cache Sharing Protocol[J], IEEE/ACM Trans.on Networking,2000,8(3):1621-1636.
    [55]SRAM[EB/OL]. http://www.cypress.com/

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700