HTTP洪泛攻击检测机制与算法研究

英文题名：Research on Detection Scheme and Algorithm for HTTP-flooding Attack
作者：王进
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：IP网络 ; HTTP洪泛 ; 大偏差 ; 聚类
英文关键词：IP network ; HTTP-flooding ; large deviation ; cluster
学位年度：2013
导师：隆克平
学科代码：081001
学位授予单位：电子科技大学
论文提交日期：2013-09-10

摘要

随着Web服务不断得到普及，它的安全受到学术界和工业界的高度关注。HTTP洪泛攻击是一类新的分布式拒绝服务攻击（DDoS，Distributed Denial ofService），它通过模拟正常用户浏览网页行为往目标网站发送大量HTTP GET请求，以消耗目标网站服务器的CPU、内存等资源，造成Web服务瘫痪，正常用户访问中断。HTTP洪泛攻击给Web服务生存性带来了严峻的挑战，是目前Web服务面临的一个重要安全问题。由于具有隐蔽性高、攻击力强等特点，HTTP洪泛攻击检测较为困难，目前尚缺乏有效的检测和防御方法。一方面，相比于带宽洪泛型DDoS攻击，HTTP洪泛攻击流量较小，通常不会造成受害服务节点相关的网络链路流量异常；另一方面，相比于TCP/SYN型DDoS攻击，HTTP洪泛攻击会话具有与正常用户极为相似的TCP协议统计特征（例如，不同类型TCP协议包的统计分布），不会造成服务端TCP协议包统计特征异常。HTTP洪泛攻击能够有效地规避现有检测方法，被越来越多地用于实施攻击。目前关于HTTP洪泛攻击的相关研究工作较少，多数还存在检测性能不高、算法过于复杂、稳定性差等问题，HTTP洪泛攻击问题仍然是一个开放性问题。本论文围绕HTTP洪泛攻击检测核心问题，借助统计学习领域研究方法，从Web用户访问行为特征量化、检测机制设计、检测机制鲁棒性三个方面深入研究HTTP洪泛攻击检测机制与算法。
     本论文首先从Web用户访问行为特征入手，围绕用户访问主题流行度、访问逻辑关联度两个网页语义特征，提出了基于大偏差统计的Web访问行为网页语义特征量化分析方法，有效地量化分析不同Web用户在访问主题流行度、访问逻辑关联度两个网页语义特征方面的差异，为后续研究奠定基础。其次，围绕用户访问主题流行度，设计了新的HTTP洪泛攻击检测机制与算法，采用多种不同类型HTTP洪泛攻击模型进行验证。最后，围绕基于正常用户访问行为的HTTP检测机制可靠性，分析了训练数据集中网页抓取行为日志对它们的影响；以用户访问主题流行度为核心，本论文进一步提出一种可容忍训练数据集中噪声的的HTTP洪泛攻击检测机制与算法。
     具体地，本论文从如下几个方面展开研究：
     1.基于网页语义的Web用户访问行为特征及量化方法研究
     Web用户访问行为特征及量化方法研究是检测HTTP洪泛攻击的基础，它刻画了不同Web访问用户之间的行为差异，是有效识别HTTP洪泛攻击的关键。现有检测机制中采用的一些访问请求间隔、访问速率等典型Web访问行为特征容易被一些攻击者模仿，导致检测机制失效，亟需研究新的Web访问行为特征用于检测HTTP洪泛攻击。结合现有Web访问行为研究基础，本论文围绕Web用户访问主题流行度、访问逻辑关联度两个网页语义特征，研究可有效量化Web用户行为差异的方法，采用大偏差统计量化分析Web用户在访问主题流行度、访问逻辑关联度方面的差异，建立基于大偏差统计的Web用户网页语义行为特征量化框架，初步分析正常用户会话跟一些常见HTTP洪泛攻击在网页语义特征方面的区别，为后续HTTP洪泛攻击检测奠定基础。
     2.基于用户访问主题流行度的HTTP洪泛攻击检测机制与算法研究
     围绕用户访问主题流行度特征，设计可检测多种不同类型HTTP洪泛攻击的检测机制与算法。全局网页点击率是Web用户访问主题流行度量化的基础，它衡量了不同网页主题最新流行趋势。受网页内容通常动态变化、检测模型的滞后性等因素影响，全局网页点击率分布呈动态变化。如何准确实时估算全局网页点击率分布是量化用户访问主题流行度的关键，也是HTTP洪泛攻击检测方法需要解决的一个重要问题。针对上述问题，本论文研究可动态估算全局网页点击率分布的方法，提出运用指数加权移动平均统计方法（EWMA，Exponential WeightedMoving Average）设计可动态估算全局网页点击率的算法，结合网站历史全局网页点击率分布、当前用户访问请求目标，动态更新当前全局网页点击率分布，进一步对该更新算法修正，反向消减恶意攻击者对网站全局点击率分布的影响。
     3. HTTP洪泛攻击检测机制鲁棒性研究
     训练数据的准确性是基于正常用户访问行为检测方法需要考虑的重要问题，是影响检测性能的关键因素。Web访问日志是HTTP洪泛攻击检测机制的主要数据源，其中通常包含有网页抓取行为日志。经过分析，发现网页抓取行为跟正常用户访问行为的差异性造成建立的检测基准不准确，严重影响检测性能。本论文以Web用户访问主题流行度、访问会话长度为主要特征，分析正常用户访问行为的关联特征分布，由此建立可容忍网页抓取行为的HTTP洪泛攻击检测机制。
With the Web services becoming more and more popular, web security attractsmore attentions from the field of academic and industry. HTTP-flooding is a newDistributed-Denial-of-Service attack. It imitates normal web surfing behavior sendinglarge number of legitimate HTTP GET requests to the victim, aiming at exhausting thevictim’s precious resources (e.g., CPU, memory etc.) and paralyzing the web services.HTTP-flooding attack seriously challenges the survivability of web applications. Due tothe stealthy attacking behavior, HTTP-flooding is difficult to detect. On one hand,compared with the tremendous traffic of Bandwidth-flooding attack (e.g., the averagetraffic is162Mbps), the low traffic of HTTP-flooding (e.g.,10Mbps) usually does notcause traffic anomaly. On the other hand, unlike the bogus TCP connections ofSYN-flooding, the true TCP connections of HTTP-flooding attack do not bringsignificant changes to the statistics of TCP SYN packets. Even worse, HTTP-floodingattackers can generate HTTP GET requests as normal web surfers. Thus,HTTP-flooding attack is much harder to detect than other DDoS, and can evade thedetection approaches for the Bandwidth-flooding and the TCP SYN-flooding DDoS.Most of the existing detection schemes usually have poor detection performance. Thus,HTTP-flooding is still an open problem. This dissertation focuses on HTTP-flooding,and detects HTTP-flooding attack with the statistical learning methods.
     This dissertation firstly proposes a novel method to efficiently quantify websurfing preference and surfing semantics, Based on the consistency between theindividual temporal surfing preference and the overall webpage popularity, thisdissertation analyzes the personal surfing differences, and detects HTTP-floodingattackers with their behavioral difference. Furthermore, aiming at the web-crawlingtraces in the training phase, this dissertation associates more surfing features, and buildsthe reference surfing profile according to the distribution density. Specifically, thisdissertation studies the HTTP-flooding attack from the following aspects:
     1. Studying the quantification of individual web surfing differences
     The quantification of individual web surfing differences is critical toHTTP-flooding detection. How to select appropriate surfing features is the key problemof efficiently quantify infividual web surfing differences. With the surfing preference and surfing semantics, this dissertation analyzes the consistency between the individualsurfing behavior and the corresponding feature of website, and builds the quantificationframework with large deviation principle. Then, this dissertation primarily analyse thesurfing difference between normal users and some simple HTTP-flooding attack.
     2. Detecting HTTP-flooding attack with the individual surfing difference
     Taking the surfing preference as the main feature, this dissertation studiesHTTP-flooding detection based on the surfing preference. Webpage popularity is thebasic of quantifing web surfing preference. Accurately computing webpage popularity isthe key problem for the surfing preference-based HTTP-flooding detection. On onehand, due to update the webpage content, webpage popularity changes dynamically. Onthe other hand, influenced by the detection-lag property, the attacking sessions beforedetected participate in the updating of webpage popularity, causing webpage popularitybiased and further degrading detection performance. Aiming at these problems, thisdissertation studies how to update webpage popularity dynamicly.
     3. Studying the web-crawling behavior-tolerant HTTP-flooding detection
     The accuracy of training dataset is the key factor determing the performance ofnormal web surfing behavior-based detection schemes. Web surfing logs are the maindataset of HTTP-flooding detection, which usually includes some web-crawling traces.These web-crawling traces can degrade the detection of HTTP-flooding attacks. Aimingat the web-crawling traces in the training phase, this dissertation studies the jointfeatures distribution density-based HTTP-flooding detection scheme. It builds thereference surfing profile from the noisy web logs, and detects HTTP-flooding attack bycomparing their surfing profile with the reference surfing profile.

引文

[1] http://royal.pingdom.com/2011/01/12/internet-2010-in-numbers/[OL].
    [2] L. Popa, A. Ghodsi, I. Stoica. HTTP as the Narrow-Waist of the Future Internet[C]. Proceedingsof the9th ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets-IX), Monterey,California,2010,1-6
    [3] http://staff.washington.edu/dittrich/misc/ddos/[OL]
    [4] http://world.kbs.co.kr/chinese/program/program_economyplus_detail.htm?No=1813[OL]
    [5] http://www.usatoday.com/tech/news/2011-01-05-cyberattacks05_ST_N.htm[OL]
    [6] http://war.163.com/13/0828/08/97BPF01O00014OMD.html[OL]
    [7] http://www.bbn.com[OL].
    [8] http://www.cert.org[OL].
    [9] F. Kargl, J. Maier, M. Weber. Protecting web servers from distributed denial of service attacks[C].Proceedings of10th International World Wide Web Conference, Hong Kong,2001,514-524
    [10] A. Hussain, J. Heidemann, C. Papadopoulos. A Framework for Classifying Denial of ServiceAttack[C]. Proceedings of ACM SIGCOMM, Karlsruhe,2003,99-110
    [11] J. D. Howard. An analysis of security incidents on the Internet[D]. Pittsburgh: Carnegie MellonUniversity,1998
    [12] J. D. Howard, T. A. Longsta. A common language for computer security incidents[R].Albuquerque: Sandia National Laboratories, October1998
    [13] J. Mirkovic, P. Reiher. A Taxonomy of DDoS Attack and DDoS Defense Mechanisms[J]. ACMSIGCOMM Computer Communication Review,2004,34(2):29-54
    [14]孙长华.分布式拒绝服务攻击研究新进展[J].电子学报，2009，37(7)：1562~1570.
    [15]谢逸,余顺争.新网络环境下应用层DDoS攻击的剖析与防御[J].电信科学,2007,23(1):89-93
    [16]张永铮,肖军,云晓春,等. DDoS攻击检测和控制方法[J].软件学报，2012,23(8):20582072
    [17]孙知信,姜举良,焦琳. DDOS攻击检测和防御模型[J].软件学报,2007,18(9):2245-2258
    [18] http://ddos.arbornetworks.com/2010/11/attac-severs-myanmar-internet/[OL]
    [19] C. Labovitz. Botnets, DDoS and Ground-Truth--A Look at5,000Operator ConfirmedAttacks[R]. Atlanta: NANOG50, October3,2010.
    [20] D. Moore, G. Voelker, S. Savage. Inferring Internet Denial-of-Service Activity[C]. Proceedingsof the2001USENIX Security Symposium, Washington,2001,9-22
    [21] T. Anderson, T. Roscoe. Preventing Internet Denial of Service with Capabilities[C].Proceedings of the2nd ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets-II),Cambridge,2003,39-44
    [22] X. W. Yang, D. Wetherall, T. Anderson. A DoS limiting Network Architecture[C]. SIGCOMM'05Proceedings of the2005conference on Applications, technologies, architectures, andprotocols for computer communications, Philadelphia,2005,241-252
    [23]罗华,胡光岷,姚兴苗.基于网络全局流量异常特征的DDoS攻击检测[J].计算机应用,2007,27(2),314317
    [24] Y. Chen, K. Hwang. Collaborative change detection of DDoS attacks on community and ISPnetworks[C]. Proc. of the IEEE Int’lSymp. on Collaborative Technologies and Systems (SpecialSessions on Collaboration Grids and Community Networks), Las Vegas,2006,401410
    [25]庄肖斌，芦康俊.一种基于流量统计的DDoS攻击检测方法[J].计算机工程,2004,30(22),127129.
    [26] J. Yuan, K. Mills. Monitoring the macroscopic effect of DDoS flooding attacks[J]. IEEE Trans.on Dependable and Secure Computing,2005,2(4),324335
    [27] V. Sekar, N. Duffield, J. V. D. Merwe et al. LADS: Large-scale automated DDoS detectionsystem[C]. Proc. of the USENIX Annual Technical Conference, Santa Clara,2006,171184
    [28] Y. Chen, Y. K. Kwok, K. Hwang. Collaborative defense against periodic shrew DDoS attacks infrequency domain[J]. ACM Trans. on Information and System Security (TISSEC),2005,66(9),1137-1151
    [29]李宗林.骨干通信网中的分布式隐蔽流量异常检测方法研究[D].成都：电子科技大学,2011
    [30] K. Argyraki. Scalable Network-layer Defense Against Internet Bandwidth-Flooding Attacks[J].IEEE/ACM Transactions on Networking (TON),2009,17(4),1284-1297
    [31] C. L. Schuba, I. V. Krsul, M. G. Kuhn. Analysis of a Denial of Service Attack on TCP[C]. IEEESymposium on Security and Privacy, Piscataway,1997,208-223
    [32] H. N. Wang, D. L. Zhang, K. Shin. Detecting SYN flooding attacks[C]. Proc. of the IEEEInfocom, New York,2002,1530-1539
    [33] H. N. Wang, D. L. Zhang, K. Shin. Change-Point monitoring for the detection of DoS attacks[J].IEEE Trans. on Dependable and Secure Computing,2004,1(4),193208
    [34] B. G. Matt. A Comparison of SYN Flood Detection Algorithms[C]. Second InternationalConference on Internet Monitoring and Protection (ICIMP), Silicon Valley,2007,1-12
    [35] Y. Ohsita, S. Ata. Detecting Distributed Denial-of-Service Attacks by analyzing TCP SYNpackets statistically[C]. IEEE Global Telecommunications Conference (Globecom), Dallas,2004,2043-2049
    [36] R. Sekar, A. Gupta. Specification-based anomaly detection: a new approach for detectingnetwork intrusions[C]. Proc. the ACM conference on computer and communications security,Washington,2002,265-274
    [37] S. Kandula, D. Katabi. Botz-4-sale: surviving organized DDoS attacks that mimic flashcrowds[R]. Massachusetts: Massachusetts Institute of Technology,2004
    [38] G. Mori, J. Malik. Recognizing Objects in Adversarial Clutter: Breaking a VisualCAPTCHA[C]. CVPR'03Proceedings of the2003IEEE computer society conference onComputer vision and pattern recognition, Madison,2003,134-141
    [39] M. Srivatsa, A. Iyengar. Mitigating Application-Level Denial of Service Attacks on WebServers: A Client-Transparent Approach[J]. ACM Transactions on the Web,2008,2(3),1649-1662
    [40] A. Sterrett. On the detection of defective members of large populations[J]. The Annals ofMathematical Statistics,1957,28(4),1033–1036
    [41] Y. Xuan, I. Shin. Detecting Application Denial-of-Service Attacks: A Group-Test-BasedApproach[J]. IEEE Transactions on parallel and distributed systems,2010,21(8),1203-1216
    [42] S Khattab, S Gobriel. Live Baiting for Ser-vice-Level DoS Attackers[C]. Proc. IEEE Infocom,Phoenix,2008,682-690
    [43] M. Walfish, M. Vutukuru. DDoS defense by offense[C]. Proc. ACM SIGCOMM, Pisa,2006,289-300
    [44] J. Jung, B. Krishnamurthy, M. Rabinovich. Flash crowds and denial of service attacks:Characterization and implications for CDNs and web sites[C]. Proc. IEEE World Wide WebConference (WWW), Honolulu,2002,252-262
    [45] Y. Xie, S. Z. Yu. Monitoring the application-layer DDoS attacks for popular websites[J].IEEE/ACM TRANSACTIONS ON NETWORKS,2009,17(1),15-25
    [46] S. Ranjan, R. Swaminathan. DDoS-Resilient scheduling to counter application layer attacksunder imperfect detecting[C]. Proc. IEEE INFOCOM, Barcelona,2006,1-13
    [47] S. Lee, G. Kim, S. Kim. Sequence-order-independent network profiling for detectingapplication layer DDoS attacks[J]. EURASIP Journal on Wireless Communications andNetworking,2011,2011(1),50-59
    [48] G. Oikonomou, J. Mirkovic. Modeling human behavior for defense against flash-crowdattacks[C]. Proc. IEEE ICC, Dresden,2009,1-7
    [49] D. Dhyani, S. S. Bhowmick. Modelling and predicting web page accesses using Markovprocesses[C]. Proc.14th Int. Work-shop on the Database and Expert Systems Applications(DEXA’03), Prague,2003,332–336
    [50] M. Brala, M. Dhanda. An Improved Markov Model Approach to Predict Web Page Caching[J].International Journal of Computer Science&Communication Networks,2012,2(3),393-399
    [51] X. Chen, X. D. Zhang. Popularity-Based Prediction Model for Web Prefetching[J]. IEEEComputer,2003,36(3),63-70
    [52] Y. Xie, S. Z. Yu. A large-scale hidden semi-markov model for anomaly detection on userbrowsing behaviors[J]. IEEE/ACM TRANSACTIONS ON NETWORKS,2009,17(1),54-65
    [53] D. Stevanovic, N. Vlajic, A. An. Detection of Malicious and Non-malicious Website VisitorsUsing Unsupervised Neural Network Learning[J]. Elsevier Applied Soft Computing,2013,13(1),698-708.
    [54] T. Yatagai, T. Isohara. Detection of HTTP-GET flood attack based on analysis of page accessbehavior[C]. IEEE Pacific RIM Conference on Communications, Computers, and Signal,Victoria,2007,232–235
    [55] S. Wen, W. J. Jia. CALD: Surviving Various Application-Layer DDoS Attacks That MimicFlash Crowd[C]. Proceedings of the4th IEEE International Conference on Network and SystemSecurity, Piscataway,2010,247-254
    [56] J. Yu, C. Fang, L. Lu, et al. Mitigating application layer distributed denial of service attacks viaeffective trust management[J]. IET Communication,2010,4(16),1952–1962
    [57] H. Beitollahi, G. Deconinck. Tackling Application-layer DDoS Attacks[J]. Procedia ComputerScience,2012,2012(10),432-441
    [58] D. Mukhopadhyay, P. Mishra. A Dynamic Web Page Prediction Model Based on AccessPatterns to Offer Better User Latency[C]. The6th International workshop on multimedia signalprocessing and transmission (MSPT), Korea,2006,1-6
    [59] F. Khalil, J. Y. Li, H. Wang. Integrating Markov model with clustering for predicting web pageaccesses[C]. Proc. of Australasian World Wide Web Conference, Coffs Harbour,2007,1-26
    [60] M. Deshpande, G. Karypis. Selective markov model for predicting web-page accesses[J]. ACMTransactions on Internet Technology (TOIT),2004,4(2),163-184
    [61] S. Yu, G. F. Zhao. Browsing Behavior Mimicking Attacks on Popular Web Sites for LargeBotnets[C].2011IEEE Conference on Computer Communications Workshops (INFOCOMWKSHPS), Shanghai,2011,947-951
    [62] J. Wang, X. L. Yang, K. P. Long. Web DDoS Detection Schemes Based on Measuring User’sAccess Behavior with Large Deviation[C]. Proc. IEEE Global Telecommunications Conference(GLOBECOM2011), Houston,2011,1-5
    [63] L. D. Catledge, J. E. Pitkow. Characterizing browsing strategies in the World Wide Web[C]. In3rd International World-Wide-Web Conference (WWW), Boston,1995,1065-1073
    [64]余顺争. Web负载流的宏观模式与识别[J].模式识别与人工智,2005,18(1),31-37
    [65] http://www.nsnam.com[OL]
    [66] http://topology.eecs.umich.edu/inet/[OL]
    [67] M. Ester, H. P. Kriegel．A density-based algorithm for discovering clusters in large spatialdatabases with noise[C]. Proc. On the Second International Conference on KnowledgeDiscovery and Data Mining (KDD-96), Portland,1996,226-231
    [68] C. Y. Lin, C. C. Chang. A new density-based scheme for clustering based on geneticalgorithm[J]. Journal of Fundamenta Informaticae,2005,68(4),315-331
    [69] T. M. Cover, J. A. Thomas. Elements of Information Theory (Second Edition)[M]. Manhattan:John Wiley&Sons, Inc.,2006
    [70] A. Koehl, H. N. Wang. Surviving a search engine overload[C]. In Proc. Of ACM World WideWeb Conference (WWW’12), Lyon,2012,171-180
    [71] J. Cao, L. E. Li. Tracking Quantiles of Network Data Streams with Dynamic Operations[C]. Por.Of IEEE INFOCOM, San Diego,2010,1-5
    [72] J. Huang, R. W. White. Parallel Browsing Behavior on the Web[C]. Proceedings of the21stACM conference on Hypertext and hypermedia (HT’10), Toronto,2010,13-18
    [73] http://www.ietf.org/rfc/rfc1945.txt[OL]
    [74] http://www.ietf.org/rfc/rfc2616.txt[OL]
    [75] https://addons.mozilla.org/zh-cn/firefox/addon/firebug/[OL]
    [76] J. H. Liu, P. Dolan. Personalized News Recommendation Based on Click Behavior[C].Proceedings of the15th international conference on Intelligent user interfaces (IUI’10), HongKong,2010,31-40
    [77] http://news.google.com[OL]
    [78]林文龙,刘业政,姜元春. Web浏览预测的Markov模型综述[J].计算机科学,2008,35(1),9-14
    [79] http://www.python.org/[OL]
    [80] http://code.google.com/p/waf/[OL]
    [81]肖军,云晓春,张永铮.基于会话异常度模型的应用层分布式拒绝服务攻击过滤[J].计算机学报,2010,33(9),17131724
    [82] http://topology.eecs.umich.edu/inet/inet-3.0.pdf[OL].
    [83] T. Fawcett. ROC Graphs: Notes and Practical Considerations for Data Mining Researchers [R].Palo Alto: Intelligent Enterprise Technologies Laboratory of HP Laboratories, January7th2003
    [84] P. N. Tan, M. Steinbach, V. Kumar. Data Mining Concepts, Models, Methods and Algorithm[M]. New York: IEEE Press,2002
    [85] http://user-agents.org/[OL]
    [86] S. Yu, S. Guo. Can We Beat Legitimate Cyber Behavior Mimicking Attacks from Botnets[C].IEEE Infocom, Orlando,2012,2851-2855
    [87] http://www.guoshi.com[OL]
    [88] A. Dembo, O. Zeitouni. Large Deviations Techniques and Applications (Second Edition)[M].Frankfurt: Springer-Verlag,2009

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700