基于支持向量机和贝叶斯分析技术的入侵检测方法研究

英文题名：Research on the Techniques of the Intrusion Detection Based on SVM and Bayesian Analysis
作者：邬书跃
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：入侵检测 ; 机器学习 ; 贝叶斯分析 ; 支持向量机
英文关键词：intrusion detection ; machine learning ; Bayesian
英文关键词：analysis ; SVM
学位年度：2012
导师：樊晓平
学科代码：081203
学位授予单位：中南大学
论文提交日期：2012-05-01

摘要

入侵检测是一种用于检测计算机网络系统中入侵行为的网络信息安全技术。本文针对入侵检测的发展趋势和应用需求,重点研究了基于支持向量机(SVM)和贝叶斯分析技术的入侵检测重要方法,解决入侵检测精度和速度的迫切需要。本文的研究工作和创新点主要包括：
     (1)提出了在少量样本条件下,采用带变异因子的SVM协作训练模型进行入侵检测的方法。充分利用大量未标记数据,通过两个分类器检测结果之间的迭代训练,可以提高检测算法的准确度和稳定性。在协作训练的多次迭代之间引入变异因子,减小由于过学习而降低训练效果的可能。仿真实验表明,本方法的检测准确度比传统的SVM算法提高了7.72%,并且对于训练数据集和测试数据集的依赖程度都较低。
     (2)提出了在少量样本情况下,采用SVM Tri-training方法进行入侵检测的技术。该方法充分利用大量未标记数据,通过三个分类器检测结果之间的迭代训练,不必使用交叉验证,适用范围更广,且准确度更高。仿真实验表明,本方法的检测准确度比SVM Co-training算法提高了2.1%,并且随着循环次数的增加,其性能优势更加明显。
     (3)提出了一个由三个相互作用的部件组成的高效攻击分类模型,可以自动和系统地对入侵检测系统中检测到的攻击进行分类。使用了改进的贝叶斯分析技术来训练分类器。基于异常的入侵检测系统常常受制于其对攻击分类能力的缺乏,因此安全研究人员非常关注攻击分类技术的研究。仿真结果表明本模型在资源使用和攻击分类精度上都有较大提高。
     (4)针对当前高速网络中入侵检测系统普遍存在的“性能-精度”失衡问题,提出了对占据较大比例的P2P流量进行提前识别和过滤的双层模型。该模型由单流内部流量特征的贝叶斯网络识别算法与多流之间行为特征SVM识别算法组成。仿真实验表明,本方法相对于传统的基于流量特征的识别技术,检测准确度提高了5.4%,并且具有较好的稳定性。
Intrusion Detection is one of the network information security techniques to detect the intrusion in computer network system. Catering to the developing trend and application demands, this dissertation focuses on the key techniques of the intrusion detection based on Support Vector Machine (SVM) and Bayesian analysis. The research and its main innovations are as follows.
     (1) It proposes SVM co-training model with mutagenic factors for intrusion detection on a little sample data. Making full use of the unmarked mass data, both the accuracy and the stability of the detection algorithm may be improved based on the iterative training of two classifiers'detection results. The introduction of mutagenic factors into multiple iterative operation in co-training reduces the possibility of lowering the training effects due to overwork. Simulation experiment shows that the accuracy of the detection in the research increased by7.72%than that of the traditional SVM algorithm and that it depends much less on both the training dataset and the detection dataset.
     (2) It also proposes SVM Tri-training for intrusion detection on a little sample data. Making full use of the unmarked mass data, this approach is based on the iterative training of three classifiers' detection results. In this way the cross validation is not applied, the scope of application is broadened and the accuracy is.increased. Simulation experiment shows that the accuracy of the detection in this research increased by21%than that of SVM Co-training and that the excellent performance becomes more apparent with the increasing cycle index.
     (3) It proposes a high efficient classification model which consists of three interactive parts and may classify the detected attacks automatically and systematically. We employ the modified Bayesian analysis to train the classifier. Abnormity-based intrusion detection is often subject to its classification ability and therefore security researchers pay much attention to the study on the attack classifying techniques. Simulation experiment shows that the utilization of resources and the attack classifying accurcy are much improved.
     (4) To the imbalance of "performance-accuracy" which is common in the high speed network's intrusion detection system, this essay proposes a double model to recognize and filter in advance the P2P flow which takes relatively major proportion. This model consists of the single-flow Bayesian Network recognition algorithm and the multithread SVM recognition algorithm. Simulation experiment shows that compared with the traditional flow-based recognition algorithm, the accuracy of the detection in this research increased by5.4%with a good stability.

引文

[1]Lee W, Xiang D. Information-Theoretic Measures for Anomaly Detection. In Proceedings of the 2001 IEEE Symposium on Security and Privacy. May,2001,130-143.
    [2]Lane T. Machine learning techniques for the computer security domain of anomaly detection. Purdue University,2000.
    [3]Matthew G, et al. Data mining methods for detection of new malicious executables. Proceedings of IEEE Symposium on Security and Privacy (IEEE S&P-2001), Oakland, CA:May 2001.14-17.
    [4]Forster J, Warmuth M K. Relative expected instantaneous loss bounds[A].13th COLT[C],2000.
    [5]田新广,博士学位论文,基于主机的入侵检测方法研究,国防科学技术大学,2005年.
    [6]Sandeep Kumar. Classification and detection of computer intrusions. Purdue University,1995.
    [7]Lippman R, Haines J, et al. Analysis and results of the 1999 darpa off-line intrusion detection evaluation. Proceedings of the 3rd International Workshop on Recent Advances in Intrusion Detetion, October 2000.
    [8]Gautama T, et al. Hierarchical density-based clustering in high-dimensional spaces using topographic maps[A]. Proceedings of IEEE Neural Network for Signal Processing Workshop 2000 [C], Sydney,2000, 251-260.
    [9]Dhurjati D, Bollineni P. A fast automaton-based method for detecting anomalous program behaviors. Proceedings of the 2001 IEEE Symposium on Security and Privacy.2001.
    [10]邓超,郭茂祖.基于自适应数据剪辑策略的Tri-training算法[J].计算机学报,2007,30(8)：1214-1226.
    [11]Heberlein LDias G, Levitt K et al.1990. A Network Security Monitor [C]. In:Jeff Wood, and David Wolber eds. Proceedings of the IEEE Symposium on Research in Security and Privacy. Oakland, California,1990. Los Alamitos:IEEE Press,296-304.
    [12]Forrest S, Hofmeyr S A, and Soraayaji.1997. A. Computer immunology [C]. In:Communications of the ACM,40(10) 88-96.
    [13]Jiangxiong Luo, Susan Bridges.2000. Mining Fuzzy Association Rules and Fuzzy Frequency Episodes for Intrusion Detection. International [J], Journal of Intelligent Systems,15 (8): 687-704.
    [14]Chris Clifton, Gary Gengo.2000. Developing custom intrusion detection filters using data mining, in:Saharon Rossett ed.2000 Military Communications International Symposium, Los Angeles[C], In:California,2000, Washington, DC, USA:IEEE Press,440-443.
    [15]Srinivas Mukkamala, Guadalupe Janoski, Andrew Sung. Comparison of neural networks and support vector machines in intrusion detection. Workshop on Statistical and Machine Learning Techniques in Computer Intrusion Detection, Baltimore, Maryland, USA,2002.
    [16]连一峰,戴英侠,王航.基于模式挖掘的用户行为异常检测[J].计算机学报,2002,25(3)：325-330.
    [17]刘海峰,卿斯汉,蒙杨,刘义清.一种基于审计的入侵检测模型及其实现机制[J].电子学报,2002,30(8)：1167-1170.
    [18]戴英霞,连一峰,王航.系统安全与入侵检测[M].北京：清华大学出版社,2002.
    [19]向继东.基于数据挖掘的自适应入侵检测建模研究[D].武汉：武汉大学,2004.
    [20]陈行.基于博弈模型的入侵检测关键技术研究[D].南京：东南大学,2010.
    [21]王飞.入侵检测分类器设计及其融合技术研究[D].南京：南京理工大学,2010.
    [22]宁卓.大规模网络中基于流量特征的入侵检测性能改进[D].南京：东南大学,2010.
    [23]周喜川.非可信环境下的支持向量机研究[D].杭州：浙江大学,2010年.
    [24]白媛.分布式网络入侵检测防御关键技术的研究[D].北京：北京邮电大学,2010.
    [25]王金林.基于混沌时间序列和SVM的入侵检测系统研究[D].天津：天津大学,2010.
    [26]赵月爱.基于非均衡数据分类的高速网络入侵检测研究[D].太原：太原理工大学,2010年.
    [27]李进.基于SVM增量学习的P2P流媒体流量识别方法研[D].长沙：国防科学技术大学,2010.
    [28]杜红乐.基于支持向量机的协同入侵检测[D].广州：广东工业大学,2010年.
    [29]夏竹青.基于不均衡数据集和决策树的入侵检测分类算法的研究[D].合肥：合肥工业大学,2010年.
    [30]刘迎意.基于频繁闭合模式的入侵检测研究[D].合肥：中国科学技术大学,2010年.
    [31]Rebecca Gurney Race著,陈明奇,吴秋新等译.入侵检测[M].北京：人民邮电出版社,2001.
    [32]胡昌振.网络入侵检测原理与技术[M].北京：北京理工大学出版社,2006.
    [33]朱映映,吴锦锋,明仲.基于网络事件和深度协议分析的入侵检测研究[J].通信学报,2011,32(8)：171-178.
    [34]李志东,杨武,王巍,苘大鹏.多源入侵检测警报的决策级融合模型[J].通信学报,2011,32(5)：121-128.
    [35]梅海彬,龚俭,张明华.基于警报序列聚类的多步攻击模式发现研究[J].通信学报,2011,32(5)：63-69.
    [36]毛伊敏,杨路明,陈志刚,刘立新.基于数据流挖掘技术的入侵检测模型与算法[J].中南大学学报(自然科学版),2011,42(9)：2720-2728.
    [37]张雪芹,顾春华,吴吉义.基于约简支持向量机的快速入侵检测算法[J].华南理工大学学报(自然科学版),2011,39(2)：108-112.
    [38]陈珊珊,杨庚,陈生寿.基于LEACH协议的Sybil攻击入侵检测机制[J].通信学报,2011,32(8)：143-149.
    [39]李战春,博士学位论文,入侵检测中的机器学习方法及其应用研究,华中科技大学,2007年05月.
    [40]蒋卫华,博士学位论文,智能网络入侵检测与安全防护技术研究,西北工业大学,2003年11月.
    [41]苏艳刚,硕士学位论文,基于多模式匹配的网络入侵检测系统关键技术实现,华中科技大学,2007年06月.
    [42]阚媛,硕士学位论文,基于智能的入侵检测系统研究与实现,江南大学,2009年8月.
    [43]杨晖泽,硕士学位论文,基于动态克隆选择的自适应免疫入侵检测器优化,太原理工大学,2011年6月.
    [44]杨德璋,硕士学位论文,基于智能的入侵检测系统研究与实现,南京邮电大学,2011年3月.
    [45]林国庆,王新梅.利用多线程技术改造Snort系统[J].西安电子科技大学学报(自然科学版),2007,34(6)：887-894.
    [46]牛冠杰等.网络安全技术实践与代码详解[M].北京：人民邮电出版社,2007.244-257.
    [47]毛国君,宗东军.基于多维数据流挖掘技术的入侵检测模型与算法[J].计算机研究与发展,2009,46(4)：602-609.
    [48]王飞,钱玉文,王执铨.基于混合AIS/SOM的入侵检测模型[J].计算机工程,2010,36(12)：164-166.
    [49]蒋建春,马恒太,任党恩,卿斯汉.网络安全入侵检测：研究综述[J].软件学报,2000,11(11)：1460-1466.
    [50]赵静,黄厚宽,田盛丰.基于隐Markov模型的协议异常检测[J].计算机研究与发展,2010,47(4)：621-627.
    [51]warrender C, Forrest S, Pearlmutter B. Detecting intrusions using system caIls:alternative data models. Proc of 1999 IEEE Symp on Security and Privacy[C], Oakland, CA, USA,1999:133-145.
    [52]Mao Guojun, Wu Xudong, Chen Gong. Mining maximal frequent itemsets from data streams. Journal of Information Science,2007,33(3): 251-262.
    [53]A Blum, TMitchell. Combining labeled and unlabeled data with Co-training[C]. Proc of thellth Annual Conference on Computational Learning Theory,1998:131-140.
    [54]侯翠琴,焦李成.基于图的Co-Training网页分类[J].电子学报,2009,37(10)：2173-2180.
    [55]Kamal Nigam, Andrew Kachites Mccallum, Sebastian Thrun and Tom Mitchell. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning,2000, Volume 39, Numbers 2-3,103-134.
    [56]Ming Li, Hongyu Zhang, Rongxin Wu and Zhi-Hua Zhou. Sample-based software defect prediction with active and semi-supervised learning. Automated Software Engineering,2012, Volume 19, Number 2,201-230.
    [57]王路,卓晴,王文渊.基于Co-Training的协同目标跟踪,计算机工程,2009年3期.
    [58]NelloC, and John S T.支持向量机导论[M].李国正,乇猛,曾华军,译.北京：电子工业出版社,2004.
    [59]KDD99. http://kdd.ics.uci.edu/databases/kddcup99/task.html [2011-6].
    [60]Libsvm.http://www.csie.ntu.edu.tw/~cjlin/libsvm/[EB/OL], [2011-6].
    [61]Srinivas Mukkamala, Andrew Sung, Ajith Abraham. Identifying key variables for intrusion detection using soft computing paradigms [A]. the IEEE International Conference on Fuzzy Systems FUZZ-IEEE'03 [C], St. Louis, MO, USA,2003.
    [62]连一峰,王航.网络攻击原理与技术[M].北京：科学出版社,2004.
    [63]李昆仑,张伟,代运娜.基于Tri-training的半监督SVM[J].计算机工程与应用,45(22),2009.
    [64]Mao Guojun, Wu Xudong, Chen Gong. Mining maximal frequent itemsets from data streams. Journal of Information Science,33(3),2007.
    [65]Zhou Z H, Li M. Tri-training:Exploiting unlabeled data using three classifiers [J]. IEEE Transactions on Knowledge and Data Engineering, 2005,17(11):1529-1541.
    [66]A. Blum, T. Mitchell. Combining labeled and unlabeled data with Co-training. Proc of thellth Annual Conference on Computational Learning Theory,1998.
    [67]Peipei Li, Xindong Wu, Xuegang Hu. Mining Recurring Concept Drifts with Limited Labeled Streaming Data, ACM Transactions on Intelligent Systems and Technology (TIST), Volume 3 Issue 2, and Feb.2012.
    [68]Yanjuan Li, Maozu Guo. Web Page Classification Using Relational Learning Algorithm and Unlabeled Data, Journal of Computers, Vol 6, No 3 (2011),474-479, Mar 2011.
    [69]R. Sommer and V. Paxson. Outside the Closed World:On Using Machine Learning For Network Intrusion Detection. In Proceedings of IEEE Symposium on Security and Privacy,2010.
    [70]D. Bolzoni, S. Etalle, and P. Hartel. Panacea:Automating Attack Classification for Anomaly-based Network Intrusion Detection Systems. In Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection,2009.
    [71]W. Robertson, G. Vigna, C. Kruegel, and R. Kemmerer. Using generalization and characterization techniques in the anomalybased detection of web attacks. In Proceedings of the 13th ISOC Symposium on Network and Distributed Systems Security,2006.
    [72]王晓程,刘恩德,谢小权.攻击分类研究与分布式网络入侵检测系统, 计算机研究与发展,Vol.38,No.6,June 2001.
    [73]Edward G Amoroso. Fundamentals of Computer Security Technology. Upper Saddle River, NJ:Prentice-Hall PTR,1994.
    [74]B. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM,13(7),1970.
    [75]宁卓,龚俭,顾文杰.高速网络中入侵检测的抽样方法[J].通信学报,2009,30(11)：25-28.
    [76]Bos H, Huang K. Towards software-based sigBatIll,'e detection for intrusion prevention on thce network card. Recent Advances in Intrusion Detection 8th International Symposium, Xiaman,2006. 102-123.
    [77]Vallentin M, Sommer R. The NIDS cluster:scalable, stateful network intrusion detection on commodity hardware. Proc Symposium on Recent Advances in Intrusion Detection. Queensland, Australia,2007. 107-126.
    [78]Mai J, Chuah CN, Sridharan A, et al. Is sampled data sufficient for anomaly detection? Proc of the 6th ACMSIGCOMM on Internet measurement, Brazil,2006.165-176.
    [79]Lakhina A, Crovella M, Diot C. Mining anomalies using traffic feature distributions. Proc ACM SIGCOMM'05, Philadelphia, PA, USA,2005. 217-228.
    [80]Irfan Ul Haq, Sardar Ali, Hassan Khan, Syed Ali Khayam. What is the Impact of P2P Traffic on Anomaly Detection? Proc Symposium on Recent Advances in Intrusion Detection,2009.
    [81]张文,沈磊.基于特征进程的P2P流量识别[J].计算机工程,2008,34(15)120-122.
    [82]李鑫,刘东林.基于统计特征的P2P流量检测方法[J].计算机工程,2010,36(5)：114-115.
    [83]Weka,http://www.cs.waikato.ac.nz/ml/weka/.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700