摘要
目前,许多误用检测系统无法检测未知攻击,而异常检测系统虽然能够精确检测未知攻击,但由于入侵检测固有的特性,入侵事件与正常事件类间存在极大的不平衡性,这导致很难利用机器学习的方法高效地进行入侵行为检测.为此,提出了一种基于信息增益和随机森林分类器的入侵检测系统.为了解决类之间的不平衡性,对训练数据集应用了合成少数过采样算法.提出了一种基于信息增益的特征选择方法,并用于构建一个数据集的特征约减子集.首先,利用随机森林算法从训练集中建立入侵模型,构建误用检测模型,通过网络连接的特征来匹配检测已知攻击.然后,利用信息增益的特征选择方法,根据特征约减获得的特征,将不确定性攻击的网络连接数据通过随机森林进行聚类,进而实现未知攻击的检测.实验采用的NSL-KDD入侵检测数据集是KDDCUP99数据集的增强版本.由于入侵检测固有的特性,NSL-KDD数据集设计时类间存在极大的不平衡性.实验结果表明,结合合成少数过采样算法以及基于特征选择的信息增益的随机森林分类器对少数类别异常检测率可达到0.962.
At present,many misuse detection systems cannot detect unknown attacks,while the anomaly detection system can accurately detect unknown attacks,but because of intrusion detection inherent characteristics,there is a great imbalance between intrusion events and normal events,which lead it very difficult to use the method of machine learning to carry out intrusion behavior detection.An intrusion detection system based on information gain and random forest classifier is proposed.In order to solve the imbalance between classes,a small number of over-sampling algorithms is applied to the training data set.A feature selection method based on information gain is proposed,and it is used to construct the feature subtraction subsets of the data set.Firstly,the intrusion model is established from the training set by using the random forest algorithm,and the misuse detection model is constructed,and the known attacks are detected by matching the characteristics of the network connection.Then,by using the feature selection method of information gain,the network connection data of the uncertain attack is clustered according to the characteristic of the feature,and the detection of unknown attack is realized by clustering with the forest.The NSL-KDD intrusion detection data set used in the experiment is an enhanced version of the KDDCUP'99 data set.Due to the inherent characteristics of intrusion detection,there is a great imbalance between NSL-KDD data set.The experimental results show that the random forest classifier combined with the Synthetic Minority Over Sampling Technique(SMOTE)can reach 0.962 of the detection rate for small sample categories.
引文
[1]Eid H F,Azar A T,Hassanien A E.Improved realtime discretize network intrusion detection system[C].Proceedings of Seventh International Conference on Bio-Inspired Computing:Theories and Applications,2013:99-109.
[2]Jarrah O Y,Siddiqui A,Elsalamouny M,et al.Machine-learning-based feature selection techniques for large-scale network intrusion detection[C].IEEE 34th International Conference on Distributed Computing Systems Workshops(Icdcsw),2014:177-181.
[3]陈友,沈华伟,李洋,等.一种高效的面向轻量级入侵检测系统的特征选择算法[J].计算机学报,2007,30(8):1398-1408.Chen You,Shen Huawei,Li Yang,et al.An efficient feature selection algorithm toward building lightweight intrusion detection system[J].Chinese Journal of Computers,2007,30(8):1398-1408.(in Chinese)
[4]赵新星,姜青山,陈路莹,等.一种面向网络入侵检测的特征选择方法[J].计算机研究与发展,2009,46(Z2):69-78.Zhao Xinxing,Jiang Qingshan,Chen Luying,et al.A feature selection method for network intrusion detection[J].Journal of Computer Research and Development,2009,46(Z2):69-78.(in Chinese)
[5]饶鲜,董春曦,杨绍全,等.基于支持向量机的入侵检测系统[J].软件学报,2003,14(4):798-803.Rao Xian,Dong Chunxi,Yang Shaoquan,et al.An intrusion detection system based on support vector machine[J].Journal of Software,2003,14(4):798-803.(in Chinese)
[6]Damopoulos D,Menesidou S A,Kambourakis G,et al.Evaluation of anomaly-based IDS for mobile devices using machine learning classifiers[J].Security and Communication Networks,2012,5(1):3-14.
[7]张振海,李士宁,李志刚,等.一类基于信息熵的多标签特征选择算法[J].计算机研究与发展,2013,50(6):1177-1184.Zhang Zhenhai,Li Shining,Li Zhigang,et al.Multilabel feature selection algorithm based on information entropy[J].Journal of Computer Research and Development,2013,50(6):1177-1184.(in Chinese)
[8]Kim D S,Lee S M,Kim T H,et al.Quantitative intrusion intensity assessment for intrusion detection systems[J].Security and Communication Networks,2012,5(10):1199-1208.
[9]Sivatha Sindhu S S,Geetha S,Kannan A.Evolving optimised decision rules for intrusion detection using particle swarm paradigm[J].International Journal of Systems Science,2012,43(12):2334-2350.
[10]任晓芳,赵德群,秦健勇.基于随机森林和加权K均值聚类的网络入侵检测系统[J].微型电脑应用,2016,32(7):21-24.Ren Xiaofang,Zhao Dequn,Qin jianyong.Network intrusion detection system based on random forests and K-means clustering aigorithm[J].Microcomputer Applications,2016,32(7):21-24.(in Chinese)
[11]Zhong S H,Huang H J,Chen A B.An effective intrusion detection model based on random forest and neural networks[C].Manufacturing Systems and Industry Applications,2011:308-313.
[12]崔振,山世光,陈熙霖.结构化稀疏线性判别分析[J].计算机研究与发展,2014,51(10):2295-2301.Cui Zhen,Shan Shiguang,Chen Xilin.Structured sparse linear discriminant analysis[J].Journal of Computer Research and Development,2014,51(10):2295-2301.(in Chinese)
[13]徐培,赵雪专,唐红强,等.基于两阶段投票的小样本目标检测方法[J].计算机应用,2014,4(4):1126-1129.Xu Pei,Zhao Xuezhuan,Tang Hongqiang,et al.Object detection method of few samples based on two-stage voting[J].Journal of Computer Applications,2014,4(4):1126-1129.(in Chinese)