用户名: 密码: 验证码:
基于信息增益和随机森林分类器的入侵检测系统研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Intrusion Detection System Using Random Forests Classifier and Information Gain
  • 作者:魏金太 ; 高穹
  • 英文作者:WEI Jin-tai;GAO Qiong;Dept.of Information and Art Design,Henan Forestry Vocational College;Luoyang Electronic Equipment Testing Center of China;
  • 关键词:网络安全 ; 入侵检测 ; 随机森林 ; 特征选择
  • 英文关键词:Network security;;IDS;;random forest;;feature selection
  • 中文刊名:HBGG
  • 英文刊名:Journal of North University of China(Natural Science Edition)
  • 机构:河南林业职业学院信息与艺术设计系;中国洛阳电子装备试验中心;
  • 出版日期:2018-02-15
  • 出版单位:中北大学学报(自然科学版)
  • 年:2018
  • 期:v.39;No.177
  • 基金:国家自然科学基金资助项目(11404398);; 河南科技厅重点攻关资助项目(142102210097)
  • 语种:中文;
  • 页:HBGG201801014
  • 页数:7
  • CN:01
  • ISSN:14-1332/TH
  • 分类号:80-85+94
摘要
目前,许多误用检测系统无法检测未知攻击,而异常检测系统虽然能够精确检测未知攻击,但由于入侵检测固有的特性,入侵事件与正常事件类间存在极大的不平衡性,这导致很难利用机器学习的方法高效地进行入侵行为检测.为此,提出了一种基于信息增益和随机森林分类器的入侵检测系统.为了解决类之间的不平衡性,对训练数据集应用了合成少数过采样算法.提出了一种基于信息增益的特征选择方法,并用于构建一个数据集的特征约减子集.首先,利用随机森林算法从训练集中建立入侵模型,构建误用检测模型,通过网络连接的特征来匹配检测已知攻击.然后,利用信息增益的特征选择方法,根据特征约减获得的特征,将不确定性攻击的网络连接数据通过随机森林进行聚类,进而实现未知攻击的检测.实验采用的NSL-KDD入侵检测数据集是KDDCUP99数据集的增强版本.由于入侵检测固有的特性,NSL-KDD数据集设计时类间存在极大的不平衡性.实验结果表明,结合合成少数过采样算法以及基于特征选择的信息增益的随机森林分类器对少数类别异常检测率可达到0.962.
        At present,many misuse detection systems cannot detect unknown attacks,while the anomaly detection system can accurately detect unknown attacks,but because of intrusion detection inherent characteristics,there is a great imbalance between intrusion events and normal events,which lead it very difficult to use the method of machine learning to carry out intrusion behavior detection.An intrusion detection system based on information gain and random forest classifier is proposed.In order to solve the imbalance between classes,a small number of over-sampling algorithms is applied to the training data set.A feature selection method based on information gain is proposed,and it is used to construct the feature subtraction subsets of the data set.Firstly,the intrusion model is established from the training set by using the random forest algorithm,and the misuse detection model is constructed,and the known attacks are detected by matching the characteristics of the network connection.Then,by using the feature selection method of information gain,the network connection data of the uncertain attack is clustered according to the characteristic of the feature,and the detection of unknown attack is realized by clustering with the forest.The NSL-KDD intrusion detection data set used in the experiment is an enhanced version of the KDDCUP'99 data set.Due to the inherent characteristics of intrusion detection,there is a great imbalance between NSL-KDD data set.The experimental results show that the random forest classifier combined with the Synthetic Minority Over Sampling Technique(SMOTE)can reach 0.962 of the detection rate for small sample categories.
引文
[1]Eid H F,Azar A T,Hassanien A E.Improved realtime discretize network intrusion detection system[C].Proceedings of Seventh International Conference on Bio-Inspired Computing:Theories and Applications,2013:99-109.
    [2]Jarrah O Y,Siddiqui A,Elsalamouny M,et al.Machine-learning-based feature selection techniques for large-scale network intrusion detection[C].IEEE 34th International Conference on Distributed Computing Systems Workshops(Icdcsw),2014:177-181.
    [3]陈友,沈华伟,李洋,等.一种高效的面向轻量级入侵检测系统的特征选择算法[J].计算机学报,2007,30(8):1398-1408.Chen You,Shen Huawei,Li Yang,et al.An efficient feature selection algorithm toward building lightweight intrusion detection system[J].Chinese Journal of Computers,2007,30(8):1398-1408.(in Chinese)
    [4]赵新星,姜青山,陈路莹,等.一种面向网络入侵检测的特征选择方法[J].计算机研究与发展,2009,46(Z2):69-78.Zhao Xinxing,Jiang Qingshan,Chen Luying,et al.A feature selection method for network intrusion detection[J].Journal of Computer Research and Development,2009,46(Z2):69-78.(in Chinese)
    [5]饶鲜,董春曦,杨绍全,等.基于支持向量机的入侵检测系统[J].软件学报,2003,14(4):798-803.Rao Xian,Dong Chunxi,Yang Shaoquan,et al.An intrusion detection system based on support vector machine[J].Journal of Software,2003,14(4):798-803.(in Chinese)
    [6]Damopoulos D,Menesidou S A,Kambourakis G,et al.Evaluation of anomaly-based IDS for mobile devices using machine learning classifiers[J].Security and Communication Networks,2012,5(1):3-14.
    [7]张振海,李士宁,李志刚,等.一类基于信息熵的多标签特征选择算法[J].计算机研究与发展,2013,50(6):1177-1184.Zhang Zhenhai,Li Shining,Li Zhigang,et al.Multilabel feature selection algorithm based on information entropy[J].Journal of Computer Research and Development,2013,50(6):1177-1184.(in Chinese)
    [8]Kim D S,Lee S M,Kim T H,et al.Quantitative intrusion intensity assessment for intrusion detection systems[J].Security and Communication Networks,2012,5(10):1199-1208.
    [9]Sivatha Sindhu S S,Geetha S,Kannan A.Evolving optimised decision rules for intrusion detection using particle swarm paradigm[J].International Journal of Systems Science,2012,43(12):2334-2350.
    [10]任晓芳,赵德群,秦健勇.基于随机森林和加权K均值聚类的网络入侵检测系统[J].微型电脑应用,2016,32(7):21-24.Ren Xiaofang,Zhao Dequn,Qin jianyong.Network intrusion detection system based on random forests and K-means clustering aigorithm[J].Microcomputer Applications,2016,32(7):21-24.(in Chinese)
    [11]Zhong S H,Huang H J,Chen A B.An effective intrusion detection model based on random forest and neural networks[C].Manufacturing Systems and Industry Applications,2011:308-313.
    [12]崔振,山世光,陈熙霖.结构化稀疏线性判别分析[J].计算机研究与发展,2014,51(10):2295-2301.Cui Zhen,Shan Shiguang,Chen Xilin.Structured sparse linear discriminant analysis[J].Journal of Computer Research and Development,2014,51(10):2295-2301.(in Chinese)
    [13]徐培,赵雪专,唐红强,等.基于两阶段投票的小样本目标检测方法[J].计算机应用,2014,4(4):1126-1129.Xu Pei,Zhao Xuezhuan,Tang Hongqiang,et al.Object detection method of few samples based on two-stage voting[J].Journal of Computer Applications,2014,4(4):1126-1129.(in Chinese)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700