基于朴素贝叶斯和One-R的入侵检测问题研究

英文题名：The Research of Intrusion Detection Based on Naive Bayes and One-R
作者：王翔
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：属性选择 ; 入侵检测 ; 朴素贝叶斯 ; 数据挖掘 ; One-R
英文关键词：DM ; Feature Selection ; Na(?)ve Bayes ; Intrusion Detection ; One-R
学位年度：2008
导师：胡学钢
学科代码：081203
学位授予单位：合肥工业大学
论文提交日期：2008-05-01

摘要

开放式网络环境使人们充分享受网络的便捷,与此同时,各种针对网络的攻击与破坏日益增多。作为保障网络安全的必要手段之一的入侵检测系统(IDS),正受到越来越多重视。从数据挖掘角度看,入侵检测就是对网络审计数据进行分类的过程,作为入侵检测系统核心的分类算法成为数据挖掘研究的关键问题。由于入侵手段的不断演变以及入侵检测审计数据具有高维、海量、属性冗余等特点,使得经典分类模型实时性无法保证,训练周期长,检测正确率不高。为增强入侵检测分类模型实时性,提升时间性能与精度,本文以朴素贝叶斯分类模型为基础,开展了入侵检测分类问题研究。
     主要工作如下:
     (1)概述了数据挖掘技术在入侵检测中应用,研究了经典贝叶斯分类算法以及入侵检测常见属性选择方法,并进行实验对比分析。
     (2)针对朴素贝叶斯分类器条件独立性假设的要求,为消除入侵检测审计数据中冗余及不相关的属性,提升分类器性能,将One-R思想引入朴素贝叶斯入侵检测分类模型的研究中,提出了朴素贝叶斯分类模型监督下,基于One-R的两阶段属性选择方法(One-R-BF),实验表明One-R-BF优于入侵检测常用属性选择方法。
     (3)针对入侵检测对分类算法实时性的要求,在One-R-BF算法的基础上提出基于One-R快速属性选择的朴素贝叶斯分类算法(One-R-NBC),并应用于入侵检测中。实验表明,One-R-NBC时空性能与分类精度均优于C4.5算法,特别是当分类器需要更新时,One-R-NBC实时性较C4.5算法有明显优势。
     (4)针对朴素贝叶斯分类器可能出现的过拟合问题,采用分布式思想,改进One-R-NBC算法,提出了分布式朴素贝叶斯分类器(D-One-R-NBC),实验表明,D-One-R-NBC是有效的并在一定程度上避免分类器过拟合问题。
Open network environment helps people fully enjoy the convenience of the network。However, in the mean time, all sorts of attacks, aimed at destroying the network, are increasing day by day. The Intrusion Detection System (IDS), one of the critical techniques to protect the security of network, is being made much account of. In the sight of Data Mining (DM), Intrusion Detecting is the processes of classifying audit data. The algorithms of Intrusion Detecting Classification, the core of the IDS, are being hot studied by the DM researchers. The rapid changes of intrusion techniques as well as the Intrusion Detection Audit data, huge with high dimensional and full of redundant attributes, cause unwarranted to real-time, long training period, as well as low detecting rate. With the motivation of improving the performance of real-time and enhancing time performance and precision of detecting model, we start our research towards Intrusion Detection based on the Na(?)ve Bayes.
     The contributions of this dissertation are as follows:
     (1)The application of Data Mining techniques in the Intrusion Detection was summarized firstly. Many popular adopted feature selection methods and classic Bayesian classification algorithms are analyzed, combined with experimental analysis.
     (2)Considering the requirement of conditional independent hypothesis of Naive Bayes Classifier and aiming at eliminating redundant and irrelevance attributes from Intrusion Detection audit data, the theory of One-R was brought into the research of Nai've Bayes Intrusion Detection classifier. Therefore, A two-step method for feature selection, based on the One-R and supervised by the Naive Bayes, is proposed(One-R-BF for short).Experiment shows that One-R-BF is superior to other feature selection methods for Na(?)ve Bayes Classifier.
     (3)Consequently, in order to conform to the requirement of real-time performance, a Naive Bayse Classifier combined with One-R-BF is presented (One-R-NBC for short) and applied to Intrusion Detecting. Experiment shows that One-R-NBC has a lower cost and better precision than C4.5. While it's critical to update the classifier model, the real-time performance of One-R-NBC is much better than that of C4.5.
     (4)Finally, a distributed method is adopted to improve the One-R-NBC, targeting at solving the possible over fitting problem of the classification model. According to the experiment, this new method (D-One-R-NBC) is efficient and somewhat avoid the problem of over fitting.

引文

[1]W J Frawley,G Piatetsky,C Shapiro,J Matheus.Knowledge Discovery in Databases:An Overview.In Piatetsky-Shapiro,W.J.Frawley eds.Knowledge Discovery in Databases.Menlo Park,California:AAAI Press/The MIT Press.1991.p1-27.
    [2]Robert.C.Holte.Very Simple Classification Rules Perform Well on Most Commonly Used Datasets[J].Machine Learning,1993.1(11):63-90
    [3]G..Holmes.Feature selection via the discovery of simple classification rules.[EB/OL].http://www.cs.waikato.ac.nz/～ml/publications/1995/.
    [4]M Dash,H Liu.Feature Selection for Classification,Intelligent Data Analysis.1997.1:131-156.
    [5]James P Anderson.Computer Security Threat Monitoring and Surveillance[R].Fort Washington,Pennsylvania,1980.
    [6]Stafford.E.Crisis and Aftermath[J].Communication of the ACM.1989,32(6):678-687.
    [7]Denning.D.An Intrusion-Detection Mode[J].IEEE Transaction on Software Engineering.1987,13(2):222-232.
    [8]Lee.W,Stolfo.S,Data Mining Approaches for Intrusion Detection,[EB/OL].http://www.usenix.org/publications/library/proceedings/sec98/full_papers/lee/1ee_ html/lee.html.
    [9]Isabelle Guyon.An Introduction to Variable and Feature Selection,Journal of Machine Learning Research.2003,3:1157-1182.
    [10]Bins J,Draper BA.Feature selection from huge feature sets[A].Proceedings.Eighth IEEE International Conference on Evolutionary Computation[C].2001,2:159-165.
    [11]Gary Stein,Bing Chen,etc..Decision Tree Classifier For Network Intrusion Detection With GA-based Feature Selection.Proceedings of the 43rd annual Southeast regional conference.2005.P 136-141
    [12]卿斯汉,蒋建春等.入侵检测技术研究综述,通信学报,2004,7(25):19-27
    [13]Hofmann Alexander,Horeis Timo,Sick Bernhard.Feature selection for intrusion detection:An evolutionary wrapper approach.2004 International Joint Conference on Neural Networks.2004.6
    [14]谷雨,徐宗本,孙剑,郑锦辉.基于PCA与ICA特征提取的入侵检测集成分类系统,计算机研究与发展,2006,43(4):633-638
    [15]Zainal,Anazida,Maarof,Mohd Aizaini,Hj Shamsuddin.Feature Selection Using Rough Set in Intrusion Detection.In:IEEE TENCON 2006,14-17th November 2006,Hongkong
    [16]Gilbert R Hendry.Applicability of Clustering to Cyber Intrusion Detection.Thesis for the degree of master.Rochester Institute of Technology.Kate Gleason College of Engineering.2007.08
    [17]Wei Wang,Roberto Battiti.Identifying Intrusions in Computer Networks with Principal Component Analysis.Proceedings of the First International Conference on Availability.Reliability and Security(ARES'06).2006
    [18]李玲娟,周桂芳,王汝传.一种适用于IDS的多次模糊迭代特征选择算法,计算机科学,2007 4(34):79-82
    [19]Jiaqi Wang,Xindong Wu,Chengqi Zhang.Support vector machines based on K-means clustering for real-time business intelligence systems.Business Intelligence and Data Mining.2005,1(1)
    [20]Matthew Miller.Learning Cost-Sensitive Classification Rules for Network Intrusion Detection using RIPPER[J].Advanced Intelligent Systems.1999.
    [21]郭山清,高丛,姚建,谢立.基于改进的随机森林算法的入侵检测模型,软件学报,2005,16(8):1490-1498
    [22]Nahla Ben Amor,Salem Benferhat,Zied Elouedi.Naive Bayes vs decision trees in intrusion detection systems.Proceedings of the 2004 ACM symposium on Applied computing.2004
    [23]Benferhat S,Tabia K.On the combination of naive Bayes and decision trees for intrusion detection.Computational Intelligence for Modelling.Control and Automation,2005 and International Conference on Intelligent Agents,Web Technologies and Internet Commerce,International Conference.2005
    [24]Krister Johansen,Stephen Lee.Network Security:Bayesian Network Intrusion Detection(BNIDS).[EB/OL],http://www.cs.jhu.edu/～fabian/courses/CS600.424/course papers/samples/Bayesian.pdf
    [25]Abdallah Abbey Sebyala,Temitope Olukemi,Lionel Sacks.Active Platform Security through Intrusion Detection Using Naive Bayesian Network For Anomaly Detection.Proceedings of the London Communications Symposium.2002
    [26]胡学钢等.扩展概念格上Rough Sets的求解,南京大学学报(自然科学),Vol.36,NCYCS'2000论文
    [27]李宏伟,杨寿保,任安西,黄梅荪.基于入侵检测的分布式防火墙系统,计算机工程,2006,03(31):149-151
    [28]钱俊,许超,史美林.警报聚合分析与数据集测试的应用,计算机研究与发展, 2006,4(43):627-632
    [29]张然,钱德沛,张文杰.入侵检测技术研究综述,小型微型计算机,2003,07(24):1113-1118
    [310]何慧,苏一丹,覃华.基于信息增益的贝叶斯入侵检测模型优化的研究,计算机工程与科学,2006,6(28):38-40
    [31]唐谦,张大方,黄昆.基于信息增益率的决策树对入侵检测的改进,计算机工程,2006,7(32):146-148
    [32]胡学钢,胡春玲.一种基于依赖分析的贝叶斯网络结构学习算法,模式识别与人工智能,2006,4(19):445-449
    [33]Kira K,Rendell L.A Practical Approach to Feature Selection.The Ninth Internation Conference on Maching Learning.1992.249-256.
    [34]Nojun,Kwak,et al.Input Feature Selection for Classification Problems[J].IEEE Transaction on Neural Network,2002,13:143-157
    [35]李洋.K-means聚类算法在入侵检测中的应用,计算机工程,2007,14(33):154-156
    [36]中国互联网信息中心.第21次中国互联网络发展状况统计报告,2007.12,http://www.cnnic.net.cn/uploadfiles/pdf/2008/1/17/104156.pdf
    [37]公安部公共信息网络安全监察局.2007年度中国信息网络安全状况调查报告,2008.1,http://www.cert.org.cn/UserFiles/File/CNCERTCC2007AnnualReport_Chinese.pdf
    [38]Joel Scanlan,Jacky Hartnett,Ray Williams.DynamicWEB:Profile Correlation Using COBWEB.AI 2006:Advances in Artificial Intelligence.2006.1059-1063
    [39]H Gunes Kayacik,A Nur Zincir-Heywood,Malcolm I.Heywood.A hierarchical SOM-based intrusion detection system[J].Engineering Applications of Artificial Intelligence.2007.4(20)
    [40]胡学钢,郭亚光.一种基于粗糙集的朴素贝叶斯分类算法,合肥工业大学学报:自然科学版,2006,2(29)
    [41]连一峰,戴英侠,王航.基于模式挖掘的用户行为异常检测,计算机学报,2002,25(3):325-330
    [42]Shi-Jie Song,Zunguo Huang,Hua-Ping Hu,Shi-Yao Jin.A Sequential Pattern Mining Algorithm for Misuse Intrusion Detection.Grid and Cooperative Computing -GCC 2004Workshops.2004
    [43]林庆,王飞,吴昊等.基于专家系统的入侵检测系统的实现,计算机信息,2007,03X:61-63
    [44]刘陶,叶君耀,朱永宣.一种基于统计方法的入侵检测模型的研究,微计算机信息,2007,30:120-122
    [45]钱昱,郑诚.基于序列模式的异常检测,微机发展,2004,9(14):53-55
    [46]翟素兰,郑诚.用于入侵检测的基于粗糙集的贝叶斯分类器,计算机技术与发展,2006,1(16):226-228
    [47]段丹青,陈松乔,杨卫平.K-means聚类算法在入侵检测中的应用,计算机工程,2007,14(33):154-156
    [48]杨静.基于粗糙集合和信息熵的分类模型研究,合肥工业大学硕士学位论文,2004 05
    [49]Jiawei Han and Micheline Kamber.DATA MINING Concepts and Techniques,Higher Education Press,Morgan Kaufmann Publishers,2003
    [50]Tom M.Mitchell著.曾华军,张银奎等译.机器学习.机械工业出版社.2003
    [51]胡学钢,张东艳,胡春玲.一种新的基于粗糙集的决策树构造算法,计算机科学.2005,8(32):7-8,50
    [52]姜卯生,王浩,姚宏亮.朴素贝叶斯分类器增量学习序列算法研究,计算机工程与应用,2004,14(40):57-59
    [53]胡学钢,李楠.基于属性重要度的随机决策树学习算法,合肥工业大学学报:自然科学版,2007,6(30):681-685
    [54]C Elkan.Boosting and Naive Bayesian Learning.In Technical Report CS97,Dept.of Computer Science and Engineering,Univ.Calif.at San Diego,Sept.1997
    [55]胡春玲.贝叶斯网络模型及其学习方法的研究,合肥工业大学硕士毕业论文,2006.05
    [56]蒋建春,冯登国.网络入侵检测原理与技术[M].北京:国防工业出版社.2001
    [57]Xindong Wu,Yanglan Gan,Hao Wang,Xuegang Hu.Feature Selection:Algorithms and Challenges.Journal of Nanchang Institute of Technology.2006.2(25):28-34
    [58]姜卯生.数据挖掘中基于贝叶斯技术的分类问题的研究,合肥工业大学硕士学位论文,2004.05
    [59]翟鹏.属性约简方法评介.科技情报开发与经济,2004,4(14):98-99
    [60]陈继,李凯,周健.基于黑板的自治代理协同入侵检测系统模型,计算机与数字工程,2004,32(3):59-62
    [61]KDDCUP99 Dataset.The third international knowledge discovery and data mining tools competition dataset KDD99-Cup[DB/OL].http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.1999-10-28.
    [62]W.Lee.Combining Knowledge Discovery and Knowledge Engineering to Build IDSs.Proceedings of the Second International Workshop on the Recent Advances in Intrusion Detection(RAID'99).Purdue,USA,October 1999

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700