佳点集覆盖算法的研究及在入侵检测中的应用

英文题名：Research on Good-Point-Set Covering Algorithm and Its Applications in Intrusion Detection
作者：施尧
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：覆盖算法 ; 佳点集 ; 入侵检测 ; 特征属性提取 ; 集成学习
英文关键词：Covering Algorithm ; Good-Point-Set ; Intrusion Detection ; Feature selection ; Ensemble Learning
学位年度：2010
导师：张燕平
学科代码：081203
学位授予单位：安徽大学
论文提交日期：2010-04-01

摘要

张铃教授和张钹院士在深入剖析了人工神经网络的机理后,提出构造性学习理论和方法,获得了成功。构造性机器学习方法是利用球形映射将神经元变换成对有限空间划分的分类器,正是这种将无限空间转变成有限空间的方法,将神经网络长期未解决的学习问题转换成覆盖问题进行求解,同时大大降低了问题描述的复杂性。
     随着计算机和网络技术的快速发展,计算机系统已经发展成为一个复杂、开放的网络系统,但是它在给人们带来便捷的同时,针对计算机主机和网络系统的入侵也带来了许多负面的影响,由此产生了以入侵检测的主动保护技术。
     本文提出了将覆盖算法和佳点集理论相结合的佳点集覆盖算法,在UCI数据集上验证了其有效性。还将其引入到入侵检测,选取不同粒度下的样本集,结合集成学习理论提出了基于佳点集覆盖算法集成的入侵检测方法,为入侵检测建立了一套新的检测方法。
     本文的主要工作包括：
     1.介绍了覆盖算法和入侵检测的研究背景和意义。主要简述了分类算法的几种经典算法以及M-P神经元的几何意义,接着介绍了构造性学习方法——覆盖算法和覆盖算法几种改进算法,总结了集成分类器的研究意义以及相对于单分类器的优点,和它目前的主要算法,并根据本文算法的特点,指出使用Bagging集成方法能更优化整体算法。
     2.介绍了佳点集理论和它在获取权值时的优越性。由于覆盖算法是一种构造性机器学习方法,它的核心思想就是找到神经元的权值和阈值,权值指的是覆盖领域的中心,而阈值则是覆盖领域的半径。在构造新的神经元的权值时,即选取样本作为新覆盖的圆心时,通常采用的是：在某个范围随机取值或者人为地规定选取顺序,这样并未体现样本的分布特点；佳点集理论可以克服随机选取样本中心的缺点,可以获取较优的覆盖顺序,实现很好的实验结果。
     3.提出了基于佳点集覆盖算法的入侵检测模型和基于佳点集覆盖算法集成的入侵检测模型。在样本的构造过程中,考虑到样本特征维数较高,故选取单个最优特征组合方法进行特征选择的方法,对样本进行降维,并根据不同的粒度选取了几组特征子集,在此基础上建立了基于佳点集覆盖算法的入侵检测模型；由于集成学习过程中,多个单分类器从不同角度对样本进行分类识别,模拟了人类从不同的侧面去观察事情的方法,对于解决特征属性很多的问题很实用,为了进一步提高算法的精度,故而引入集成学习方法,提出了基于佳点集覆盖集成的入侵检测方法,进一步提高了检测率。
Based on deeply analyzing the mechanism of artificial neural networks (ANN), Zhang Ling and Zhang Bo proposed the theory and method of constructive machine learning which has been successfully used in many aspects. By using sphere projection, they converted neurons in ANN to a set of classifiers which are utilized for partitioning limited space. In other words, this method transforms problems in infinite space to finite space. Therefore, the learning problem of ANN can be converted to covering problem and the complexity can be reduced simultaneously.
     With the fast development of computer and network technique, the computer system has become more and more complex and open. Even it shares a lot of convenient properties, some negative influence being brought such as easy to be intruded. Therefore, active protecting technologies come into being and can be used to tackle this problem.
     This dissertation proposes a learning algorithm GCA (Good-Point-Set Covering Algorithm) which is combined with covering algorithm and the theory of Good-Point-Set. This algorithm is effective by validated on UCI data. In addition, GCA will be further introduced into intrusion detection. By selecting dataset in different granularity and combining with ensemble learning, we construct a new intrusion detection model based on ensemble Good-Point-Set Covering Algorithm.
     The content of this dissertation is detailed as follows:
     1. The background knowledge and significance of covering algorithm and intrusion detection are explained. We review some related works include classical classification algorithms, geometrical representation of McCulloch-Pitts neuron, Constructive method (or covering algorithm) and several methods make improvement on it. By comparing with single classifier, we summarize the properties of ensemble classifier and related approaches. Then the performance of the algorithm proposed here can be enhanced by adopting Bagging ensemble method.
     2. Introduce the theory of Good-Point-Set and its advantage on selecting weight of neurons. As a constructive machine learning method, the essential problem of covering algorithm is to find the weight and threshold of neurons, in which weight and threshold refer to the center of covering domain and radius of covering domain respectively. Usually the approaches of constructing the weight of neuron or selecting a sample as the center of new covering domain are by randomly selecting or pre-setting the selection order. However, these methods did not take the data distributions into accord. In contrast, the theory of Good-Point-Set can effectively achieve better covering order and improve the performance significantly.
     3. We propose two intrusion detection models based on GCA (Good-Point-Set Covering Algorithm) and ensemble learning algorithm of GCA. In the process of sample construction, a single optimal combination of feature selection approach is used to decrease the high feature dimension of samples. Additionally, Select several subsets of features under different granularities and establish a GCA intrusion detection model. Then we adopt ensemble learning approach to improve the performance of GCA intrusion detection model. In the ensemble learning framework, several single-classifiers identify samples from different angles and simulate the human behavior that observes object from different aspects, which are useful for resolving the multi-attributes problem. Therefore, our proposed ensemble learning based GCA intrusion detection model can further improve the detection accuracy.

引文

[1]张铃.关于前向神经网络的设计问题[J].安徽大学学报(自然科学版),1998,22(3)：31-41.
    [2]W.S.McCulloch and W.Pits. A Logical Caculus of the Ideas Immanent in Nervous Activity [J].Bulletin of Math. Biophysics,1943,18(5):115-133.
    [3]张铃,张钹.多层前向网络的交叉覆盖设计算法[J].软件学报,1999,10(7)：737-742.
    [4]瑞星公司客户服务中心副总经理钟玮,2009年度中国大陆地区互联网电脑病毒疫情报告http://it.sohu.com/20091215/n268965088.shtml,2009-12-23.
    [5]边肇祺,张学工.模式识别[M].北京：清华大学出版社,2000.
    [6]张学工.关于统计学习理论与支持向量机[J].自动化学报,2000,26(1)：32-42.
    [7]Richard O.Duda, Peter E.Hart David GStork.模式分类(Pattern Classification)[M].北京：机械工业出版社.
    [8]张铃,张钹.M-P神经元模型的几何意义及其应用[J].软件学报,1998,9(5)：334-338.
    [9]吴涛,张铃,张燕平.机器学习中的核覆盖算法[J].计算机学报,2005,28(8)：12961301.
    [10]张铃,吴涛,周瑛,张燕平.覆盖算法的概率模型[J].软件学报,2007,
    11(18)：2691-2699.
    [11]赵姝,张燕平,张铃,徐峰.最小覆盖算法[C].第26届中国控制会议,2007：181-185.
    [12]宋杰,程家兴,许中卫,周瑛.一种改进的贪婪式覆盖算法[J].计算机技术与发展,2006,16(8)：113-115.
    [13]赵姝,张燕平,张铃.基于粒度计算的覆盖算法[J].计算机科学,2008,35(3)：225-227.
    [14]Dienerich T G. Machine learning research four current directions [J].AI Magazine,1997,18(4):97-136.
    [15]A.Garg,V. Pavlovic, T.S. Huang. Bayesian Networks as ensemble of classifiers[C].//Proceedings. Proceedings of the 16th International Conference on Pattern Recognition,2002.
    [16]马文驷.多分类器融合模式识别方法研究[D].西安：西安电子科技大学.2002.
    [17]Kearns M, Valiant L G. Learning Boolean formulae or factoring. Aiken Computation Laboratory, Harvard University, Cambridge, MA, Technical Report TR-1488,1988.
    [18]Schapire R E.The strength of weak learn ability [J].Machine Learning,1990, 5(2):197-227
    [19]Freund Y. Boosting a weak algorithm by majority. Information and Computation, 1995,121(2):256-268.
    [20]Freund and Schapire R E. A decision-theoretic generalization of on-line learning and an Application to boosting [J]. Journal of Computer and System Sciences.1997, 551:119-139.
    [21]Breiman L. Bagging Predication [J].Machine Learning,1996,24(2):123-140.
    [22]华罗庚,王元.数论在近似分析中的应用[M].北京：科学出版社,1978.
    [23]张铃,张钹.佳点集遗传算法[J].计算机学报,2001,9(24)：917-922.
    [24]肖赤心,蔡自兴,王勇.高维进化策略调整神经网络结构和参数[J].小型微型计算机系统,2008,1129(12)：2313-2318.
    [25]Baum E B, Lang K J. Constructing hidden units using examples and queries. In: Lippman R Petaleds[C]. Neural Information Processing. SanMateo, CA:Morgan Kaufmann Publishers, Inc,1991:904-910.
    [26]Chen Q C. Generating-shrinking algorithm for learning arbitrary classification. Neural Networks,1994,5 (7):1477-1489.
    [27]Fahlman S E, Lebiere C. The cascade-correlation learning architecture. In: Tourdtzhy D S. Advances in Neural Information-processing System. SanMateo, CA: Morgan Kaufmann Publishers, Inc.1990:524-532.
    [28]Mitchell T M. Machine Learning [M]. China Machine Press,2003.
    [29]代英侠,连一峰,王航.系统安全与入侵检测[M].北京：清华大学出版社,2001.
    [30]Anderson J P. Computer Security Threat Monitoring and Surveillance[R]. Technical report, James P Anderson Co, Fort Washington, Pennsylvania,April 1980.
    [31]Denning D.E. An Intrusion Detection [J]. Model. IEEE Trans. On Software Engineering,1987,13(2):222-232.
    [32]Sebring M. Expert Systems in Intrusion Detection. A Case Study [C]. Proceedings of the 11th National Computer Security Conference.Baltimore,MD,1988.
    [33]Jackson K, DuBois D, Stallings C. An Expert System Application for Network Intrusion Detection[C]. Proceedings of the 14th Department of Energy Computer Security Group Conference,1991.
    [34]Internet Security Systems (ISS). Intrusion Detection [EB/OL]. http://www.iss.net/products_services/intrusion_detection.php,2009.
    [35]Cisco Systems, Inc. Cisco Netranger Sensor [EB/OL]. http://www.cisco.com/en/US/products/ps6009/index.html.2009.
    [36]NFR Security, Inc. NFR Security [EB/OL]. http://www.nfr.com,2008-09-24.
    [37]科友科技.NISDetctor网络安全监控器[EB/OL]. http://www.cnns.net, 2009-07-16.
    [38]启明星辰.天阗入侵检测与管理系统套件[EB/OL]. http://www.venustech.com.cn./,2009-12-24.
    [39]蒋建春,马恒太,任党恩,卿斯汉.网络安全入侵检测：研究综述[J].软件学报,2000,11(11)：1460-1466.
    [40]陈友,程学旗,李洋,戴磊.基于特征选择的轻量级入侵检测系统[J].软件学报,2007,18(7)：1639-1651.
    [41]张晓惠,林柏钢.基于特征选择和多分类支持向量机的异常检测[J].通信学报,2009,30(10)：68-73.
    [42]肖海军,王小非,洪帆,崔国华.基于特征选择和支持向量机的异常检测[J].华中科技大学学报(自然科学版),2008,36(3)：99-102.
    [43]田新广,高立志,张尔扬.新的基于机器学习的入侵检测方法[J].通信学报2006,27(6)：108-114.
    [44]王涛,宫会丽.支持向量机在入侵检测系统中的应用[J].微计算机信息,2006,22(12-3)：89-91.
    [45]胡亮,金刚,于漫,任斐,任维武.基于异常检测的入侵检测技术[J].吉林大学学报(理学版),2009,47(6)：1264-1270.
    [46]刘勇国.基于数据挖掘的网络入侵检测研究[D].重庆：重庆大学博士学位论文,2003.
    [47]覃晓,元昌安,龙珑.基于数据挖掘的入侵检测技术[J].计算机安全,2009,11(7)：16-19.
    [48]赵文武,刘雪飞,吴伯桥.基于数据挖掘的入侵特征选择与构造的新方法[J].计算机应用研究,2005,4：128-130.
    [49]单懿慧,蒋玉明,田诗源.面向入侵检测的改进BMHS模式匹配算法[J].计算机工程,2009,35(24)：170-173.
    [50]杨文君，魏占国,王玉平.入侵检测系统中高效的模式匹配算法[J].小型微型计算机系统,2009,11(11)：2189-2194.
    [51]单冬红,赵伟艇.基于入侵检测的模式匹配算法的改进研究[J].电脑知识与技术,2009,5(29)：8143-8145.
    [52]韩忠秋,刘晓洁,李涛,梁刚,龚勋,姚隽兮.一种入侵检测系统的模式匹配算法[J].计算机应用研究,2009,26(8)：3033-3035.
    [53]杨慧.基于专家系统的入侵检测技术[J].中国科技信息,2008,15：81-83.
    [54]李瑗,张海春,臧爱军,何东彬.专家系统结合神经网络构建智能入侵检测系统研究[J].石家庄学院学报,2009,11(6)：64-67
    [55]刘焱,钟国辉,刘玉,王芙蓉.基于状态机的流媒体入侵检测研究[J].计算机科学,2009,36(5)：108-110.
    [56]邓文达.基于有限状态机协议分析模型的入侵检测系统[J].计算机应用2006,25(6)：48-52.
    [57]刘雪飞,王申强,吴伯桥,马恒太.基于系统调用的入侵检测新方法[J].计算机应用研究,2006,12：112-114.
    [58]Data set used for The Third International Knowledge Discovery and Data Mining Tools Competition. http://archive.ics.uci.edu/ml/databases/kddcup99/kddcup99.html Apr 2010.
    [59]Das S. Filters, wrappers and a boosting based hybrid for feature selection [J]. In: Brodley C, Danyluk A,eds.Proc of the 8th Int'l Conf.on Machine Learning.San Francisco:Morgan Kaufmann Publishers,2001:74-81.
    [60]Kohavi R, John GH. Wrappers for feature subset selection [J].Artificial Intelligence Journal,1997,97(1-2):273-324.
    [61]孙刚.基于SVM的入侵检测系统研究[D].北京：北京邮电大学博士生论文,2007.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700