关联规则学习与反馈技术及其在网络安全审计系统中的应用研究

英文题名：The Technologies of Association Rule Learning and Feedback and the Study on Their Applications of Network Security Audit System
作者：杨仁华
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：关联规则 ; 学习与反馈 ; Apriori ; 数据挖掘 ; 安全审计
英文关键词：association rules ; learning and feedback ; Apriori ; data mining ; security audit
学位年度：2010
导师：刘培玉
学科代码：081202
学位授予单位：山东师范大学
论文提交日期：2010-06-05

摘要

随着计算机技术、微电子技术、通信技术等科学技术的发展,特别是互联网(Internet)以其海量的信息资源、方便快捷高效的信息交流方式等技术的出现与发展,网络已成为人们学习生活的重要工具,但病毒、黑客以及其它不确定的危险问题一直威胁着网络信息安全,同时也时刻考验着人类在应对网络危险、保护网络安全等方面的智慧。
     网络安全审计在网络安全管理的过程中扮演着重要的角色,也是网络环境安全所必须支持的功能。目前国内外普遍采用的是基于专家特征检测方法对网络数据信息安全审计,此类方法存在效率低、对未知行为安全审计自适应能力差等不足。关联规则学习是一类新型的知识发现方法,已成为网络安全审计的重要研究方向。
     本文首先对国内外网络安全审计的现状及面临的问题、发展趋势进行了深入的学习与研究,学习了关联规则的相关概念及度量方法;研究了关联规则学习经典Apriori算法、FP-Growth算法及相关改进算法思想等,特别是对Apriori算法做了重点研究。Apriori算法采用逐层搜索迭代的方法在事务数据库D中寻找频繁项集,此算法理解与实现都非常容易,但存在两处致命缺陷,一是在整个过程中需要N次扫描事务数据库,二是产生较大规模的候选频繁项集。
     本文主要工作是针对Apriori算法的缺点提出了一种基于矩阵的Apriori关联规则学习改进算法,其特点是整个关联规则学习过程中只需要一次扫描事务数据库,同时不产生候选频繁项集。过程描述如下:改进算法对事务数据库D扫描一次,同时将D中的事务Tm与数据项Ik的关系转换成矩阵Matrix(i * j)结构关系,以布尔数据1和0表示数据项Ik是否包含于事务Tm中。改进算法核心思想是对布尔矩阵的行向量(Ik&Ii)进行逻辑与运算,对运算结果按1计数并比较最小支持度,从而得到相关的频繁项集;改进算法对“与”运算的结果进行相应的剪枝操作,以频繁1-项集为前提进行频繁k项集的挖掘学习,然后根据相关度量方法生成有效关联规则。其次,对改进关联规则学习算法及经典Apriori算法进行了仿真实验,实验分析表明改进算法能够有效减少关联规则学习的时间及空间复杂度。
     最后根据改进的关联规则学习算法的核心思想,对网络安全审计模型进行了设计,并给予实现。实验结果表明,改进的关联规则学习算法在网络安全审计的自适应能力上有较好表现,取得了预期效果。
With the rapid development of computer technology, microelectronics, communication technology and other kinds of sciences and technologies, especially the development of the Internet with its large amounts of information resources and rapid convenient efficient way to deliver information, the network has been an important tool in people’s study and day-to-day life, but computer viruses, hackers, and any other uncertain dangerous problems have been threatening the security of the information on the network, testing people’s wisdom to deal with the network danger and protecting the network security at the same time.
     The network security audit plays an important role in the process of network security management, and it is also a function that the network environment security must support. At present the network security audit systems for network data used widely at home and abroad are based on the expert feature detection methods. Its merit is that it can make accurate identification and judgment to the already known dangerous behavior patterns, and the defects are low detection efficiency and low automatic adaptation capability of the security audit of unknown behaviors and so on. Learning of association rules is a new method to discover new knowledge and has been a significant aspect of the study on network security audit.
     This thesis makes a further study on the status quo, the present problems, and developing trends in network security audits at home and abroad at the beginning, describes related conceptions and measure methods, studies the Apriori algorithm, FP-Growth algorithm and some improved algorithm ideas related in association rule learning, especially focuses on the Apriori algorithm. It casts about for frequent item sets in database D using the method of iterative search layer by layer. This algorithm is easy to understand and make it carry out. But it has two vital defects, one is that it needs N times to scan database D during the whole process, and the other one is that it produces lots of candidate frequent item sets.
     For the defects of Apriori, this thesis puts forward an improved matrix-based association rule learning algorithm, with the characteristics that it only needs one time to scan database during the whole learning process and does not produce any frequent item sets. The improved algorithm scans database D one time, switching the relationship of affair Tm and data item Ik to the structural relationship of Matrix (i*j), using Boolean data 1 and 0 to present whether data item Ik is included in affair Tm.. The core idea of the improved algorithm is to make logic and operation of the row vectors(Ik&Ii)in Boolean matrix, taking count of the result by 1 and comparing minimum support degree to achieve the corresponding frequent item sets. The improved algorithm makes some corresponding pruning operation on the result of logic and operation, taking frequent 1– item set as the prerequisite to mine and learn frequent k item set and then creates available association rules according to related measure methods. Then, for the improved association rule learning algorithm and the classical Apriori algorithm, this thesis makes simulation experiments. The results show that the improved algorithm can efficiently cut down the complexity of the time and space of the association rules learning.
     Finally, basing on the core idea of improved association rule learning algorithms, this thesis designs and achieves simple models of network security audit. The experiments show that the improved learning algorithm of association rules in network security audits have better performance of automatic adaptive capacity, achieving the expected effect.

引文

[1] http://research.cnnic.cn/html/1247710466d1051.html[EB/OL].中国互联网信息中心(CNNIC).
    [2] Control Objective for Information and Related Technology (COBIT)[M].Information System and Control Foundation,July 2000.
    [3]拉斯·克关德著.陈永剑等译.网络安全的最终解决方案[M].北京:电子工业出版社,2000.
    [4] NCSC-TG-11.Trusted Network Interpretation Environments Guideline[S].Library No.S235,465. Version 1.August 1990.
    [5] DoD 5200.28-STD.Trusted Computer System Evaluation Criteria[S].Library No.S225,711. NCSC,December 1985.
    [6] Information Technology Security Evaluation Criteria[S].Brussels:Office for Official Publications of the European Communities,June 1991.
    [7] GB17859-1999.计算机信息系统安全等级划分准则[S].北京:中国标准出版社,1999.
    [8] ISO/IEC 15408.Common Criteria for Information Technology Security Evaluation[S].1998.
    [9] GB/T 20945-2007.信息安全技术—信息系统安全审计产品技术要求和评价方法[S].北京:中国标准出版社,2007.
    [10] Anderson,Thane Frivold,Alfonso Valdes.Next-generation Intrusion Detection Expert System (NIDES)[C].SRU-CSK-95-07,May 1995.
    [11]覃爱明,胡昌振,谭惠民.数据挖掘技术在网络攻击检测中的应用[J].计算机工程与应用,2002年11月:177-180.
    [12]赵艳铎.关联规则算法研究及其在网络安全审计系统中的应用[D].北京:清华大学,2005.
    [13]史忠植.知识发现[M].北京:清华大学出版社,2002.
    [14]王莘.基于数据挖掘的审计日志分析技术研究[D].郑州:解放军信息工程大学,2007.
    [15]沙艳鑫.基于关联规则的网络安全审计技术研究[D].郑州:解放军信息工程大学,2005.
    [16]朱胜奎.基于日志数据挖掘的网络安全审计技术研究[D].济南:山东师范大学,2009.
    [17] http://msdn.microsoft.com/zh-cn/library/ms174949.aspx[EB/OL].数据挖掘概念.
    [18] Xindong Wu,Vipin Kumar.Top 10 Algorithms in Data Mining[C].ICDM 2006 Panel,2006.
    [19]杨仁华,刘培玉.基于日志的安全审计系统研究与实现[J].信息技术与信息化,2009,4:29-31.
    [20]瓦普尼克(美)著,许建华等译.统计学习理论[M].北京:电子工业出版社,2009.
    [21] Vladimir N.Vapnik.The Nature of Statistical Learning Theory[M].Springer-Varian New York,1995.
    [22]赵玲涛.基于内容的安全审计跟踪算法及其应用研究[D].上海:上海交通大学,2007.
    [23] R.Agrawal,T.Imielinski,A.Swmai.Mining Association Rules between Sets of Items in Large Databases[C].In:Proceedings of ACM SIGMOD Intl.Conf.on Management of Data,Washington, DC,1993.
    [24] R.Agrawal,R.Srikant.Fast Algorithms for Mining Association Rules[C].In:Proceedings of the 20th Intl.conf.on Very Large Databases,Chile,1994.
    [25] R.Agrawal,R.Srikant.Mining Sequential Patterns[C].In:proceedings of 15th Intl.conf.on data engineering,September 1995.
    [26] R.Agrawal,J.Shaefr.Parallel Mining of Association Rules[R].Desin,Implementation,and Experience.Technical Report FJ10004,IBM Almaden Research Center,CA95120,1996.
    [27] R.Agrawal,Heikii Mannila,R.Srikant.Fast Discovery of Association Rules[A].InU.M.Fayyad,Gregory Piatesky-Shapiro,Advances in Knowledge Discovery and Data Mining[M].California:AAAI/MIT Press, 1999.
    [28] Jiawei Han,Jian Pei and Yiwen Yin.Mining Frequent Patterns without Candidate Generation[C]. SIGMOD’00,2000.
    [29] Jiawei Han,Micheline Kamber(加)著,范明,孟小峰等译.数据挖掘:概念与技术[M].北京:机械工业出版社,2001.
    [30] R.Srikant and R.Agrawal.Mining Sequential Patterns:Generalizations and Patterns Performance Improvements[C].In:Proceedings of the 5th International Conference on Extending Database Technology.1996.
    [31]陈卓,杨炳儒等.序列模式挖掘综述[J].计算机应用研究,2008,25(7):1960-1963,1976.
    [32] Z.Pawlak.Rough Sets:Theoretical Aspects of Reasoning about Data[M].Kluwer Academic Publishers,Norwell,MA,1992.
    [33] Bnin.S and Page.L.The anatomy of a large-scale hyper textual Web search engine[C].In WWW-7, 1998.
    [34] Inokuchi A,Washio T,Motoda H.An apriori-based algorithm for mining frequent substructures from graph data[C].In:Proc.of the 4th European Conf.on Principles and Practice of Data Mining and Knowledge Discovery(PKDD-2000),2000.Proc.of the 8th ACM SIGKDD intl.conf.on Knowledge discovery and data mining. ACM press,2002:13-23.
    [35]王艳辉,吴斌,王柏.频繁子图挖掘算法综述[J].计算机科学,2005,32(10):193-196.
    [36]吕晓玲,谢邦昌著.数据挖掘方法与应用[M].北京:中国人民大学出版社,2009.
    [37]刘乃丽.基于FP-树的关联规则挖掘算法的设计与实现[D].济南:山东大学,2005.
    [38]吴承荣,廖健,张世永.网络安全审计系统的设计与实现[J].计算机工程,1999,25(Special Issue): 171-174.
    [39] Adriano Veloso Wagner Meira Jr.Srinivasan Parthasarathy.New Parallel Algorithms for Frequent Itemset Mining in Very Large Databases[C].Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’03) 2003 IEEE,Computer Society.
    [40]李超,余昭平.基于矩阵的Apriori算法改进[J].计算机工程,2006,32(23):68-69.
    [41] S.Park,M.S.Chen and P.S.Yu.An Effective Hash Based Algorithm for Mining Association Rules[C].Michael J.Carey and Donovan A.Schneider.Proceedings of the ACM-SIGMOD international Conference on Managent of Data(SIGMOD’95),San Jose,Califormia,1995.ACM Press Publisher, 1995:175-186.
    [42] J.Han and Y.Fu.Discovery of Multiple-Level Association Rules from Large Databases[C].Umeshwar Dayal,Peter M.D.Gray, and Shojiro Nishio.Proceeding of 21st International Conference on Very Large Databases (VLDB’95), Zurich,Switzerland,1995.MorganKaufmann Publisher,1995:420-431.
    [43] A.Savasere,E.Omiecinski and S.Navathe.An efficient algorithm for mining association rules in large databases[C].Umeshwar Dayal,Peter M.D.Gray,and Shojiro Nishio.Proceeding of the 21st International Conference on Very Large Databases (VLDB’95),Zurich,Switzerland,1995. Morgan Kaufmann Publisher,1995: 432-443.
    [44] H.Toivonen.Sampling large databases for association rules[C].T.M.Vijayaraman,Alejandro P.Buchmann,C.Mohan and Nandlal,L.Sarda.Proceedings of the 22nd International Conference on Very Large Databases (VLDB’96),Bombay,India,1996.Morgan Kaufmann Publisher,1996:134-135.
    [45] S.Brin,R.Motwani,J.D.Ullman and S.Tsur.Dynamic Itemset counting and implication rules for market basket data[C]. Joan Peckham. Proceedings of the 1997 ACM-SIGMOD International Conference On Management of Data (SIGMOD’97),Tucson,Arizona,1997.ACM Press Publisher, 1997:255-264.
    [46]朱意霞,姚力文,黄水源,黄龙军.基于排序矩阵和树的关联规则挖掘算法[J].计算机科学,2006, 33(7):196-198.
    [47] http://baike.baidu.com/view/17249.htm?fr=ala0_1_1[EB/OL].信息安全.
    [48]http://news.newhua.com/news1/safe_industry/2010/322/103229625JAA3553GA6226E8126EEHI72D5HJG8G5B21B9646H903D.html[EB/OL].华军软件园资讯中心.
    [49] Wenke Lee,Salvatore J.Stolfo.Data Mining Approaches for Intrusion Detection[J].Computer Science Department Columbia University.
    [50] Wenke Lee,Salvatore J.Stolfo,Philip K.Chan,Eleazar Eskin,Wei Fan,Matthew Miller,Shlomo Hershkop,and Junxin Zhang.Real Time Data Mining-based Intrusion Detection[C]. IEEE Symposium on Security and Privacy, 2001.
    [51] Wenke Lee,Salvatore J.Stolfo,Philip K.Wmok.A Data Mining framework for building Intrusion detection Models[C].IEEE Symposium on Security and Privacy,1999.
    [52] Wenke Lee,Salvatore J.Stolfo,Philip K.Wmok.Mining audit data to build intrusion deteetion models[C].In:Proeeedings of the 4th Inteniational Conference on Knowledge Discovery and Data Mining,New York,AAAI Press,1998.
    [53] White G B,Fish E A,Pooch U W.Cooperating Security Managers:A peer-based Intrusion Detection System[J].IEEE network,1996,10(1):20-24.
    [54] Teresa F,Lunt,Ann Tamaru,Fred Gilham,R.Jagannathan.A real time intrusion detection expert system[C].SRI International,Computer Science Lab,1992.
    [55]吕镇帮,吴广茂.计算机网络安全与安全审计技术研究[J].航空计算技术,1999,29(4):54-58.
    [56]辛义忠.基于数据挖掘的网络安全审计技术的研究与实现[D].沈阳:沈阳工业大学,2004.
    [57]薛张伟.基于反馈的入侵检测技术研究与设计[D].西安:西安电子科技大学,2005.
    [58]李洪培,王新梅.基于神经网络的入侵检测系统模型[J].西安电子科技大学学报,1999,7(5): 667-670.
    [59] Ted Hendy. An Information Assurance Architecture For Army Installations[J].IEEE,2000.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700