数据挖掘在入侵检测中的应用研究

作者：钱昱
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：数据挖掘 ; 入侵检测系统 ; Markov模型 ; 关联规则 ; 序列模式 ; 加权关联规则
英文关键词：Data Mining ; intrusion detection ; Markov Model ; Association Rules Sequence Model ; Weighted Association Rules
学位年度：2004
导师：郑诚
学科代码：081203
学位授予单位：安徽大学
论文提交日期：2004-05-10

摘要

入侵检测作为一种主动的信息安全保障措施，有效地弥补了传统安全防护技术的缺陷。通过构建动态的安全循环，可以最大限度地提高系统的安全保障能力，减少安全威胁对系统所造成的危害。
     入侵检测技术实质上归结为安全审计数据的处理。然而，操作系统的日益复杂化的网络数据流量的急剧膨胀，导致了安全审计数据同样以惊人的速度递增。使用数据挖掘技术从审计数据中提取出有利于进行判断的比较的特征模型，已是入侵检测研究的热点问题，具有重大的理论意义和实用价值。
     本文针对数据挖掘在入侵检测中的应用进行了研究工作。在第一章的绪论部分对数据挖掘技术、入侵检测系统进行了综述，之后概述了本文各章的研究内容，说明了本文的立题依据和意义。在第二章我介绍了数据挖掘及其相关问题，包括数据挖掘的过程、方法、分类和应用等。第三章是入侵检测及其相关问题，包括系统模块、分类、入侵检测技术及其研究现状。为后继章节的进一步展开和深入讨论奠定了基础。
     在第四章中我研究了Markov链在异常检测中的应用，此方法能在用户缺乏网络安全的领域知识的情况下，识别系统的异常行为，对于实际应用有着十分重要的意义。并从单步和多步Markov链两个方面给出了实验结果，证明了其应用于异常检测的可行性。
     在第五章中我集中介绍了关联规则在入侵检测中的应用。首先我研究了关联规则的各种实用算法及其改进算法，之后为了解决将关联规则算法应用于入侵检测系统提高系统检测率的同时也增加系统误报率的问题，论文给出加权关联规则详细的算法说明，并将此方法应用于入侵检测。
     第六章是序列模式在异常检测中的应用。首先介绍了国内外相关研究的现状，之后给出了AprioriAll算法和AprioriSome算法，并研究比较了两种算法的优缺点。最后给出了算法应用于异常检测的实验结果，证明了该方法的可行性。
     第七章是全文的总结和研究工作的展望。
     论文主要作了以下工作：

     数据挖掘l’l入浸十气则中的I征用研究
     (l)在应用Marko’’链方面，深入研究了Markov模型在安全领域知识缺
     乏情况下的异常行为的检测。并结合实验论证了其应用于异常检测的可行
     性。
     (2)在关联规则方面，论文利用DARPA于1 998年提供的数据挖掘出用户正
     常使用的规则模式，以此来检测异常行为。实验证明了其可行性。为了解
     决将关联规则算法应用于入侵检测系统提高系统检测率的同时也增加系统
     误报率的问题，论文把加权关联规则算法应用于入侵检测系统，此方法能在
     一定程度上提高了入侵检测的检测率也限制了无趣规则的产生，并降低了误
     报率。最后结合实验论证了方法的可行性。
     (3)在序列模式方面，论文给出了一种基于序列模式挖掘的异常检测方法。
     此方法应用于连接会话记录以及基于Unix的主机非常有效，本文对9个Un玫
     用户的实验表明了此方法的可行性。
As a kind of active measure of Information Assurance, Intrusion Detection acts as the effective complement to traditional protection techniques. The dynamic security circle, including policy, protection, detection and response, can greatly improve the assurance ability of information systems and reduce the extent of security threats.
    In fact, intrusion detection technology can be regarded as the analyze process of network's audit data. With the development of operating system and network technology, the network's audit data has increased sharply. So in intrusion detection, we need study efficacious technology to deal with audit data. In the current research, we use data mining technology to draw characteristic models from tremendous amount of audit data. The application of data mining technology has become one of the most important researches of intrusion detection.
    The paper makes some researches of the application of data mining to intrusion detection system. It summarizes data mining technologies and intrusion detection systems in the first chapter, and subsequently the main content which the later chapters refer to, and illustrate the foundation and significance of the thesis. In the second and third chapters it introduces data mining and related problems, including the process, method, classification and application of data mining. It introduces intrusion detection system. The introduction includes system model, classification and related technology. The two chapters is the foundation of the further research of the later chapters.
    The study on application of Markov chain to the anomaly detection is in the fourth chapter. The method can identify the anomaly behavior in the condition that the users possess little knowledge of network security. There is a very important significance applying the method to the practice. The chapter


    indicates the experimental result of the single-step Markov and multi-steps Markov. The experiment shows the feasibility of the method.
    In the fifth chapter it intensively introduces the application of association rules to intrusion detection. At first, a variety of applied algorithms and improved algorithms are studied. Subsequently, it in the paper gives weighted association rules algorithm in order to solve the problem of improving detection rate but increasing false positive rate when association rules are applied to the detection system. The method can to some extent improve detection rate of the intrusion detection system, confine the produce of uninteresting rules, and decrease false positive rate. Finally the feasibility of the method through the experiment is proved.
    In the sixth chapter the application of sequence models to anomaly detection is studied. To begin with, it summarizes the present researches outside and inside our country. Subsequently, it gives the description of the algorithm AprioriAll and AprioriSome, and indicates the difference between two algorithms and their advantages and disadvantages. At last, the paper shows the experimental result and proves the feasibility of the method.
    In the seventh chapter it summarizes the whole paper and make a prospect of our research.
    Main works of the paper:
    (1) In terms of the application of Markov chain, Markov chain model used for anomaly detection is deeply discussed. The experiments indicate that the model can detect anomaly system behavior under the condition of poor system security know ledge.
    (2) In terms of the association rules, according to the normal behavior models mined from the training data of DARPA in 1998, the experiments indicate that the method can detect anomaly users behavior. Subsequently, weighted association rules algorithm is given in order to solve the problem of improving detection rate but

    increasing false positive rate when association rules are applied to the detection system. The method can to some extent improve detection rate of the intrusion detection system, confine the produce of uninteresting rules, and decrease false

引文

[ANDER1998] R.Anderson, A.Khattak. The use of information retrieval techniques for intrusion detection. Web Proceedings of the First International Workshop on Recent Advances in Intrusion Detection (RAID'98). Available at: http://www.raid-symposium.org/raid98.
    [AGRAW1993] R Agrawai, T Imielinski, A Swami. Mining association rules between sets of items in large databases[C].In: Proceeding of the ACM-SIG-MOD International Conference on Management of data. Washington, DC,1993:207～216
    [AGRAW1995] R.Agrawal, R.srikant. Mining sequential patterns. Proceedings 11 International Conference on Data Engineering(ICDE), Taipei, 1995: 3-14.
    [AIS93] R.Agrawal, T.Imielinski, and A.Swami. Mining association rules between sets of items in large databases. In Proceedings of 1993 ACM-SIGMOD Int. Conf. Management of Data, pages207-216, Washington, DC, May 1993.
    [AS94] R.Agrawai and R.Srikant.Fast algorithms for mining association rules in large databases.In Research Report RJ19893, IBM Almaden Research Center, San Jose, CA, June 1994.
    [AS95] R.Agrawal and R.Srikant.Mining sequential patterns. In Proceedings of 1995 Int.Conf. Data Engineering(ICDE'95), pages3-14, Taipei,Mar.1995.
    [ARP96] A.Arning, Rakesh Agrawl, and P.Raghavan.A linear method for deviation detection in large databases. In Proceedings of 1996 Int. Conference on Knowledge Discovery in Databases and DataMining, Porland, Oregon, Aug.1996.
    [Bay98] R.J.Bayardo. Efficiently mining long patterns from databases, in Proceedings ofSIGMOD'98, pages85-93, Seattle, WA, June 1998.
    [BACE2000] R.Bace.EMell. Intrusion detection systems. NIST Special Publication on Intrusion Detection Systems. National Institute of Standards and

    Technology.2000.
    [BBK98]Stefan Berchtold, C.bohm, and H.kriegel. The Pyramid-Tree: Breaking the Curse of Dimensionality. In Proceedings of SIGMOD'98, pages 142-153, Seattle, WA, June 1998.
    [BR96]Ronald J.Brachman and Tej Anand. The Process of Knowledge Discovery in Databases. Advances in Knowledge Discovery and Data Mining, pages 37-57,1996.
    [Cle93]W.Cleveland. Visualizing Data. NJ: Hobart Press, 1993.
    [CCZ01]Chen Ning, Chen An, and Zhou Long-xiang. An Effective Clustering Algorithm in Large Transaction Databases. Journal of Softwvare. Vol.12, No.7. pages 476-484, 2001.7.
    [DARPA98]1998 DARPA Intrusion Detection Evaluation Data Set Overview. Available at: http://www.II.mit.edu/IST/ideval/data/1998/1998_data_index.html
    [DAVIS1997]B.D.Davison.H.Hirsh. Experiments in UNIX command prediction. Technical Report ML-TR-41. Department of Computer Science, Rutgers University.
    [DAVIS1998a]B.D.Davison.H.Hirsh, Probabilistic Online Action Prediction. In the 1998 AAAI Spring Symposium on Intelligent Environments. 1998.
    [DAVIS1998b]B.D.Davison.H.Hirsh. Predicting Sequences of User Actions. In the AAAI/ICML 1998 Workshop on Predicting the Future: AI Approaches to Time-Series Analysis. AAAI Press, pages: 5-12, July 1998.
    [Dev95]J.L.Devore. Probability and Statistics for Engineering and the Sciences, 4th edition. New York Duxbury Press, 1995.
    [EKSX96]M.Ester, H.P.Kriegel, J.Sander, and X.Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proceedings KDD'96, pages 226-231, Portland, OR, Aug.1996.
    [EP96]J.Elder Ⅳ and D.Pregibon. A statistical perspective on knowledge discovery in databases. Advances in Knowledge Discovery and Data Mining.

    AAAI/MIT Press, pages 83-115, 1996.
    [FHRS90]K.Fox.R.Henning, J.reed and R.Simonian. A neural network approach towards intrusion detection. Technical Report, Harris Corporation, 1990.
    [FORR1996]S.Forrest, S.A.Hofmeyr, A.Somayaji, etc. A sense of self for unix processes. In Proceedings of the 1996 IEEE Symposium on Security. and Privacy, CA: IEEE Computer Society Press, pages 120-128, 1996
    [GRS99]S.Guha, R.Rastogi, and K.Shim.Rock: A robust clustering algorithm for categorical attributes.In Proceedings of ICDE'99, pages 512-521, Sydney,Australia, Mar.1999
    [Hec96]D.Heckerman. Bayesian networks for data mining. Knowledge Discovery in Databases, Kluwer, Academic Publishers, Boston, pages 1-44, 1996.
    [HELM1997]P.Helman, J.Bhangoo. A statistically based system for prioritizing information exploration under uncertainty. IEEE Transactions on Systems, Man and Cybernetics, PartA: Systems and Humans, 27 (4): 449-466, 1997.
    [HF95]Han and Y.Fu. Discovery of multiple-level association rules from large databases. In Proceedings of VLDB'95, pages 420-431, Zurichu, Switzerland, Sept.1995.
    [HK98]A.Hinneburge and D.A. Keim. An efficient approach to clustering in large multimedia databases with noise. In Proceeding of KDD'98, pages 58-65,New York, Aug.1998.
    [HK2000]Han J, Kamber Morgan Kaufmann M. Data Mining: Concepts and Techniques 2000[M]. New York: Series Editor Morgan Kaufmann Publishers, 2000.1-200.
    [HOFM1998a]S.A.Hofmeyr, S.Forrest, A.Somayaji. Intrusion detection using sequences of system calls. Journal of Computer Security, 6: 151-180, 1998.
    [HPY00]J.Han, J.Pei and Y.Yin. Mining frequent patterns without candidate generation. In Proceedings of SIGMOD'00, pages 1-12, Dallas, TX, May 2000.
    [HPM00]J.Han, J.Pei and -B.Mortazavi-Asl.Freespan: Frequent pattern-projected sequential pattern mining, in Proceedings of KDD'00, pages

    355-359.Boston, MA, Aug.2000.
    [HMK01lJiawei Han，Micheline Kamber著，范明、孟小峰等译，数据挖掘——概念与技术，机械工业出版社，2001．
    [KH95]K.Koperski and J.Han. Discover" of spatial association rules in geographic information databases. In Proceedings of 4th Int. Symp. Large Spatial Databases(SSD'95). Pages 47-66, Portland, ME, Aug.1995.
    [KN97]E.Knorr and R.Ng. A Unified Approach For Mining Outliers. Properties and computation. In Proceedings of 1997 Int Conf. Knowledge Discovery and Data Mining(KDD'97), pages 219-222, Newport Beach, California, 1997.
    [KORVE1999]B.Korvemaker, R.Greiner. The trails and tribulations of building and adaptive user interface. Available at: http://www.cs.ualberta.ca/～greiner/PAPERS/AdaptUI.ps, 1999.
    [KORVE2000]B.Korvemaker, R.Greiner. Predicting UNIX command lines: adjusting-to user patterns. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 230-235, Austin, TX, July 2000. AAAI Press.
    [KR90]L.Kaufman and P.J.Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis New York: John Wiley&Sons, 1990.
    [LANE1997]T.Lane, C.E.Brodlev. Sequence matching and learning in anomaly detection for computer security. In AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, pages 43-49, AAAI Press, July 1997.
    [LANEI999]T.Lane. Hidden markov models for human computer interface modeling. In Proceedings of the IJCAI-99 Workshop on Learning about Users, pages 35-44, 1999
    [LEEI997]W.Lee, S.J.Stolfo, P.K.Chan. Learning Patterns from UNIX process execution traces for intrusion detection. In AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, AAAI press: pages 50-56, 1997
    [LEEI998a]W.Lee, S.J.Stolfo. Data .Mining approaches for intrusion detection. in Proceedings of the 7th USENIX Security Symposium, 1998.


    [LEE1998b]W.Lee, S.J.Stolfo, K.W.Mok. Mining audit data to build intrusion detection models. 4th International Conference on Knowledge Discovery and Data Mining. New York, 1998: 66-72.
    [LEEI999a]W.Lee. A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems. PhD thesis, Columbia University, June 1999.
    [LEEI999b]W.Lee, C.T.Park, S.J.Stolfo. Automated intrusion detection methods using NFR. In Workshop on Intrusion Detection and Network Monitoring 1999, Available at: http://www.securityfocus.com/data/library/nfr_nid.ps
    [LEE1999c]W.Lee, C.T.Park, S.J.Stolfo. Automated intrusion detection methods using NFR, In Workshop on Intrusion Detection and Network Monitoring 1999, Available at: http://www.securityfocus.com/data/library/nfr_nid.ps.
    [LHF98]H.Lu, J.Han, and L.Feng. Stock movement and n-dimensional intertransaction association rules. In Proceedings of SIGMOD98 Workshop on Research Issues on Data Mining and Knowledge Discovery, pages 12:1-12:7, Seattle, WA, June 1998.
    [LFH2000]Lippmann R, Fried D, Graf I, Hanines J, et aI.Evaluating Intrusion Detection System: The 1998 DARPA Off-Line Intrusion Detection valuation[A].In: Dan.Schnackenberg, Boeing.DARPA.Information Survivability Conference and Exposition[C].Hilton Head: AAAI Press,2000.323-325.
    [LIAN2002]连一峰，戴英侠，王航．基于模式挖掘的用户行为异常检测．计算机学报．Vol．25 No．3，2002年3月。
    [LSL96]H.J.Lu, R.Setiono, and H.Liu. Neural rule: a connectionist approach to data mining. In Proceedings of the 21th VLDB Conference. Zurich, Switzerland, pages 478-489, 1996o
    [李谭解李庄02]李斌、谭立湘、解光军、李海鹰、庄镇泉，非同步多时间序列频繁模式的发现算法，软件学报，Vol．13 No．3，pages．110-416，2003．3。
    [李苑董全01]李雄飞、苑森淼、董立岩、全勃，多段支持度数据挖掘算法研究，计算机学报，Vol．24 No．6，pages 661-665，2001．6


    [Mac67]J.MacQueen. Some methodsfor classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symp. Math. Statist,Prob., I: 281-297, 1967.
    [Me96]L.Me.Genetic Algorithms. A biologically inspired approach for security audit trails analysis. Short paper, In Proceedings of 1996 IEEE Symposium on Security andPrivacy, Oakland, CA, May 1996.
    [MJ2000]Mukkamala R, Gagnon J, Jajodia S. Integrating data mining techniques with intrusion detection methods[A]. In: Vijay Atluri and John Hale, Research Advances in Database and Information System Security[C].Boston,MA: KluwerPublishers, 2000.33-46
    [NYE2000]Nong Ye. A Markov chain model of temporal behavior for anomaly detection[A]. Proceedings of the 2000 IEEE workshop on Information Assurance and Security[C], pages 171-174, 2000.
    [PCY95]J.S.Park, M.S.Chen, and P.S.Yu. An effective hash-based algorithm for mining association rules. In Proceedings of SIGMOD'95, pages 175-186, San Jose, CA, May 1995.
    [秦李01]秦敏、李治柱，对数据挖掘关联分析的剪裁，上海交通大学学报，Vol．35 No．9，pages 1373-1376，2001．9。
    [Qui86]J.R.Quinlan. Induction of decision trees. Machine Learning, I:18-106,1986.
    [Qui93]J.R.Quinlan.C4.5:Programs for Machine Learning. San Mateo, CA:Morgan Kaufmann, 1993
    [卜白李02]卜东波、白硕、李国杰、聚类／分类中的粒度原理，计算机学报，Vol．25No．8，pages 810-816，2002．8．
    [SA96]R.Srikant and R.Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of SIGMOD'96, pages 1-12, Montreal, Canada,June 1996.
    [SCZ98]G.Sheikholeslami, S.Chatterjee, and A.Zhang. WaveCluster: A muitiresolution clustering approach for vety large spatial databases. In Proc. Of

    VLDB'98, pages428-439, New York, Aug.1998.
    [SD90]J.W.Shavlik and T.G.Dietterich. Readings in Machine Learning. San Mateo, CA: Morgan Kaufmanne, 1990
    [SM03]A.H.Sung and Srinivas Mukkamala. Identifying Important Features for Intrusion Detection Using Support Vector Machines and Neural Networks. In Proceedings of 2003 Symposium on Applications and the Internet, pages 209-216, Orlando, Florida, Jan.2003.
    [SNYDER2001]D.Snyder. On-line Intrusion Detection Using sequences of System Calls. Master's thesis, Department of Computer Science, Florida State University., 2001
    [SPT1997]Stolfo S J, Prodomidis A L, Tselepis S, et, al. Jam. java agents for meta-learning over distributed databases[A]. In: Newvport Beach. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining[C]. CA: AAAI Press, 1997.33-34
    [UHMM2003]ChangChu Zou,Using hidden Markov model in anomaly intrusion detection[J/OL],Available at: http://tennis.ecs.umass.edu/～czou/research/HMM,
    [UUD1998]The UCI KDD Archive(1998). UNIX User Data Available at: http://kdd.ics.uci.edu/databases/UNIX_user_data/UNIX_user_data.html
    [WARREN1999]C.Warrender, S.Forrest, B.Pearlmutter. Detecting intrusions using system calls: Alternative data models. In Proceedings of IEEE Symposium on Security and Privacy, pages 133-145, Oakland, California, May9-12 1999.
    [XIE1995]谢锦辉，隐Markov模型(HMM)及其在语音处理中的应用[M]，武汉：华中理工大学出版社，1995。
    [张周焦02]张莉、周伟达、焦李成，核聚类算法，计算机学报，Vol．25 No．6，pages 587-590，2002．6．
    [钱郑01]钱昱、郑诚，基于关联规则的网络入侵检测，安徽大学学报，Vol．26 No．4，pages 361-40．2002．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700