入侵检测中基于密度的数据流聚类算法研究

英文题名：Research on Density-Based Clustering Algorithm of Data Streams in Intrusion Detection
作者：王彦涛
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：入侵检测 ; 数据流挖掘 ; 数据流聚类算法 ; D-Stream算法
英文关键词：intrusion detection ; data stream mining ; clustering algorithm of data streams ; D-Stream algorithm
学位年度：2011
导师：张凤斌
学科代码：081202
学位授予单位：哈尔滨理工大学
论文提交日期：2011-03-01

摘要

随着计算机的普及和网络技术的迅速发展,网络给人们带来利益的同时,也遭受着多种形式的攻击。入侵检测作为主动的安全防护技术,有效地阻止了各种攻击。目前数据流挖掘得到人们越来越多的重视,在数据流上建立模型,进行实时挖掘,这对于入侵检测来说很重要。
     数据流聚类算法是数据流挖掘的一个重点发展方向,利用数据流聚类算法建立的入侵检测模型,能够实时更新入侵检测规则库。所以将数据流聚类技术应用到入侵检测具有重大的现实意义。然而目前的数据流聚类算法存在着很多的缺点,本文以D-Stream算法为研究背景,分析了算法存在的缺点和不足,以提高入侵检测系统的检测率,降低误报率为目标,通过对算法进行改进使其更好地满足入侵检测的需要。
     首先,分析了当前入侵检测系统的发展现状和存在的问题、数据流挖掘的相关技术、数据流聚类算法的特点及入侵检测对数据流聚类算法的要求,为后文介绍提供了理论基础。
     然后,通过对D-Stream算法进行研究,给出了一种基于密度的数据流聚类算法M-Stream。针对Cosine相似度和Minkowski距离的特点,引进频度和摘要信息概念,提出了一种度量混合属性数据相似性的度量方法。针对算法的时空复杂度问题,算法采用树和哈希表来存储结点和指针。针对参数设置问题,提出了一种密度阈值函数,使数据流聚类在固定内存约束内进行。针对离线聚类问题,通过扩展邻居细胞概念来进行聚类,并通过内存抽样方法来发现演化的簇。
     最后,根据数据流的特点,设计了一个适合于数据流聚类的入侵检测模型,并采用后台学习的方式实时更新规则库。通过在KDD CUP1999数据集上的实验,表明了该算法优于以前的算法,达到了预期的效果。
Along with the popularity of computer and rapid development of the network technology. Network brings interests to people, but also suffers from various forms of attack. Intrusion detection works as a proactive security technology, effectively prevents various attacks. Data stream mining gets more and more recognition from people. Building model on data stream, doing mining real time, which is very important for intrusion detection.
     Clustering algorithm of data streams is a key development direction of the data stream mining. Using clustering algorithm of data streams builds the intrusion detection model, which can update intrusion detection rule library real time. So put the clustering algorithm of data streams to intrusion detection has the significant practical significance. However the current clustering algorithm of data streams exists many shortcomings, based on the D-Stream algorithm as the research background, analyzing the algorithm’s shortcomings and the insufficiency. The goal is to make the intrusion detection system has high detection rate, low false alarm rate. Through improving the algorithm, makes its better meet the needs of intrusion detection.
     Firstly, this article analyses the development status of the current intrusion detection system and problems、related technologies of the data stream mining、the characteristics of the clustering algorithm of data streams and the requirement of the clustering algorithm of data streams for intrusion detection, which is providing theory basis for the after article.
     Secondly, through reseraching on the D-Stream algorithm, this paper presents a density-based clustering algorithm of data streams which is M-Stream. According to the Cosine similarity and the feature of the Minkowski distance and importing the concepts of the frequency and summary information, presents a similarity measurement method between the mixed attribute dataes. Aiming at the time and space complexity problems of the algorithm., which adopts trees and hash table for storing nodes and pointer. Aiming at the parameter setting problem, this paper proposes a density thresholding function, making the clustering of data streams execute in fixed memory within the constraints. Aiming at the off-line clustering problem, through extending neighbor cells concept to cluster, through memory sampling method to find the evolution of the cluster.
     Finally, according to the characteristics of the data stream, this paper designs a suitable intrusion detection model based on clustering algorithm of data streams, using the backend learning to update the rule library. Using the KDD CUP1999 datasets to test the system, the experimental results show that the method is better than previous algorithm and achieves the desired target.

引文

[1]蒋盛益.基于聚类的入侵检测算法研究[M].北京:科学出版社,2008:3-5.
    [2]李洋.K-means聚类算法在入侵检测中的应用[J].计算机工程,2007,33(14):154-155.
    [3]宋群,张骏,邓正宏.基于偏斜数据流分类的入侵检测方法[J],2009,27(6):859-861.
    [4]毛国君,宗东军.基于多维数据流挖掘技术的入侵检测模型与算法[J].计算机研究与发展,2009,46(4):602-609.
    [5] GANG X,MINGXIA Z.A Novel Method of Outliers within Data Streams Based on Clustering Evolving Model for Detecting Intrusion Attacks of Unknown Type[C].International Conference on Multimedia Information Networking and Security,IEEE Press,2010:579-583.
    [6] CHRISTINE D,HYUN I J,WENJUN Z.A New Data-Mining Based Approach for Network Intrusion Detection[C].Communication Networks and Services Research Conference,Columbia,MO,2009:372-377.
    [7]查全民,汪荣贵,何畏.基于量子遗传聚类的入侵检测方法木[J].计算机应用研究,2010,27(1):240-246.
    [8]薛潇,刘以安,阚媛,等.基于FCM-GRNN聚类的入侵检测算法研究[J].计算机仿真,2010,27(6):151-154.
    [9] LEE W,STOLFO S J,MOK K.Mining Audit Data to Build Intrusion Detection TnodelsV[C].Pros. 4th International Conf on Knowledge Discovery and Bata Mining(KDI98),New York City NY,1998:66-72.
    [10] BARBARA D,COUTO J,JAJODIA S,et al.ADAM: A Testbed for Exploring the Use of Data Mining in Intrusion Detection[J].SIGMOD,2001,30(4):15-24.
    [11] XIANGYANG L . Clustering and Classification Algorithm for Computer Intrusion Detection[D].Arizona State University,2001.
    [12] PORTNOY L,ESKIN E,STOLFO S J,Intrusion Detection with Unlabeled Data Using Clustering[C].Proceedings of ACM CSS Workshop on Data Mining Applied to Security(DMSA 2001),Philadelphia,PA,2001:5-8.
    [13] SEQUEIRA K , ZAKI M . ADMIT: Anomaly-based Data Mining forIntrusions[C].Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Edmonton Alberta Canada,2002: 386-395.
    [14] JOSHI M V,AGARWAL R C,KUMAR V.Mining Needles in a Haystack: Classifying Rare Classes via two-phase rule induction[C].In Pro. of ACM SIGMOD Conference,Santa Barbara,CA,2001:91-102.
    [15] SEMARY A E,EDMAONDS J,GONZALEZ P J,et al.Applying Data Mining of Fuzzy Association Rules to Network Intrusion Detection[C].IEEE Proc On Information Assurance,West Point,New York,2006:100-107.
    [16] SANG H O,WON S L.Anomaly Information Detection Based on Dynamic Cluster Updating[J].Advances in Knowledge Discovery and Data Mining,2007,6:737-744.
    [17] BIRANT D , KUT A . ST-DBSCAN: An Algorithm for Clustering Spatial-temporal Data[J].Data and Knowledge Engineering,2007,60(1): 208-221.
    [18] GUDADHE M,PRASAD P,WANKHADE K.A New Data Mining Based Network Intrusion Detection Model[C] . Int'l Conf on Computer&Communication Technology,Nagpur,India,2010:731-735.
    [19]樊龙,薛贺,陈乐.入侵检测系统中的数值型关联规则挖掘方法[J].微计算机应用,2007,1(1):22-26.
    [20]党小超,郝占军,王筱娟.基于簇连接度聚类算法的入侵检测[J].计算机工程与应用,2010,46(21):82-86.
    [21]孙大朋.改进的模糊聚类算法在入侵检测中的研究[J].计算机与数字工程,2010,38(3):88-91.
    [22] GIANNELLA C,HAN J,ROBERTSON E,et al.Mining Frequent Itemsets over Arbitrary Time Intervals in Data Streams[R].Bloomington: Indiana University, 2003.
    [23] CHU N.C.N,WILLIAMS A,ALHAJJ R,et al.Data Stream Mining Architecture for Network Intrusion Detection[C].Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration,Canada,2004: 142-150.
    [24] OH S H,KANG J S,BYUN Y C,et al.Intrusion Detection Based on Clustering a Data Stream[C].Proceedings of the Third ACIS Int’l Conference on SoftwareEngineering Research,Management and Applications(SERA’05),South Korea, 2005:220-227.
    [25] SPINOSA E J,CARVAIHO A P,GAMA J.Cluster-based Novel Concept Detection in Data Streams Applied to Intrusion Detection in Computer Networks[C].Proceedings of the ACM Symposium on Applied Computing, New York,NY,USA,2008:976-980.
    [26] WANG W,GUYET T,KANPSKOG S J.Autonomic Intrusion Detection System[C],Proceedings of 12th International Symposium On Recent Advances in Intrusion Detection (RAID),2009:359-361.
    [27] SMITH M,SCHWARZER F,HARBACH M,et al.A Streaming Intrusion Detection System for Grid Computing Environments[C].IEEE International Conference on High Performance Computing and Communications,Marburg, Germany,2009:44-51.
    [28] HOFMANN A,SICK B.Online Intrusion Alert Aggregation with Generative Data Stream Modeling[J].Dendable and Secure Computing,2009,8(2): 282-293.
    [29]郑军,胡铭曾,云晓春,等.基于数据流方法的大规模网络异常发现[J].通信学报,2006,27(2):1-8.
    [30]郭山清,谢立,曾春佩.入侵检测在线规则生成模型[J].计算机学报,2006,29(9):1523-1532.
    [31]俞研,郭山清,黄浩.基于数据流的异常入侵检测[J].计算机科学,2007,34(5):66-71.
    [32]何勇.基于动态网格的数据流聚类算法及其应用研究[D].长沙:国防科学技术大学研究生院硕士学位论文,2008:21-51.
    [33]毛国君,宗东军.基于多维数据流挖掘技术的入侵检测模型与算法[J].计算机研究与发展,2009,46(4):602-609.
    [34]郑盈盈.移动网格聚类分析及其在数据流管理中的应用研究[D].合肥:合肥工业大学硕士学位论文,2009:25-41.
    [35] YUNYUE Z,DENNIS S.Statstream: Statistical Monitoring of Thousands of Data Streams in Real Time[C].Proceeding of the 28th International Conference on Very Large Data Bases,Hong Kong,China,2002:358-369.
    [36]陈春燕,张伟.应用摘要层次结构的数据流聚类算法[J].计算机应用与软件,2007,24(10):176-178.
    [37] O’CALLAGHAN L,MISHRA N,MEYERSON A,et al.Streaming Data Algorithms for High-quality Clustering[C] . Proceedings of the 18th International Conference on Data Engineering,San Jose,CA,USA,2000: 685-704.
    [38] AGGARWAL C,HAN J,WANG J,et al.A Framework for Clustering Evolving Data Streams[C].Proceedings of the 29th International Conference on Very Large Data Bases,Berlin,Germany,2003:81-92.
    [39] UDOMMANETANAKI K,RAKTHANMANON T,WAIYAMAI K.E-Stream: Evolution-Based Technique for Stream Clustering[J].LNCS,2007,4362: 605-615.
    [40] CAO F,ESTER M,QIAN W,et al.Density-Based Clustering over an Evolving Data Stream with Noise[C].Proceedings of the SIAM Conference on Data Ming,Sparks,Nevada,USA,2006:326-337.
    [41] CHEN Y,TU L.Density-Based Clustering for Real-Time Stream Data[C].In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,San Jose,California,USA,2007:133-135.
    [42]杨宁,唐常杰,王悦,等.一种基于时态密度的倾斜分布数据流聚类算法[J].软件学报,2010,21(5):1031-1041.
    [43]陈华辉,施伯乐,钱江波,等.基于小波概要的并行数据流聚类[J].软件学报,2010,21(4):644-658.
    [44] KDDCUP1999 datasets [DB/OL].California: Information a Computer Science University of California , 1999-10-28[2011-02-09] . http://kdd.ics.uci.edu/ databases/ kddcup99/Kddcup99.html.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700