数据挖掘中模糊聚类算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据挖掘不同于传统的数据处理技术,它能够从大量的信息和数据中分析和提取出有用的知识,来帮助人们做出决策。数据挖掘是目前信息领域和数据库技术的前沿研究课题,被公认为是最具发展前景的关键技术之一。
     作为数据挖掘主要方法之一的聚类分析技术,也随着数据挖掘技术的研究和发展,越来越受到人们的关注。聚类分析是将数据合理归类的一种方法,目前,已提出的聚类分析算法很多,本文对其中最常用的基于目标函数的模糊C-均值聚类分析算法进行了深入研究,针对其算法存在的不足,进行了一些新的改进。
     首先,针对模糊C-均值聚类分析算法中将数据集隶属度概率和为1的条件用于模糊性事件时,影响聚类正确率的情况,采用可能性理论作为理论基础,提出了一种新的基于隶属关系不确定的可能性改进模糊聚类算法,该算法将可能隶属度与不确定性隶属度引入到目标函数中,使得样本中的元素不局限于仅属于一个聚类,更符合现实情况。其次,针对模糊C-均值聚类分析算法中采用欧式距离进行相异性度量,只能对椭球状分布的数据进行聚类的局限性,采用马式距离进行相异性度量,同时采用输入数据矩阵化,从而能处理更多的数据模式,扩大了聚类的适用范围。
     为验证本论文提出的方法的有效性,对其进行了实验。从实验结果来看,达到了预期的效果。
Data Mining is different from traditional data processing techniques, because it can analyse and pick up useful knowledge from a mass of information, which can help man make correct decision. Data Mining is a superior area in the information and database technology, and is usually considered as one of the key technology with wild developing perspective.
     Being the most important techniques of Data Mining, clustering analysis is more and more attentioned. Clustering analysis is a frequently-used technique which can classifying data in reason, at present, there are many clustering analysis techniques, this paper researches on Fuzzy Clustering-Means algorithm, and proposes an improved algorithm based on the disadvantage of FCM.
     Firstly, since the condition that the sum of possible membership degree of data set is 1 will make negative effect on the correction ratio of fuzzy clustering in fuzzy events, some research on the membership degree of data based on uncertain the theory. Possible membership degree and uncertain membership degree are introduced into this algorithm's object function, which makes the element sample not longer belong to one cluster only and leads to more preferable results than current clustering algorithms. Secondly, using Mahalanobis space widens the application of the FCM. Lastly, changing the object vector to matrix adepts the algorithm to more data model.
     Through the experiments, the improved algorithm achieves expected purpose.
引文
[1]Margart H,Dunham.DATA MINING Introductory and Advanced Topics.北京:清华大学出版社,2003:3-71P
    [2]Jiawei Han,Micheline Kamber.数据挖掘-概念与技术[M].范明,孟小峰等译.北京:机械工业出版社,2001:1-76页
    [3]Pal N R,Pal K,Bezdek J C.A new hybrid C-means clustering model[C]//Proceedings of the IEEE International Conference on Fuzzy Systems.Piscataway:IEEE Press,2004:179-184P
    [4]Pal N R,Pal K,Bezdek J C.A possibilistic fuzzy C-means clustering algorithm [J].IEEE Trans Fuzzy Systems,2005,13(4):517-530P
    [5]邵峰晶,于忠清.数据挖掘原理与算法.北京:中国水利水电出版社,2003:226-268页
    [6]Zhou Z.H.Three perspectives of data mining[J].Artificial Intelligence.2003,143(1):139-146P
    [7]Lee W,Stolfo S J,Mok K W.A.Data Mining Framework for Building Intrusion Detection Models.In:Proceedings of the 1999 IEEE Symposium on Security and Privacy.1999:120-132P
    [8]覃宝灵.聚类分析技术及其应用研究.广西工学院学报.2007,18(3):105-107页
    [9]Zhang J S,Leung Y W,Improved possibilistic c-means clustering algorithms,IEEE Trans.On Fuzzy Systems,2004,12(2):209-227P
    [10]Raymond T Ng,Jiawei Han.Eficient and efective clustering Methods for spatial data mining[A].Proceedings of the International Very Large Databases Conference[C].1994:144-155P
    [11]Timm H,Kruse R.A modification to improve possibilistic fuzzy cluster analysis [C]//The 2002 IEEE International Conference on Fuzzy Systems.Piscataway,2002:1460-1465P
    [12]Pal N R,Bezdek J C.On cluster validity for the fuzzy c-means mode[J],IEEE Transactons on Fuzzy Systems,1995,3(3):370-379P
    [13]Portnoy L,Eskin E,Stolfo S J.In trusion detection with unlabeld data using clustering[A].In:Proc.of ACM CSS Workshop on Data Mining Applied to Security[C].philadelphia:ACM Press,2001(11):5-8P
    [14]Mu-Chun Su.A modified Version for k-means.IEEE Transactons on Pattern Analysis and Machine Intelligence.2001,23(6):674-680P
    [15]俞剑平,闫巧.入侵检测系统的研究和发展方向.信息安全与通信保密.2002(5):30-35页
    [16]Barbara D,Couto J,Jajodia Setal.ADAM Detecting Intrusions by Data Mining.In:Proceedings of the IEEE Workshop on Information Assurance and Security,2001:172-185P
    [17]KDD Cup 1999 Data.http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
    [18]Hathaway R J,Bezdek J C,Hu Y K.Generalized fuzzy c-means clustering strategies using norm distances,IEEE Trans.on Fuzzy Systems.2000,8(5):576-582P
    [19]Lee W,Stolfo S J,Mok K W.Data Mining Approaches for Intrusion Detection.Proceedings of the Seventh USENIX Security Symposium(SECURI-TY '98),1998:120-132P
    [20]J Luo,S Bridges.Mining fuzzy association roles and fttzzy frequency episodes forintmsion detection[C].International journal of Intelligent system,2000,15(8):678-703P
    [21]于剑,程乾生.关于聚类有效性函数FP(u,c)的研究.电子学报.2001,29(7):1-4页
    [22]张敏,于剑.基于划分的模糊聚类算法.软件学报.2004,15(6):858-868页
    [23]谷淑化,吕维先,马于涛.关于数据挖掘中聚类分析算法的比较.现代计算机.2005:26-29页
    [24]EugeneHSpaford,DiegoZamboni.Intrusion Detection Using Autonomous Agent [J].Computer New York.2000,34(4):547-570P
    [25]向继,高能,荆继武.聚类算法在网络入侵检测中的应用.计算机工程. 2003,29(16):48-50页
    [26]E.Eskin,A.Arnold,M,Prerau,et al.Application of Data Mining in Computer Security.New York:Kluwer Academic Pub.2002:78-103P
    [27]A.Hinneburg.D.A.Keim.Clustering Method for Large Databases:From the Past to the Future.In:Proceedings ACMSIGMOD International Conference on Management of Data,New York,1999:201-212P
    [28]M.Ester,H.P.Kriegel,J.Sander,etal.A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.In:Proceedings of 2~(nd) International Conference on Knowledge Discovery and Data Mining,California,1996:226-231P
    [29]WenLkeLee,SalvatoreJ,Stolfo.Data Mining APProaches for Intrusion Detection.Proeeedings of the 7~(th) USENIX Security SymPosium,1998:6-9P
    [30]Srilatha,Chebrolu,AjithAbraham,Johnson P.Tomas.Feature deduction and ensemble design of intrussion detection system.Computers& Security.2005(24):295-307P
    [31]杨德刚.基于模糊C均值聚类的网络入侵检测算法.计算机科学.2005,32(1):86-88页
    [32]刘燕,姜建国.模糊聚类在入侵检测中的应用.人工智能及识别技术.2006:195页
    [33]中国人民大学统计系数据挖掘中心.数据挖掘中的聚类分析.统计与信息论坛.2002,17(3):4-6页
    [34]陈健美,宋顺林,路鑫,宋全庆,朱玉金.改进模糊聚类算法及其在入侵检测中的应用.东南大学学报.2007,37(4):589-592页
    [35]沈世铭.数据挖掘技术在入侵检测中的研究.硕士学位论文.2007:30-34页

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700