基于FCM聚类的算法改进

英文题名：Improvement Based on FCM Clustering Algorithm
作者：宁绍芬
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：聚类分析 ; 模糊聚类 ; 模糊C-均值(FCM)算法 ; 初始聚类中心
英文关键词：Clustering Analysis ; Fuzzy Clustering ; FCM ; Initial Clustering Centers
学位年度：2007
导师：姬光荣
学科代码：081001
学位授予单位：中国海洋大学

摘要

聚类分析是数据挖掘中的一个重要研究领域,是一种数据划分或分组处理的重要手段和方法。聚类的应用是非常广泛的,无论是在商务领域,还是在生物学、Web文档分类、图像处理等其它领域,都得到了有效的应用。目前聚类算法大体上分为基于图论的方法、基于层次的方法、基于密度的方法、基于网格的方法、基于模型的方法和基于划分的方法。
     模糊C-均值(FCM)聚类算法是非监督模式识别中应用最为广泛的算法之一。由于该算法是通过极小化目标函数而求得最优解的。该算法随机选取C(C为聚类数)个点作为初始聚类中心,通过一个迭代过程完成聚类。该算法也有它固有的不足:算法在进行聚类以前要求知道C值,这对于没有经验的用户来说很困难;初始聚类中心的选择对于最后的聚类结果有很大的影响,如果初始聚类中心选择不当,目标函数有可能得不到全局最优,而陷入局部极小值。
     此文首先对常用的几种聚类算法分别进行了介绍,并举例说明。然后重点讨论了基于FCM聚类的算法改进。试图从几个方面对FCM聚类进行改进:C的选择;初始聚类中心的选取;用类核代替类心;修改距离测度函数以及修改隶属度m的值。实验采用聚类中常用的IRIS数据集来测试改进算法,并且和标准FCM算法进行了比较,证实了该算法的有效性。最后简单讨论了FCM聚类在海雾识别中的应用。
Clustering is an important area of application for a variety of fields including data mining. It is also an important method of data partition or grouping. Clustering has been used in various ways including commerce, market analysis, biology, Web classification and so on. Clustering algorithms can be divided into graph-based, hierarchical, density-based, grid-based ,model-based and partitioning based algorithms.
     Fuzzy c -Mean (FCM) clustering algorithm is one of the widely applied algorithms in unsupervised model recognition fields. As well known, the optimal solution of FCM algorithm is obtained by minimizing the objective function. FCM clustering starts with selecting C initial clustering centers randomly(C is the number of clusters) and continue the algorithm by looping. FCM clustering is not perfect, either. Before using it, people need to know the number of clusters and good selection of initial cluster centers. If bad initial centers are picked, the objective function of FCM algorithm will not go to a minimum value.
     In this paper, several frequently used clustering algorithm are firstly discussed with one example. Then as the emphasis, improvement methods are introduced. In details, it concludes how to decide the number of clusters; how to get good initial clustering centers; To replace initial centers with cores of the clusters; To improve the“definition”of distance and to modify the membership value-m. Later on it is proved the improvement effect by using IRIS dataset, which is often used in clustering analysis. At last application of FCM in sea fog recognition is simply presented.

引文

[1]高新波模糊聚类分析极其应用西安电子科技大学出版社2004年1月
    [2]李金宗模式识别高等教育出版社1994年7月
    [3] Jiawei Han, Micheline Kamber著范明,孟小峰等译Data Mining Concepts and Techniques机械工业出版社2001年8月
    [4]朱明编著数据挖掘中国科学技术大学出版社2002年5月
    [5]陈安陈宁周龙骧等数据挖掘技术及应用科学出版社2006年3月
    [6]苏新宁杨建林江念南粟湘数据仓库和数据挖掘清华大学出版社2006年4月
    [7] Ian H.Witten Eibe Frank著董林邱泉于晓风吴韶群孙立骏译机械工业出版社2006年7月
    [8] Ester M, Kriegel HP, Sander J, Xu X. A density based algorithm for discovering clusters in large spatial databases with noise Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining AAAI Press 1996年226~231
    [9]曹步文,刘先锋,汤小康数据挖掘技术-聚类算法研究计算机与现代化2006年第11期
    [10] M.A.Hearst J.O.Pedersen Reexamining the cluster hypothesis: Scatter/Gather on retrieval results Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96) 1996年pp76-84.
    [11]叶吉祥谭冠政路秋静基于核的非凸数据模糊K-均值聚类研究计算机工程与设计第26卷第7期
    [12] Kuo-Lung Wu Miin-Shen Yang Alternative c-means clustering algorithms Department of Mathematics Chung Yuan Christian University Taiwan 2000
    [13]傅刚王菁茜张美根郭敬天郭明克郭可彩一次黄海海雾事件的观测与数值模拟研究中国海洋大学学报第34卷第5期
    [14] Jiu-Lun Fan Wen-Zhi Zhen Wei-Xin Xie Suppressed fuzzy c-means clustering algorithm Pattern Recognition Letters 24(2003)1607-1612.
    [15]张逸清刘文才基于遗传算法的K-Means聚类改进研究
    [16] Chaitanya Swamy Approximation Algorithms for Clustering Problems, May 2004
    [17]边肇棋,张学工等模式识别北京:清华大学出版社2000年1月
    [18]刘泉凤,陆蓓数据挖掘中聚类算法的比较研究浙江水利水电专科学校学报第17卷第2期
    [19]行小帅,焦李成数据挖掘的聚类方法电路与系统学报第8卷第1期
    [20]伍忠东,高新波,谢维信基于核方法的模糊聚类算法西安电子科技大学学报第31卷第4期
    [21] Yuan Gan Clustering Algorithms for Data and Knowledge Exploration, August 2003
    [22] Moses Samson Charikar Algorithms for Clustering, August 2000
    [23]李丽珊,朱文兴基于簇中心动态迁移的一个聚类算法福建农林大学学报,第33卷第4期
    [24] Scott D.Epter Data Clustering with Distance Thresholds, December 1999
    [25]韩逢庆,李红梅,黄席樾基于遗传算法的一种C均值聚类算法研究计算机工程与应用2002.24
    [26] M.C.Cowgill R.J.Harvey A Genetic Algorithm Approach to Cluster Analysis Computers and Mathematics with Applications 37(1999)
    [27]叶吉祥,谭冠政,路秋静基于核的非凸数据模糊K-均值聚类研究计算机工程与设计第26卷第7期
    [28] SUDIPTO GUHA,RAJEEV RASTOGI,KYUSEOK SHIMS CURE: AN EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATABASES Information Systems Vol. 26, No. 1, pp. 35-58,2OOl
    [29] Witold Pedrycz Conditional Fuzzy C-Means Pattern Recognition Letters 17(1996)625~631
    [30] M. Ramze Rezaee, B.P.F. Lelieveldt, J.H.C. Reiber A new cluster validity index for the fuzzy c-mean Pattern Recognition Letters 19(1998)237–246
    [31]袁方,孟增辉,于戈对K-Means聚类算法的改进计算机工程与应用2004.36
    [32]张雷,李人厚人工免疫C-均值聚类算法西安交通大学学报第39卷第8期
    [33]欧阳,成卫,韩逢庆基于遗传算法的模糊C-均值聚类算法重庆大学学报第27卷第6期
    [34]于剑论模糊C均值算法的模糊指标计算机学报第26卷第8期
    [35]王燕一种改进的K-means聚类算法计算机应用与软件第21卷第10期
    [36] Kuo-Lung Wu, Miin-Shen Yang Alternative c-means clustering algorithms Pattern Recognition 35(2002)2267–2278
    [37] S.Lozano, D.Dobado, J.Larraneta, L.Onieva Modified fuzzy C-means algorithm for cellular manufacturing Fuzzy Sets and Systems 126 (2002) 23–32
    [38] T.W.Liao, Aivars K.Celmins, Robert J.Hammell II A fuzzy c-means variant for the generation of fuzzy term sets Fuzzy Sets and Systems 135(2003)241-257
    [39]刘立平,孟志青一种选取初始聚类中心的方法计算机工程与应用2004.8
    [40]陈金山,韦岗遗传+模糊C均值混合聚类算法电子与信息学报第24卷第2期
    [41]魏立梅,谢维信对手抑制式模糊C-均值算法电子学报第28卷第7期
    [42] George E. Tsekourasa, Haralambos Sarimveis A new approach for measuring the validity of the fuzzy c-means algorithm Advances in Engineering Software 35(2004)567–575
    [43]高坚基于C-均值和免疫遗传算法的聚类分析计算机工程第29卷第12期
    [44] Elena D.Cristofor Information-Theoretical Methods in Clustering
    [45]刘小芳,曾黄麟,吕炳朝部分监督加权模糊C-均值算法的聚类分析计算机仿真第22卷第3期
    [46]于剑,程乾生关于FCM算法中的权重指数m的一点注记电子学报第31卷第3期

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700