摘要
介绍了半监督模糊聚类(SFCM)算法的原理和基础,针对当先验信息量稀少时算法无法真正有效地利用labeled数据的监督信息的缺点,提出了一种改进的半监督模糊聚类算法,即SSFCM算法。该方法把表示labeled数据点权重的参数放在聚类中心的迭代表达式里,从而可以调节监督信息的影响力。最后,在标准Iris数据集下,通过matlab编程实现算法。实验结果表明:无论从聚类结果的准确率还是算法运行迭代次数来看,SSFCM算法均优于FCM算法和SFCM算法。
The principle and foundation of semi-supervised fuzzy C-means clustering(SFCM) algorithm are introduced. When the amount of prior information is scare, the algorithm cannot really use the supervision information of the labeled data effectively. In consideration of this shortcoming, an improved semi-supervised fuzzy clustering algorithm, namely SSFCM algorithm, is proposed. This method places the parameters that represent the weights of the labeled data points in the iterative expressions at the clustering center so that the influence of the supervisory information can be adjusted. Finally, the algorithm is implemented through matlab programming under the standard Iris data set. Experimental results indicate that the SSFCM algorithm is superior to the FCM algorithm and SFCM algorithm no matter from the accuracy of clustering results or the number of iterations of algorithm operation.
引文
[1]Wagstaff K,Cardie C,Rogers S,et al.Constrained K-means Clustering with Background Knowledge[C].Eighteenth International Conference on Machine Learning,2001:577-584.
[2]Huang H,Cheng Y,Zhao R.A Semi-supervised Clustering Algorithm Based on Must-Link Set[M].Advanced Data Mining and Applications.Springer Berlin Heidelberg,2008:492-499.
[3]Pedrycz W.Algorithms of Fuzzy Clustering with Partial Supervision[J].Pattern Recognition Letters,1985,3(01):13-20.
[4]Li K,Cao Z,Cao L,et al.A Novel Semi-supervised Fuzzy C-means Clustering Method[C].Control and Decision Conference,2009:3761-3765.
[5]Blum A,Mitchell T.Combining Labeled and Unlabeled Data with Co-training[C].Eleventh Conference on Computational Learning Theory,2000:92-100.
[6]Wu L,Hoi S C H,Jin R,et al.Learning Bregman Distance Functions for Semi-Supervised Clustering[J].IEEE Transactions on Knowledge&Data Engineeri ng,2010,24(03):478-491.
[7]Roy M,Ghosh S,Ghosh A.Change Detection in Remotely Sensed Images Using Semi-supervised Clustering Algorithms[J].International Journal of Knowledge Engineering&Soft Data Paradigms,2013,4(02):118-137.
[8]何振峰,熊范纶.结合限制的分隔模型及K-Means算法[J].软件学报,2005,16(05):799-809.HE Zhen-feng,XIONG Fan-lun.Limited Separation Model and K-Means Algorithm[J].Journal of Software,2005,16(05):799-809.
[9]肖宇,于剑.基于近邻传播算法的半监督聚类[J].软件学报,2008,19(11):2803-2813.XIAO Yu,YU Jian.Semi-supervised Clustering Based on Nearest Neighbor Propagation Algorithm[J].Journal of Software,2008,19(11):2803-2813.
[10]Chen M S,Wang S W.Fuzzy Clustering Analysis for Optimizing Fuzzy Membership Functions[J].Fuzzy Sets&Systems,1999,103(02):239-254.
[11]唐亮,黄培之,谢维信.顾及数据空间分布特性的模糊C均值聚类算法研究[J].武汉大学学报(信息科学版),2003,28(04):476-479.TANG Liang,HUANG Pei-zhi,XIE Wei-xin.Study on Fuzzy C-means Clustering Algorithm Considering Spatial Distribution of Data[J].Engineering Journal of Wuhan University(Information Science Edition),2003,28(04):476-479.
[12]Bezdek J,Hathaway R,Sobin M,et al.Convergence Theory for Fuzzy C-means:Counterexamples and Repairs[J].Systems Man&Cybernetics IEEE Transactions on,1987,7(05):873-877.
[13]Bensaid A M,Hall L O,Bezdek J C,et al.Partially Supervised Clustering for Image Segmentation[J].Pattern Recognition,1996,29(05):859-871.