摘要
针对传统的聚类算法在样本数据量不足或样本受到污染情况下的聚类性能下降问题,在经典的极大熵聚类算法(MEKTFCA)的基础上,提出了一种新的融合历史聚类中心点和历史隶属度这两种知识的基于极大熵的知识迁移模糊聚类算法。该算法通过学习由源域总结出来的有益历史聚类中心和历史隶属度知识来指导数据量不足或受污染的目标域数据的聚类任务,从而提高了聚类性能。通过一组模拟数据集和两组真实数据集构造的迁移场景上的实验,证明了该算法的有效性。
To address the issue of clustering performance degradation when traditional clustering algorithms are applied to insufficient and/or noisy data,a maximum entropy-based knowledge transfer fuzzy clustering algorithm is proposed. This improves the classical maximum entropy clustering algorithm for target domains by leveraging two kinds of knowledge from the source domain,i.e.,historical clustering centers and historical degree of membership,into the objective function proposed for clustering insufficient and/or noisy target data. The effectiveness of the proposed algorithm is demonstrated by experiments on several synthetic and two real datasets.
引文
[1]CARIOU C,CHEHDI K.Unsupervised nearest neighbors clustering with application to hyperspectral images[J].IEEE journal of selected topics in signal processing,2015,9(6):1105-1116.
[2]ALI A,BOYACI A,BAYNAL K.Data mining application in banking sector with clustering and classification methods[C]//Proceedings of 2015 International Conference on Industrial Engineering and Operations Management.Dubai,UAE,2015:1-8.
[3]LI Shuai,ZHOU Xiaofeng,SHI Haibo,et al.An efficient clustering method for medical data applications[C]//Proceedings of 2015 IEEE International Conference on Cyber Technology in Automation,Control,and Intelligent System.Shenyang,China,2015:133-138.
[4]LIKAS A,VLASSIS N,VERBEEK J J.The global k-means clustering algorithm[J].Pattern recognition,2003,36(2):451-461.
[5]BEZDEK J C.Pattern recognition with fuzzy objective function algorithms[M].New York:Springer,1981:43-93.
[6]KARAYIANNIS N B.MECA:maximum entropy clustering algorithm[C]//Proceedings of the 3rd IEEE International Conference on Fuzzy Systems.Orlando,USA,1994,1:630-635.
[7]LI Ruiping,MUKAIDONO M.A maximum-entropy approach to fuzzy clustering[C]//Proceedings of 1995 the4th IEEE International Conference on Fuzzy System.Yokohama,Japan,1995,4:2227-2232.
[8]ZHANG Tian,RAMAKRISHNAN R,LIVNY M.BIRCH:an efficient data clustering method for very large databases[C]//Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data.New York,NY,USA,1996:103-114.
[9]GUHA S,RASTOGI R,SHIM K.CURE:an efficient clustering algorithm for large databases[C]//Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data.New York,NY,USA,1998:73-84.
[10]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceeding of the Second International Conference on Knowledge Discovery and Data Mining.Portland,Oregon,USA,1996:226
[11]ANKERST M,BREUNIG M M,KRIEGEL H P,et al.OPTICS:ordering Points to Identify the Clustering Structure[C]//Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data.Philadelphia,Pennsylvania,USA,1999:49-60.
[12]ARIAS-CASTRO E,CHEN Guangliang,LERMAN G.Spectral Clustering based on local linear approximations[J].Electronic journal of statistics,2011,5:1537-1587.
[13]PAN S J,YANG Qiang.A survey on transfer learning[J].IEEE transactions on knowledge and data engineering,2010,22(10):1345-1359.
[14]GU Quanquan,ZHOU Jie.Learning the shared subspace for multi-task clustering and transductive transfer classification[C]//Proceedings of Ninth IEEE International Conference on Data Mining.Miami,FL,USA,2009:159-168.
[15]DAI Wenyuan,YANG Qiang,XUE Guirong,et al.Self-taught clustering[C]//Proceedings of the 25th International Conference on Machine Learning.New York,NY,USA,2008:200-207.
[16]GU Quanquan,ZHOU Jie.Co-clustering on manifolds[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA,2009:359-368.
[17]JIANG Wenhao,CHUNG F L.Transfer spectral clustering[M]//FLACH P A,BIE T D,CRISTIANINI N.Machine Learning and Knowledge Discovery in Databases.Berlin Heidelberg:Springer,2012:789-803.
[18]JING Liping,NG K M,HUANG J Z.An entropy weighting k-means algorithm for subspace clustering of highdimensional sparse data[J].IEEE transactions on knowledge and data engineering,2007,19(8):1026-1041.
[19]LIU Jun,MOHAMMED J,CARTER J,et al.Distancebased clustering of CGH data[J].Bioinformatics,2006,22(16):1971-1978.
[20]DAI Wenyuan,XUE Guirong,YANG Qiang,et al.Coclustering based classification for out-of-domain documents[C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,NY,USA,2007:210-219.
[21]MCCALLUM A K.Bow:a toolkit for statistical language modeling,text retrieval,classification and clustering[EB/OL].1996.http://www.cs.cmu.edu/mccallum/bow.
[22]BAY S D,KIBLER D,PAZZANI M J,et al.The UCI KDD archive of large data sets for data mining research and experimentation[J].ACM SIGKDD explorations newsletter,2000,2(2):81-85.