摘要
针对单一聚类方法远不能满足实际数据分析需求,且K-Means聚类中维数高,非度量型数据分析亟待解决的问题,提出一种基于非度量多维缩放的聚类组合算法(NMDSCCA)。该算法通过非度量多维缩放方法对非度量型的高维数据进行降维,利用降维后得到的主成分变量作为输入变量,以K-Means算法作为基聚类器进行聚类,解决了K-Means算法无法处理分类数据以及维数高的变量局限性,使其具有普适性。仿真实验表明,新算法不仅聚类效果上均优于传统K-Means算法及基于主成分分析(PCA)的聚类组合算法,而且算法应用于大数据时具有更高的收敛速度。
Concerning the problem about real and complex data analysis not being met by single clustering method and non-metric and high-dimensional variables exited in K-Means algorithm, a Clustering Combination Algorithm based on Nonmetric MultiDimensional Scaling( NMDSCCA) was proposed. Firstly, the non-metric multi-dimensional scaling method was used to reduce the dimension. Then, using the principal component variables obtained after dimensionality reduction as input variables, and the K-Means algorithm as a base classifier for clustering, The limitations existed in K-Means algorithm about the classification of data and high-dimensional variable were solved and the algorithm was made universal. The simulation results show that the algorithm not only has advantages over both traditional K-Means algorithm and clustering algorithm based on Principal Component Analysis( PCA) in cluster performance experiments, but also has high convergence speed when dealing with big data.
引文
[1]STREHL A, GHOSH J. Cluster ensembles:a knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3(3):583-617.
[2]ZHANG X L, BRODLEY C E. Solving cluster ensemble problems by bipartite graph partitioning[C]//Proceedings of the 21st International Conference on Machine Learning. New York:ACM, 2004:9-15.
[3]王敏峰,朱敏琛.一种新的聚类组合算法[J].福州大学学报(自然科学版),2010,38(6):819-823.
[4]孟子健,马江洪.一种可选初始聚类中心的改进K均值算法[J].统计决策,2014(12):12-14.
[5]韩凌波. K-均值算法个数优化问题研究[J].四川理工学院学报(自然科学版),2012,25(2):77-84.
[6]徐勇,陈亮.一种基于降维思想的K均值聚类算法[J].湖南城市学院学报(自然科学版),2017,26(1):54-61.
[7]余世孝.非度量多维测度及其在群落分类中的应用[J].植物生态学报,1995,19(2):128-136.
[8]RIVAS M N, BURTON O T, WISE P, et al. A microbiota signature associated with experimental food allergy promotes allergic sensitization and anaphylaxis[J]. Journal of Allergy and Clinical Immunologym, 2013, 131(1):201-212.
[9]王斌会.多元统计分析及R语言建模[M].广州:暨南大学出版社,2011:267-279.
[10]张学工.模式识别[M].北京:清华大学出版社,2015:173-177.
[11]周爱武,陈宝楼,王琰. K-Means算法的研究与改进[J].计算机技术与发展,2012,22(10):101-104.
[12]宋媛.聚类分析中确定最佳聚类数的若干问题研究[D].延边:延边大学,2013:11-27.