基于非度量多维缩放的聚类组合算法

英文篇名：Clustering combination algorithm based on non-metric multidimensional scaling
作者：周文娟 ; 赵礼峰
英文作者：ZHOU Wenjuan;ZHAO Lifeng;College of Science, Nanjing University of Posts and Telecommunications;
关键词：非度量多维缩放 ; K-Means算法 ; 聚类分析 ; 聚类组合 ; 高维数据 ; 主成分分析
英文关键词：non-metric multidimensional scaling;;K-Means algorithm;;clustering analysis;;clustering combination;;high-dimensional data;;Principal Component Analysis(PCA)
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：南京邮电大学理学院;
出版日期：2018-06-30
出版单位：计算机应用
年：2018
期：v.38
基金：国家自然科学基金青年基金资助项目(61304169)
语种：中文;
页：JSJY2018S1017
页数：6
CN：S1
ISSN：51-1307/TP
分类号：72-77

摘要

针对单一聚类方法远不能满足实际数据分析需求,且K-Means聚类中维数高,非度量型数据分析亟待解决的问题,提出一种基于非度量多维缩放的聚类组合算法(NMDSCCA)。该算法通过非度量多维缩放方法对非度量型的高维数据进行降维,利用降维后得到的主成分变量作为输入变量,以K-Means算法作为基聚类器进行聚类,解决了K-Means算法无法处理分类数据以及维数高的变量局限性,使其具有普适性。仿真实验表明,新算法不仅聚类效果上均优于传统K-Means算法及基于主成分分析(PCA)的聚类组合算法,而且算法应用于大数据时具有更高的收敛速度。
Concerning the problem about real and complex data analysis not being met by single clustering method and non-metric and high-dimensional variables exited in K-Means algorithm, a Clustering Combination Algorithm based on Nonmetric MultiDimensional Scaling( NMDSCCA) was proposed. Firstly, the non-metric multi-dimensional scaling method was used to reduce the dimension. Then, using the principal component variables obtained after dimensionality reduction as input variables, and the K-Means algorithm as a base classifier for clustering, The limitations existed in K-Means algorithm about the classification of data and high-dimensional variable were solved and the algorithm was made universal. The simulation results show that the algorithm not only has advantages over both traditional K-Means algorithm and clustering algorithm based on Principal Component Analysis( PCA) in cluster performance experiments, but also has high convergence speed when dealing with big data.

引文

[1]STREHL A, GHOSH J. Cluster ensembles:a knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3(3):583-617.
    [2]ZHANG X L, BRODLEY C E. Solving cluster ensemble problems by bipartite graph partitioning[C]//Proceedings of the 21st International Conference on Machine Learning. New York:ACM, 2004:9-15.
    [3]王敏峰,朱敏琛.一种新的聚类组合算法[J].福州大学学报(自然科学版),2010,38(6):819-823.
    [4]孟子健,马江洪.一种可选初始聚类中心的改进K均值算法[J].统计决策,2014(12):12-14.
    [5]韩凌波. K-均值算法个数优化问题研究[J].四川理工学院学报(自然科学版),2012,25(2):77-84.
    [6]徐勇,陈亮.一种基于降维思想的K均值聚类算法[J].湖南城市学院学报(自然科学版),2017,26(1):54-61.
    [7]余世孝.非度量多维测度及其在群落分类中的应用[J].植物生态学报,1995,19(2):128-136.
    [8]RIVAS M N, BURTON O T, WISE P, et al. A microbiota signature associated with experimental food allergy promotes allergic sensitization and anaphylaxis[J]. Journal of Allergy and Clinical Immunologym, 2013, 131(1):201-212.
    [9]王斌会.多元统计分析及R语言建模[M].广州:暨南大学出版社,2011:267-279.
    [10]张学工.模式识别[M].北京:清华大学出版社,2015:173-177.
    [11]周爱武,陈宝楼,王琰. K-Means算法的研究与改进[J].计算机技术与发展,2012,22(10):101-104.
    [12]宋媛.聚类分析中确定最佳聚类数的若干问题研究[D].延边:延边大学,2013:11-27.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700