基于非度量多维缩放的聚类组合算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Clustering combination algorithm based on non-metric multidimensional scaling
  • 作者:周文娟 ; 赵礼峰
  • 英文作者:ZHOU Wenjuan;ZHAO Lifeng;College of Science, Nanjing University of Posts and Telecommunications;
  • 关键词:非度量多维缩放 ; K-Means算法 ; 聚类分析 ; 聚类组合 ; 高维数据 ; 主成分分析
  • 英文关键词:non-metric multidimensional scaling;;K-Means algorithm;;clustering analysis;;clustering combination;;high-dimensional data;;Principal Component Analysis(PCA)
  • 中文刊名:JSJY
  • 英文刊名:Journal of Computer Applications
  • 机构:南京邮电大学理学院;
  • 出版日期:2018-06-30
  • 出版单位:计算机应用
  • 年:2018
  • 期:v.38
  • 基金:国家自然科学基金青年基金资助项目(61304169)
  • 语种:中文;
  • 页:JSJY2018S1017
  • 页数:6
  • CN:S1
  • ISSN:51-1307/TP
  • 分类号:72-77
摘要
针对单一聚类方法远不能满足实际数据分析需求,且K-Means聚类中维数高,非度量型数据分析亟待解决的问题,提出一种基于非度量多维缩放的聚类组合算法(NMDSCCA)。该算法通过非度量多维缩放方法对非度量型的高维数据进行降维,利用降维后得到的主成分变量作为输入变量,以K-Means算法作为基聚类器进行聚类,解决了K-Means算法无法处理分类数据以及维数高的变量局限性,使其具有普适性。仿真实验表明,新算法不仅聚类效果上均优于传统K-Means算法及基于主成分分析(PCA)的聚类组合算法,而且算法应用于大数据时具有更高的收敛速度。
        Concerning the problem about real and complex data analysis not being met by single clustering method and non-metric and high-dimensional variables exited in K-Means algorithm, a Clustering Combination Algorithm based on Nonmetric MultiDimensional Scaling( NMDSCCA) was proposed. Firstly, the non-metric multi-dimensional scaling method was used to reduce the dimension. Then, using the principal component variables obtained after dimensionality reduction as input variables, and the K-Means algorithm as a base classifier for clustering, The limitations existed in K-Means algorithm about the classification of data and high-dimensional variable were solved and the algorithm was made universal. The simulation results show that the algorithm not only has advantages over both traditional K-Means algorithm and clustering algorithm based on Principal Component Analysis( PCA) in cluster performance experiments, but also has high convergence speed when dealing with big data.
引文
[1]STREHL A, GHOSH J. Cluster ensembles:a knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3(3):583-617.
    [2]ZHANG X L, BRODLEY C E. Solving cluster ensemble problems by bipartite graph partitioning[C]//Proceedings of the 21st International Conference on Machine Learning. New York:ACM, 2004:9-15.
    [3]王敏峰,朱敏琛.一种新的聚类组合算法[J].福州大学学报(自然科学版),2010,38(6):819-823.
    [4]孟子健,马江洪.一种可选初始聚类中心的改进K均值算法[J].统计决策,2014(12):12-14.
    [5]韩凌波. K-均值算法个数优化问题研究[J].四川理工学院学报(自然科学版),2012,25(2):77-84.
    [6]徐勇,陈亮.一种基于降维思想的K均值聚类算法[J].湖南城市学院学报(自然科学版),2017,26(1):54-61.
    [7]余世孝.非度量多维测度及其在群落分类中的应用[J].植物生态学报,1995,19(2):128-136.
    [8]RIVAS M N, BURTON O T, WISE P, et al. A microbiota signature associated with experimental food allergy promotes allergic sensitization and anaphylaxis[J]. Journal of Allergy and Clinical Immunologym, 2013, 131(1):201-212.
    [9]王斌会.多元统计分析及R语言建模[M].广州:暨南大学出版社,2011:267-279.
    [10]张学工.模式识别[M].北京:清华大学出版社,2015:173-177.
    [11]周爱武,陈宝楼,王琰. K-Means算法的研究与改进[J].计算机技术与发展,2012,22(10):101-104.
    [12]宋媛.聚类分析中确定最佳聚类数的若干问题研究[D].延边:延边大学,2013:11-27.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700