自然最近邻优化的密度峰值聚类算法

英文篇名：Optimized Density Peak Clustering Algorithm by Natural Nearest Neighbor
作者：金辉 ; 钱雪忠
英文作者：JIN Hui;QIAN Xuezhong;Engineering Research Center of Internet of Things Technology Applications Ministry of Education, School of Internet of Things Engineering, Jiangnan University;
关键词：密度峰 ; 自然最近邻居 ; 局部密度 ; 稀疏区域 ; 类簇间相似度
英文关键词：density peak;;natural nearest neighbor;;local density;;sparse regions;;similarity between clusters
中文刊名：KXTS
英文刊名：Journal of Frontiers of Computer Science and Technology
机构：江南大学物联网工程学院物联网技术应用教育部工程研究中心;
出版日期：2018-05-19 15:46
出版单位：计算机科学与探索
年：2019
期：v.13;No.127
基金：国家自然科学基金(No.61673193);; 中央高校基本科研业务费专项资金(Nos.JUSRP51635B,JUSRP51510)~~
语种：中文;
页：KXTS201904019
页数：10
CN：04
ISSN：11-5602/TP
分类号：175-184

摘要

针对现有的基于密度的聚类算法存在参数敏感,处理非球面数据和复杂流形数据聚类效果差的问题,提出一种新的基于密度峰值的聚类算法。该算法首先根据自然最近邻居的概念确定数据点的局部密度,然后根据密度峰局部密度最高并且被稀疏区域分割来确定聚类中心,最后提出一种新的类簇间相似度概念来解决复杂流形问题。在实验中,该算法在合成和实际数据集中的表现比DPC(clustering by fast search and find of density peaks)、DBSCAN(density-based spatial clustering of applications with noise)和K-means算法要好,并且在非球面数据和复杂流形数据上的优越性特别大。
Aiming at the problem that the existing density-based clustering algorithm is sensitive to parameters and the clustering result of aspheric data and complex manifold data is bad, a new clustering algorithm based on density peak is proposed. The algorithm first determines the local density of data based on the natural nearest neighbor, and then determines the clustering center based on which density peaks have the highest local density and are divided by sparse regions. Finally, a new concept of similarity between clusters is proposed to solve complex manifold problems.In the experiment, the performance of this algorithm is better than that of DPC(clustering by fast search and find of density peaks), DBSCAN(density-based spatial clustering of applications with noise) and K-means in synthetic and actual data sets, and the advantages of aspheric data and complex manifold data are particularly superior.

引文

[1] Tran T N, Drab K, Daszykowski M. Revised DBSCAN algorithm to cluster data with dense adjacent clusters[J]. Chemometrics&Intelligent Laboratory Systems, 2013, 120(2):92-96.
    [2] Rodriguez A, Laio A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492.
    [3] Zhang W, Li J. Extended fast search clustering algorithm:widely density clusters, no density peaks[J]. ar Xiv:1505.05610,2015.
    [4] Liu Y H, Ma Z M, Yu F. Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J].Knowledge-Based Systems, 2017, 133:208-220.
    [5] Bie R, Mehmood R, Ruan S, et al. Adaptive fuzzy clustering by fast search and find of density peaks[J]. Personal&Ubiquitous Computing, 2016, 20(5):785-793.
    [6] Du M J, Ding S F, Xue Y. A novel density peaks clustering algorithm for mixed data[J]. Pattern Recognition Letters,2017, 97:46-53.
    [7] Wang G T, Song Q B. Automatic clustering via outward statistical testing on density metrics[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8):1971-1985.
    [8] Ding S F, Du M J, Sun T F, et al. An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood[J]. Knowledge-Based Systems, 2017,133:294-313.
    [9] Lv Y H, Ma T H, Tang M L, et al. An efficient and scalable density-based clustering algorithm for datasets with complex structures[J]. Neurocomputing, 2015, 171:9-22.
    [10] Cheng D D, Zhu Q S, Huang J L, et al. Natural neighborbased clustering algorithm with local representatives[J].Knowledge-Based Systems, 2017, 123:238-253.
    [11] Yang L J, Zhu Q S, Huang J L, et al. Adaptive edited natural neighbor algorithm[J]. Neurocomputing, 2017, 230:427-433.
    [12] Huang J L, Zhu Q S, Yang L J, et al. QCC:a novel clustering algorithm based on quasi-cluster centers[J]. Machine Learning, 2017, 106(3):337-357.
    [13] Xie J Y, Gao H C, Xie W X, et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016,354:19-40.
    [14] Du M J, Ding S F, Jia H J. Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J]. Knowledge-Based Systems, 2016, 99:135-145.
    [15] Geng Y A, Li Q Y, Zheng R, et al. RECOME:a new densitybased clustering algorithm using relative KNN kernel density[J]. Information Sciences, 2018, 436/437:13-30.
    [16] Zhu Q S, Feng J, Huang J L. Natural neighbor:a selfadaptive neighborhood method without parameter K[J].Pattern Recognition Letters, 2016, 80:30-36.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700