摘要
针对现有的基于密度的聚类算法存在参数敏感,处理非球面数据和复杂流形数据聚类效果差的问题,提出一种新的基于密度峰值的聚类算法。该算法首先根据自然最近邻居的概念确定数据点的局部密度,然后根据密度峰局部密度最高并且被稀疏区域分割来确定聚类中心,最后提出一种新的类簇间相似度概念来解决复杂流形问题。在实验中,该算法在合成和实际数据集中的表现比DPC(clustering by fast search and find of density peaks)、DBSCAN(density-based spatial clustering of applications with noise)和K-means算法要好,并且在非球面数据和复杂流形数据上的优越性特别大。
Aiming at the problem that the existing density-based clustering algorithm is sensitive to parameters and the clustering result of aspheric data and complex manifold data is bad, a new clustering algorithm based on density peak is proposed. The algorithm first determines the local density of data based on the natural nearest neighbor, and then determines the clustering center based on which density peaks have the highest local density and are divided by sparse regions. Finally, a new concept of similarity between clusters is proposed to solve complex manifold problems.In the experiment, the performance of this algorithm is better than that of DPC(clustering by fast search and find of density peaks), DBSCAN(density-based spatial clustering of applications with noise) and K-means in synthetic and actual data sets, and the advantages of aspheric data and complex manifold data are particularly superior.
引文
[1] Tran T N, Drab K, Daszykowski M. Revised DBSCAN algorithm to cluster data with dense adjacent clusters[J]. Chemometrics&Intelligent Laboratory Systems, 2013, 120(2):92-96.
[2] Rodriguez A, Laio A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492.
[3] Zhang W, Li J. Extended fast search clustering algorithm:widely density clusters, no density peaks[J]. ar Xiv:1505.05610,2015.
[4] Liu Y H, Ma Z M, Yu F. Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J].Knowledge-Based Systems, 2017, 133:208-220.
[5] Bie R, Mehmood R, Ruan S, et al. Adaptive fuzzy clustering by fast search and find of density peaks[J]. Personal&Ubiquitous Computing, 2016, 20(5):785-793.
[6] Du M J, Ding S F, Xue Y. A novel density peaks clustering algorithm for mixed data[J]. Pattern Recognition Letters,2017, 97:46-53.
[7] Wang G T, Song Q B. Automatic clustering via outward statistical testing on density metrics[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8):1971-1985.
[8] Ding S F, Du M J, Sun T F, et al. An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood[J]. Knowledge-Based Systems, 2017,133:294-313.
[9] Lv Y H, Ma T H, Tang M L, et al. An efficient and scalable density-based clustering algorithm for datasets with complex structures[J]. Neurocomputing, 2015, 171:9-22.
[10] Cheng D D, Zhu Q S, Huang J L, et al. Natural neighborbased clustering algorithm with local representatives[J].Knowledge-Based Systems, 2017, 123:238-253.
[11] Yang L J, Zhu Q S, Huang J L, et al. Adaptive edited natural neighbor algorithm[J]. Neurocomputing, 2017, 230:427-433.
[12] Huang J L, Zhu Q S, Yang L J, et al. QCC:a novel clustering algorithm based on quasi-cluster centers[J]. Machine Learning, 2017, 106(3):337-357.
[13] Xie J Y, Gao H C, Xie W X, et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016,354:19-40.
[14] Du M J, Ding S F, Jia H J. Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J]. Knowledge-Based Systems, 2016, 99:135-145.
[15] Geng Y A, Li Q Y, Zheng R, et al. RECOME:a new densitybased clustering algorithm using relative KNN kernel density[J]. Information Sciences, 2018, 436/437:13-30.
[16] Zhu Q S, Feng J, Huang J L. Natural neighbor:a selfadaptive neighborhood method without parameter K[J].Pattern Recognition Letters, 2016, 80:30-36.