K近邻相似度优化的密度峰聚类

英文篇名：Density Peaks Clustering Optimized by K Nearest Neighbor's Similarity
作者：朱庆峰 ; 葛洪伟
英文作者：ZHU Qingfeng;GE Hongwei;Ministry of Education Key Laboratory of Advanced Process Control for Light Industry(Jiangnan University);School of Internet of Things Engineering, Jiangnan University;
关键词：聚类 ; 密度峰 ; 相似度 ; K近邻
英文关键词：clustering;;density peaks;;similarity;;K nearest neighbor
中文刊名：JSGG
英文刊名：Computer Engineering and Applications
机构：轻工过程先进控制教育部重点实验室(江南大学);江南大学物联网工程学院;
出版日期：2018-04-09 17:36
出版单位：计算机工程与应用
年：2019
期：v.55;No.921
语种：中文;
页：JSGG201902024
页数：7
CN：02
分类号：154-159+258

摘要

针对密度峰聚类分配时,仅考虑样本点与指向点(密度比它大的最近点)之间的距离,不适用于流形聚类(如Circleblock数据集、Lineblobs数据集等)的问题,提出了K近邻相似度优化的密度峰聚类算法。在计算每个点的密度与指向点后,通过相似度函数,找出每个点的K近邻,然后根据K近邻信息判断样本点的指向点是否正确,对于指向错误的点重新寻找正确的指向点,可以有效减少错误分配。在人工数据集和UCI数据集上的实验表明,新算法具有更高的准确率。
For the clustering of density peaks, only the distance between the sample point and the point of pointing(the nearest point of density is bigger than it)is considered, and it is not applicable to the problem of manifold clustering(such as Circleblock data set, Lineblobs data set, etc.). A density peak clustering algorithm with K similarity optimization is proposed. After calculating the density and point of each point, find the K neighborhood of each point by the similarity function, and then judge whether the point of the sample point is correct according to the K proximity information.For the point pointing to the wrong point, it can effectively reduce the error distribution. Experiments on artificial datasets and UCI datasets show that the new algorithm has a higher accuracy rate.

引文

[1]Jain A K.Data clustering:50 years beyond K-means[J].Pattern Recognition Letters,2010,31(8):651-666.
    [2]Park H S,Jun C H.A simple and fast algorithm for K-medoids clustering[J].Expert Systems with Applications,2009,36(2):3336-3341.
    [3]Zhou Y J,Xu C,Li J G.Unsupervised anomaly detection method based on improved CURE clustering algorithm[J].J Commu,2010,31:18-23.
    [4]Ansari S,Chetlur S,Prabhu S,et al.An overview of clustering analysis techniques used in data mining[J].International Journal of Emerging Technology and Advanced Engineering,2013,3(12):284-286.
    [5]Amini A,Wah T Y,Saybani M R,et al.A study of densitygrid based clustering algorithms on data streams[C]//2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery(FSKD),2011,3:1652-1656.
    [6]Yang P,Zhu Q,Huang B.Spectral clustering with density sensitive similarity function[J].Knowledge-Based Systems,2011,24(5):621-628.
    [7]Yang J,Gao J,Liang J,et al.An improved DBSCANclustering algorithm based on data field[J].Journal of Frontiers of Computer Science and Technology,2012,6(10):903-911.
    [8]Kalita H K,Bhattacharya D K,Kar A.A new algorithm for ordering of points to identify clustering structure based on perimeter of triangle:optics(bopt)[C]//International Conference on Advanced Computing and Communications,2007:523-528.
    [9]Rodriguez A,Laio A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
    [10]马春来,单洪,马涛.一种基于簇中心点自动选择策略的密度峰值聚类算法[J].计算机科学,2016,43(7):255-258.
    [11]Bie R,Mehmood R,Ruan S,et al.Adaptive fuzzy clustering by fast search and find of density peaks[J].Personal and Ubiquitous Computing,2016,20(5):785-793.
    [12]谢娟英,高红超,谢维信.K近邻优化的密度峰值快速搜索聚类算法[J].中国科学:信息科学,2016,46(2):258-280.
    [13]杨燕,靳蕃,Mohamed K.聚类有效性评价综述[J].计算机应用研究,2008,25(6):1630-1632.
    [14]Vinh N X,Epps J,Bailey J.Information theoretic measures for clusterings comparison:is a correction for chance necessary?[C]//Proceedings of the 26th Annual International Conference on Machine Learning,2009:1073-1080.
    [15]欧慧,夏卓群,武志伟.基于改进流形距离的粗糙集k-means聚类算法[J].计算机工程与应用,2016,52(14):84-89.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700