数据点的密度引力聚类新算法

英文篇名：Density Attraction Clustering Algorithm Between Data Points
作者：温晓芳 ; 杨志翀 ; 陈梅
英文作者：WEN Xiaofang;YANG Zhichong;CHEN Mei;School of Electronic and Information Engineering,Lanzhou Jiaotong University;
关键词：聚类分析 ; 任意簇 ; 密度引力 ; 局部密度
英文关键词：clustering;;various clusters;;density attraction;;local density
中文刊名：KXTS
英文刊名：Journal of Frontiers of Computer Science and Technology
机构：兰州交通大学电子与信息工程学院;
出版日期：2018-01-19 09:05
出版单位：计算机科学与探索
年：2018
期：v.12;No.123
基金：国家自然科学基金No.61762057;; 甘肃省高等学校科研项目基金No.2017A-027~~
语种：中文;
页：KXTS201812013
页数：11
CN：12
ISSN：11-5602/TP
分类号：130-140

摘要

现有的很多聚类算法在各种数据集中检测任意簇时通常不能获得好的性能。通过把每个数据点看作自然界中的质点,定义了数据点间密度引力的概念,在此基础上提出了一种新的具有鲁棒性的密度引力聚类算法。首先根据每个数据点的周围邻居分布稀疏程度获得其局部密度,然后迭代地将每个数据点分配给密度比它大且距其最近的互近邻点形成初始簇,最后将具有共同数据点的初始簇进行合并得到最终簇。实验将提出的新算法在六个不同维度、不同类型的数据集上分别与三种经典算法、三种新算法进行了测试,结果表明该算法的聚类性能优于对比算法,且可以在不同维度的数据集中发现任意簇。
Existing clustering algorithms often fail in dealing with datasets with various clusters.By regarding each data point as a particle,a new robust density attracted clustering algorithm is proposed to better detect the cluster.First,the local density of a point is obtained according to the dense degree of its neighbors.Then,each point is assigned to its mutual nearest neighbor with greater density to form initial clusters.Next,the initial clusters including the same points are merged together to form the final clusters.By comparing the new algorithm with the three classical and the three state-of-the-art methods on six various datasets respectively,the results show that the new proposed algorithm shows the best performance,and can be used to find various clusters in datasets with different dimensions.

引文

[1]Roiger R J.Data mining:a tutorial-based primer[M].Boca Raton:CRC Press,Inc,2017.
    [2]Guang Junye,Liu Mingxia,Zhang Daoqiang.Application of effective distance in clustering algorithms[J].Journal of Frontiers of Computer Science and Technology,2017,11(3):406-413.
    [3]Jain A K,Murty M N,Flynn P J.Data clustering:a review[J].ACM Computing Surveys,1999,31(3):264-323.
    [4]Chen Mei.Research on clustering algorithm for complex data[D].Lanzhou:Lanzhou University,2016.
    [5]Ester M,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining,Portland,Aug2-4,1996.Menlo Park:AAAI,1996:226-231.
    [6]Ankerst M,Breunig M M,Kriegel H P,et al.OPTICS:ordering points to identify the clustering structure[C]//Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data,Philadelphia,Jun 1-3,1999.New York:ACM,1999:49-60.
    [7]Karypis G,Han E H,Kumar V.Chameleon:hierarchical clustering using dynamic modeling[J].IEEE Computer,1999,32(8):68-75.
    [8]Guha S,Rastogi R,Shim K.CURE:an efficient clustering algorithm for large databases[C]//Proceedings of the ACMSIGMOD International Conference on Management of Data,Seattle,Jun 1-4,1998.New York:ACM,1998:73-84.
    [9]Guha S,Rastogi R,Shim K.ROCK:a robust clustering algorithm for categorical attributes[J].Information Systems,2000,25(5):345-366.
    [10]Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
    [11]Shao Junming,Han Zhichao,Yang Qinli,et al.Commnity detection based on distance dynamics[C]//Proceedings of the21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Sydney,Aug 10-13,2015.New York:ACM,2015:1075-1084.
    [12]Chen Mei,Li Longjie,Wang Bo,et al.Effectively clustering by finding density backbone based-on k NN[J].Pattern Recognition,2016,60:486-498.
    [13]Guidotti R,Monreale A,Nanni M,et al.Clustering individual transactional data for masses of users[C]//Proceedings of the23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Halifax,Aug 13-17,2017.New York:ACM,2017:195-204.
    [14]Kobren A,Monath N,Krishnamurthy A,et al.A hierarchical algorithm for extreme clustering[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Halifax,Aug 13-17,2017.New York:ACM,2017:255-264.
    [15]Beis J S,Lowe D G.Shape indexing using approximate nearest-neighbour search in high-dimensional spaces[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,San Juan,Jun 17-19,1997.Washington:IEEE Computer Society,1997:1000-1006.
    [16]Sugiyama M,Yamamoto A.A fast and flexible clustering algorithm using binary discretization[C]//Proceedings of the IEEE International Conference on Data Mining,Vancouver,Dec 11-14,2011.Washington:IEEE Computer Society,2011:1212-1217.
    [17]Huang Hao,Gao Yunjun,Chiew K,et al.Towards effective and efficient mining of arbitrary shaped clusters[C]//Proceedings of the IEEE 30th International Conference on Data Engineering,Chicago,Mar 31-Apr 4,2014.Washington:IEEE Computer Society,2014:28-39.
    [18]Rodriguez A,Laio A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
    [2]光俊叶,刘明霞,张道强.有效距离在聚类算法中的应用[J].计算机科学与探索,2017,11(3):406-413.
    [4]陈梅.面向复杂数据的聚类算法研究[D].兰州:兰州大学,2016.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700