摘要
现有的很多聚类算法在各种数据集中检测任意簇时通常不能获得好的性能。通过把每个数据点看作自然界中的质点,定义了数据点间密度引力的概念,在此基础上提出了一种新的具有鲁棒性的密度引力聚类算法。首先根据每个数据点的周围邻居分布稀疏程度获得其局部密度,然后迭代地将每个数据点分配给密度比它大且距其最近的互近邻点形成初始簇,最后将具有共同数据点的初始簇进行合并得到最终簇。实验将提出的新算法在六个不同维度、不同类型的数据集上分别与三种经典算法、三种新算法进行了测试,结果表明该算法的聚类性能优于对比算法,且可以在不同维度的数据集中发现任意簇。
Existing clustering algorithms often fail in dealing with datasets with various clusters.By regarding each data point as a particle,a new robust density attracted clustering algorithm is proposed to better detect the cluster.First,the local density of a point is obtained according to the dense degree of its neighbors.Then,each point is assigned to its mutual nearest neighbor with greater density to form initial clusters.Next,the initial clusters including the same points are merged together to form the final clusters.By comparing the new algorithm with the three classical and the three state-of-the-art methods on six various datasets respectively,the results show that the new proposed algorithm shows the best performance,and can be used to find various clusters in datasets with different dimensions.
引文
[1]Roiger R J.Data mining:a tutorial-based primer[M].Boca Raton:CRC Press,Inc,2017.
[2]Guang Junye,Liu Mingxia,Zhang Daoqiang.Application of effective distance in clustering algorithms[J].Journal of Frontiers of Computer Science and Technology,2017,11(3):406-413.
[3]Jain A K,Murty M N,Flynn P J.Data clustering:a review[J].ACM Computing Surveys,1999,31(3):264-323.
[4]Chen Mei.Research on clustering algorithm for complex data[D].Lanzhou:Lanzhou University,2016.
[5]Ester M,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining,Portland,Aug2-4,1996.Menlo Park:AAAI,1996:226-231.
[6]Ankerst M,Breunig M M,Kriegel H P,et al.OPTICS:ordering points to identify the clustering structure[C]//Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data,Philadelphia,Jun 1-3,1999.New York:ACM,1999:49-60.
[7]Karypis G,Han E H,Kumar V.Chameleon:hierarchical clustering using dynamic modeling[J].IEEE Computer,1999,32(8):68-75.
[8]Guha S,Rastogi R,Shim K.CURE:an efficient clustering algorithm for large databases[C]//Proceedings of the ACMSIGMOD International Conference on Management of Data,Seattle,Jun 1-4,1998.New York:ACM,1998:73-84.
[9]Guha S,Rastogi R,Shim K.ROCK:a robust clustering algorithm for categorical attributes[J].Information Systems,2000,25(5):345-366.
[10]Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
[11]Shao Junming,Han Zhichao,Yang Qinli,et al.Commnity detection based on distance dynamics[C]//Proceedings of the21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Sydney,Aug 10-13,2015.New York:ACM,2015:1075-1084.
[12]Chen Mei,Li Longjie,Wang Bo,et al.Effectively clustering by finding density backbone based-on k NN[J].Pattern Recognition,2016,60:486-498.
[13]Guidotti R,Monreale A,Nanni M,et al.Clustering individual transactional data for masses of users[C]//Proceedings of the23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Halifax,Aug 13-17,2017.New York:ACM,2017:195-204.
[14]Kobren A,Monath N,Krishnamurthy A,et al.A hierarchical algorithm for extreme clustering[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Halifax,Aug 13-17,2017.New York:ACM,2017:255-264.
[15]Beis J S,Lowe D G.Shape indexing using approximate nearest-neighbour search in high-dimensional spaces[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,San Juan,Jun 17-19,1997.Washington:IEEE Computer Society,1997:1000-1006.
[16]Sugiyama M,Yamamoto A.A fast and flexible clustering algorithm using binary discretization[C]//Proceedings of the IEEE International Conference on Data Mining,Vancouver,Dec 11-14,2011.Washington:IEEE Computer Society,2011:1212-1217.
[17]Huang Hao,Gao Yunjun,Chiew K,et al.Towards effective and efficient mining of arbitrary shaped clusters[C]//Proceedings of the IEEE 30th International Conference on Data Engineering,Chicago,Mar 31-Apr 4,2014.Washington:IEEE Computer Society,2014:28-39.
[18]Rodriguez A,Laio A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[2]光俊叶,刘明霞,张道强.有效距离在聚类算法中的应用[J].计算机科学与探索,2017,11(3):406-413.
[4]陈梅.面向复杂数据的聚类算法研究[D].兰州:兰州大学,2016.