摘要
针对数据竞争算法采用欧式距离计算相似度、人为指定聚类簇数以及聚类中心无法准确自动确定等问题,提出了一种自动确定聚类中心的数据竞争聚类算法。引入了数据场的概念,使得计算出的势值更加符合数据集的真实分布;同时,结合数据点的势能与局部最小距离形成决策图完成聚类中心点的自动确定;根据近邻原则完成聚类。在人工以及真实数据集上的实验效果表明,提出的算法较原数据竞争算法具有更好的聚类性能。
Aiming at the similarity of Euclidean distance calculation, the number of clustering clusters and the clustering center can not be determined automatically and accurately, a data competition clustering algorithm is proposed to automatically determine the clustering center. Firstly, the concept of the data field is introduced so that the calculated potential value is more consistent with the true distribution of the data set. At the same time, the automatic determination of the clustering center is completed by combining the potential energy of the data point with the local minimum distance to form the decision graph. Principle to complete the cluster. The experimental results on the artificial and real data sets show that the proposed algorithm has better clustering performance than the original data competition algorithm.
引文
[1]沈小云,衣俊艳.面向聚类分析的自适应弹性网络算法研究[J].计算机工程与应用,2017,53(9):175-183.
[2]李建勋,申静静,李维乾,等.基于趋势函数的空间数据聚类方法[J].计算机工程与应用,2017,53(6):22-28.
[3]Jain A K.Data clustering:50 years beyond K-means[J].Pattern Recognition Letters,2010,31(8):651-666.
[4]Park H S,Jun C H.A simple and fast algorithm for K-medoids clustering[J].Expert Systems with Applications,2009,36(2):3336-3341.
[5]Zhou Y,Xu C,Li J.Unsupervised anomaly detection method based on improved CURE clustering algorithm[J].Journal on Communications,2010,31(7):18-23.
[6]Karypis G,Han E H,Kumar V.Chameleon:hierarchical clustering using dynamic modeling[J].Computer,1999,32(8):68-75.
[7]Yang Jing,Gao Jiawei,Liang Jiye.An improved DBSCANclustering algorithm based on data field[J].Journal of Frontiers of Computer Science and Technology,2012,6(10):903-911.
[8]Kalita H K,Bhattacharyya D K,Kar A.A new algorithm for ordering of points to identify clustering structure based on perimeter of triangle:OPTICS(BOPT)[C]//International Conference on Advanced Computing&Communications,2007:523-528.
[9]Ansari S,Chetlur S,Prabhu S.An overview of clustering analysis techniques used in data miniing[J].International Journal of Emerging Technology and Advanced Engineering,2013,3(12):284-286.
[10]Amini A,Wah T Y,Saybani M R.A study of densitygridbased clustering algorithms on data streams[C]//2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery(FSKD),2011:1652-1656.
[11]Lu Zhimao,Zhang Qi.Clustering by data competition[J].Science China Information Sciences,2013,56(1):1-13.
[12]苏辉,葛洪伟,张欢庆.密度自适应的数据竞争聚类算法[J].计算科学与探索,2016,10(10):1439-1450.
[13]Fu X,Liu W,Tang Y.Improved particle filter algorithms based on partial systematic resampling[C]//IEEE International Conference on Intelligent Computing and Intelligent systems,2010:483-487.
[14]Wang Shuliang,Wang Dakui,Li Caoyuan,et al.Clustering by fast search and find of density peaks with data field[J].Chinese Journal of Electronics,2016,25(3):397-402.
[15]Rodriguez A,Laio A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[16]王洋,张桂珠.自动确定聚类中心的密度峰值算法[J].计算机工程与应用,2018,54(8):137-142.