基于正序迭代选择策略的聚类中心自动选择方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Automatic Selection Method of Cluster Center Based on
  • 作者:王万良 ; 吕闯 ; 赵燕伟 ; 高楠 ; 杨小涵 ; 张兆娟
  • 英文作者:WANG Wanliang;Chuang;ZHAO Yanwei;GAO Nan;YANG Xiaohan;ZHANG Zhaojuan;College of Computer Science and Technology,Zhejiang University of Technology;
  • 关键词:聚类中心 ; 决策函数 ; 正序迭代 ; 密度峰值聚类 ; 数据挖掘
  • 英文关键词:Cluster Center;;Decision Function;;Positive Sequence Iterative;;Density Peak Clustering;;Data Mining
  • 中文刊名:MSSB
  • 英文刊名:Pattern Recognition and Artificial Intelligence
  • 机构:浙江工业大学计算机科学与技术学院;
  • 出版日期:2019-02-15
  • 出版单位:模式识别与人工智能
  • 年:2019
  • 期:v.32;No.188
  • 基金:国家自然科学基金项目(No.61572438,61702456,61873240)资助~~
  • 语种:中文;
  • 页:MSSB201902007
  • 页数:10
  • CN:02
  • ISSN:34-1089/TP
  • 分类号:57-66
摘要
针对密度峰值聚类算法的决策函数不能自动有效地确定聚类中心的问题,提出自动确定聚类中心的密度峰值聚类算法.首先,通过归一化处理,使决策函数中的两个变量分布均匀.然后,在确定聚类中心时,提出正序迭代选择策略,即根据聚类核心点数目的变化趋势搜索拐点,并以拐点之前的点作为聚类中心,完成聚类.最后,在UCI数据集上验证文中算法的性能,算法在未提高时间复杂度的情况下,可以对任意分布形状的数据集进行聚类,具有较好的适应性和聚类效果.
        The decision function of density peak clustering algorithm cannot determine the clustering center automatically and effectively. Therefore, a density peak clustering algorithm, automatically clustering by fast search and find of density peaks(AUTO-CFSFDP), is proposed. Firstly, the normalization process is carried out to make the uneven distribution of variables in the decision function become uniform. Secondly, the selection strategy based on positive-sequence iteration is presented to search elbow point according to the variation trend of the number of cluster core points in the process of determining the cluster center. A set of points before the elbow point is used as the cluster centers to complete clustering. Finally, the performance of AUTO-CFSFDP is evaluated on UCI datasets. AUTO-CFSFDP can cluster the datasets of arbitrary distributions without extra time consumption. The adaptability and clustering results are improved effectively.
引文
[1] CHEN C L P, ZHANG C Y. Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences, 2014, 275: 314-347.
    [2] GAN W S, LIN J C W, CHAO H C, et al. Data Mining in Distributed Environment: A Survey. Data Mining and Knowledge Discovery, 2017, 7: e1216.
    [3] XU D K, TIAN Y J. A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, 2015, 2(2): 165-193.
    [4] JAIN A K, DUBES R C. Algorithms for Clustering Data. New York, USA: Prentice Hall, 1988.
    [5] 王万良.人工智能及其应用.第3版.高等教育出版社, 2016.(WANG W L. Artificial Intelligence and Application. 3rd Edition. Beijing, China: Higher Education Press, 2016)
    [6] ZHANG Y M, LIU M D, LIU Q W. An Energy-Balanced Clustering Protocol Based on an Improved CFSFDP Algorithm for Wireless Sensor Networks. Sensors, 2018, 18(3). DOI: 10.3390/s18030881.
    [7] ALTMAN N, KRZYWINSKI M. Points of Significance: Clustering. Nature Methods, 2017, 14(6): 545-546.
    [8] QIN B Y, LI Z, LUO Z H, et al. Terahertz Time-Domain Spectroscopy Combined with PCA-CFSFDP Applied for Pesticide Detection. Optical & Quantum Electronics, 2017, 49(7). DOI: 10.1007/s11082-017-1080-x.
    [9] 郑建炜,路程,秦梦洁,等.联合特征选择和光滑表示的子空间聚类算法.模式识别与人工智能, 2018, 31(5): 409-418.(ZHENG J W, LU C, QIN M J, et al. Subspace Clustering via Joint Feature Selection and Smooth Representation. Pattern Recognition and Artificial Intelligence, 2018, 31(5): 409-418.)
    [10] 逯瑞强,马福民,张腾飞.基于区间2-型模糊度量的粗糙K-means聚类算法.模式识别与人工智能, 2018, 31(3): 265-274.(LU R Q, MA F M, ZHANG T F. Interval Type-2 Fuzzy Measure Based Rough K-means Clustering. Pattern Recognition and Artificial Intelligence, 2018, 31(3): 265-274.)
    [11] 雷小锋,谢昆青,林帆,等.一种基于K-means局部最优性的高效聚类算法.软件学报, 2008, 19(7): 1683-1692.(LEI X F, XIE K Q, LIN F, et al. An Efficient Clustering Algorithm Based on Local Optimality of K-means. Journal of Software, 2008, 19(7): 1683-1692.)
    [12] ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182.
    [13] GUHA S, RASTOGI R, SHIM K. CURE: An Efficient Clustering Algorithm for Large Database. Information Systems, 2001, 26(1): 35-58.
    [14] RODRIGUEZ A, LAIO A. Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496.
    [15] ESTER M, KRIEGEL H P, XU X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Proc of the International Conference on Knowledge Discovery and Data Mining. Palo Alto, USA: AAAI Press, 1996: 226-231.
    [16] XIE J Y, GAO H C, XIE W X, et al. Robust Clustering by Detecting Density Peaks and Assigning Points Based on Fuzzy Weighted K-nearest Neighbors. Information Sciences, 2016, 354: 19-40.
    [17] MEHMOOD R, BIE R, JIAO L B, et al. Adaptive Cutoff Distance: Clustering by Fast Search and Find of Density Peaks. Journal of Intelligent and Fuzzy Systems, 2016, 31(5): 2619-2628.
    [18] WANG W, YANG J, MUNTZ R R. STING: A Statistical Information Grid Approach to Spatial Data Mining // Proc of the International Conference on Very Large Data Bases. San Francisco, USA: Morgan Kaufmann Publisher, 1997: 186-195.
    [19] AGRAWAL R, GEHRKE J, GUNOPULOS D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD Record, 1998, 27(2): 94-105.
    [20] 朱杰,陈黎飞.核密度估计的聚类算法.模式识别与人工智能, 2017, 30(5): 439-447.(ZHU J, CHEN L F. Clustering Algorithm with Kernel Density Estimation. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 439-447.)
    [21] HAN J W, KAMBER M, PEI J. Data Mining: Concepts and Techniques. New York, USA: Elsevier, 2011.
    [22] BIE R F, MEHMOOD R, RUAN S S, et al. Adaptive Fuzzy Clustering by Fast Search and Find of Density Peaks. Personal and Ubiquitous Computing, 2016, 20(5): 785-793.
    [23] MEHMOOD R, BIE R F, DAWOOD H, et al. Fuzzy Clustering by Fast Search and Find of Density Peaks // Proc of the International Conference on Identification, Information, and Knowledge in the Internet of Things. Washington, USA: IEEE, 2015: 258-261.
    [24] WANG J L, ZHANG Y, LAN X. Automatic Cluster Number Selection by Finding Density Peaks // Proc of the 2nd IEEE Internatio-nal Conference on Computer and Communications. Washington, USA: IEEE, 2016: 13-18.
    [25] DING J J, CHEN Z T, HE X X, et al. Clustering by Finding Density Peaks Based on Chebyshev's Inequality // Proc of the 35th Chinese Control Conference. Washington, USA: IEEE, 2016: 7169-7172.
    [26] XU X H, JU Y S, LIANG Y L, et al. Manifold Density Peaks Clustering Algorithm // Proc of the 3rd International Conference on Advanced Cloud and Big Data. Washington, USA: IEEE, 2015: 311-318.
    [27] ZHOU R, ZHANG S, CHEN C, et al. A Distance and Density-Based Clustering Algorithm Using Automatic Peak Detection // Proc of the IEEE International Conference on Smart Cloud. Wa-shington, USA: IEEE, 2016: 176-183.
    [28] 淦文燕,刘冲.一种改进的搜索密度峰值的聚类算法.智能系统学报, 2017, 12(2): 229-236.(GAN W Y, LIU C. An Improved Clustering Algorithm that Searches and Finds Density Peaks. CAAI Transactions on Intelligent Systems, 2017, 12(2): 229-236).
    [29] 贾培灵,樊建聪,彭延军.一种基于簇边界的密度峰值点快速搜索聚类算法.南京大学学报(自然科学), 2017, 53(2): 368-377.(JIA P L, FAN J C, PENG Y J. An Improved Clustering Algorithm by Fast Search and Find of Density Peaks Based on Boundary Samples. Journal of Nanjing University(Natural Sciences), 2017, 53(2): 368-377.)
    [30] RAGHAVAN V V, DEOGUN J S, SEVER H. Introduction to Data Mining. New York, USA: John Wiley & Sons, 1998.
    [31] GIONIS A, MANNILA H, TSAPARAS P. Clustering Aggregation. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1). DOI: 10.1145/1217299.1217303.
    [32] CHANG H, YEUNG D Y. Robust Path-Based Spectral Clustering. Pattern Recognition, 2008, 41(1): 191-203.
    [33] PAL N R, PAL K, KELLER J M, et al. A Possibilistic Fuzzy c-means Clustering Algorithm. IEEE Transactions on Fuzzy Systems, 2005, 13(4): 517-530.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700