基于密度的全局K-means算法的改进

英文篇名：Improvement of the global K-means algorithm based on density
作者：徐娟 ; 范菁 ; 陈楚天 ; 曲金帅
英文作者：XU Juan;FAN Jing;CHEN Chu-tian;QU Jin-shuai;University Key Laboratory of Information and Communication Security Disaster Backup and Recovery in Yunnan Province,Yunnan Minzu University;School of Electrical and Information Technology,Yunnan Minzu University;
关键词：GK-means算法 ; FGK-means算法 ; DGK-means算法 ; 高密度数
英文关键词：GK-means algorithm;;FGK-means algorithm;;DGK-means algorithm;;high-density number
中文刊名：YNMZ
英文刊名：Journal of Yunnan Minzu University(Natural Sciences Edition)
机构：云南民族大学云南省高校信息与通信安全灾备重点实验室;云南民族大学电气信息工程学院;
出版日期：2019-03-19 14:49
出版单位：云南民族大学学报(自然科学版)
年：2019
期：v.28;No.114
基金：国家自然科学基金(61540063);; 云南省应用基础研究计划项目(2018FD055);; 云南省教育厅科学研究基金(2017ZDX045);; 云南民族大学校级科研项目(2017QN02);; 云南省高校科技创新团队开放式基金
语种：中文;
页：YNMZ201902014
页数：5
CN：02
ISSN：53-1192/N
分类号：60-64

摘要

针对全局K-means聚类算法和快速全局K-means聚类算法在选择下一簇的聚类中心点时,需要逐一计算数据集中每个点作为备选聚类中心点时的簇内平方误差函数,而数据集中存在很多不可能作为备选点的噪声点.为剔除噪声点,提出了一种基于高密度数的DGK-means算法,并通过UCI数据库中的4组数据集进行实验测试.验证了在聚类效果稳定的前提下,改进的DGK-means算法比全局K-means算法和快速全局K-means算法,聚类用时更短,聚类效率更高.
When selecting the cluster center point of the next cluster, it is necessary for the global k-means clustering algorithm and the fast global k-means clustering algorithm to calculate the intra-cluster square error when each point in the dataset is used as the candidate cluster center point one by one. However, there are many noise points in the data set, and it is not possible to use them candidate points. In order to eliminate noise points, a high-density-based DGK-means algorithm is proposed and tested by four sets of data sets in the UCI database. The comparison of the improved DGK-means algorithm with the global?k-means clustering algorithm and the fast global k-means clustering algorithm reveals that under the premise of the stable clustering effect, the clustering time of improved algorithm is shorter and its efficiency is better.

引文

[1] 朱明. 数据挖掘[M]. 2版.北京:中国科学技术大学出版社, 2008.
    [2] YEDLA M, PATHAKOTA S R, SRINIVASA T M. Enhancing K-means clustering algorithm with improved initial center[J]. International Journal of computer science and information technologies, 2010, 1(02):121-125.
    [3] FISHER D H. Knowledge acquisition via incremental conceptual clustering[J]. Machine learning, 1987, 2(02):139-172.
    [4] LIKAS A, VLASSIS N, VERBEEK J J. The global k-means clustering algorithm[J]. Pattern recognition, 2003, 36(02):451-461.
    [5] 赵丽. 全局K-均值聚类算法研究与改进[D]. 西安:西安电子科技大学, 2013.
    [6] 张玉芳,毛嘉莉,熊忠阳.一种改进的K-means算法[J].计算机应用,2003(08):33-35+62.
    [7] 韩凌波. 一种改进的K-means初始聚类中心选取算法[J]. 计算机工程与应用, 2006, 46(17):150-152.
    [8] 谢娟英, 蒋帅, 王春霞, 等. 一种改进的全局K-均值聚类算法[J]. 陕西师范大学学报(自然科学版), 2010, 38(02):18-22.
    [9] PARK H S, JUN C H. A simple and fast algorithm for K-medoids clustering[J]. Expert Systems with Applications, 2009, 36(02):3336-3341.
    [10] 谢娟英, 郭文娟, 谢维信,等. 基于样本空间分布密度的初始聚类中心优化K-均值算法[J]. 计算机应用研究, 2012, 29(03):888-892.