一种基于代表点质量的万有引力聚类算法

英文篇名：Gravitation Based Clustering Algorithm Using Representative Points with Mass
作者：张晓民 ; 张枫 ; 刘黎明
英文作者：Zhang Xiaomin;Zhang Feng;Liu Liming;Software School, Nanyang Institute of Technology;
关键词：聚类 ; 质量 ; 代表点 ; 万有引力
英文关键词：cluster;;mass;;representative point;;universal gravitation
中文刊名：NKDZ
英文刊名：Acta Scientiarum Naturalium Universitatis Nankaiensis
机构：南阳理工学院软件学院;
出版日期：2016-08-20
出版单位：南开大学学报(自然科学版)
年：2016
期：v.49
基金：河南省高等学校重点科研项目(15A520089)
语种：中文;
页：NKDZ201604002
页数：8
CN：04
ISSN：12-1105/N
分类号：10-17

摘要

为快速有效地对大规模数据对象聚类,提出了一种基于代表点质量的万有引力聚类算法GCARM.算法首先扫描数据集并利用K-叉树结构使得距离相近的对象凝聚在一起成为具有质量的代表点;然后计算代表点之间的万有引力,使得引力大于设定阈值的代表点连通起来,其最大连通对象的集合就是聚类.实验结果表明,GCARM算法可以在保证精度的情况下识别任意形状,任意大小的聚类并去除噪声,并具有较高的效率和可扩展性.
To accelerate the cluster process for large-scale datasets, a new method called gravitation based clustering algorithm using representative points with mass is explored. Firstly, the algorithm scans the dataset and uses K-tree structure to form the near objects into representative points with mass; then it calculates the universal gravitation between them. The representative points having bigger attraction than a threshold presupposed would be connected and considered objects in one cluster with high similarity. Experiments show that GCARM could recognize clusters of arbitrary shape and arbitrary size, and remove noise with high efficiency and scalability while guaranteeing the accuracy.

引文

1孟凡荣,李晓翠,周勇.一种基于代表点的增量聚类算法[J].计算机应用研究,2013,29(8):2 865-2 867.
    2 伍育红.聚类算法综述[J].计算机科学,2015,42(6A):491-499.
    3 杨长春,周梦,叶施仁,等.基于改进CURE算法的微博热点话题发现[J].计算机仿真,2013,30(11):383-387.
    4 王寅同,王建东,陈海燕,等.一种代表点的层次折半聚类算法[J].小型微型计算机系统,2015,36(2):215-219.
    5 成卫青,卢艳红.一种基于最大最小距离和SSE的自适应聚类算法[J].南京邮电大学学报:自然科学版,2015,35(2):102-106.
    6 Jiawei Han,Micheline Kamber,Jian Pei.Data Mining:Concepts and Techniques[M].3rd ed.北京:机械工业出版社,2012.
    7 谭旁宁,Steinbach M,Kumar V.Introduction to Data Mining[M].北京:人民邮电出版社,2012.
    8 王民,尹超,王稚慧,等.Binary-Positive下的并行化CURE算法[J].计算机工程与应用,2014,50(11):58-61.
    9 Sudipto Guha,Rajeev Rastogi,Kyuseok Shim.CURE:An Efficient Clustering Algorithm for Large Databases[M].[S.l.]:ACM Press,1998.
    10 陈恩红,王上飞,宁岩,等.一种利用代表点的有效聚类算法设计与实现[J].模式识别与人工智能,2001,14(4):417-422.
    11 贾瑞玉,耿锦威,宁再早,等.基于代表点的快速聚类算法[J].计算机工程与应用,2010,415(33):121-126.
    12 Francis S H,Stephen M K.计算机图形学(Open GL版)[M].3rd ed.胡事民,译.北京:电子工业出版社,2009.
    13 Ester M,Kriegel HP,Sander J,et al.A density-based algorithm for discovering clusters in large spatial database with noise:proceedings of the 2nd International Conference on Knowledge and Discovery and Data Mining,Portland,January 5-9,1996[C].Palo Alto:AAAI Press,1996.
    14 Karypis G,Han E H,Kumar V.Chameleon:Hierarchical clustering using dynamic modeling[J].IEEE Computer,1999(8):68-75.
    15 邱保志,沈钧毅.基于网格技术的高精度聚类算法[J].计算机工程,2006,32(3):12-13.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700