改进的最小生成树自适应空间点聚类算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Improved Adaptive Spatial Points Clustering Algorithm Based on Minimum Spanning Tree
  • 作者:颜金彪 ; 郑文武 ; 段晓旗 ; 邓运员 ; 郭元军 ; 胡最
  • 英文作者:YAN Jinbiao;ZHENG Wenwu;DUAN Xiaoqi;DENG Yunyuan;GUO Yuanjun;HU Zui;National-Local Joint Engineering Laboratory on Digital Preservation and Innovative Technologies for the Culture of Traditional Villages and Towns;Cooperative Innovation Center for Digitalization of Cultural Heritage in Traditional Villages and Towns;
  • 关键词:最小生成树 ; 全局裁剪 ; 局部裁剪 ; 自适应 ; 聚类
  • 英文关键词:Minimum Spanning Tree;;global clipping;;local clipping;;adaptive;;clustering
  • 中文刊名:DQXX
  • 英文刊名:Journal of Geo-Information Science
  • 机构:传统村镇文化数字化保护与创意利用技术国家地方联合工程实验室;湖南省古村古镇文化遗产数字化传承协同创新中心;
  • 出版日期:2018-07-11 17:19
  • 出版单位:地球信息科学学报
  • 年:2018
  • 期:v.20;No.131
  • 基金:国家自然科学基金项目(41471118;41771150;41771188);; 衡阳师范学院青年项目(16A01;17A02)~~
  • 语种:中文;
  • 页:DQXX201807004
  • 页数:8
  • CN:07
  • ISSN:11-5809/P
  • 分类号:21-28
摘要
针对传统的最小生成树聚类算法存在使用全局不变阈值确定噪声边,聚类需要用户根据经验确定初始化聚类参数,如"边权值倍数容差","边长变化因子"等,聚类不能发现局部噪声的问题,本文提出了一种改进的最小生成树自适应空间点聚类算法。该算法在无需用户输入参数的前提下,克服主观因素的影响,根据最小生成树边长的数理统计特征定义裁剪因子。算法首先从宏观层面对最小生成树进行首轮删枝操作,消除全局环境下的噪声边,进而根据各子树的边长统计情况,自适应设定局部裁剪因子,进行第二轮删枝操作,消除局部环境下的噪声边。最后,采用1个模拟数据和1个实际应用验证算法的有效性,结果表明本文提出的改进算法在无需人为提供经验参数的环境下能够发现任意形状、不同密度的簇,能够准确的识别出空间点中的噪声数据,从而能够实现空间点数据背后隐藏信息的自动挖掘。
        In this paper, we proposed an improved adaptive spatial point clustering algorithm based on minimum spanning tree(MSTAA in abbreviation) to solve the problems existed in the traditional clustering algorithms.The first problem of these classical clustering algorithms is that the noise edges are determined using the global invariant. Another one is that the initial clustering parameters such as edge weight tolerance, edge variation factor,the number of clusters and initial clustering centers are determined by the users empirically. Furthermore, these algorithms can't find the noise edges at the local level. Based on these problems mentioned above, the algorithm put forward in this article aims to overcome the influence of subjective factors by defining two clipping factors.These trimming factors do not need to be determined by the users and can be automatically obtained according to the statistical features of the side length in the minimum spanning tree. The detailed realization process is as follows. In the first place, the pruning operation on the minimum spanning tree from the global level is carried out, which can eliminate the noises in the global environment. After the first round of tailoring, the initial minimum spanning tree becomes sub-tree collections. In the second place, in order to eliminate the noises at the local level, the algorithm performs the second round of pruning operation by setting the adaptive local cutting factor in the light of the side length statistics of each sub-tree. After the above two rounds of cutting, the MSTAA algorithm will get the final clustering result. In order to validate the effectiveness of the algorithm, both a simulated data and a practical application are adopted. By comparing with 4 classical clustering algorithms(k-means, DBSCAN, SEMST,HEMST), we find that the improved algorithm presented in this paper could find clusters of arbitrary shape and density in the environment where no one provides experience parameters. At the same time, not only does the MSTAA algorithm eliminate the obvious global noise points, but also it can distinguish noise points at the local environment so as to ensure a high similarity degree of cluster point set. All of the features of the MSTAA algorithm mentioned above make it possible to automatically mine hidden information behind spatial point data.
引文
[1]Eckley D C,Curtin K M.Evaluating the spatiotemporal clustering of traffic incidents[J].Computers,Environment and Urban Systems,2013,37:70-81.
    [2]Dale M R T.Spatial pattern analysis in plant ecology[M].Cambridge:Cambridge university press,2000.
    [3]Miller H J,Han J Geographic.Data mining and knowledge discovery,second edition[M].New York:CRC Press,2009.
    [4]Grubesic T H,Mack E A.Spatio-temporal interaction of urban crime[J].Journal of Quantitative Criminology,2008,24(3):285-306.
    [5]Wang M,Wang A,Li A.Mining spatial-temporal clusters from geo-databases[J].Advanced Data Mining and Applications,2006:263-270.
    [6]邓敏,刘启亮,李光强,等.一种基于似最小生成树的空间聚类算法[J].武汉大学学报·信息科学版,2010,35(11):1360-1364.[Dend M,Liu Q L,Li G Q.A spatial clustering algorithm based on minimum spanning tree-like[J].Geomatics and Information Science of Wuhan University,2010,35(11):1360-1364.]
    [7]Macqueen J.Some methods for classification and analysis of multivariate observations[C].Proceeding of Berkeley Symposium on Mathematical Statistics and Probability.1967:281-297.
    [8]Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
    [9]孙磊磊.AP聚类算法研究及其在电子病历挖掘中的应用[D].大连:大连理工大学,2017.[Sun L L.Study on affinity propagation clustering algorithm and its application in mining electronic medical records[D].Dalian:Dalian University of Technology,2017.]
    [10]Guha S,Rastogi R,Shim K.Cure:An efficient clustering algorithm for large databases[J].International System,2001,26(1)35-38.
    [11]Ester M.A density-based algorithm for discovering clusters in large spatial databases with noise[C].Proceeding of International Conference on Knowledg Discovery and Data Mining,Portland,1996:226-231.
    [12]Wang W,Yang J,Muntz R R.STING:A statistical information grid approach to spatial data mining[C].International Conference on Very Large Data Bases,Morgan Kaufmann Publishers Inc.1997:186-195.
    [13]汪闽,周成虎,裴韬,等.一种带控制节点的最小生成树聚类方法[J].中国图象图形学报,2002,7(8):765-770.[Wang Min,Zhou C H,Pei T,et al.A MST based clustering method with controlling vertexes[J].Journal of Image and Graphics,2002,7(8):765-770.]
    [14]Grygorash O,Zhou Y,Jorgensen Z.Minimum Spanning Tree based clustering algorithms[C].IEEE International Conference on TOOLS with Artificial Intelligence,IEEE,2006:73-81.
    [15]王小乐,刘青宝,陆昌辉,等.一种最小生成树聚类算法[J].小型微型计算机系统,2009,30(5):877-882.[Wang X L,Liu Q B,Lu C H.Minimum Spanning Tree clustering algorithm[J].Journal of Chinese Computer Systems,2009,30(5):877-882.]
    [16]徐晨凯,高茂庭.改进的最小生成树自适应分层聚类算法[J].计算机工程与应用,2014,50(22):149-153.[Xu C K,Gao M T.Improved adaptive hierarchical clustering algorithm based on minimum spanning tree[J].Computer Engineering and Applications,2014,50(22):149-153.]
    [17]邱雪松,蔺艳斐,邵苏杰,等.一种面向智能电网数据采集的传感器聚合布局构造算法[J].电子与信息学报,2015,37(10):2411-2417.[Qiu X S,Lin Y F,Shao S J,et al.Sensor aggregation distribution construction algorithm for smart grid data collection system[J].Journal of Electronics&Information Technology,2015,37(10):2411:2417.]
    [18]贾瑞玉,李振.基于最小生成树的层次K-means聚类算法[J].微电子学与计算机,2016,33(3):86-88.[Jia R Y,Li Z.The level of K-means clustering algorithm based on the Minimum Spanning Tree[J].Microelectronics and Computer,2016,33(3):86-88.]
    [19]朱利,邱媛媛,于帅,等.一种基于快速k-近邻的最小生成树离群检测方法[J].计算机学报,2017,40(12):2856-2870.[Zhu L,Qiu Y Y,Yu S,et al.A fast k NN-Based MST outlier detection method[J].Chinese Journal of Computers,2017,40(12):2856-2870.]
    [20]Humphrey G.The psychology of the gestalt[J].Journal of Educational Psychology,1924,15(7):401-412.
    [21]Deng M,Liu Q,Cheng T,et al.An adaptive spatial clusteringalgorithmbasedondelaunaytriangulation[J].Computers Environment&Urban Systems,2011,35(4):320-332.
    [22]余杰,吕品,郑昌文.Delaunay三角网构建方法比较研究[J].中国图象图形学报,2010,15(8):1158-1167.[Yu J,Lu P,Zheng C W.A comparative research on methods of Delaunay triangulation[J].Journal of Image and Graphics,2010,15(8):1158-1167.]
    [23]Graham R L,Hell P.On the history of the minimum Sspanning tree problem[J].Annals of the History of Computing,1985,7(1):43-57.
    [24]Asano T,Bhattacharya B,Keil M,et al.Clustering algorithms based on minimum and maximum spanning trees[C].Symposium on Computational Geometry,Data Base systems and Logic Programming,1988:252-257.
    [25]Xu X,Ester M,Kriegel H P,et al.A Distribution-Based clustering algorithm for mining in large spatial databases[C].International Conference on Data Engineering,1998.Proceedings,IEEE,1998:324-331.
    [26]Zahn C T.Graph-theoretical methods for detecting and describing gestalt clusters[J].IEEE Transactions on Computers,1971,20(1):68-86.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700