蚁群算法及其在聚类中的应用

英文题名：Ant Colony Algorithm and Its Application in Clustering
作者：张新建
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：蚁群算法 ; 蚁群优化算法 ; 聚类 ; FCM算法
英文关键词：Ant algorithm ; Ant colony optimization ; clustering ; FCM algorithm
学位年度：2010
导师：贾瑞玉
学科代码：081202
学位授予单位：安徽大学
论文提交日期：2010-04-01

摘要

蚁群算法是通过对自然界中真实蚂蚁的集体行为的观察、模拟而得到一种仿生优化算法,它具有很好的并行性、分布性。根据蚂蚁群体不同的集体行为特征,蚁群算法可分为受蚂蚁觅食行为启发的模型和受孵化分类启发的模型、受劳动分工和协作运输启发的模型。本文重点研究了前两种蚁群算法模型。
     受蚂蚁觅食行为启发的模型又称为蚁群优化算法(ACO),是继模拟退火算法、遗传算法、禁忌搜索等之后又一启发式智能优化算法。目前它已成功应用于求解TSP问题、地图着色、路径车辆调度等优化问题。本文针对蚁群算法收敛时间长、易陷入局部最优的缺点,通过对路径上信息素的更新方式作出动态调整、建立信息素平滑机制,进而使得不同路径上的信息素的更新速度有所不同,从而使改进后算法能够有效地缩短搜索的时间,并能对最终解进行优化,避免过早的陷入局部最优。
     聚类是数据挖掘的重要技术之一,它可按照某种规则将数据对象划分为多个类或簇,使同一类的数据对象有较高的相似度,而不同类的数据对象差异较大。受蚂蚁的觅食过程启发的聚类算法又被称为基于蚂蚁觅食原理的聚类算法。把蚂蚁觅食行为分为搜索食物和搬运食物两个环节,同时把数据对象视为蚂蚁,把聚类中心视为“食物源”,这样数据对象的聚类过程就可以转化为蚂蚁觅食的过程,在信息素的引导下蚂蚁就可以完成数据对象的聚类。但在该算法中没有区分数据对象不同属性的重要性,本文通过采用离差最大化方法,对每个属性根据它的重要性为它赋予一个权值,从而改进了原算法中的距离计算,使得相似的数据对象能快速的聚集到一起,从而避免了大量无效的相似度计算,提高了算法的效率。
     受孵化分类启发的模型又称为蚂蚁堆形成原理聚类算法。很多种类的蚂蚁都能够将卵和小幼虫紧密地排列成束并放置在巢穴孵化区的中心,而最大的幼虫位于孵化束的外围。Deneubourg等人根据这一现象最先提出了一个基本模型(BM)来模拟该现象,对基本模型比较成功的改进有LF算法。模糊聚类算法思想来源于Ruspini于1969年提出的模糊划分思想,是指在涉及事物之间的模糊界限时按一定要求对事物进行分类的数学方法,本文所介绍的FCM是其中的一种。本文通过分析LF算法和FCM算法的优缺点,以及它们的互补性,提出了基于LF算法的改进FCM算法,即将这两个算法进行融合,同时也采用了离差最大化赋权方法,对算法中的距离计算进行改进,进一步提高了算法的性能。
Ant colony algorithm which is inspired by the behavior of ant colony is parallelism, distributed. According to the behavior characteristics of ant colony in different aspects, ant algorithm can be classified as model inspired by ant looking for food behavior and model inspired by brood sorting, model inspired by division of laborj, model inspired by cooperative transport. This dissertation focuses on the two models mentioned above.
     Model inspired by ant looking for food behavior is also called ants colony optimization (ACO), which is a new kind of evolution computation algorithms based on bionics, and is a new heuristic intelligent optimization algorithm after simulated annealing, genetic algorithm and tabu search. After the algorithm being proposed, ACO has applied into TSP, quadratic assignment, graph coloring, vehicle route successfully. Aiming at solving disadvantages of ACO, this dissertation proposed an ant colony algorithm with dynamic update pheromone. Improved algorithm can shorten searching time, optimize solution finally and avoid the premature fall into local optimum.
     Clustering is an important technique in data mining, it is a rule in accordance with the data objects into multiple categories or clusters to make the same kind of data objects have a higher degree of similarity, but not quite different kind of data object.Clustering algorithm inspired by ants looking for food behavior is also known as clustering algorithm based on the theory of ants looking for food behavior. In the abstract of the ants in nature looking for food behavior, the behavior of looking for food is divided into two areas of searching food and transporting food, while the data object as ants, the cluster center as a "food source". So the clustering process can be described as ant foraging process. The clustering algorithm does not distinguish between the different attributes of the importance of data objects, this dissertation uses the maximum deviation algorithm, for each attribute according to its importance as it gives a weight. These make similar objects fast clustering, and avoid a lot of useless computation, which improves the efficiency of algorithm.
     Model inspired by brood sorting can also be called ant clustering model. Many kinds of ants can closely arrange eggs and larvae in a sheaf around the middle of nest area, and the biggest larvae located at the edge of area. Deneubourg and his colleagues were the first to propose the basic model (BM) to simulate this phenomenon. LF algorithm is a successfully improved BM. Fuzzy clustering algorithm is inspired by the theory of Fuzzy partition which is proposed by Ruspini in 1969. The FCM algorithm which is used in the dissertation is one of Fuzzy clustering algorithm. The dissertation analyzes the advantages and disadvantages of the LF algorithm and FCM algorithm, and their complementary analysis, LF algorithm is proposed based on improved FCM algorithm, the forthcoming integration of these two algorithms, but also used to maximize the weighted deviation algorithm, the algorithm in the distance calculation be improved to further enhance the performance of the algorithm.

引文

[1]Jiawei Han, Micheline Kamber.范明,孟小峰等译.数据挖掘：概念与技术[M].北京：机械工业出版社,2001
    [2]王立伟.数据挖掘的研究现状综述[J]：图书与情报.2008,5：42.
    [3]张伟,刘勇军等.数据挖掘发展研究[J]：计算机科学.2001,28(7)：81.
    [4]杨燕,靳蕃等. 聚类有效性评价综述[J].计算机应用研究.2008.25(6)：1630.
    [5]黄金花.聚类算法的分析与比较[J].职校论坛.2008.13：254.
    [6]T Zhang, R Ramakrishnan, M Livny. BIRCH:An efficient data clustering method for very large databases [C], In Proceedings 1996 ACM-SIGMOD International Conference Management of Data, Montreal, Canada, 1996:103-114
    [7]S Guha, R Rastogi, K Shim. Cure:An efficient clustering algorithm for large databases [C], In Proceedings 1998 ACM-SIGMOD International Conference Management of Data, Seattle, WA,1998:73-84
    [8]Harting J., Wong M. A K-means clustering algorithm. Applied Statistics, 1979,(28);100-108.
    [9]吴景岚,朱文兴.基于k中心点的迭代局部搜索聚类算法[J].计算机研究与发展.2004.41：247.
    [10]蔡颖砚,谢昆青,马修军.屏蔽了输人参数敏感性的DBSCAN改进算法[J].北京大学学报(自然科学版).2004,40(3)：480-486.
    [11]A Hinneburg, D A Keim. An efficient approach to clustering in large multimedia databases with noise [C], In Proceedings 1998 International Conference Knowledge Discovery and Data Mining, New York,1998:58-65
    [12]W Wang, J Yang, R Muntz. STING:A statistical information grid approach to spatial data mining [C], In Proceeding 1997 International Conference Very Large Data Bases, Athens, Greece,1997:186-195
    [13]G Sheikholeslami, S Chatterjee, A Zhang. WaveCluster:A multi resolution clustering approach for very large spatial databases [C], In Proceedings 1998 International Conference Very Large Data Bases, New York, 1998:428-439
    [14]R Agrawal, J Gehrke, D Gunopulos, P Raghavan. Automatic subspace clustering of high dimensional data for data mining applications [C], In-Proceedings 1998 ACM-SIGMOD International Conference Management of Data, Seattle, WA,1998:94-105
    [15]Fisher D. Improving inference through conceptual clustering. In Proc.1987 AAAI Conf, Seattle, WA.1987,461-465.
    [16]Kohonen T. Self-Organization and Associative Memory (Third Edition) [M], Berlin:Springer Verlag, 1989
    [17]Dorigo M, Bonabeau E, Theraulaz G. Ant algorithm and stigmergy [J]. Future Generation Computer System,2000,16(8)-851-871
    [18]Dorigo M, Stutzle T. An experimental study of the simple ant colony optimization algorithm [C],N Mastorakis(Ed.),2001 WSES International Conference on Evolutionary Computation (EC'01):253-258, WSES Press, 2001
    [19]张惟皎,刘春煌,尹晓峰.蚁群算法在数据挖掘中的应用研究[J].计算机工程与应用.2004.28：171.
    [20]章义刚,贾瑞玉等.快速蚁群算法求解圆排列问题[J].计算机技术与发展.2007.17(8)：48-50.
    [21]刘念涛,刘希玉.基于改进的启发式蚁群算法的聚类问题的研究[J].计算机技术与发展.2007.17(8)：37-39.
    [22]段海滨.蚁群算法原理及其应用[J].北京：科学出版社,2005.
    [23]Dorigo M, Maniezzo V, Colorni A. Positive feedback as a search strategy [C]. Technical report,91-016, Dipartimento di Elettronica, Politecnico di Milano, Milan,1991
    [24]Colorni A, Dorigo M, Maniezzo V. Distributed optimization by ant colonies [C]. In F.J. Varela & P. Bourgine (Ed.), Proceedings of the First European Conference on Artificial Life:134-142, Cambridge, MA, MIT Press, 1992
    [25]Dorigo M. Optimization Learning and Natural Algorithm [D], PhD thesis, Dipartimento di Elettronica, Politecnico di Milano, Milan,1992
    [26]Dorigo M, Gambardella L M. A study of some properties of Ant-Q [C], In H Voigt, W Ebeling, I Rechenberg, H Schwefel(Eds.), Proceedings of PPSN-IV, Fourth International Conference on Parallel Problem Solving from Nature, vol.1141 of Lecture Notes in Comouter Science, Belin, Springer-Verlag,1996:656-665
    [27]Dorigo M, Gambardella L M. Ant Colony System:A cooperative learning approach to the traveling salesman problem [J]. IEEE Transactions on Evolutionary Computation,1997,1(1):53-66
    [28]胡祥培,丁秋雷,李永先.蚁群算法研究评述[J].管理工程学报.2008.22(2)：75-76.
    [29]杨瑞,张海英,潘永湘.混合自适应蚁群算法及其应用研究[J].西安理工大学学报,2005.21(4)：405-408.
    [30]Bullnheimer B, Hartl R F, Sreauss C. A new rank-based version of the Ant System:A computational study [J]. Central European Journal for Operations Research and Economics,1999,7 (1):25-38
    [31]吴庆洪,张纪会,徐心如.具有变异特征的蚁群算法[J].计算机研究与发展,1999.36(10)：1240-1245.
    [32]吴斌,史忠植.一种基于蚁群算法的TSP问题分段求解算法[J].计算机学报,2001.21(12)：1328-1333.
    [33]徐精明,曹先彬,王煦法.多态蚁群算法[J].中国科学技术大学学报,2005.35(1)：59-65.
    [34]郝晋,石立宝,周家启.求解复杂TSP问题的随机扰动蚁群算法[J].系统工程理论与实践,2002.9：88-92.
    [35]梁静,钱省三,马良.基于双层蚂蚁算法的半导体炉管制程批高度研究[J]. 系统工程理论与实践,2005,12：96—101.
    [36]许殿,史小卫,程睿.回归蚁群算法[J].西安电子科技大学学报.2005.32(6)：944—947.
    [37]高尚.解旅行商问题的混沌蚁群算法[J].系统工程理论与实践.2005.9：100-105.
    [38]熊伟清,周扬,魏平.具有灾变的动态蚁群算法[J].电路与系统学报.2005.10(6)：98—102.
    [39]吴斌,赵燕伟.蚁群算法的研究现状[J].自动化仪表,2004.25(1)：1-4.
    [40]丁建立,许增强,袁著祉.遗传算法与蚁群算法的融合[J].计算机研究与发展,2003.40(9)：1351-1356.
    [41]朱庆保,杨志军.基于变异和动态信息素更新的蚁群优化算法[J].软件学报,2004.15(2)：185-192.
    [42]段海滨,王道波,于秀芬.蚁群算法的研究现状及其展望[J].中国工程科学.2007.9(2)：99-100.
    [43]Wagner I A, Lindenbaum M, Bruckstein A M. Smell as a computational resource—A lesson we can learn from the ant [C]. In M Y Vardi (Ed.), Proceedings of the Fourth Israeli Symposium on Theory of Computing and System (ISTCS-99):219-230. Los Alamitos, CA, IEEE Computer Society Press, 1996.
    [44]N. Monmarche.Algorithms de fourmis artificielles:applications a la classification et al'optimisation. PhD thesis, Universities France Rabelais.2000.
    [45]N. Monmarche. On data clustering with artificial ants. A. A. Freitas, editor, AAA1-99 &GECC0-99 workshop on Data Mining with Evolutionary Algorithms:Research Directions, Orlando, Florida, July 18,23-26.
    [46]H Azzag, N Monmarchc, M Slimance et al. AntTree:A new model for clustering with artificial ants[C]. In:IEEE Congress on Evolutionary Computation, Canberra, Australia,2003:8-12
    [47]Hanene Azzag, Gilles Venturini. A hierarchical ant based clustering algorithm and its use in three real-world applications[J]. European Journal of Operational Research,2007,179(6):906-922
    [48]N. Labroche, N. Monmarche and G.Venturini. AntClust:ant clustering and web usage mining[R]. Proc. of the GECCO Conference, Chicago,2003
    [49]郑松,侯迪波,周泽魁.动态调整选择策略的改进蚁群算法[J].控制与决策.2008.23(2)：225-228.
    [50]张建华,江贺,张宪超.蚁群聚类算法综述[J].计算机工程与应用.2006.16：172.
    [51]马永红,周荣喜,李振光.基于离差最大化的决策者权重的确定方法[J].北京化工大学学报.2007.34(2)：177.
    [52]王应明.运用离差最大化方法进行多指标决策与排序[J].系统工程与电子技术,1998,20(7)：24-26.
    [53]王应明,张军奎.基于标准差和平均差的权系数确定方法及其应用[J].数理统计与管理.2003,22(3)：22-26.
    [54]王坚强.基于离差优化的信息不完全确定的多准则分类方法[J].控制与决策.2006,21(5)：513-516.
    .[55]任全,李为民,张文.客观赋权法指导下的部分权重信息多属性决定方法研究[J].数学的实践与认识,2004.34(7)：86-90.
    [56]李正义,曾雪兰,覃菊莹.离差最大化特征加权模糊C-划分的聚类分析[J].模糊系统与数学,2008,22(4)：171-172.
    [57]张群,熊英,黄庆炬.基于蚁群算法的数据挖掘方法研究[J].湖北工业大学学报,2007,22(2)：5-9.
    [58]Zadeh L A. Fuzzy sets. Inf Cont,1965,8:338-353.
    [59]Ruspini E H. A new approach to clustering. Inf Cont,1969,15:22-32.
    [60]高新波,谢维信.模糊聚类理论发展及应用的研究发展[J].科学通报,1999,44(21).
    [61]Tamura S, Higuchi S, Tanaka K. Pattern classification based on fuzzy relations. IEEE SMC,1971,1(1):217-242
    [62]Zkim Le. Fuzzy relation compositions and pattern recognition. Inf Sci,1996,89:107～130
    [63]Wu Z, Leathy R. An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE PAMI,1993,15(11): 1101-1113.
    [64]J. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York,1981.
    [65]宫改云,高新波,伍忠东.FCM聚类算法中模糊加权指数m的优先方法[J].模糊系统与数学,2005.19(1)：144.
    [66]安良,胡男,良梅等.一种改进的模糊C均值(FCM)聚类算法[J].合肥工业大学学报,2003,26(3)：354-358.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700