图论在聚类分析中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文讨论聚类分析中的系统聚类法、模糊聚类法和灰色聚类法,着重探讨其聚类的图论方法。
     第一章 介绍聚类分析的基本知识。
     第二章 讨论系统聚类分析及其中著名的最短距离法,指出了最短距离法即为图论中求赋权连通图最小支撑树Kruskal算法的变形,得到了最短距离法的优良性质:
     定理2.1对最短距离法和k=n-1,n-2,…,1成立。它说明最短距离法得到的系统H=(P_n,P_(n-1),…,P_1)对每个m=n-1,n-2,…,2,P_m既是分离性最好的m-剖分,又是相似性最好的m-剖分。
     第三章讨论模糊聚类分析及其常用的传递闭包法,指出这种聚类方法的复杂性,定义了模糊关系图G=(X,E,(?))及其中两点u,v间的连通强度S(u,v),证明了S是X上的模糊等价关系、G=(X,E,(?))的最大支撑树具有如下性质:
     定理3.2 设T是模糊关系图G=(X,E,(?))的一棵支撑树,则以下陈述等价:
     (1)T是G的最大支撑树
     (2)对于任意的e'∈E(T),从T中移出e'得T_1,T_2,则w(e')=(?)w(e);
     (3)对任意的u,v∈X(u≠v),T中u,v之间的惟一路是G中最优(u,v)路。并建立了G=(X,E,(?))中任两点的连通强度和(?)的传递闭包t((?))之间的关系:
    
    定理“.“设X二!xl,花,…,xn}
    的传递闭包t(尽)=尽”=(‘(”,),、。
    图,则
    ,尽=(动nx,是X上的模糊相似关系,尽
    G=(X,E,豹是与相对应的模糊关系
    t(B从,x’)二稍=s(x,.,xj.)·
    由此仿照图论中求赋权连通图最小支撑树的Dijkstra算法,给出了模糊聚类的
    最大支撑树算法如下:
    设模糊相似矩阵尽=(份)nx,,聚类水平为兄,令人表示树的结点集,记
    份二r(气,xj
    (l)任选一点vl:X,记鸿={vl},令l(vl)=o,P(vl)==。,l(v):=r(vl,v),
    P(v):=vl(丫,。X、鸿),k:=
    (2)算
    z(v)誊z(v·),令凡十,一人。{v·},对每个;。x\人*,,若
     r(v’,v)>l(v),则l(v):=r(v’,v),P(v):=v’.
     (3)若k二n一1,据指针P(.)倒向追踪得最大支撑树T;T去掉w(e)<兄的
     所有边e后的各连通分支即为X在之水平上的一个分类,停,否则,-
     令k片k+1,回(2).
    而且其复杂性仅为口(矛),最后通过实例进行了验证.
     第四章讨论灰色关联聚类分析及其传统的逐行检查法,分析用这种方法进
    行灰色关联聚类的复杂性为口(m4).仿照模糊聚类分析的最大支撑树方法中连
    通强度的定义和相关结论,给出了赋权图G=(犷,E,D)中两点u,v间的连通强度
    S(。,‘)及其最大支撑树的一个性质·定义了G=(V,E,D)的兄一截图叹和连通
    闭包G两个概念,得到了如下结论:
    
    定理4.2设赋权图G=(代E,D),则任取兄任[0,l],G;=认.
    由此分析说明了指标集X中两点u,v在兄水平上能够归为一类,等价于它们在
    关联图G=(X,E,A)的最大支撑树T的兄一截图瓜中连通·据此建立了灰色关
    联聚类的最大支撑树方法,其复杂性仅为口(mZ),有效降低了运算的复杂性,最
    后通过实例进行了验证.
     关键词:系统聚类;模糊聚类;灰色关联聚类;最小支撑树;最大支撑树
     中图分类号:O巧7.6
In this paper, we discuss hierarchical clustering methods, fuzzy clustering methods and grey clustering methods in cluster analysis, especially graph theoretic approach.Chapter one introduces the basic knowledge of cluster analysis.In Chapter two we discuss the hierarchical clustering analysis and the famous minimum distance method, and suggest that the method is a transformation of the Kruskal algorithm for the minimum spanning tree of a weighted connected graph in graph theory, obtain a good property of minimum distance method:Theorem 2.1 for minimum distance method and k = n-1,n-2,...,1.It implies that any Pm for m=n~1, n-2,...,2 in the system H = (Pn,Pn-1,...,P1)obtained by the minimum distance method is not only a separation best m -partition, but also a similarity best m -partition.In Chapter three we discuss fuzzy clustering analysis and the well-known transitive closure method, indicate the complexity of the method, define theconnected intensity S(u,v) between vertices u,v in the fuzzy relation graph G = (X,E,R), prove that S is a fuzzy equivalence relation and the maximum
    
    spanning tree of G=(X,E, R) has the following properties:Theorem 3. 2 If Tis a spanning tree of fuzzy relation graph G = (X,E,R),then the following statements are equivalent:(1) T is a maximum spanning tree of G;(2) For any e' ∈ E(T), let T1, T2 be the two components of T - e', then(3) For any u,v∈(u≠v), the unique path L between u,v in T is an optimal (u,v) path of G.and develop the relation between the connected intensity in G = (X,E,R) and the transitive closure t(R) of R :Theorem 3.3 If X = {X1,X2,...,Xn} , R = (rij)n×n is a fuzzy resembling relation of X, the transitive closure of R is t(R) = Rn =(rij(n))n×n,G=(X,E,R) is the responding fuzzy relation graph of (X, R), thenFrom above we imitate the Dijkstra algorithm for the minimum spanning tree of a weighted connected graph, give an algorithm of the fuzzy clustering with maximum spanning tree:If is a fuzzy resembling matrix and X is the clustering level, letting Ak be a set of vertices in the tree and rij = r(xi ,xj), then(1) Choose a vertex v1∈X, let A1 = {v1}, l(v1) = 0, p(v1) = , l(v):=r(v1,v),(2) Calculate max for any if
    
    r(v*,v)>l(v), then l(v):=r(v*,v) and p(v):=v*.(3) If k=n-1, use the point function p(·) to find out back tracking a maximum spanning tree T of G; each connected component of T with all edges e of w(e) < λ removed is a cluster of X on the level λ, stop.Otherwise, let k := k +1 and return to (2). The complexity of the algorithm is only O(n2). We check the algorithm by anexample finally.In Chapter four we discuss grey correlation clustering analysis and the traditional line-by-line check-up method, point out that the complexity of the methodis O(m4), by imitating the definitions of the connected intensity and related results in the maximum spanning tree algorithm of fuzzy clustering analysis, we give the definition of connected intensity in a weighted graph G = (V, E, D) and property of its maximum spanning tree. Furthermore we define the λ - cross graph Gλ and the connected closure G of G = (V,E, D), obtain a property of Gλand G:Theorem 4. 2 If G = (V,E,D) is a weighted graph, then for any λ∈[0,l],-By which we indicate that vertices u and v in X clustered together in level λ, if and only if they are connected in the λ- cross graph Tλ of a maximumspanning tree T of the correlation graph G = (X, E, A). From this we develop the maximum spanning tree algorithm of grey correlation clustering analysis with the complexity is only O(m), which reduce the amount of operation effectively. Finally we check the algorithm by an example.
引文
[1] 张尧庭,方开泰,多元统计分析引论[M].北京:科学出版社,1999.
    [2] 方开泰,聚类分析[J].数学的实践与认识,1978,(1),(2).
    [3] 罗积玉,邢瑛,经济统计分析方法及预测[M].北京:清华大学出版社,1987.
    [4] 肖位枢,模糊数学基础及应用[M].北京:航空工业出版社,1987.
    [5] 刘思峰,郭天榜,党耀国等,灰色系统理论及其应用[M].北京:科学出版社,1999.
    [6] 高敬振,图论在聚类分析中的应用[J].数学的实践与认识,1991,(3).
    [7] [美]J.A.邦迪,U.S.R.默蒂著,吴望名等译,图论及其应用[M].北京:科学出版社,1984.
    [8] [美]C.H.Papadimitriou,K.Steiglitz著,刘振宏,蔡茂诚译,组合最优化[M].北京:清华大学出版社,1988.
    [9] 彭祖赠,孙韫玉,模糊(Fuzzy)数学及其应用[M].武汉:武汉大学出版社,2002.
    [10] 李相缟,李洪兴,陈世权,汪培庄,模糊聚类分析及其应用[M].贵阳:贵州科学技术出版社,1994.
    [11] 杨留记,Fuzzy关系传递闭包的若干定理[J].模糊数学,1984,(1).
    [12] 彭祖赠,孙韫玉,Fuzzy聚类分析[J].模糊数学,1982,2(1).
    [13] 李秀珍,模糊聚类分析法及其应用[J].长沙大学学报,1999,(4).
    [14] 张韬,纪德云,模糊聚类分析法[J].沈阳大学学报(自然科学版),2000,(2).
    [15] 何清,模糊聚类分析理论与应用研究进展[J].模糊系统与数学,1998,(2).
    [16] 吴望名,弗晰图与弗晰树[J].数学的实践与认识,1980,(1).
    [17] 管梅谷,求最小树的破圈法[J].数学的实践与认识,1975,(4).
    [18] 朱剑英,应用模糊数学方法的若干关键问题及处理方法[J].模糊系统与数学,1992,6(2).
    
    [19] 邓聚龙,灰色系统理论教程[M].武汉:华中理工大学出版社,1990.
    [20] 刘思峰,灰色系统理论的产生、发展及前沿动态,中国管理科学[J].2003,11.
    [21] 高敬振,灰色综合评判的数学模型[J].系统工程理论与实践,1994,14(12).
    [22] 高敬振,灰色线性规划最优解与最优值的漂移[J].山东师范大学学报(自然科学版),2004,19(2).
    [23] 陈世联,段万春,绝对关联度及其应用[J].系统工程理论与实践,1998,(6).
    [24] 何文章,郭鹏,关于灰色关联度中的几个问题的探讨[J].数理统计与管理,1999,18(3).
    [25] 张绍良,张国良.灰色关联度计算方法比较及其存在问题分析[J].系统工程,1996,14(3).
    [26] 吴元奇,冯荣扬,聚类分析计算方法的理论及结果比较[J].湛江海洋大学学报,2002,(1).
    [27] 孙效功,杨作升,基于灰色关联度的聚类分析方法[J].青岛海洋大学学报,1995,(2).
    [28] 肖新平,灰色聚类分析法的改进及其应用[J].系统工程,1996,16(2).
    [29] 邱学军,灰色聚类关联分析法及其应用[J].系统工程理论与实践,1995,(1).
    [30] Dang Yaoguo, Valuation of Grey Clustering with Human Experience[J]. The Journal of Grey System, 1995, Vol. 7.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700