基于证据理论的聚类集成方法

英文篇名：Clustering ensemble method based on belief function theory
作者：李锋 ; 李寿梅 ; Thierry ; Denoeux
英文作者：LI Feng;LI Shoumei;Denoeux Thierry;College of Applied Sciences,Beijing University of Technology;Centre National de la Recherche Scientifique,Sorbonne Universités,Universitéde Technologie de Compiègne;
关键词：证据理论 ; 聚类集成 ; 关系表示 ; 互相关矩阵 ; 传递闭包
英文关键词：belief function;;clustering ensemble;;relational representation;;co-association matrix;;transitive closure
中文刊名：NJXZ
英文刊名：Journal of Nanjing University of Information Science & Technology(Natural Science Edition)
机构：北京工业大学应用数理学院;索邦大学联盟贡比涅技术大学/国家科学研究中心;
出版日期：2019-05-28
出版单位：南京信息工程大学学报(自然科学版)
年：2019
期：v.11;No.61
基金：国家自然科学基金(11571024);; 2018年北京工业大学研究生外培计划
语种：中文;
页：NJXZ201903013
页数：8
CN：03
ISSN：32-1801/N
分类号：96-103

摘要

单个聚类方法得到的结果会存在不稳定性等问题,为了克服这些问题,本文在证据理论(又称为信任函数理论)的基础上提出了一种新的聚类集成方法.多数情况下,聚类集成方法主要包含2个关键步骤:得到一组基划分,以及结合基划分得到最终聚类结果,本文的方法重点考虑第2步.在第1步得到基划分之后,将其转换成一种中间表示,可以称这种中间表示为关系表示.在证据理论中,我们认为得到的关系表示是不可靠的,可以用折扣过程对关系表示进行预处理,然后就可以用不同的结合法则融合关系表示.从融合后的关系表示中提取信任矩阵或似然矩阵,将其视为样本间的互相关矩阵.为了能够充分利用样本间的传递性,将得到的互相关矩阵视为一个模糊关系,对其做传递闭包处理,从而得到一个模糊等价关系.将模糊的等价关系视为新的相似性数据,用能够处理相似性数据的聚类方法得到最终的结果.通过实验,表明了该聚类集成方法的稳定性和有效性.
To overcome the instability of one single clustering result,we propose a new clustering ensemble method based on Dempster-Shafer theory(also known as belief function theory).In general,ensemble methods consist of two principal steps:generating base partitions and combining them into a single one;our method mainly focuses on the second step.After obtaining the base partitions in the first step,we convert them into an intermediate interpretation,which can be called a relational representation.We believe that the evidence source from the relational representations may be doubtful,which can be fixed by using the discounting process in belief function theory.After discounting the relational representations,we can combine them in the evidential level by different combination rules.Then,we can obtain the belief matrix or plausibility matrix from the fused relational representation,which can be seen as a co-association matrix between objects.To make full use of the transitive property between objects,we treat this co-association matrix as a fuzzy relation and make it the transitive closure to yield a fuzzy equivalence relation.The final partition is obtained by applying some clustering algorithms to the new co-association matrix.The experimental results show the stability and efficiency of our method.

引文

[1] Denoeux T,Masson M H.EVCLUS:evidential clustering of proximity data[J].IEEE Transactions on Systems,Man and Cybernetics,Part b (Cybernetics),2004,34(1):95-109
    [2] Masson M H,Den?ux T.ECM:an evidential version of the fuzzy c-means algorithm[J].Pattern Recognition,2008,41(4):1384-1397
    [3] Masson M H,Den?ux T.RECM:relational evidential c-means algorithm[J].Pattern Recognition Letters,2009,30(11):1015-1026
    [4] Antoine V,Quost B,Masson M H,et al.CEVCLUS:evidential clustering with instance-level constraints for relational data[J].Soft Computing,2014,18(7):1321-1335
    [5] Masson M H,Denoeux T.Ensemble clustering in the belief functions framework[J].International Journal of Approximate Reasoning,2011,52(1):92-109
    [6] Den?ux T,Masson M H.Evidential reasoning in large partially ordered sets[J].Annals of Operations Research,2012,195(1):135-161
    [7] Li F J,Qian Y H,Wang J T,et al.Multigranulation information fusion:a dempster-shafer evidence theory based clustering ensemble method[C]//2015 International Conference on Machine Learning and Cybernetics (ICMLC),Guangzhou,China,2015:58-63
    [8] Denoux T,Li S M,Sriboonchitta S.Evaluating and comparing soft partitions:an approach based on Dempster–Shafer theory[J].IEEE Transactions on Fuzzy Systems,2018,26(3):1231-1244
    [9] Shafer G.A mathematical theory of evidence[M].Princeton:Princeton University Press,1976
    [10] Ghaemi R,Sulaiman M N,Ibrahim H.A survey:clustering ensembles techniques [C]//Proceedings of World Academy of Science Engineering and Technology,2009:109-114
    [11] Vega-Pons S,Ruiz-Shulcloper J.A survey of clustering ensemble algorithms[J].International Journal of Pattern Recognition and Artificial Intelligence,2011,25(3):337-372
    [12] Zhan J M,Chen J T,Xing J Q.Research advance of clustering ensemble algorithm[C]//2017 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR),Ningbo,China,2017:109-114
    [13] Akbari E,Mohamed Dahlan H,Ibrahim R,et al.Hierarchical cluster ensemble selection[J].Engineering Applications of Artificial Intelligence,2015,39:146-156
    [14] Strehl A,Ghosh J.Cluster ensembles:a knowledge reuse framework for combining multiple partitions[J].Journal of Machine Learning Research,2002:583-617
    [15] Dudoit S,Fridlyand J.Bagging to improve the accuracy of a clustering procedure[J].Bioinformatics,2003,19(9):1090-1099
    [16] Fischer B,Buhmann J M.Bagging for path-based clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(11):1411-1415
    [17] Fred A L N,Jain A K.Data clustering using evidence accumulation[C]//.Object Recognition Supported by User Interaction for Service Robots,Quebec City,Quebec,Canada,2002:276-280
    [18] Fred A L N,Jain A K.Combining multiple clusterings using evidence accumulation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(6):835-850
    [19] Iam-On N,Boongoen T,Garrett S.Refining pairwise similarity matrix for cluster ensemble problem with cluster relations[M]//Iam-On N,Boongoen T,Garrett S.Discovery Science.Berlin,Heidelberg:Springer Berlin Heidelberg,2008:222-233
    [20] Vega-Pons S,Ruiz-Shulcloper J.Clustering ensemble method for heterogeneous partitions[M]//Vega-Pons S,Ruiz-Shulcloper J.Progress in Pattern Recognition,Image Analysis,Computer Vision,and Applications.Berlin,Heidelberg:Springer Berlin Heidelberg,2009:481-488
    [21] Wang X,Yang C N,Zhou J.Clustering aggregation by probability accumulation[J].Pattern Recognition,2009,42(5):668-675
    [22] Yang L,LV H,Wang W.Soft cluster ensemble based on fuzzy similarity measure [C]//The Proceedings of the Multiconference on Computational Engineering in Systems Applications,2006:1994-1997
    [23] Dubois D,Prade H.Fundamentals of fuzzy sets[M].Boston,MA:Springer US,2000.DOI:10.1007/978-1-4615-4429-6
    [24] Hubert L,Arabie P.Comparing partitions[J].Journal of Classification,1985,2(1):193-218
    [25] Dua D,Graff C.UCI machine learning repository[D].Irvine,CA:University of California,School of Information and Computer Science,2019
    [26] Bezdek J C,Ehrlich R,Full W.FCM:the fuzzy c-means clustering algorithm[J].Computers & Geosciences,1984,10(2/3):191-203

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700