基于分类型矩阵对象数据的MD fuzzy k-modes聚类算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A MD fuzzy k-modes Algorithm for Clustering Categorical Matrix-Object Data
  • 作者:李顺勇 ; 张苗苗 ; 曹付元
  • 英文作者:Li Shunyong;Zhang Miaomiao;Cao Fuyuan;School of Mathematical Sciences, Shanxi University;School of Computer and Information Technology, Shanxi University;
  • 关键词:矩阵对象数据 ; MD ; fuzzy ; k-modes算法 ; 相异性度量 ; 类中心 ; 聚类
  • 英文关键词:matrix-object data;;MD fuzzy k-modes algorithm;;dissimilarity measure;;cluster centers;;clustering
  • 中文刊名:JFYZ
  • 英文刊名:Journal of Computer Research and Development
  • 机构:山西大学数学科学学院;山西大学计算机与信息技术学院;
  • 出版日期:2019-06-15
  • 出版单位:计算机研究与发展
  • 年:2019
  • 期:v.56
  • 基金:国家自然科学基金项目(61573229);; 山西省基础研究计划项目(201701D121004);; 山西省回国留学人员科研资助项目(2017-020);; 山西省高等学校教学改革创新项目(J2017002)~~
  • 语种:中文;
  • 页:JFYZ201906020
  • 页数:13
  • CN:06
  • ISSN:11-1777/TP
  • 分类号:195-207
摘要
传统的聚类算法一般是对单值属性数据进行聚类.但在许多实际应用中,每个对象通常被多个特征向量所描述.例如,顾客在购物时可能同时购买多个产品.由多个特征向量描述的对象称为矩阵对象,由矩阵对象构成的数据集称为矩阵对象数据集.目前,针对矩阵对象数据聚类算法的研究相对较少,还有很多问题有待解决.利用fuzzy k-modes算法的聚类过程,提出一种基于矩阵对象数据的matrix-object data fuzzy k-modes(MD fuzzy k-modes)聚类算法.该算法结合模糊集的概念引入模糊因子β,重新定义了矩阵对象间的相异性度量,并给出类中心的启发式更新算法.最后,在5个真实数据集上验证了MD fuzzy k-modes算法的有效性,并分析了模糊因子β与隶属度w之间的关系.大数据时代,利用MD fuzzy k-modes算法对多条记录进行聚类,能更易发现顾客的消费偏好,从而做出更有针对性的推荐.
        Traditional algorithms generally cluster single-valued attributed data. However, in practice, each attribute of the data object is described by more than one feature vector. For example, customers may purchase multiple products at the same time as they shop. An object described by multiple feature vectors is called a matrix object and such data are called matrix-object data. At present, the research work on clustering algorithms for categorical matrix-object data is relatively rare, and there are still many issues to be settled. In this paper, we propose a new matrix-object data fuzzy k-modes(MD fuzzy k-modes) algorithm that uses the fuzzy k-modes clustering process to cluster categorical matrix-object data. In the proposed algorithm, we introduce the fuzzy factor β with the concept of fuzzy set. The dissimilarity measure between two categorical matrix-objects is redefined, and the heuristic updating algorithm of the cluster centers is provided. Finally, the effectiveness of the MD fuzzy k-modes algorithm is verified on the five real-world data sets, and the relationship between fuzzy factor β and membership w is analyzed. Therefore, in the era of big data, clustering multiple records by using the MD fuzzy k-modes algorithm can make it easier to find customers' spending habits and preferences, so as to make more targeted recommendation.
引文
[1]Macqueen J.Some methods for classification and analysis of multivariate observations[C] //Proc of the 5th Berkeley Symp on Mathematical Statistics and Probability.Oakland,CA:University of California Press,1967:281- 297
    [2]Huang Zhexue.Extensions to the k-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery,1998,2(3):283- 304
    [3]Chaturvedi A,Green P E,Caroll J D.K-modes clustering[J].Journal of Classification,2001,18(1):35- 55
    [4]Huang Zhexue,Ng M K.A note on k-modes clustering[J].Journal of Classification,2003,20(2):257- 261
    [5]Ying Sun,Zhu Qiuming,Chen Zhengxin.An iterative initial-points refinement algorithm for categorical data clustering[J].Pattern Recognition Letters,2002,23(7):875- 884
    [6]Ng M K,Li Mark Junjie,Huang Joshua Zhexue,et al.On the impact of dissimilarity measure in k-modes clustering algorithm[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2007,29(3):503- 507
    [7]San O M,Huynh V N,Nakamori Y.An alternative extension of the k-means algorithm for clustering categorical data[J].International Journal of Applied Mathematics & Computer Science,2004,14(2):241- 247
    [8]Li Cen,Biswas G.Unsupervised learning with mixed numeric and nominal data[J].IEEE Transactions on Knowledge & Data Engineering,2002,14(4):673- 690
    [9]Liang Jiye,Bai Liang,Cao Fuyuan.K-modes clustering algorithm based on a new distance measure[J].Journal of Computer Research and Development,2010,47(10):1749- 1755 (in Chinese)(梁吉业,白亮,曹付元.基于新的距离度量的K-modes聚类算法[J].计算机研究与发展,2010,47(10):1749- 1755)
    [10]Bai Liang,Liang Jiye.The k-modes type clustering plus between-cluster information for categorical data[J].Neurocomputing,2014,133(133):111- 121
    [11]Cao Fuyuan,Liang Jiye,Bai Liang,et al.A framework for clustering categorical time-evolving data[J].IEEE Transactions on Fuzzy Systems,2010,18(5):872- 882
    [12]Cao Fuyuan,Liang Jiye.A data labeling method for clustering categorical data[J].Expert Systems with Applications,2011,38(3):2381- 2385
    [13]Cao Fuyuan,Liang Jiye,Li Deyu,et al.A dissimilarity measure for the k-modes clustering algorithm[J].Knowledge-Based Systems,2012,26(9):120- 127
    [14]Cao Fuyuan,Liang Jiye,Li Deyu,et al.A weighting k-modes algorithm for subspace clustering of categorical data[J].Neurocomputing,2013,108(5):23- 30
    [15]Pawlak Z.Rough Sets:Theoretical Aspects of Reasoning about Data[M].Dordrecht,Netherland:Kluwer Academic Publishers,1992
    [16]Lehmann I,Weber R,Zimmermann H J.Fuzzy set theory[J].Operations-Research-Spektrum,1992,14(1):1- 9
    [17]Bezdek J C,Ehrlich R,Full W.FCM:The fuzzy c-means clustering algorithm[J].Computers & Geosciences,1984,10(2/3):191- 203
    [18]Huang Zhexue,Ng M K.A fuzzy k-modes algorithm for clustering categorical data[J].IEEE Transactions on Fuzzy Systems,1999,7(4):446- 452
    [19]Kim D W,Lee K H,Lee D.Fuzzy clustering of categorical data using fuzzy centroids[J].Pattern Recognition Letters,2004,25(11):1263- 1271
    [20]Li Jie,Gao Xinbo,Jiao Licheng.A new feature weighted fuzzy clustering algorithm[C] //Proc of the Int Workshop on Rough Sets.Berlin:Springer,2005:412- 420
    [21]Cai Weiling,Chen Songcan,Zhang Daoqiang.Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation[J].Pattern Recognition,2007,40(3):825- 838
    [22]Zhou Zhiping,Zhu Shuwei,Zhang Daowen.Multiobjective clustering algorithm with fuzzy centroids for categorical data[J].Journal of Computer Research and Development,2016,53(11):2594- 2606 (in Chinese)(周治平,朱书伟,张道文.分类数据的多目标模糊中心点聚类算法[J].计算机研究与发展,2016,53(11):2594- 2606)
    [23]Cao Fuyuan,Yu Liqin,Huang Joshua Zhexue,et al.k-mw-modes:An algorithm for clustering categorical matrix-object data[J].Applied Soft Computing,2017,57:605- 614
    [24]Cao Fuyuan,Huang Joshua Zhexue,Liang Jiye,et al.An algorithm for clustering categorical data with set-valued features[J].IEEE Transactions on Neural Networks & Learning Systems,2018,29(10):4593- 4606
    [25]Cao Fuyuan,Huang Joshua Zhexue,Liang Jiye,et al.A fuzzy SV-k-modes algorithm for clustering categorical data with set-valued attributes[J].Applied Mathematics & Computation,2017,295:1- 15
    [26]Pal N R,Bezdek J C.On cluster validity for the fuzzy c-means model[J].IEEE Transactions on Fuzzy Systems,1995,3(3):370- 379
    [27]Zhou Kaile,Fu Chao,Yang Shanlin.Fuzziness parameter selection in fuzzy c-means:The perspective of cluster validation[J].Science China Information Sciences,2014,57(11):1- 8

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700