A probabilistic framework for optimizing projected clusters with categorical attributes
详细信息    查看全文
  • 作者:LiFei Chen
  • 关键词:projective clustering ; projected cluster ; categorical data ; probabilistic framework ; kernel density estimation ; attribute weighting ; 072104 ; 鎶曞奖鑱氱被 ; 绫诲睘鏁版嵁 ; 姒傜巼鏋勬灦 ; 鏍稿瘑搴︿及璁?/li> 灞炴€у姞鏉?/li>
  • 刊名:SCIENCE CHINA Information Sciences
  • 出版年:2015
  • 出版时间:July 2015
  • 年:2015
  • 卷:58
  • 期:7
  • 页码:1-15
  • 全文大小:535 KB
  • 参考文献:1.Aggarwal C C, Procopiuc C, Wolf J L, et al. Fast algorithm for projected clustering. ACM SIGMOD Rec, 1999, 28: 61鈥?2View Article
    2.Moise G, Sander J, Ester M. Robust projected clustering. Knowl Inf Syst, 2008, 14: 273鈥?98MATH View Article
    3.Chen L, Jiang Q, Wang S. Model-based method for projective clustering. IEEE Trans Knowl Data Eng, 2012, 24: 1291鈥?305View Article
    4.Huang J Z, Ng M K, Rong H, et al. Automated variable weighting in k-means type clustering. IEEE Trans Patt Anal Mach Intell, 2005, 27: 657鈥?68View Article
    5.Poon L, Zhang N, Chen T, et al. Variable selection in model-based clustering: to do or to facilitate. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, 2010. 887鈥?94
    6.Light R J, Marglin B H. An analysis of variance for categorical data. J Am Stat Assoc, 1971, 66: 534鈥?44MATH View Article
    7.San O M, Huynh V N, Nakamori Y. An alternative extension of the k-means algorithm for clustering categorical data. Int J Appl Math Comput Sci, 2004, 14: 241鈥?47MATH MathSciNet
    8.Huang Z. Extensions to the k-means algorithm for clustering large data sets with categorical value. Data Min Knowl Discov, 1998, 2: 283鈥?04View Article
    9.Chan E Y, Ching W K, Ng M K, et al. An optimization algorithm for clustering using weighted dissimilarity measures. Patt Recogn, 2004, 37: 943鈥?52MATH View Article
    10.Bai L, Liang J, Dang C, et al. A novel attribute weighting algorithm for clustering high-dimensional categorical data. Patt Recogn, 2011, 44: 2843鈥?861MATH View Article
    11.Xiong T, Wang S, Mayers A, et al. DHCC: divisive hierarchical clustering of categorical data. Data Min Knowl Discov, 2012, 24: 103鈥?35MATH MathSciNet View Article
    12.Chen L, Wang S. Central clustering of categorical data with automated feature weighting. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, 2013. 1260鈥?266
    13.Cao F, Liang J, Li D, et al. A Weighting k-modes algorithm for subspace clustering of categorical data. Neurocomputing, 2013, 108: 23鈥?0View Article
    14.Boriah S, Chandola V, Kumar V. Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the 8th SIAM International Conference on Data Mining, Atlanta, 2008. 243鈥?54
    15.Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newslett, 2004, 6: 90鈥?05View Article
    16.Gan G, Wu J. Subspace clustering for high dimensional categorical data. ACM SIGKDD Explor Newslett, 2004, 6: 87鈥?4View Article
    17.Bai L, Liang J, Dang C, et al. The impact of cluster representatives on the convergence of the k-modes type clustering. IEEE Trans Patt Anal Mach Intell, 2013, 35: 1509鈥?522View Article
    18.Sen P K. Gini diversity index, hamming distance and curse of dimensionality. Metron Int J Stat, 2005, LXIII: 329鈥?49
    19.Tao J, Chung F, Wang S. A kernel learning framework for domain adaptation learning. Sci China Inf Sci, 2012, 55: 1983鈥?007MATH MathSciNet View Article
    20.Ouyang D, Li Q, Racine J. Cross-validation and the estimation of probability distributions with categorical data. Nonparametr Stat, 2006, 18: 69鈥?00MATH MathSciNet View Article
    21.Li Q, Racine J S. Nonparametric Econometrics: Theory and Practice. Princeton: Princeton University Press, 2007
    22.Aitchison J, Aitken C. Multivariate binary discrimination by the kernel method. Biometrika, 1976, 63: 413鈥?20MATH MathSciNet View Article
    23.Hofmann T, Scholkopf B, Smola A J. Kernel methods in machine learning. Ann Stat, 2008, 36: 1171鈥?220MATH MathSciNet View Article
    24.Zhou K, Fu C, Yang S. Fuzziness parameter selection in fuzzy c-means: the perspective of cluster validation. Sci China Inf Sci, 2014, 57: 112206
    25.Jain A K, Murty M N, Flynn P J. Data clustering: a review. ACM Comput Surv, 1999, 31: 264鈥?23View Article
    26.Li T, Ma S, Ogihara M. Entropy-based criterion in categorical clustering. In: Proceedings of the 21st International Conference on Machine Learning, Alberta, 2004. 536鈥?43
    27.Wang K, Yan X, Chen L. Geometric double-entity model for recognizing far-near relations of clusters. Sci China Inf Sci, 2011, 54: 2040鈥?050MathSciNet View Article
  • 作者单位:LiFei Chen (1)

    1. School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, 350117, China
  • 刊物类别:Computer Science
  • 刊物主题:Chinese Library of Science
    Information Systems and Communication Service
  • 出版者:Science China Press, co-published with Springer
  • ISSN:1869-1919
文摘
The ability to discover projected clusters in high-dimensional data is essential for many machinelearning applications. Projective clustering of categorical data is currently a challenge due to the difficultiesin learning adaptive weights for categorical attributes coordinating with clusters optimization. In this paper,a probability-based learning framework is proposed, which allows both the attribute weights and the centerbasedclusters to be optimized by kernel density estimation on categorical attributes. A novel algorithm is thenderived for projective clustering on categorical data, based on the new learning approach for the kernel bandwidthselection problem. We show that the attribute weight substantially connects to the kernel bandwidth, whilethe optimized cluster center corresponds to the normalized frequency estimator of the categorical attributes.Experimental results on synthesis and real-world data show outstanding performance of the proposed method,which significantly outperforms state-of-the-art algorithms.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700