Comparative study of matrix refinement approaches for ensemble clustering
详细信息    查看全文
  • 作者:Natthakan Iam-On (1)
    Tossapon Boongoen (2)

    1. School of Information Technology
    ; Mae Fah Luang University ; Chiang Rai ; 57100 ; Thailand
    2. Department of Mathematics and Computer Science
    ; Royal Thai Air Force Academy ; Bangkok ; 10220 ; Thailand
  • 关键词:Cluster ensemble ; Multiple clusterings ; Summarization ; Information matrix
  • 刊名:Machine Learning
  • 出版年:2015
  • 出版时间:January 2015
  • 年:2015
  • 卷:98
  • 期:1-2
  • 页码:269-300
  • 全文大小:4,245 KB
  • 参考文献:1. Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. / Social Networks, / 25(3), 211鈥?30. CrossRef
    2. Aittokallio, T. (2010). Dealing with missing values in large-scale studies: microarray data imputation and beyond. / Briefings in Bioinformatics, / 11(2), 253鈥?64. CrossRef
    3. Al-Razgan, M., Domeniconi, C., & Barbara, D. (2008). Random subspace ensembles for clustering categorical data. In / Supervised and unsupervised ensemble methods and their applications (pp. 31鈥?8). Berlin: Springer. CrossRef
    4. Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository.
    5. Avogadri, R., & Valentini, G. (2009). Fuzzy ensemble clustering based on random projections for DNA microarray data analysis. / Artificial Intelligence in Medicine, / 45, 173鈥?83. CrossRef
    6. Ayad, H., & Kamel, M. (2003). Finding natural clusters using multiclusterer combiner based on shared nearest neighbors. In / Proceedings of international workshop on multiple classifier systems (pp. 166鈥?75). CrossRef
    7. Azimi, J., & Fern, X. (2009). Adaptive cluster ensemble selection. In / Proceedings of international joint conference on artificial intelligence (pp. 992鈥?97).
    8. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: bagging, boosting, and variants. / Machine Learning, / 36, 105鈥?39. CrossRef
    9. Bezdek, J. (1981). / Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press. CrossRef
    10. Bezdek, J., & Hathaway, R. (1988). Recent convergence results for the fuzzy c-means clustering algorithms. / Journal of Classification, / 5(2), 237鈥?47. CrossRef
    11. Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: applications to image and text data. In / Proceedings of international conference on knowledge discovery and data mining (pp. 245鈥?50).
    12. Boongoen, T., Shen, Q., & Price, C. (2010). Disclosing false identity through hybrid link analysis. / Artificial Intelligence and Law, / 18(1), 77鈥?02. CrossRef
    13. Boulis, C., & Ostendorf, M. (2004). Combining multiple clustering systems. In / Proceedings of European conference on principles and practice of knowledge discovery in databases (pp. 63鈥?4).
    14. Breiman, L. (1996). Bagging predictors. / Machine Learning, / 24, 123鈥?40.
    15. Celton, M., Malpertuy, A., Lelandais, G., & de Brevern, A. G. (2010). Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. / BMC Genomics, / 11, 15. CrossRef
    16. Costa, J. A. F., & de Andrade Netto, M. (1999). Cluster analysis using self-organising maps and image processing techniques. In / Proceedings of IEEE international conference on systems, man, and cybernetics (Vol.聽5, pp. 367鈥?72).
    17. de Castro L. N. (2001). / Immune engineering: development of computational tools inspired by the artificial immune systems. Ph.D. thesis, DCA鈥擣EEC/UNICAMP, Campinas/SP, Brazil, Department of Computer Engineering and Industrial Automation, School of Electrical and Computer Engineering, State University of Campinas, Brazil.
    18. Domeniconi, C., & Al-Razgan, M. (2009). Weighted cluster ensembles: methods and analysis. / ACM Transactions on Knowledge Discovery from Data, / 2(4), 1鈥?0. CrossRef
    19. Domeniconi, C., Gunopulos, D., Yan, B., Al-Razgan, M., & Papadopoulos, D. (2007). Locally adaptive metrics for clustering high-dimensional data. / Data Mining and Knowledge Discovery, / 14(1), 63鈥?7. CrossRef
    20. Duda, R. O., Hart, P. E., & Stork, D. G. (2000). / Pattern classification (2nd ed.). New York: Wiley-Interscience.
    21. Dudoit, S., & Fridyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. / Bioinformatics, / 19(9), 1090鈥?099. CrossRef
    22. Dunn, J. C. (1974). Well separated clusters and optimal fuzzy partitions. / Cybernetics and Systems, / 4(1), 95鈥?04.
    23. Fern, X. Z., & Brodley, C. E. (2003). Random projection for high dimensional data clustering: a cluster ensemble approach. In / Proceedings of international conference on machine learning (pp. 186鈥?93).
    24. Fern, X. Z., & Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In / Proceedings of international conference on machine learning (pp. 36鈥?3).
    25. Fern, X. Z., & Lin, W. (2008). Cluster ensemble selection. / Statistical Analysis and Data Mining, / 1(3), 128鈥?41. CrossRef
    26. Fischer, B., & Buhmann, J. M. (2003). Bagging for path-based clustering. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 25(11), 1411鈥?415. CrossRef
    27. Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. / Machine Learning, / 2, 139鈥?72.
    28. Fouss, F., Pirotte, A., Renders, J. M., & Saerens, M. (2007). Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. / IEEE Transactions on Knowledge and Data Engineering, / 19(3), 355鈥?69. CrossRef
    29. Fred, A. L. N., & Jain, A. K. (2002). Data clustering using evidence accumulation. In / Proceedings of international conference on pattern recognition (pp. 276鈥?80).
    30. Fred, A. L. N., & Jain, A. K. (2003). Robust data clustering. In / Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 128鈥?36).
    31. Fred, A. L. N., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 27(6), 835鈥?50. CrossRef
    32. Fred, A. L. N., & Jain, A. K. (2006). Learning pairwise similarity for data clustering. In / Proceedings of international conference on pattern recognition (pp. 925鈥?28).
    33. Getoor, L., & Diehl, C. P. (2005). Link mining: a survey. / ACM SIGKDD Explorations Newsletter, / 7(2), 3鈥?2. CrossRef
    34. Ghosh, J., & Acharya, A. (2011). Cluster ensembles. / Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, / 1(4), 305鈥?15.
    35. Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. / ACM Transactions on Knowledge Discovery from Data, / 1(1), 4-ex. CrossRef
    36. Hadjitodorov, S. T., Kuncheva, L. I., & Todorova, L. P. (2006). Moderate diversity for better cluster ensembles. / Information Fusion, / 7(3), 264鈥?75. CrossRef
    37. Han, J., & Kamber, M. (2000). / Data mining: concepts and techniques (1st ed.). San Mateo: Morgan Kaufmann.
    38. Hu, X., & Yoo, I. (2004). Cluster ensemble and its applications in gene expression analysis. In / Proceedings of Asia-Pacific bioinformatics conference (pp. 297鈥?02).
    39. Iam-On, N., Boongoen, T., & Garrett, S. (2008). Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In / Proceedings of eleventh international conference on discovery science (pp. 222鈥?33).
    40. Iam-On, N., Boongoen, T., & Garrett, S. (2010). LCE: a link-based cluster ensemble method for improved gene expression data analysis. / Bioinformatics, / 26(12), 1513鈥?519. CrossRef
    41. Iam-On, N., Boongoen, T., Garrett, S., & Price, C. (2011). A link-based approach to the cluster ensemble problem. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 33(12), 2396鈥?409. CrossRef
    42. Iam-On, N., Boongoen, T., Garrett, S., & Price, C. (2012). A link-based approach to cluster ensemble approach for categorical data clustering. / IEEE Transactions on Knowledge and Data Engineering, / 24(3), 413鈥?25. CrossRef
    43. Jain, A. K., & Dubes, R. C. (1998). / Algorithms for clustering. Englewood Cliffs: Prentice-Hall.
    44. Jain, A. K., Duin, R., & Mao, J. (2000). Statistical pattern recognition: a review. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 22(1), 4鈥?7. CrossRef
    45. Jeh, G., & Widom, J. (2002). SimRank: a measure of structural-context similarity. In / Proceedings of international conference on knowledge discovery and data mining (pp. 538鈥?43).
    46. Karypis, G., & Kumar, V. (1998). Multilevel k-way partitioning scheme for irregular graphs. / Journal of Parallel and Distributed Computing, / 48(1), 96鈥?29. CrossRef
    47. Kaufman, L., & Rousseeuw, P. J. (1990). / Finding groups in data: an introduction to cluster analysis. New York: Wiley. CrossRef
    48. Kim, D. W., Lee, K. Y., & Lee, K. H. (2007). Towards clustering of incomplete microarray data without the use of imputation. / Bioinformatics, / 23(1), 107鈥?13. CrossRef
    49. Kim, E., Kim, S., Ashlock, D., & Nam, D. (2009). MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering. / BMC Bioinformatics, / 10, 260. CrossRef
    50. Kim, K., & Ahn, H. (2008). A recommender system using GA K-means clustering in an online shopping market. / Expert Systems with Applications / 34, 1200鈥?209. CrossRef
    51. Kim, S., & Lee, J. (2007). Ensemble clustering method based on the resampling similarity measure for gene expression data. In / Statistical methods in medical research (pp. 1鈥?6).
    52. Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 20(3), 226鈥?39. CrossRef
    53. Kuncheva, L. I. (2006). Experimental comparison of cluster ensemble methods. In / Proceedings of international conference on fusion (pp. 105鈥?15).
    54. Kuncheva, L. I., & Hadjitodorov, S. T. (2004). Using diversity in cluster ensembles. In / Proceedings of the IEEE international conference on systems, man & cybernetics (pp. 1214鈥?219).
    55. Kuncheva, L. I., & Vetrov, D. (2006). Evaluation of stability of k-means cluster ensembles with respect to random initialization. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 28(11), 1798鈥?808. CrossRef
    56. Lam, L., & Suen, C. Y. (1997). Application of majority voting to pattern recognition: an analysis of its behavior and performance. / IEEE Transactions on Systems, Man and Cybernetics, / 22(5), 553鈥?68. CrossRef
    57. Law, M., Topchy, A., & Jain, A. K. (2004). Multiobjective data clustering. In / Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 424鈥?30).
    58. Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. / Journal of the American Society for Information Science and Technology, / 58(7), 1019鈥?031. CrossRef
    59. Lin, Z., King, I., & Lyu, M. R. (2006). Pagesim: a novel link-based similarity measure for the world wide web. In / Proceedings of IEEE/WIC/ACM international conference on web intelligence (pp. 687鈥?93).
    60. Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. / SIAM News, / 23(5), 1鈥?8.
    61. McQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In / Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (pp. 281鈥?97).
    62. Minaei-Bidgoli, B., Topchy, A., & Punch, W. (2004). A comparison of resampling methods for clustering ensembles. In / Proceedings of the international conference on machine learning: models, technologies and applications (pp. 939鈥?45).
    63. Minkov, E., Cohen, W. W., & Ng, A. Y. (2006). Contextual search and name disambiguation in email using graphs. In / Proceedings of international ACM SIGIR conference on research and development in information retrieval (pp. 27鈥?4).
    64. Mirkin, B. (2001). Reinterpreting the category utility function. / Machine Learning, / 45, 219鈥?28. CrossRef
    65. Monti, S., Tamayo, P., Mesirov, J. P., & Golub, T. R. (2003). Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. / Machine Learning, / 52(1鈥?), 91鈥?18. CrossRef
    66. Naldi, M. C., Carvalho, A. C., & Campello, R. B. (2013). Cluster ensemble selection based on relative validity indexes. / Data Mining and Knowledge Discovery. doi:10.1007/s10618-012-0290-x .
    67. Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. / Advances in Neural Information Processing Systems, / 14, 849鈥?56.
    68. Nguyen, N., & Caruana, R. (2007). Consensus clusterings. In / Proceedings of IEEE international conference on data mining (pp. 607鈥?12).
    69. Punera, K., & Ghosh, J. (2007). Soft cluster ensembles. In / Advances in fuzzy clustering and its applications (pp. 69鈥?0). New York: Wiley. CrossRef
    70. Reuther, P., & Walter, B. (2006). Survey on test collections and techniques for personal name matching. / International Journal on Metadata, Semantics and Ontologies, / 1(2), 89鈥?9. CrossRef
    71. Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. / Journal of Computational and Graphical Statistics, / 15(1), 118鈥?38. CrossRef
    72. Shi, T., Seligson, D., Belldegrun, A. S., Palotie, A., & Horvath, S. (2005). Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma. / Modern Pathology, / 18(4), 547鈥?57. CrossRef
    73. Strehl, A., & Ghosh, J. (2002). Cluster ensembles: a knowledge reuse framework for combining multiple partitions. / Journal of Machine Learning Research, / 3, 583鈥?17.
    74. Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., & Kellam, P. (2004). Consensus clustering and functional interpretation of gene-expression data. / Genome Biology, / 5, R94. CrossRef
    75. Tan, P. N., Steinbach, M., & Kumar, V. (2005). / Introduction to data mining. Reading: Addison Wesley.
    76. Tijms, H. (2004). / Understanding probability: chance rules in everyday life. Cambridge: Cambridge University Press.
    77. Topchy, A. P., Jain, A. K., & Punch, W. F. (2003). Combining multiple weak clusterings. In / Proceedings of IEEE international conference on data mining (pp. 331鈥?38). CrossRef
    78. Topchy, A. P., Jain, A. K., & Punch, W. F. (2004a). A mixture model for clustering ensembles. In / Proceedings of SIAM international conference on data mining (pp. 379鈥?90).
    79. Topchy, A. P., Law, M. H. C., Jain, A. K., & Fred, A. L. (2004b). Analysis of consensus partition in cluster ensemble. In / Proceedings of IEEE international conference on data mining (pp. 225鈥?32).
    80. Topchy, A. P., Jain, A. K., & Punch, W. F. (2005). Clustering ensembles: models of consensus and weak partitions. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 27(12), 1866鈥?881. CrossRef
    81. Wu, R. C., Chen, R. S., Chang, C. C., & Chen, J. Y. (2005). Data mining application in customer relationship management of credit card business. In / Proceedings of international conference on computer software and applications (pp. 39鈥?0).
    82. Xue, H., Chen, S., & Yang, Q. (2009). Discriminatively regularized least-squares classification. / Pattern Recognition, / 42(1), 93鈥?04. CrossRef
    83. Yu, Z., Wong, H. S., & Wang, H. (2007). Graph-based consensus clustering for class discovery from gene expression data. / Bioinformatics, / 23(21), 2888鈥?896. CrossRef
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Automation and Robotics
    Computing Methodologies
    Simulation and Modeling
    Language Translation and Linguistics
  • 出版者:Springer Netherlands
  • ISSN:1573-0565
Cluster ensembles or consensus clusterings have been shown to be better than any standard clustering algorithm at improving accuracy and robustness across various sets of data. This meta-learning formalism also helps users to overcome the dilemma of selecting an appropriate technique and the parameters for that technique. Since founded, different research areas have emerged with the common purpose of enhancing the effectiveness and applicability of cluster ensembles. These include the selection of ensemble members, the imputation of missing values, and the summarization of ensemble members. In particular, this paper is set to provide the review of different matrix refinement approaches that have been recently proposed in the literature for summarizing information of multiple clusterings. With various benchmark datasets and quality measures, the comparative study of these novel techniques is carried out to provide empirical findings from which a practical guideline can be drawn.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700