Comparative study of matrix refinement approaches for ensemble clustering

详细信息查看全文

作者：Natthakan Iam-On (1)
Tossapon Boongoen (2)

1. School of Information Technology ; Mae Fah Luang University ; Chiang Rai ; 57100 ; Thailand
2. Department of Mathematics and Computer Science ; Royal Thai Air Force Academy ; Bangkok ; 10220 ; Thailand
关键词：Cluster ensemble ; Multiple clusterings ; Summarization ; Information matrix
刊名：Machine Learning
出版年：2015
出版时间：January 2015
年：2015
卷：98
期：1-2
页码：269-300
全文大小：4,245 KB
参考文献：1. Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. / Social Networks, / 25(3), 211鈥?30. CrossRef
2. Aittokallio, T. (2010). Dealing with missing values in large-scale studies: microarray data imputation and beyond. / Briefings in Bioinformatics, / 11(2), 253鈥?64. CrossRef
3. Al-Razgan, M., Domeniconi, C., & Barbara, D. (2008). Random subspace ensembles for clustering categorical data. In / Supervised and unsupervised ensemble methods and their applications (pp. 31鈥?8). Berlin: Springer. CrossRef
4. Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
5. Avogadri, R., & Valentini, G. (2009). Fuzzy ensemble clustering based on random projections for DNA microarray data analysis. / Artificial Intelligence in Medicine, / 45, 173鈥?83. CrossRef
6. Ayad, H., & Kamel, M. (2003). Finding natural clusters using multiclusterer combiner based on shared nearest neighbors. In / Proceedings of international workshop on multiple classifier systems (pp. 166鈥?75). CrossRef
7. Azimi, J., & Fern, X. (2009). Adaptive cluster ensemble selection. In / Proceedings of international joint conference on artificial intelligence (pp. 992鈥?97).
8. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: bagging, boosting, and variants. / Machine Learning, / 36, 105鈥?39. CrossRef
9. Bezdek, J. (1981). / Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press. CrossRef
10. Bezdek, J., & Hathaway, R. (1988). Recent convergence results for the fuzzy c-means clustering algorithms. / Journal of Classification, / 5(2), 237鈥?47. CrossRef
11. Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: applications to image and text data. In / Proceedings of international conference on knowledge discovery and data mining (pp. 245鈥?50).
12. Boongoen, T., Shen, Q., & Price, C. (2010). Disclosing false identity through hybrid link analysis. / Artificial Intelligence and Law, / 18(1), 77鈥?02. CrossRef
13. Boulis, C., & Ostendorf, M. (2004). Combining multiple clustering systems. In / Proceedings of European conference on principles and practice of knowledge discovery in databases (pp. 63鈥?4).
14. Breiman, L. (1996). Bagging predictors. / Machine Learning, / 24, 123鈥?40.
15. Celton, M., Malpertuy, A., Lelandais, G., & de Brevern, A. G. (2010). Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. / BMC Genomics, / 11, 15. CrossRef
16. Costa, J. A. F., & de Andrade Netto, M. (1999). Cluster analysis using self-organising maps and image processing techniques. In / Proceedings of IEEE international conference on systems, man, and cybernetics (Vol.聽5, pp. 367鈥?72).
17. de Castro L. N. (2001). / Immune engineering: development of computational tools inspired by the artificial immune systems. Ph.D. thesis, DCA鈥擣EEC/UNICAMP, Campinas/SP, Brazil, Department of Computer Engineering and Industrial Automation, School of Electrical and Computer Engineering, State University of Campinas, Brazil.
18. Domeniconi, C., & Al-Razgan, M. (2009). Weighted cluster ensembles: methods and analysis. / ACM Transactions on Knowledge Discovery from Data, / 2(4), 1鈥?0. CrossRef
19. Domeniconi, C., Gunopulos, D., Yan, B., Al-Razgan, M., & Papadopoulos, D. (2007). Locally adaptive metrics for clustering high-dimensional data. / Data Mining and Knowledge Discovery, / 14(1), 63鈥?7. CrossRef
20. Duda, R. O., Hart, P. E., & Stork, D. G. (2000). / Pattern classification (2nd ed.). New York: Wiley-Interscience.
21. Dudoit, S., & Fridyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. / Bioinformatics, / 19(9), 1090鈥?099. CrossRef
22. Dunn, J. C. (1974). Well separated clusters and optimal fuzzy partitions. / Cybernetics and Systems, / 4(1), 95鈥?04.
23. Fern, X. Z., & Brodley, C. E. (2003). Random projection for high dimensional data clustering: a cluster ensemble approach. In / Proceedings of international conference on machine learning (pp. 186鈥?93).
24. Fern, X. Z., & Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In / Proceedings of international conference on machine learning (pp. 36鈥?3).
25. Fern, X. Z., & Lin, W. (2008). Cluster ensemble selection. / Statistical Analysis and Data Mining, / 1(3), 128鈥?41. CrossRef
26. Fischer, B., & Buhmann, J. M. (2003). Bagging for path-based clustering. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 25(11), 1411鈥?415. CrossRef
27. Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. / Machine Learning, / 2, 139鈥?72.
28. Fouss, F., Pirotte, A., Renders, J. M., & Saerens, M. (2007). Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. / IEEE Transactions on Knowledge and Data Engineering, / 19(3), 355鈥?69. CrossRef
29. Fred, A. L. N., & Jain, A. K. (2002). Data clustering using evidence accumulation. In / Proceedings of international conference on pattern recognition (pp. 276鈥?80).
30. Fred, A. L. N., & Jain, A. K. (2003). Robust data clustering. In / Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 128鈥?36).
31. Fred, A. L. N., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 27(6), 835鈥?50. CrossRef
32. Fred, A. L. N., & Jain, A. K. (2006). Learning pairwise similarity for data clustering. In / Proceedings of international conference on pattern recognition (pp. 925鈥?28).
33. Getoor, L., & Diehl, C. P. (2005). Link mining: a survey. / ACM SIGKDD Explorations Newsletter, / 7(2), 3鈥?2. CrossRef
34. Ghosh, J., & Acharya, A. (2011). Cluster ensembles. / Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, / 1(4), 305鈥?15.
35. Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. / ACM Transactions on Knowledge Discovery from Data, / 1(1), 4-ex. CrossRef
36. Hadjitodorov, S. T., Kuncheva, L. I., & Todorova, L. P. (2006). Moderate diversity for better cluster ensembles. / Information Fusion, / 7(3), 264鈥?75. CrossRef
37. Han, J., & Kamber, M. (2000). / Data mining: concepts and techniques (1st ed.). San Mateo: Morgan Kaufmann.
38. Hu, X., & Yoo, I. (2004). Cluster ensemble and its applications in gene expression analysis. In / Proceedings of Asia-Pacific bioinformatics conference (pp. 297鈥?02).
39. Iam-On, N., Boongoen, T., & Garrett, S. (2008). Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In / Proceedings of eleventh international conference on discovery science (pp. 222鈥?33).
40. Iam-On, N., Boongoen, T., & Garrett, S. (2010). LCE: a link-based cluster ensemble method for improved gene expression data analysis. / Bioinformatics, / 26(12), 1513鈥?519. CrossRef
41. Iam-On, N., Boongoen, T., Garrett, S., & Price, C. (2011). A link-based approach to the cluster ensemble problem. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 33(12), 2396鈥?409. CrossRef
42. Iam-On, N., Boongoen, T., Garrett, S., & Price, C. (2012). A link-based approach to cluster ensemble approach for categorical data clustering. / IEEE Transactions on Knowledge and Data Engineering, / 24(3), 413鈥?25. CrossRef
43. Jain, A. K., & Dubes, R. C. (1998). / Algorithms for clustering. Englewood Cliffs: Prentice-Hall.
44. Jain, A. K., Duin, R., & Mao, J. (2000). Statistical pattern recognition: a review. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 22(1), 4鈥?7. CrossRef
45. Jeh, G., & Widom, J. (2002). SimRank: a measure of structural-context similarity. In / Proceedings of international conference on knowledge discovery and data mining (pp. 538鈥?43).
46. Karypis, G., & Kumar, V. (1998). Multilevel k-way partitioning scheme for irregular graphs. / Journal of Parallel and Distributed Computing, / 48(1), 96鈥?29. CrossRef
47. Kaufman, L., & Rousseeuw, P. J. (1990). / Finding groups in data: an introduction to cluster analysis. New York: Wiley. CrossRef
48. Kim, D. W., Lee, K. Y., & Lee, K. H. (2007). Towards clustering of incomplete microarray data without the use of imputation. / Bioinformatics, / 23(1), 107鈥?13. CrossRef
49. Kim, E., Kim, S., Ashlock, D., & Nam, D. (2009). MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering. / BMC Bioinformatics, / 10, 260. CrossRef
50. Kim, K., & Ahn, H. (2008). A recommender system using GA K-means clustering in an online shopping market. / Expert Systems with Applications / 34, 1200鈥?209. CrossRef
51. Kim, S., & Lee, J. (2007). Ensemble clustering method based on the resampling similarity measure for gene expression data. In / Statistical methods in medical research (pp. 1鈥?6).
52. Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 20(3), 226鈥?39. CrossRef
53. Kuncheva, L. I. (2006). Experimental comparison of cluster ensemble methods. In / Proceedings of international conference on fusion (pp. 105鈥?15).
54. Kuncheva, L. I., & Hadjitodorov, S. T. (2004). Using diversity in cluster ensembles. In / Proceedings of the IEEE international conference on systems, man & cybernetics (pp. 1214鈥?219).
55. Kuncheva, L. I., & Vetrov, D. (2006). Evaluation of stability of k-means cluster ensembles with respect to random initialization. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 28(11), 1798鈥?808. CrossRef
56. Lam, L., & Suen, C. Y. (1997). Application of majority voting to pattern recognition: an analysis of its behavior and performance. / IEEE Transactions on Systems, Man and Cybernetics, / 22(5), 553鈥?68. CrossRef
57. Law, M., Topchy, A., & Jain, A. K. (2004). Multiobjective data clustering. In / Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 424鈥?30).
58. Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. / Journal of the American Society for Information Science and Technology, / 58(7), 1019鈥?031. CrossRef
59. Lin, Z., King, I., & Lyu, M. R. (2006). Pagesim: a novel link-based similarity measure for the world wide web. In / Proceedings of IEEE/WIC/ACM international conference on web intelligence (pp. 687鈥?93).
60. Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. / SIAM News, / 23(5), 1鈥?8.
61. McQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In / Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (pp. 281鈥?97).
62. Minaei-Bidgoli, B., Topchy, A., & Punch, W. (2004). A comparison of resampling methods for clustering ensembles. In / Proceedings of the international conference on machine learning: models, technologies and applications (pp. 939鈥?45).
63. Minkov, E., Cohen, W. W., & Ng, A. Y. (2006). Contextual search and name disambiguation in email using graphs. In / Proceedings of international ACM SIGIR conference on research and development in information retrieval (pp. 27鈥?4).
64. Mirkin, B. (2001). Reinterpreting the category utility function. / Machine Learning, / 45, 219鈥?28. CrossRef
65. Monti, S., Tamayo, P., Mesirov, J. P., & Golub, T. R. (2003). Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. / Machine Learning, / 52(1鈥?), 91鈥?18. CrossRef
66. Naldi, M. C., Carvalho, A. C., & Campello, R. B. (2013). Cluster ensemble selection based on relative validity indexes. / Data Mining and Knowledge Discovery. doi:10.1007/s10618-012-0290-x .
67. Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. / Advances in Neural Information Processing Systems, / 14, 849鈥?56.
68. Nguyen, N., & Caruana, R. (2007). Consensus clusterings. In / Proceedings of IEEE international conference on data mining (pp. 607鈥?12).
69. Punera, K., & Ghosh, J. (2007). Soft cluster ensembles. In / Advances in fuzzy clustering and its applications (pp. 69鈥?0). New York: Wiley. CrossRef
70. Reuther, P., & Walter, B. (2006). Survey on test collections and techniques for personal name matching. / International Journal on Metadata, Semantics and Ontologies, / 1(2), 89鈥?9. CrossRef
71. Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. / Journal of Computational and Graphical Statistics, / 15(1), 118鈥?38. CrossRef
72. Shi, T., Seligson, D., Belldegrun, A. S., Palotie, A., & Horvath, S. (2005). Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma. / Modern Pathology, / 18(4), 547鈥?57. CrossRef
73. Strehl, A., & Ghosh, J. (2002). Cluster ensembles: a knowledge reuse framework for combining multiple partitions. / Journal of Machine Learning Research, / 3, 583鈥?17.
74. Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., & Kellam, P. (2004). Consensus clustering and functional interpretation of gene-expression data. / Genome Biology, / 5, R94. CrossRef
75. Tan, P. N., Steinbach, M., & Kumar, V. (2005). / Introduction to data mining. Reading: Addison Wesley.
76. Tijms, H. (2004). / Understanding probability: chance rules in everyday life. Cambridge: Cambridge University Press.
77. Topchy, A. P., Jain, A. K., & Punch, W. F. (2003). Combining multiple weak clusterings. In / Proceedings of IEEE international conference on data mining (pp. 331鈥?38). CrossRef
78. Topchy, A. P., Jain, A. K., & Punch, W. F. (2004a). A mixture model for clustering ensembles. In / Proceedings of SIAM international conference on data mining (pp. 379鈥?90).
79. Topchy, A. P., Law, M. H. C., Jain, A. K., & Fred, A. L. (2004b). Analysis of consensus partition in cluster ensemble. In / Proceedings of IEEE international conference on data mining (pp. 225鈥?32).
80. Topchy, A. P., Jain, A. K., & Punch, W. F. (2005). Clustering ensembles: models of consensus and weak partitions. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 27(12), 1866鈥?881. CrossRef
81. Wu, R. C., Chen, R. S., Chang, C. C., & Chen, J. Y. (2005). Data mining application in customer relationship management of credit card business. In / Proceedings of international conference on computer software and applications (pp. 39鈥?0).
82. Xue, H., Chen, S., & Yang, Q. (2009). Discriminatively regularized least-squares classification. / Pattern Recognition, / 42(1), 93鈥?04. CrossRef
83. Yu, Z., Wong, H. S., & Wang, H. (2007). Graph-based consensus clustering for class discovery from gene expression data. / Bioinformatics, / 23(21), 2888鈥?896. CrossRef
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Automation and Robotics
Computing Methodologies
Simulation and Modeling
Language Translation and Linguistics
出版者：Springer Netherlands
ISSN：1573-0565

文摘

Cluster ensembles or consensus clusterings have been shown to be better than any standard clustering algorithm at improving accuracy and robustness across various sets of data. This meta-learning formalism also helps users to overcome the dilemma of selecting an appropriate technique and the parameters for that technique. Since founded, different research areas have emerged with the common purpose of enhancing the effectiveness and applicability of cluster ensembles. These include the selection of ensemble members, the imputation of missing values, and the summarization of ensemble members. In particular, this paper is set to provide the review of different matrix refinement approaches that have been recently proposed in the literature for summarizing information of multiple clusterings. With various benchmark datasets and quality measures, the comparative study of these novel techniques is carried out to provide empirical findings from which a practical guideline can be drawn.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700