An improved clustering ensemble method based link analysis

详细信息查看全文

作者：Zhi-Feng Hao (1) (2)
Li-Juan Wang (1) (2)
Rui-Chu Cai (1)
Wen Wen (1)

1. Faculty of Computer ; Guangdong University of technology ; Guangzhou ; 510006 ; China
2. School of Computer Science and Engineering ; South China University of Technology ; Guangzhou ; 510006 ; China
关键词：K ; means clustering ; Clustering ensemble ; Link analysis
刊名：World Wide Web
出版年：2015
出版时间：March 2015
年：2015
卷：18
期：2
页码：185-195
全文大小：662 KB
参考文献：1. Adamic, L.A., Adar, E.: Friends and neighbors on the Web. Soc. Networks 25(3), 211鈥?30 (2003) CrossRef
2. Ayad, H., and Kamel, M.: 鈥淔inding Natural Clusters Using Multiclusterer Combiner Based on Shared Nearest Neighbors,鈥?Proc. Int鈥檒 Work. Mult. Classif. Syst., 166鈥?75 (2003)
3. Borges, J., Levene, M.: Ranking pages by topology and popularity within Web sites. World Wide Web 9, 301鈥?16 (2006) CrossRef
4. Domeniconi, C., Al-Razgan, M.: Weighted Cluster Ensembles: Methods and Analysis. ACM Trans. Knowl. Discov. Data 2(4), 1鈥?0 (2009) CrossRef
5. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley & Sons, New York (2001)
6. Fern, X.Z., Brodley, C.E.: 鈥淩andom projection for high dimensional clustering: A cluster ensemble approach,鈥?Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 186鈥?93 (2003)
7. Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 25(11), 1411鈥?415 (2003)
8. Fouss, F., Pirotte, A., Renders, J.M., Saerens, M.: Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. EEE Trans. Knowl. Data Eng. 19(3), 355鈥?69 (2007) CrossRef
9. Fred, A.L.N., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835鈥?50 (2005)
10. Getoor, L., Diehl, C.P.: Link mining: a survey. ACM SIGKDD Explor. Newsl. 7(2), 3鈥?2 (2005) CrossRef
11. Gionis, A., Mannila, H. and Tsaparas, P.: 鈥淐lustering Aggregation,鈥?Proc. Int鈥檒 Conf. Data Eng., 341鈥?52 (2005)
12. Iam-On, N., Boongoen, T., Garrett, S., Price, C.: A link-based approach to the cluster ensemble problem. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2396鈥?409 (2011) CrossRef
13. Jain, A.K., Law, M.H.C.: Data clustering: A user鈥檚 dilemma鈥? Pattern Recognition and Machine Intelligence, pp. 1鈥?0. Springer-Verlag, Berlin (2005) CrossRef
14. Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31, 264鈥?23 (1999) CrossRef
15. Karypis, G., Kumar, V.: Multilevel k-Way Partitioning Scheme for Irregular Graphs. J. Parallel Distrib. Comput. 48(1), 96鈥?29 (1998) CrossRef
16. Kellam, P., Liu, X., Martin, N.J., Orengo, C., Swift, S. and Tucker, A.: 鈥淐omparing, contrasting and combining clusters in viral gene expression data,鈥?in Proc. 6th Workshop Intell. Data Anal. Med. Pharmocol., 56鈥?2 (2001)
17. Kuncheva, L.I., Vetrov, D.P.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1798鈥?808 (2006) CrossRef
18. Li, J.Q., Zhao, Y., Garcia-Molina, H.: A path-based approach for web page retrieval. World Wide Web 15, 257鈥?83 (2012) CrossRef
19. Likas, A., Vlassis, N., Verbeek, J.J.: The Global k-Means Clustering Algorithm. Pattern Recognit. 36, 451鈥?61 (2003) CrossRef
20. Lin, Z., King, I. and Lyu, M.R.: 鈥淧ageSim: A Novel Link-Based Similarity Measure for the World Wide Web,鈥漃roc. IEEE/WIC/ACM Int鈥檒 Conf. Web Intell., 687鈥?93 (2006)
21. Minaei-Bidgoli, B. Topchy, A. and Punch, W.: 鈥淎 Comparison of Resampling Methods for Clustering Ensembles,鈥?Proc. Int鈥檒 Conf. Mach. Learn. Models Technol. Appl., 939鈥?45 (2004)
22. Monti, S., Tamayo, P., Mesirov, J.P., Golub, T.R.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52, 91鈥?18 (2003)
23. Natthakan Iam-On, Tossapon Boongoen, Improved Link-Based Cluster Ensembles,WCCI 2012 IEEE World Congress on Computational Intelligence. Brisbane(2012)
24. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst 14, 849鈥?56 (2001)
25. Nguyen, N. and Caruana, R.: 鈥淐onsensus Clusterings,鈥?Proc. IEEE Int鈥檒 Conf. Data Min., 607鈥?12 (2007)
26. Punera, K., Ghosh, J.: Soft cluster ensembles. In: de Oliveira Valente, J., Pedrycz, W. (eds.) Advances in fuzzy clustering and its applications. Wiley, Hoboken (2007)
27. Strehl, A., Ghosh, J.: Cluster Ensembles: a Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. 3, 583鈥?17 (2002)
28. Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866鈥?881 (2005) CrossRef
29. Wang, T.: CA-Tree: a Hierarchical Structure for Efficient and Scalable Coassociation-Based Cluster Ensembles. IEEE Trans. Syst. Man Cybern.鈥擯ART B: Cybern. 41(3), 686鈥?98 (2011) CrossRef
30. Wei, F., Qian, W., Wang, C., Zhou, A.: Detecting overlapping community structures in networks. World Wide Web 12, 235鈥?61 (2009) CrossRef
刊物类别：Computer Science
刊物主题：Information Systems Applications and The Internet
Database Management
Operating Systems
出版者：Springer Netherlands
ISSN：1573-1413

文摘

Clustering Ensemble aggregates several base clustering analyses into a consensus clustering result, which is more accurate, stable and meaningful than standard clustering algorithm. In this paper, the ensemble information is described by data cluster association matrix. However, most data cluster association matrix overlooks an important type of information about the relationship between clusters. This paper proposes a new method WETU to refine the data cluster association matrix with link-based similarity measure. The refined data cluster association matrix is obtained according to the similarity of clusters among all base clustering results, not in one base clustering result. In addition, WETU can provide more discriminative information than CSM and WTU. The data cluster association matrix is refined into high level real-valued matrix, which can be aggregated by real-valued method, such as Global k-means. Experiments on synthetic dataset and UCI datasets show that the proposed method outperforms standard K-means, base clustering algorithm and CSM+Global k-means and WTU+Global k-means.T

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700