Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning
详细信息    查看全文
  • 作者:Yihao Zhang ; Junhao Wen ; Xibin Wang
  • 关键词:Semi ; supervised clustering ; Gaussian mixture model ; Distance metric learning ; Expectation maximization
  • 刊名:Journal of Intelligent Information Systems
  • 出版年:2015
  • 出版时间:August 2015
  • 年:2015
  • 卷:45
  • 期:1
  • 页码:113-130
  • 全文大小:549 KB
  • 参考文献:Basu, S., Banerjee, A., Mooney, R. (2002). Semi-supervised clustering by seeding[C]. In Proceedings of 19th international conference on machine learning (pp. 19-6).
    Belkin, M., Niyogi, P., Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples [J]. Journal of Machine Learning Research, 7, 2399-434.MathSciNet MATH
    Bilenko, M., Basu, S., Mooney, R.J. (2004). Integrating constraints and metric learning in semi-supervised clustering [C]. In Proceedings of the 21th international conference on machine learning (pp. 81-8).
    Bonifati, A., & Cuzzocrea, A. (2006). Storing and retrieving Xpath fragments in structured P2P networks [J]. Data & Knowledge Engineering, 59(2), 247-69.View Article
    Cai, D., He, X.F., Han, J.W. (2010). Locally consistent concept factorization for document clustering [J]. IEEE Transactions on Knowledge and Data Engineering, 23(6), 902-13.View Article
    Chandra, B., & Gupta, M. (2013). A novel approach for distance-based semi-supervised clustering using functional link neural network [J]. Soft Computing, 17(3), 369-79.View Article
    Chang, C.C., & Chen, H.Y. (2012). Semi-supervised clustering with discriminative random fields [J]. Pattern Recognition, 45(12), 4402-413.View Article MATH
    Cheung, Y.M, & Zeng, H. (2012). Semi-supervised maximum margin clustering with pairwise constraints [J]. IEEE Transactions on Knowledge and Data Engineering, 24(5), 926-39.View Article
    Cohn, D., Caruana, R., McCallum, A. (2003). Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Cornell University.
    Cuzzocrea, A., Furfaro, F., et al. (2004). A grid framework for approximate aggregate query answering on summarized sensor network readings [C]. In On the move to meaningful internet systems (pp. 144-53).
    da Costa, A.F.B.F., Pimentel, B.A., de Souza R.M.C.R. (2013). Clustering interval data through kernel-induced feature space [J]. Journal of Intelligent Information Systems, 40(1), 109-40.View Article
    Demiriz, A., Bennett, K.P., Embrechts, M.J. (1999). Semi-supervised clustering using genetic algorithms [C]. In Proceedings of artificial neural networks in engineering (ANNIE-99) (pp. 809-14).
    Dempster, A.P., Laird, N.M., Rubin, D.B. (1997). Maximum likelihood from incomplete data via the EM algorithm [J]. Journal of the Royal Statistical Society, Series B, 39(1), 1-8.MathSciNet
    Figueiredo, M.A., & Jain, A.K. (2002). Unsupervised learning of finite mixture models [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 381-96.View Article MATH
    Grira, N., Crucianu, M., Boujemaa, N. (2005). Unsupervised and semi-supervised clustering: A brief survey. In A review of machine learning techniques for processing multimedia content. Report of the MUSCLE European Network of Excellence (6th Framework Programme).
    He, X.F., Cai, D., Shao, Y.L., et al. (2011). Laplacian regularized Gaussian mixture model for data clustering [J]. IEEE Transactions on Knowledge and Data Engineering, 23(9), 1406-418.View Article
    Jain, A.K., Murty, M.N., Flynn, P.J. (1999). Data clustering: a review [J]. ACM Computing Surveys, 31(3), 264-23.View Article
    Klein, D., Kamvar, S.D., Manning, C.D. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering [C]. In Proceedings of the 19th international conference on machine learning (ICML-02) (pp. 307-14).
    Kulis, B., Basu, S., Dhillon, I., et al. (2009). Semi-supervised graph clustering: a kernel approach [J]. Machine Learning, 74(1), 1-2.View Article
    Luxburg, U.V. (2007). A tutorial on spectral clustering [J]. Statistics and Computing, 17(4), 395-16.MathSciNet View Article
    Macqueen, J. (1965). Some methods for classification and analysis of multivariate observations [C]. In Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (pp. 281-97).
    Ng, A.Y., Jordan, M.I., Weiss, Y. (2001). On spectral clustering: analysis and an algorithm [J]. Advances in Neural Information Processing Systems, 14, 849-56.
    Ruiz, C., Spiliopoulou, M., Menasalvas, E. (2010). Density-based semi-supervised clustering [J]. Data Mining and Knowledge Discovery, 21(3), 345-70.MathSciNet View Article
    Theobald, M. (2013). The program of the svmlight algorithm. http://?www.?mpi-inf.?mpg.?de/?~mtb/?svmlight/?JNI_?SVM-light-6.-1.?zip . Accessed 4 Mar 2013.
    Tong, B., Shao, H., Chou B.H., et al. (2012). Linear semi-supervised projection clustering by transferred centroid regularization [J]. Journal of Intelligent Information Systems, 39(2), 461-90.View Article
    Wagstaff, K., & Cardie, C. (2000). Clustering with instance-level constraints [C]. In Proceedings of the 17th international conference on machine learning (pp. 1103-110).
    Wan, M., Li, L.X., Xiao, J.H., et al. (2012). Data clustering using bacterial foraging optimization [J]. Journal of Intel
  • 作者单位:Yihao Zhang (1)
    Junhao Wen (1) (2)
    Xibin Wang (1)
    Zhuo Jiang (1)

    1. College of Computer Science, Chongqing University, Chongqing, 400030, China
    2. College of Software Engineering, Chongqing University, Chongqing, 400030, China
  • 刊物类别:Computer Science
  • 刊物主题:Data Structures, Cryptology and Information Theory
    Artificial Intelligence and Robotics
    Document Preparation and Text Processing
    Business Information Systems
  • 出版者:Springer Netherlands
  • ISSN:1573-7675
文摘
Semi-supervised clustering aim to aid and bias the unsupervised clustering by employing a small amount of supervised information. The supervised information is generally given as pairwise constraints, which was used to either modify the objective function or to learn the distance measure. Many previous work have shown that the cluster algorithm based on distance metric is significantly better than the cluster algorithm based on probability distribution in the some data set, there are a totally opposite result in another data set, so how to balance the two methods become a key problem. In this paper, we proposed a semi-supervised hybrid clustering algorithm that provides a principled framework integrating distance metric into Gaussian mixture model, which consider not only the intrinsic geometry information but also the probability distribution information of the data. In comparison to only using the pairwise constraints, the labeled data was used to initialize Gaussian distribution parameter and to construct the weight matrix of regularizer, and then we adopt Kullback-Leibler Divergence as the “distance-measurement to regularize the objective function. Experiments on several UCI data sets and the real world data sets of Chinese Word Sense Induction demonstrate the effectiveness of our semi-supervised cluster algorithm.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700