Criteria for Mixture-Model Clustering with Side-Information
详细信息    查看全文
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2017
  • 出版时间:2017
  • 年:2017
  • 卷:10163
  • 期:1
  • 页码:43-59
  • 丛书名:Pattern Recognition Applications and Methods
  • ISBN:978-3-319-53375-9
  • 卷排序:10163
文摘
The estimation of mixture models is a well-known approach for cluster analysis and several criteria have been proposed to select the number of clusters. In this paper, we consider mixture models using side-information, which gives the constraint that some data in a group originate from the same source. Then the usual criteria are not suitable. An EM (Expectation-Maximization) algorithm has been previously developed to jointly allow the determination of the model parameters and the data labelling, for a given number of clusters. In this work we adapt three usual criteria, which are the bayesian information criterion (BIC), the Akaike information criterion (AIC), and the entropy criterion (NEC), so that they can take into consideration the side-information. One simulated problem and two real data sets have been used to show the relevance of the modified criterion versions and compare the criteria. The efficiency of both the EM algorithm and the criteria, for selecting the right number of clusters while getting a good clustering, is in relation with the amount of side-information. Side-information being mainly useful when the clusters overlap, the best criterion is the modified BIC.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700