An effective and interpretable method for document classification
详细信息    查看全文
文摘
As the number of documents has been rapidly increasing in recent time, automatic text categorization is becoming a more important and fundamental task in information retrieval and text mining. Accuracy and interpretability are two important aspects of a text classifier. While the accuracy of a classifier measures the ability to correctly classify unseen data, interpretability is the ability of the classifier to be understood by humans and provide reasons why each data instance is assigned to a label. This paper proposes an interpretable classification method by exploiting the Dirichlet process mixture model of von Mises–Fisher distributions for directional data. By using the labeled information of the training data explicitly and determining automatically the number of topics for each class, the learned topics are coherent, relevant and discriminative. They help interpret as well as distinguish classes. Our experimental results showed the advantages of our approach in terms of separability, interpretability and effectiveness in classification task of datasets with high dimension and complex distribution. Our method is highly competitive with state-of-the-art approaches.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700