An effective and interpretable method for document classification

详细信息查看全文

作者：Ngo Van Linh ; Nguyen Kim Anh ; Khoat Than…
关键词：Variational inference ; Bayesian nonparametrics ; Classification ; Von Mises–Fisher distribution
刊名：Knowledge and Information Systems
出版年：2017
出版时间：March 2017
年：2017
卷：50
期：3
页码：763-793
全文大小：
刊物类别：Computer Science
刊物主题：Information Systems and Communication Service; IT in Business;
出版者：Springer London
ISSN：0219-3116
卷排序：50

文摘

As the number of documents has been rapidly increasing in recent time, automatic text categorization is becoming a more important and fundamental task in information retrieval and text mining. Accuracy and interpretability are two important aspects of a text classifier. While the accuracy of a classifier measures the ability to correctly classify unseen data, interpretability is the ability of the classifier to be understood by humans and provide reasons why each data instance is assigned to a label. This paper proposes an interpretable classification method by exploiting the Dirichlet process mixture model of von Mises–Fisher distributions for directional data. By using the labeled information of the training data explicitly and determining automatically the number of topics for each class, the learned topics are coherent, relevant and discriminative. They help interpret as well as distinguish classes. Our experimental results showed the advantages of our approach in terms of separability, interpretability and effectiveness in classification task of datasets with high dimension and complex distribution. Our method is highly competitive with state-of-the-art approaches.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700