Supervised labeled latent Dirichlet allocation for document categorization

详细信息查看全文

作者：Ximing Li (1) (2)
Jihong Ouyang (1) (2)
Xiaotang Zhou (1) (2)
You Lu (1) (2)
Yanhui Liu (1) (2)

1. College of Computer Science and Technology ; Jilin University ; Changchun ; China
2. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education ; Jilin University ; Changchun ; China
关键词：Supervised ; Topic modeling ; Latent Dirichlet allocation ; Multi ; label classification
刊名：Applied Intelligence
出版年：2015
出版时间：April 2015
年：2015
卷：42
期：3
页码：581-593
全文大小：1,304 KB
参考文献：1. Ali, D, Faqir, M (2012) Group topic modeling for academic knowledge discovery. Appl. Intell. 36: pp. 870-886 CrossRef
2. Andrieu, C, Freitas, ND, Doucet, A, Jordan, MI (2003) An introduction to MCMC for machine learning. Mach Learn 50: pp. 5-43 CrossRef
3. Blei, DM, Lafferty, JD (2007) A correlated topic model fo science. Ann Appl Stat 1: pp. 17-35 CrossRef
4. Blei DM, McAuliffe JD (2007) Supervised topic models. In: Neural information processing systems
5. Blei, DM, Ng, AY, Jordan, MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3: pp. 993-1022
6. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition , vol 2, pp 524鈥?31
7. Heinrich G. (2005) Parameter estimation for text analysis. lus-plus">http://www.arbylon.net/publications/textest
8. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp 50鈥?7
9. Jaegul, C, Changhyun, L, Chandan, KR, Park, H (2013) Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Vis Comput Graph 19: pp. 1992-2001 CrossRef
10. Ji S, Tang L, Yu S, Ye J (2008) Extracting shared subspace for multi-label classification. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 381鈥?89
11. Kim D, Kim S, Oh A (2012) Dirichlet process with mixed random measures: a nonparametric topic model for labeled data. In: 29th International conference on machine learning, pp 727鈥?34
12. Lacoste-Julien S, Sha F, Jordan MI (2009) Disclda: discriminative learning for dimensionality reduction and classification. In: Neural information processing systems, pp 897鈥?04
13. Lewis, DD, andTony, G, Rose, YY, Li, F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5: pp. 361-397
14. Quelhas, P, Monay, F, Odobez, JM, Gatica-Perez, D, Tuytelaars, T, Van Gool, L (2005) Modeling scenes with local descriptors and latent aspects. Comput Vis IEEE Int Conf 1: pp. 883-890
15. Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Conference on empirical methods in natural language processing, pp 248鈥?56. Association for Computational Linguistics
16. Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 457鈥?65
17. Rubin, TN, Chambers, A, Smyth, P, Steyvers, M (2012) Statistical topic models for multi-label document classification. Mach Learn 88: pp. 157-208 CrossRef
18. Sebastiani, F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34: pp. 1-47 CrossRef
19. Wallach H (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on Machine learning, pp 977鈥?84. ACM
20. Xie P, Xing EP (2013) Integrating document clustering and topic modeling. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 694鈥?03
21. Xu, Y, Guo, R (2014) An inproved nu-twin support vector machine. Appl Intell 41: pp. 42-54 CrossRef
22. Zhang ML, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 999鈥?008
23. Zhu J, Ahmed A, Xing E (2009) Medlda: maximum margin supervised topic models for regression and classification. In: Proceedings of the 26th annual international conference on machine learning, pp 1257鈥?264. ACM
24. Zhu J, Ahmed A, Xing E. (2012) Medlda: maximum margin supervised topic models
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Mechanical Engineering
Manufacturing, Machines and Tools
出版者：Springer Netherlands
ISSN：1573-7497

文摘

Recently, supervised topic modeling approaches have received considerable attention. However, the representative labeled latent Dirichlet allocation (L-LDA) method has a tendency to over-focus on the pre-assigned labels, and does not give potentially lost labels and common semantics sufficient consideration. To overcome these problems, we propose an extension of L-LDA, namely supervised labeled latent Dirichlet allocation (SL-LDA), for document categorization. Our model makes two fundamental assumptions, i.e., Prior 1 and Prior 2, that relax the restriction of label sampling and extend the concept of topics. In this paper, we develop a Gibbs expectation-maximization algorithm to learn the SL-LDA model. Quantitative experimental results demonstrate that SL-LDA is competitive with state-of-the-art approaches on both single-label and multi-label corpora.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700