Group topic model: organizing topics into groups

详细信息查看全文

作者：Ximing Li ; Jihong Ouyang ; You Lu ; Xiaotang Zhou ; Tian Tian
关键词：Topic modeling ; Latent Dirichlet allocation ; Group ; Variational inference ; Online learning ; Document clustering
刊名：Information Retrieval
出版年：2015
出版时间：February 2015
年：2015
卷：18
期：1
页码：1-25
全文大小：1,158 KB
参考文献：1. Blei, D., & Lafferty, J. (2006). Dynamic topic models. In / Proceedings of the 23rd international conference on machine learning (pp. 113-20). ACM.
2. Blei, D., & McAuliffe, J. (2007). Supervised topic models. In / Proceedings of the neural information processing systems.
3. Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. / The Journal of Machine Learning Research, / 3, 993-022.
4. Blei, D., & Lafferty, J. (2007). A correlated topic model fo science. / The Annals of Applied Statistics, / 1(1), 17-5. CrossRef
5. Blei, D., Griffiths, T., & Jordan, M. (2010). The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. / Journal of the ACM, / 57(2), 1-0. CrossRef
6. Blei, D. (2012). Probabilistic topic models. / Communications of the ACM, / 55(4), 77-4. CrossRef
7. Boyd-Graber, J., & Blei, D. (2008). Syntactic topic models. In / Proceedings of neural information processing systems.
8. Cai, D., He, X., & Han, J. (2011). Locally consistent concept factorization for document clustering. / IEEE Transactions on Knowledge and Data Engineering, / 23(6), 902-13.
9. Chang, J., & Blei, D. (2010). Hierarchical relational models for document networks. / Annals of Applied Statistics, / 4(1), 124-50. CrossRef
10. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. / Journal of the American Society for Information Science, / 41(6), 391-07. CrossRef
11. Doyle, G., & Elkan, C. (2009). Accounting for burstiness in topic models. In / Proceedings of the 26th international conference on machine learning (pp. 281-88). ACM.
12. Hoffman, M., & Blei, D. (2010). Online learning for latent Dirichlet allocation. In / Advances in neural information processing systems.
13. Hoffman, M., Blei, D., & Wang, C. (2013). Stochastic variational inference. / Journal of Machine Learning Research, / 14(1), 1303-347.
14. Hofmann, T. (1999). Probabilistic latent semantic indexing. In / Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 50-7). ACM.
15. Jing, L., Ng, M. K., & Huang, J. Z. (2007). An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. / IEEE Transactions on Knowledge and Data Engineering, / 19(8), 1026-041. CrossRef
16. Koller, D., & Friedman, N. (2009). / Probabilistic graphical models: Principles and techniques. Cambridge: MIT Press.
17. Li, W., & McCallum, A. (2006). Pachinko allocation: Dag-structured mixture models of topic correlations. In / Proceedings of the 23rd international conference on machine learning (pp. 577-84). ACM.
18. Li, F., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In / Computer vision and pattern recognition (Vol. 2, pp. 524-31). IEEE.
19. Lovasz, L., & Plummer, M. (1986). / Matching theory. North Holland: Akademiai Kiado.
20. Lu, Y., Mei, Q., & Zhai, C. (2011). Investigating task performance of probabilistic topic models: An empirical study of PLSA and LDA. / Information Retrieval, / 14(2), 178-03. CrossRef
21. Reisinger, J., Waters, A., Silverthorn, B., & Mooney, R. (2009). Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In / Proceedings of neural information processing systems (pp. 1982-989). (2009).
22. Reisinger, J., Waters, A., Silverthorn, B., & Mooney, R. (2010). Spherical topic models. In / Proceedings of the 27th international conference on machine learning. ACM.
23. Sivic, J., Russell, B., Zisserman, A., Freeman, W., & Efros, A. (2008). Unsupervised discovery of visual object class hierarchies. In / Proceedings of the computer vision and pattern recognition (pp. 1-). IEEE.
24. Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. / Journal of the American Statistical Association, / 101(476), 1566-581.

Latent Dirichlet allocation defines hidden topics to capture latent semantics in text documents. However, it assumes that all the documents are represented by the same topics, resulting in the “forced topic-problem. To solve this problem, we developed a group latent Dirichlet allocation (GLDA). GLDA uses two kinds of topics: local topics and global topics. The highly related local topics are organized into groups to describe the local semantics, whereas the global topics are shared by all the documents to describe the background semantics. GLDA uses variational inference algorithms for both offline and online data. We evaluated the proposed model for topic modeling and document clustering. Our experimental results indicated that GLDA can achieve a competitive performance when compared with state-of-the-art approaches.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700