Supervised labeled latent Dirichlet allocation for document categorization
详细信息    查看全文
  • 作者:Ximing Li (1) (2)
    Jihong Ouyang (1) (2)
    Xiaotang Zhou (1) (2)
    You Lu (1) (2)
    Yanhui Liu (1) (2)

    1. College of Computer Science and Technology
    ; Jilin University ; Changchun ; China
    2. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education
    ; Jilin University ; Changchun ; China
  • 关键词:Supervised ; Topic modeling ; Latent Dirichlet allocation ; Multi ; label classification
  • 刊名:Applied Intelligence
  • 出版年:2015
  • 出版时间:April 2015
  • 年:2015
  • 卷:42
  • 期:3
  • 页码:581-593
  • 全文大小:1,304 KB
  • 参考文献:1. Ali, D, Faqir, M (2012) Group topic modeling for academic knowledge discovery. Appl. Intell. 36: pp. 870-886 CrossRef
    2. Andrieu, C, Freitas, ND, Doucet, A, Jordan, MI (2003) An introduction to MCMC for machine learning. Mach Learn 50: pp. 5-43 CrossRef
    3. Blei, DM, Lafferty, JD (2007) A correlated topic model fo science. Ann Appl Stat 1: pp. 17-35 CrossRef
    4. Blei DM, McAuliffe JD (2007) Supervised topic models. In: Neural information processing systems
    5. Blei, DM, Ng, AY, Jordan, MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3: pp. 993-1022
    6. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition , vol 2, pp 524鈥?31
    7. Heinrich G. (2005) Parameter estimation for text analysis. lus-plus">http://www.arbylon.net/publications/textest
    8. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp 50鈥?7
    9. Jaegul, C, Changhyun, L, Chandan, KR, Park, H (2013) Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Vis Comput Graph 19: pp. 1992-2001 CrossRef
    10. Ji S, Tang L, Yu S, Ye J (2008) Extracting shared subspace for multi-label classification. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 381鈥?89
    11. Kim D, Kim S, Oh A (2012) Dirichlet process with mixed random measures: a nonparametric topic model for labeled data. In: 29th International conference on machine learning, pp 727鈥?34
    12. Lacoste-Julien S, Sha F, Jordan MI (2009) Disclda: discriminative learning for dimensionality reduction and classification. In: Neural information processing systems, pp 897鈥?04
    13. Lewis, DD, andTony, G, Rose, YY, Li, F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5: pp. 361-397
    14. Quelhas, P, Monay, F, Odobez, JM, Gatica-Perez, D, Tuytelaars, T, Van Gool, L (2005) Modeling scenes with local descriptors and latent aspects. Comput Vis IEEE Int Conf 1: pp. 883-890
    15. Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Conference on empirical methods in natural language processing, pp 248鈥?56. Association for Computational Linguistics
    16. Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 457鈥?65
    17. Rubin, TN, Chambers, A, Smyth, P, Steyvers, M (2012) Statistical topic models for multi-label document classification. Mach Learn 88: pp. 157-208 CrossRef
    18. Sebastiani, F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34: pp. 1-47 CrossRef
    19. Wallach H (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on Machine learning, pp 977鈥?84. ACM
    20. Xie P, Xing EP (2013) Integrating document clustering and topic modeling. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 694鈥?03
    21. Xu, Y, Guo, R (2014) An inproved nu-twin support vector machine. Appl Intell 41: pp. 42-54 CrossRef
    22. Zhang ML, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 999鈥?008
    23. Zhu J, Ahmed A, Xing E (2009) Medlda: maximum margin supervised topic models for regression and classification. In: Proceedings of the 26th annual international conference on machine learning, pp 1257鈥?264. ACM
    24. Zhu J, Ahmed A, Xing E. (2012) Medlda: maximum margin supervised topic models
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Mechanical Engineering
    Manufacturing, Machines and Tools
  • 出版者:Springer Netherlands
  • ISSN:1573-7497
文摘
Recently, supervised topic modeling approaches have received considerable attention. However, the representative labeled latent Dirichlet allocation (L-LDA) method has a tendency to over-focus on the pre-assigned labels, and does not give potentially lost labels and common semantics sufficient consideration. To overcome these problems, we propose an extension of L-LDA, namely supervised labeled latent Dirichlet allocation (SL-LDA), for document categorization. Our model makes two fundamental assumptions, i.e., Prior 1 and Prior 2, that relax the restriction of label sampling and extend the concept of topics. In this paper, we develop a Gibbs expectation-maximization algorithm to learn the SL-LDA model. Quantitative experimental results demonstrate that SL-LDA is competitive with state-of-the-art approaches on both single-label and multi-label corpora.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700