Discriminative learning of generative models: large margin multinomial mixture models for document classification
详细信息    查看全文
  • 作者:Hui Jiang ; Zhenyu Pan ; Pingzhao Hu
  • 关键词:Discriminative learning ; Large margin estimation (LME) ; Multinomial mixture model (MMM) ; Linear programming ; Document classification ; Approximation ; maximization (AM)
  • 刊名:Pattern Analysis & Applications
  • 出版年:2015
  • 出版时间:August 2015
  • 年:2015
  • 卷:18
  • 期:3
  • 页码:535-551
  • 全文大小:1,279 KB
  • 参考文献:1.Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden Markov support vector machines. In: Proceedings of the 20th international conference on machine learning (ICML-2003), Washington D.C., pp 3-0
    2.Arenas-Garcia J, Perez-Cruz F (2003) Multi-class support vector machines: a new approach. In: Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP-003), Hong Kong, pp II-781–II-784
    3.Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993-022MATH
    4.Brown LD (1986) Fundamentals of statistical exponential families, with applications in statistical decision theory. Institute of Mathematical Statistics, HaywardMATH
    5.Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121-67View Article
    6.Chang C-C, Lin C-J (2011) LIBSVM : a library for support vector machines. ACM Transac Intell Syst Technol 2(3):27.1-7.27
    7.Chu-Carroll J, Carpenter B (1999) Vector-based natural language call routing. Comput Linguist 25(3):361-88
    8.Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39:1-8MathSciNet
    9.Druck G, Pal C, Zhu X, Mccallum A (2007) Semi-supervised classification with hybrid generative/discriminative methods. In: ACM international conference on knowledge discovery and data mining, pp 280-89
    10.Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 1-
    11.Han EH, Karypis G, Kumar V (2001) Text categorization using weight adjusted k-nearest neighbor classification. In: Proceedings of the 5th Pacific-Asia conference on knowledge discovery and data mining, Hong Kong
    12.Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Net 13:415-25View Article
    13.Jaakkola T, Haussler D (1998) Exploiting generative models in discriminative classifiers. In: Proceedings of advances in neural information processing systems (NIPS), no. 11
    14.Jaakkola T, Meila M, Jebara T (1999) Maximum entropy discrimination. In: Proceedings of advances in neural information processing systems (NIPS), no. 12
    15.Jebara T, Pentland A (1998) Maximum conditional likelihood via bound maximization and the CEM algorithm. In: Proceedings of advances in neural information processing systems (NIPS), no. 11
    16.Jebara T (2002.) Discriminative, generative and imitative learning. Ph.D. thesis, MIT, Feb 2002
    17.Jiang H, Li X, Liu C-J (2006) Large margin hidden markov models for speech recognition. IEEE Trans Audio Speech Lang Process 15(5):1584-595View Article
    18.Jiang H, Li X (2007) Incorporating training errors for large margin HMMs under semi-definite programming framework. In: Proceedings of 2007 IEEE international conference on acoustic, speech, and signal processing (ICASSP-007), pp 629-32, Hawaii
    19.Jiang H, Li X (2007) A general approximation-optimization approach to large margin estimation of HMMs. In: Kodic V (ed) Speech recognition and synthesis. I-tech
    20.Jiang H (2010) Discriminative training of HMMs for automatic speech recognition: a survey. Comput Speech Lang 24(4):589-08View Article
    21.Jiang H, Li X (2010) Parameter estimation of statistical models using convex optimization: an advanced method of discriminative training for speech and language processing. IEEE Signal Process Mag 27(3):115-27View Article
    22.Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the European conference on machine learning (ECML), Springer
    23.Jordan MI (2004) Graphical models. Stat Sci (Spec Issue Bayesian Stat) 19:140-55MATH
    24.Katagiri S, Juang B-H, Lee C-H (1998) Pattern recognition using a generalized probabilistic descent method. Proc IEEE 86(11):2345-373View Article
    25.Lewis DD, Yang Y, Rose T, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361-97
    26.Li X, Jiang H, Liu C-J (2005) Large margin HMMs for speech recognition. In: Proceedings of 2005 IEEE international conference on acoustic, speech, and signal processing (ICASSP-005), Philadelphia, pp V513–V516
    27.Li X, Jiang H (2005) A constrained joint optimization method for large margin HMM estimation. In: Proceedings of 2005 IEEE workshop on automatic speech recognition and understanding
    28.Li X, Jiang H (2006) Solving large margin HMM estimation via semi-definite programming. In: Proceedings of 2006 international conference on spoken language processing (ICSLP-006), Pittsburgh
    29.Li X, Jiang H (2007) Solving large margin hidden markov model estimation via semidefinite programming. IEEE Trans Audio Speech Lang Process 15(8):2383-392View Article
    30.Liu P, Jiang H, Zitouni I (2004) Discriminati
  • 作者单位:Hui Jiang (1)
    Zhenyu Pan (1)
    Pingzhao Hu (1)

    1. Department of Computer Science and Engineering, York University, 4700 Keele Street, Toronto, ON, M3J 1P3, Canada
  • 刊物类别:Computer Science
  • 刊物主题:Pattern Recognition
  • 出版者:Springer London
  • ISSN:1433-755X
文摘
In this paper, a novel discriminative learning method is proposed to estimate generative models for multi-class pattern classification tasks, where a discriminative objective function is formulated with separation margins according to certain discriminative learning criterion, such as large margin estimation (LME). Furthermore, the so-called approximation-maximization (AM) method is proposed to optimize the discriminative objective function w.r.t. parameters of generative models. The AM approach provides a good framework to deal with latent variables in generative models and it is flexible enough to discriminatively learn many rather complicated generative models. In this paper, we are interested in a group of generative models derived from multinomial distributions. Under some minor relaxation conditions, it is shown that the AM-based discriminative learning methods for these generative models result in linear programming (LP) problems that can be solved effectively and efficiently even for rather large-scale models. As a case study, we have studied to learn multinomial mixture models (MMMs) for text document classification based on the large margin criterion. The proposed methods have been evaluated on a standard RCV1 text corpus. Experimental results show that large margin MMMs significantly outperform the conventional MMMs as well as pure discriminative models such as support vector machines (SVM), where over 25?% relative classification error reduction is observed in three independent RCV1 test sets.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700