Efficient histogram dictionary learning for text/image modeling and classification
详细信息    查看全文
  • 作者:Minyoung Kim
  • 关键词:Text/image histograms ; Dictionary learning ; Mathematical optimization ; Text/image classification/annotation
  • 刊名:Data Mining and Knowledge Discovery
  • 出版年:2017
  • 出版时间:January 2017
  • 年:2017
  • 卷:31
  • 期:1
  • 页码:203-232
  • 全文大小:
  • 刊物类别:Computer Science
  • 刊物主题:Data Mining and Knowledge Discovery; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Statistics for Engineering, Physics, Computer Science, Chemistry and Earth Sciences;
  • 出版者:Springer US
  • ISSN:1573-756X
  • 卷排序:31
文摘
In dealing with text or image data, it is quite effective to represent them as histograms. In modeling histograms, although recent Bayesian topic models such as latent Dirichlet allocation and its variants are shown to be successful, they often suffer from computational overhead for inference of a large number of hidden variables. In this paper we consider a different modeling strategy of forming a dictionary of base histograms whose convex combination yields a histogram of observable text/image document. The dictionary entries are learned from data, which establishes direct/indirect association between specific topics/keywords and the base histograms. From a learned dictionary, the coding of an observed histogram can provide succinct and salient information useful for classification. One of our main contributions is that we propose a very efficient dictionary learning algorithm based on the recent Nesterov’s smooth optimization technique in conjunction with analytic solution methods for quadratic minimization sub-problems. Not alone the faster theoretical convergence rate, also in real time, our algorithm is 20–30 times faster than general-purpose optimizers such as interior-point methods. In classification/annotation tasks on several text/image datasets, our approach exhibits comparable or often superior performance to existing Bayesian models, while significantly faster than their variational inference.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700