基于深度CNN和极限学习机相结合的实时文档分类

英文篇名：REAL-TIME DOCUMENT CLASSIFICATION BASED ON DEEP CNN AND EXTREME LEARNING MACHINE
作者：闫河 ; 王鹏 ; 董莺艳 ; 罗成 ; 李焕
英文作者：Yan He;Wang Peng;Dong Yingyan;Luo Cheng;Li Huan;College of Computer Science, Chongqing University of Teachnology;Artificial Intelligence College, Chongqing University of Teachnology;
关键词：文档图像分类 ; CNN ; 迁移学习
英文关键词：Document image classification;;CNN;;Migration learning
中文刊名：JYRJ
英文刊名：Computer Applications and Software
机构：重庆理工大学计算机科学与工程学院;重庆理工大学两江人工智能学院;
出版日期：2019-03-12
出版单位：计算机应用与软件
年：2019
期：v.36
基金：国家自然科学基金面上项目(61173184);; 重庆市自然科学基金项目(cstc2018jcyjAX0694)
语种：中文;
页：JYRJ201903033
页数：6
CN：03
ISSN：31-1260/TP
分类号：180-185

摘要

提出一种文档图像实时分类训练和测试的方法。在实际应用中,数据训练的精确性和高效性在文档图像识别中起着关键的作用。现有的深度学习方法不能满足此要求,因为需要大量的时间用于训练和微调深层次的网络架构。针对此问题,提出一种基于计算机视觉的新方法:第一阶段训练深度网络,作为特征提取器;第二阶段用极限学习机(ELM)用于分类。该方法的性能优于目前最先进的基于深度学习的相关方法,在Tobacco-3482数据集上的最终准确率为83.45%。与之前基于卷积神经网络(CNN)的方法相比,相对误差降低了26%。ELM的训练时间仅为1.156秒,对2 482张图像的整体预测时间是3.083秒。因此,该文档分类方法适合于大规模实时应用。
This paper presented a real-time training and testing method for document image classification. In practical applications, the accuracy and efficiency of data training play a key role in document image recognition. The existing deep learning methods cannot meet this requirement, because they need a lot of time to train and fine-tune the deep network architecture. To solve this problem, we proposed a new method based on computer vision. The method was divided into two steps: the depth network was trained as a feature extractor; we used the extreme learning machine(ELM) for classification. The performance of this method is superior to the advanced methods based on deep learning. The final accuracy of this method on Tobacco-3482 dataset is 83.45%. Compared with the method based on convolution neural network, the relative error is reduced by 26%. The training time of ELM is only 1.156 s, and the overall prediction time of 2 482 images is 3.083 s. Therefore, the method is suitable for large-scale real-time applications.

引文

[1] Afzal M Z, Capobianco S, Malik M I, et al. Deepdocclassifier: document classification with deep convolutional neural network[C]//2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE,2015:1111-1115.
    [2] Kang L, Kumar J, Ye P, et al. Convolutional Neural Networks for Document Image Classification[C]//International Conference on Pattern Recognition. IEEE Computer Society, 2014.
    [3] Harley A W, Ufkes A, Derpanis K G. Evaluation of deep convolutional nets for document image classification and retrieval[C]//13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2015:991-995.
    [4] Kumar J, Ye P, Doermann D. Learning document structure for retrieval and classification[C]//Proceedings of the 21st International Conference on Pattern Recognition(ICPR2012). IEEE, 2012: 1558-561.
    [5] Chen S, He Y, Sun J, et al. Structured document classification by matching local salient features[C]//Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, 2012:653-656.
    [6] Reddy K V U. Form classification[C]//Proceedings of SPIE—The International Society for Optical Engineering, 2008, 6815.
    [7] Tang B, He H, Baggenstoss P, et al. A Bayesian Classification Approach Using Class-Specific Features for Text Categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2016,28(6):1602-1606.
    [8] Diab D M, El Hindi K M. Using differential evolution for fine tuning naive Bayesian classifiers and its application for text classification[J]. Applied Soft Computing, 2017, 54:183-199.
    [9] Shin C, Doermann D S. Document Image Retrieval Based on Layout Structural Similarity[C]//Proceedings of the 2006 International Conference on Image Processing, Computer Vision, & Pattern Recognition, Las Vegas, Nevada, USA, June 26-29, 2006, Volume 2. DBLP, 2006.
    [10] Collins-Thompson K, Nickolov R. A clustering-based algorithm for automatic document separation[C]//Research & Development in Information Retrieval. 2007.
    [11] Kumar J, Ye P, Doermann D. Structural similarity for document image classification and retrieval[J]. Pattern Recognition Letters, 2014,43:119-126.
    [12] Shunsuke K, Ryunosuke K, Donahue I. End-to-end text classification via image-based embedding using character-level networks[EB].arXiv preprint arXiv:1810.03595v2, 2018.
    [13] Sharad J, Suraj S, Nitin K. First steps toward CNN based source classification of document images shared over messaging app[EB]. arXiv preprint arXiv:1808.05941v1, 2018.
    [14] Wang H, Feng L, Kong A, et al. Multi-view reconstructive preserving embedding for dimension reduction[EB]. arXiv preprint arXiv:1807.10614v1, 2018.
    [15] Praveen K, Jawahar C V. HWNet v2: An efficient word image representation for handwritten documents[EB]. arXiv preprint arXiv:1802.06194v1, 2018.
    [16] Das A, Roy S, Bhattacharya U, et al. Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks[EB]. arXiv preprint arXiv:1801.09321v3, 2018.
    [17] Roy S, Das A, Bhattacharya U. Generalized stacking of layerwise-trained Deep Convolutional Neural Networks for document image classification[C]//2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016:1273-1278.
    [18] Csurka G, Larlus D, Gordo A, et al. What is the right way to represent document images?[EB]. arXiv preprint arXiv:1603.01076,2016.
    [19] Afzal M Z, Andreas K?lsch, Ahmed S, et al. Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification[C]//Iapr International Conference on Document Analysis & Recognition. IEEE Computer Society, 2017.
    [20] Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1. Curran Associates Inc., 2012:1097-1105.
    [21] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: A new learning scheme of feedforward neural networks[C]//Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on. IEEE, 2004:985-990.
    [22] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: Theory and applications[J]. Neurocomputing, 2006, 70(1-3):489-501.
    [23] Kotsiantis S B. Supervised machine learning: A review of classification techniques[J]. Informatica, 2007,31 (3):249-268.
    [24] Jia Y, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[EB]. arXiv preprint arXiv:1408.5093, 2014.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700