摘要
提出一种文档图像实时分类训练和测试的方法。在实际应用中,数据训练的精确性和高效性在文档图像识别中起着关键的作用。现有的深度学习方法不能满足此要求,因为需要大量的时间用于训练和微调深层次的网络架构。针对此问题,提出一种基于计算机视觉的新方法:第一阶段训练深度网络,作为特征提取器;第二阶段用极限学习机(ELM)用于分类。该方法的性能优于目前最先进的基于深度学习的相关方法,在Tobacco-3482数据集上的最终准确率为83.45%。与之前基于卷积神经网络(CNN)的方法相比,相对误差降低了26%。ELM的训练时间仅为1.156秒,对2 482张图像的整体预测时间是3.083秒。因此,该文档分类方法适合于大规模实时应用。
This paper presented a real-time training and testing method for document image classification. In practical applications, the accuracy and efficiency of data training play a key role in document image recognition. The existing deep learning methods cannot meet this requirement, because they need a lot of time to train and fine-tune the deep network architecture. To solve this problem, we proposed a new method based on computer vision. The method was divided into two steps: the depth network was trained as a feature extractor; we used the extreme learning machine(ELM) for classification. The performance of this method is superior to the advanced methods based on deep learning. The final accuracy of this method on Tobacco-3482 dataset is 83.45%. Compared with the method based on convolution neural network, the relative error is reduced by 26%. The training time of ELM is only 1.156 s, and the overall prediction time of 2 482 images is 3.083 s. Therefore, the method is suitable for large-scale real-time applications.
引文
[1] Afzal M Z, Capobianco S, Malik M I, et al. Deepdocclassifier: document classification with deep convolutional neural network[C]//2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE,2015:1111-1115.
[2] Kang L, Kumar J, Ye P, et al. Convolutional Neural Networks for Document Image Classification[C]//International Conference on Pattern Recognition. IEEE Computer Society, 2014.
[3] Harley A W, Ufkes A, Derpanis K G. Evaluation of deep convolutional nets for document image classification and retrieval[C]//13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2015:991-995.
[4] Kumar J, Ye P, Doermann D. Learning document structure for retrieval and classification[C]//Proceedings of the 21st International Conference on Pattern Recognition(ICPR2012). IEEE, 2012: 1558-561.
[5] Chen S, He Y, Sun J, et al. Structured document classification by matching local salient features[C]//Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, 2012:653-656.
[6] Reddy K V U. Form classification[C]//Proceedings of SPIE—The International Society for Optical Engineering, 2008, 6815.
[7] Tang B, He H, Baggenstoss P, et al. A Bayesian Classification Approach Using Class-Specific Features for Text Categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2016,28(6):1602-1606.
[8] Diab D M, El Hindi K M. Using differential evolution for fine tuning naive Bayesian classifiers and its application for text classification[J]. Applied Soft Computing, 2017, 54:183-199.
[9] Shin C, Doermann D S. Document Image Retrieval Based on Layout Structural Similarity[C]//Proceedings of the 2006 International Conference on Image Processing, Computer Vision, & Pattern Recognition, Las Vegas, Nevada, USA, June 26-29, 2006, Volume 2. DBLP, 2006.
[10] Collins-Thompson K, Nickolov R. A clustering-based algorithm for automatic document separation[C]//Research & Development in Information Retrieval. 2007.
[11] Kumar J, Ye P, Doermann D. Structural similarity for document image classification and retrieval[J]. Pattern Recognition Letters, 2014,43:119-126.
[12] Shunsuke K, Ryunosuke K, Donahue I. End-to-end text classification via image-based embedding using character-level networks[EB].arXiv preprint arXiv:1810.03595v2, 2018.
[13] Sharad J, Suraj S, Nitin K. First steps toward CNN based source classification of document images shared over messaging app[EB]. arXiv preprint arXiv:1808.05941v1, 2018.
[14] Wang H, Feng L, Kong A, et al. Multi-view reconstructive preserving embedding for dimension reduction[EB]. arXiv preprint arXiv:1807.10614v1, 2018.
[15] Praveen K, Jawahar C V. HWNet v2: An efficient word image representation for handwritten documents[EB]. arXiv preprint arXiv:1802.06194v1, 2018.
[16] Das A, Roy S, Bhattacharya U, et al. Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks[EB]. arXiv preprint arXiv:1801.09321v3, 2018.
[17] Roy S, Das A, Bhattacharya U. Generalized stacking of layerwise-trained Deep Convolutional Neural Networks for document image classification[C]//2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016:1273-1278.
[18] Csurka G, Larlus D, Gordo A, et al. What is the right way to represent document images?[EB]. arXiv preprint arXiv:1603.01076,2016.
[19] Afzal M Z, Andreas K?lsch, Ahmed S, et al. Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification[C]//Iapr International Conference on Document Analysis & Recognition. IEEE Computer Society, 2017.
[20] Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1. Curran Associates Inc., 2012:1097-1105.
[21] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: A new learning scheme of feedforward neural networks[C]//Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on. IEEE, 2004:985-990.
[22] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: Theory and applications[J]. Neurocomputing, 2006, 70(1-3):489-501.
[23] Kotsiantis S B. Supervised machine learning: A review of classification techniques[J]. Informatica, 2007,31 (3):249-268.
[24] Jia Y, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[EB]. arXiv preprint arXiv:1408.5093, 2014.