摘要
针对尚缺乏识别准确率高的铁路集装箱箱号OCR系统这一现实,设计了一套识别准确率能够达到98%以上的铁路集装箱箱号OCR系统。该系统对采集到的图像进行字符的自动分割,在训练CNN时针对目前数据集多样性不足、样本较少的情况,采用了数据增强的方法扩充数据集,并且基于LeNet-5进行了网络结构搜索,训练了分别用于数字和字母识别的卷积神经网络Digit Net和Letter Net,其在测试集上的识别准确率分别能够达到99.7%和99.2%。
In China, there is still a lack of high-accuracy railway container number OCR system. This paper designs a railway container number OCR system with recognition accuracy of over 98%. This system automatically divides the characters in acquired image. When training CNN, to cope with the insufficient diversity of current datasets and the lack of samples, data augmentation is used to enlarge the dataset. This system performs network structure search based on LeNet-5, training the convolutional neural networks Digit Net and Letter Net for digital and letter recognition respectively. The recognition accuracy on the test set reaches99.7% and 99.2%.
引文
[1] Islam N, Islam Z, Noor N. A Survey on Optical Character Recognition System[J]. ITB Journal of Information and Communication Technology,2017.
[2]陈永煌.集装箱箱号识别技术的研究与实现[D].华中科技大学,2013.
[3]黄深广,翁茂楠,史俞,刘清.基于计算机视觉的集装箱箱号识别[J].港口装卸,2018.1:1-4
[4]刘璇.铁路集装箱号码与车型智能识别系统研究[D].西南交通大学,2018.
[5] Smith R. An Overview of the Tesseract OCR Engine[C]//International Conference on Document Analysis&Recognition,2007.
[6] Lécun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998.86(11):2278-2324