基于改进TextBoxes++的多方向场景文字识别算法的研究

英文篇名：Deformation-Sensitive Word Filtering Algorithm Based on Improved Trie Tree
作者：李伟冲
英文作者：LI Wei-chong;College of Computer Science, Sichuan University;
关键词：文字识别 ; OCR ; 文字检测 ; 端到端文字识别
英文关键词：Text Recognition;;OCR;;Text Detection;;End-to-End Text Recognition
中文刊名：XDJS
英文刊名：Modern Computer
机构：四川大学计算机学院;
出版日期：2018-12-25
出版单位：现代计算机(专业版)
年：2018
期：No.636
基金：国家自然科学基金(No.61332001)
语种：中文;
页：XDJS201836016
页数：6
CN：36
ISSN：44-1415/TP
分类号：69-74

摘要

多方向自然场景文字识别是计算机视觉领域中最困难和最有价值的挑战之一。现存的大多数方法只针对水平方向的文字,或将文字检测和识别视为单独的任务。基于目前先进的多方向场景文字检测算法TextBoxes++提出一个统一的端到端可训练的多方向文字识别方法,用于文字的同时检测和识别。为了适应多方向文字的识别,在TextBox-es++文字检测分支添加对四边形文字框角度的预测;并且通过添加文字识别分支扩展TextBoxes++的网络结构,用于文字的识别;引入RoIRotate以在检测和识别之间共享卷积特征。在公开数据集ICDAR 2015和ICDAR 2017 MLT上的实验证明所提出的方法的有效性。
Multi-directional natural scene text recognition is one of the most difficult and valuable challenges in the field of computer vision. Most existing methods are only for horizontal text, or for text detection and recognition as separate tasks. Based on the current advanced multi-directional scene text detection algorithm TextBoxes++, proposes a unified end-to-end trainable multi-directional text recognition method for simultaneous detection and recognition of text. In order to adapt to the recognition of multi-directional text, the text box of TextBoxes++adds a prediction of the angle of the quadrilateral text box; and by adding text recognition branch to extend the network structure of TextBoxes++ for text recognition; finally, introduces RoIRotate for detection and recognition, the convolution feature is shared between. Experiments on the public dataset ICDAR 2015 and ICDAR 2017 MLT demonstrate the effectiveness of the proposed method.

引文

[1]M.Liao,B.Shi,X.Bai,X.Wang,W.Liu.Textboxes++:A Single-Shot Oriented Scene Text Detector.In IEEE Transactions on Image Processing 27(2018)3676-3690.
    [2]Zhou X.,Yao C.,Wen H.,Wang Y.,Zhou S.,He W.,Liang J.East:An Efficient and Accurate Scene Text Detector.arXiv Preprint:1704.03155,2017.
    [3]He W.,Zhang,X.Y.,Yin F.,Liu C.L.Deep Direct Regression for Multi-Oriented Scene Text Detection.arXiv Preprint:1703.08289,2017b.
    [4]Shi B.,Bai X.,Belongie S.2017.Detecting oriented text in natural images by linking segments.arXiv preprint arXiv:1703.06520.
    [5]Qiangpeng Yang,Mengli Cheng,Wenmeng Zhou.IncepText:A New Inception-Text Module with Deformable PSROI Pooling for MultiOriented Scene Text Detection.In arXiv:1805.01167[cs.CV]
    [6]P.He,W.Huang,T.He,Q.Zhu,Y.Qiao,X.Li.Single Shot Text Detector with Regional Attention.arXiv Preprint:1709.00138,2017.6,7
    [7]M.Jaderberg,K.Simonyan,A.Vedaldi,A.Zisserman.Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition.arXiv preprint:1406.2227,2014.2,3,6,7.
    [8]Xuebo Liu,Ding Liang,Shi Yan.FOTS:Fast Oriented Text Spotting with a Unified Network.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,arXiv:1801.01671v1.
    [9]W.Liu,D.Anguelov,D.Erhan,C.Szegedy,S.Reed.SSD:Single Shot Multibox Detector.In Proceedings of the European Conference on Computer Vision,2016.1,2,3.
    [10]K.Simonyan,A.Zisserman.Very Deep Convolutional Networks for Large-Scale Image Recognition.CoRR,vol.abs/1409.1556,2014.
    [11]A.Krizhevsky,I.Sutskever,G.E.Hinton.ImageNet Classification with Deep Convolutional Neural Networks.In Advances in Neural Information Processing Systems,2012,5,1097-1105.
    [12]A.Gupta,A.Vedaldi,A.Zisserman.Synthetic Data for Text Localization in Natural Images.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016,3,5,7,2315-2324.
    [13]L.Huang,Y.Yang,Y.Deng,Y.Yu.Densebox:Unifying Landmark Localization with End to End Object Detection.arXiv preprint:1509.04874,2015.
    [14]A.Graves,S.Fernandez,F.Gomez,J.Schmidhuber.Connectionist Temporal Classification:Labelling Unsegmented Sequence Data with Recurrent Neural Networks.In Proceedings of the 23rd International Conference on Machine Learning.ACM,2006,2,5,369-376.
    [15]D.Karatzas,L.Gomez-Bigorda,A.Nicolaou,S.Ghosh,A.Bagdanov,M.Iwamura,J.Matas,L.Neumann,V.R.Chandrasekhar,S.Lu,F.Shafait,S.Uchida,E.Valveny.ICDAR 2015 Competition on Robust Reading.In Proc.of ICDAR,2015.
    [16]Icdar 2017 robust reading competitions.http://rrc.cvc.uab.es/.Online;accessed 2017-11-1.2,6
    [17]D.Karatzas,F.Shafait,S.Uchida,M.Iwamura,L.G.i Bigorda,S.R.Mestre,J.Mas,D.F.Mota,J.A.Almazan,L.P.de las Heras.Icdar2013 Robust Reading Competition.In Document Analysis and Recognition(ICDAR),2013 12th International Conference on IEEE 2013,2,6,7,1484-1493.
    [18]D.Kingma,J.Ba.Adam:A Method for Stochastic Optimization.arXiv preprint:1412.6980,2014.
    [19]B.Shi,X.Bai,S.Belongie.Detecting Oriented Text in Natural Images by Linking Segments.arXiv preprint:1703.06520,2017.1,2,6,8.
    [20]L.Gomez,D.Karatzas.Textproposals:a Text-Specific Selective Search Algorithm for Word Spotting in the Wild.Pattern Recognition,2017,6,70:60-74.
    [21]D.Karatzas,L.Gomez-Bigorda,A.Nicolaou,S.Ghosh,A.Bagdanov,M.Iwamura,J.Matas,L.Neumann,V.R.Chandrasekhar,S.Lu,et al.Icdar 2015 Competition on Robust Reading.In Document Analysis and Recognition(ICDAR),2015 13th International Conference on IEEE,2015,2,6,1156-1160.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700