基于深度学习的场景文字检测与识别

英文篇名：Deep learning for scene text detection and recognition
作者：白翔 ; 杨明锟 ; 石葆光 ; 廖明辉
英文作者：Xiang BAI;Mingkun YANG;Baoguang SHI;Minghui LIAO;School of Electronic Information and Communications, Huazhong University of Science and Technology;
关键词：深度学习 ; 场景文字 ; 文字检测 ; 文字识别 ; 计算机视觉
英文关键词：deep learning;;scene text;;text detection;;text recognition;;computer vision
中文刊名：PZKX
英文刊名：Scientia Sinica(Informationis)
机构：华中科技大学电子信息与通信学院;
出版日期：2018-05-20
出版单位：中国科学:信息科学
年：2018
期：v.48
基金：国家自然科学基金(批准号:61733007,61222308,61573160);; 数字出版技术国家重点实验室开放课题(批准号:F2016001)资助项目
语种：中文;
页：PZKX201805006
页数：14
CN：05
ISSN：11-5846/TP
分类号：51-64

摘要

场景文字检测与识别是一种通用文字识别技术,已成为近年来计算机视觉与文档分析领域的热点研究方向.其被广泛应用于地理定位、车牌识别、无人驾驶等领域.相对于传统的文档文字检测和识别,场景文字在字体、尺度、排布、背景等方面变化更加剧烈,深度学习技术也由于卓越的性能成为该领域的主流方法.本文主要回顾了作者基于深度学习在此领域取得的代表性成果,并对此领域未来研究趋势进行了展望.
Scene text detection and recognition is a universal text recognition technology, which has become a hot research topic in the field of computer vision and document analysis in recent years. It is widely applied in geographical positioning, license plate recognition, and driverless applications. Compared to traditional document text detection and recognition, scene text varies more dramatically in font, color, scale, layout, and background.Owing to its excellent performance, deep learning has been widely adopted in this field. In this paper, we mainly review our representative studies based on deep learning in this field and describe the future research trends in this field.

引文

1 Zhu Y Y,Yao C,Bai X.Scene text detection and recognition:recent advances and future trends.Front Comput Sci,2016,10:19-36
    2 Ye Q X,Doermann D S.Text detection and recognition in imagery:a survey.IEEE Trans Pattern Anal Mach Intel,2015,37:1480-1500
    3 Mori S,Suen C Y,Yamamoto K.Historical review of OCR research and development.Proc IEEE,1992,80:1029-1058
    4 Huang W L,Qiao Y,Tang X O.Robust scene text detection with convolution neural network induced mser trees.In:Proceedings of European Conference on Computer Vision,Zurich,2014.497-511
    5 Neumann L,Matas J.Real-time scene text localization and recognition.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Providence,2012.3538-3545
    6 Yao C,Bai X,Liu W Y,et al.Detecting texts of arbitrary orientations in natural images.In:Proceedings of IEEEConference on Computer Vision and Pattern Recognition,Providence,2012.1083-1090
    7 Liao M H,Shi B G,Bai X,et al.Text Boxes:a fast text detector with a single deep neural network.In:Proceedings of the 31st AAAI Conference on Artificial Intelligence,San Francisco,2017
    8 Jaderberg M,Simonyan K,Vedaldi A,et al.Reading text in the wild with convolutional neural networks.Int JComput Vision,2016,116:1-20
    9 Gupta A,Vedaldi A,Zisserman A.Synthetic data for text localisation in natural images.In:Proceedings of IEEEConference on Computer Vision and Pattern Recognition,Las Vegas,2016
    10 Ren S Q,He K M,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks.IEEE Trans Pattern Anal Mach Intel,2017,39:1137-1149
    11 Liu W,Anguelov D,Erhan D,et al.SSD:single shot multibox detector.In:Proceedings of European Conference on Computer Vision,Amsterdam,2016
    12 Ross G,Jeff D,Trevor D,et al.Rich feature hierarchies for accurate object detection and semantic segmentation.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Columbus,2014
    13 Girshick R B.Fast R-CNN.In:Proceedings of IEEE International Conference on Computer Vision,Santiago,2015
    14 Zhang Z,Zhang C Q,Shen W,et al.Multi-oriented text detection with fully convolutional networks.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,2016
    15 Zhang Z,Shen W,Yao C,et al.Symmetry-based text line detection in natural scenes.In:Proceedings of Computer Vision and Pattern Recognition,Boston,2015.2558-2567
    16 Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation.In:Proceedings of IEEEConference on Computer Vision and Pattern Recognition,Boston,2015
    17 Lecun Y,Bottou L,Bengio Y,et al.Gradient-based learning applied to document recognition.Proc IEEE,1998,86:2278-2324
    18 Shahab A,Shafait F,Dengel A.ICDAR 2011 robust reading competition challenge 2:reading text in scene images.In:Proceedings of International Conference on Document Analysis and Recognition,Beijing,2011.1491-1496
    19 Karatzas D,Shafait F,Uchida S,et al.ICDAR 2013 robust reading competition.In:Proceedings of the 12th International Conference on Document Analysis and Recognition,Washington,2013.1484-1493
    20 Shi B G,Bai X,Belongie S.Detecting oriented text in natural images by linking segments.In:Proceedings of IEEEConference on Computer Vision and Pattern Recognition,Honolulu,2017
    21 Tian Z,Huang W L,He T,et al.Detecting text in natural image with connectionist text proposal network.In:Proceedings of European Conference on Computer Vision,Amsterdam,2016
    22 He P,Huang W L,He T,et al.Single shot text detector with regional attention.In:Proceedings of IEEE International Conference on Computer Vision,Venice,2017.3066-3074
    23 Hu H,Zhang C Q,Luo Y X,et al.Word Sup:exploiting word annotations for character based text detection.In:Proceedings of IEEE International Conference on Computer Vision,Venice,2017.4950-4959
    24 He W H,Zhang X Y,Yin F,et al.Deep direct regression for multi-oriented scene text detection.In:Proceedings of IEEE International Conference on Computer Vision,Venice,2017.745-753
    25 Zhou X Y,Yao C,Wen H,et al.EAST:an efficient and accurate scene text detector.In:Proceedins of IEEEConference on Computer Vision and Pattern Recognition,Honolulu,2017.2642-2651
    26 Karatzas D,Gomez-Bigorda L,Nicolaou A,et al.ICDAR 2015 competition on robust reading.In:Proceedings of the13th International Conference on Document Analysis and Recognition,Tunis,2015.1156-1160
    27 Mishra A,Alahari K,Jawahar C J.Scene text recognition using higher order language priors.In:Proceedings of British Machine Vision Conference,Surrey,2012
    28 Yao C,Bai X,Shi B G,et al.Strokelets:a learned multi-scale representation for scene text recognition.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Columbus,2014.4042-4049
    29 Bai X,Yao C,Liu W Y.Strokelets:a learned multi-scale mid-level representation for scene text recognition.IEEETrans Image Process,2016,25:2789-2802
    30 Alsharif O,Pineau J.End-to-end text recognition with hybrid HMM maxout models.Co RR,2013.Ar Xiv:1310.1811
    31 Almaz′an J,Gordo A,Forn′es A,et al.Handwritten word spotting with corrected attributes.In:Proceedings of IEEEInternational Conference on Computer Vision,Sydney,2013.1017-1024
    32 Bissacco A,Joseph M,Netzer Y,et al.Photo OCR:reading text in uncontrolled conditions.In:Proceedings of IEEEInternational Conference on Computer Vision,Sydney,2013.785-792
    33 Shi B G,Bai X,Yao C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition.IEEE Trans Pattern Anal Mach Intel,2017,39:2298-2304
    34 Hochreiter S,Schmidhuber J.Long short-term memory.Neural Comput,1997,98:1735-1780
    35 Wang K,Babenko B,Belongie S J.End-to-end scene text recognition.In:Proceedings of International Conference on Computer Vision,Barcelona,2011
    36 Lucas S M,Panaretos A,Sosa L,et al.ICDAR 2003 robust reading competitions:entries,results,and future directions.Int J Doc Anal Recogn,2005,7:105-122
    37 Shi B G,Wang X G,Lyu P Y,et al.Robust scene text recognition with automatic rectification.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,2016.4168-4176
    38 Jaderberg M,Simonyan L,Zisserman A,et al.Spatial transformer networks.In:Proceedings of Conference on Neural Information Processing Systems,Montreal,2015.2017-2025
    39 Phan T Q,Shivakumara P,Tian S X,et al.Recognizing text with perspective distortion in natural scenes.In:Proceedings of IEEE International Conference on Computer Vision,Sydney,2013
    40 Risnumawan A,Shivakumara P,Chan C S,et al.A robust arbitrary text detection system for natural scene images.Expert Syst Appl,2014,41:8027-8048
    41 Yang S L,Bo L F,Wang J,et al.Unsupervised template learning for fine-grained object recognition.In:Proceedings of the 25th International Conference on Neural Information Processing Systems,Lake Tahoe,2012.3122-3130
    42 Jia D,Jonathan K,Li F F.Fine-grained crowdsourcing for fine-grained recognition.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Portland,2013.580-587
    43 Zhang N,Donahue J,Girshick R,et al.Part-based R-CNNs for fine-grained category detection.In:Proceedings of European Conference on Computer Vision,Zurich,2014.834-849
    44 Bai X,Yang M K,Lyu P Y,et al.Integrating scene text and visual appearance for fine-grained image classification with convolutional neural networks.Co RR,2017.Ar Xiv:1704.04613
    45 Szegedy C,Liu W,Jia Y Q,et al.Going deeper with convolutions.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Boston,2015
    46 Karaoglu S,van Gemert J C,Gevers T.Con-text:text detection using background connectivity for fine-grained object classification.In:Proceedings of the 21st ACM International Conference on Multimedia,Barcelona,2013.757-760
    47 Karaoglu S,Tao R,Gevers T,et al.Words matter:scene text for image classification and retrieval.IEEE Trans Multim,2017,19:1063-1076
    48 Liu Y L,Jin L W,Zhang S T,et al.Detecting curve text in the wild:new dataset and new solution.Co RR,2017.Ar Xiv:1712.02170
    49 Shi B G,Yao C,Liao M H,et al.Competition on reading chinese text in the wild.In:Proceedings of the 14th IAPRInternational Conference on Document Analysis and Recognition,Kyoto,2017
    50 Lyu P Y,Bai X,Yao C,et al.Auto-encoder guided GAN for chinese calligraphy synthesis.Co RR,2017.Ar Xiv:1706.08789
    51 Goodfellow I J,Pouget-Abadie J,Mirza M,et al.Generative adversarial nets.In:Proceedings of the 27th International Conference on Neural Information Processing Systems,Montreal,2014
    1)https://github.com/Mh Liao/Text Boxes.
    2)https://github.com/bgshih/seglink.
    3)https://github.com/bgshih/crnn.
    4)ICDAR2017 competition on multi-lingual scene text detection and script identification.http://rrc.cvc.uab.es/?ch=8.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700