摘要
场景文字检测与识别是一种通用文字识别技术,已成为近年来计算机视觉与文档分析领域的热点研究方向.其被广泛应用于地理定位、车牌识别、无人驾驶等领域.相对于传统的文档文字检测和识别,场景文字在字体、尺度、排布、背景等方面变化更加剧烈,深度学习技术也由于卓越的性能成为该领域的主流方法.本文主要回顾了作者基于深度学习在此领域取得的代表性成果,并对此领域未来研究趋势进行了展望.
Scene text detection and recognition is a universal text recognition technology, which has become a hot research topic in the field of computer vision and document analysis in recent years. It is widely applied in geographical positioning, license plate recognition, and driverless applications. Compared to traditional document text detection and recognition, scene text varies more dramatically in font, color, scale, layout, and background.Owing to its excellent performance, deep learning has been widely adopted in this field. In this paper, we mainly review our representative studies based on deep learning in this field and describe the future research trends in this field.
引文
1 Zhu Y Y,Yao C,Bai X.Scene text detection and recognition:recent advances and future trends.Front Comput Sci,2016,10:19-36
2 Ye Q X,Doermann D S.Text detection and recognition in imagery:a survey.IEEE Trans Pattern Anal Mach Intel,2015,37:1480-1500
3 Mori S,Suen C Y,Yamamoto K.Historical review of OCR research and development.Proc IEEE,1992,80:1029-1058
4 Huang W L,Qiao Y,Tang X O.Robust scene text detection with convolution neural network induced mser trees.In:Proceedings of European Conference on Computer Vision,Zurich,2014.497-511
5 Neumann L,Matas J.Real-time scene text localization and recognition.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Providence,2012.3538-3545
6 Yao C,Bai X,Liu W Y,et al.Detecting texts of arbitrary orientations in natural images.In:Proceedings of IEEEConference on Computer Vision and Pattern Recognition,Providence,2012.1083-1090
7 Liao M H,Shi B G,Bai X,et al.Text Boxes:a fast text detector with a single deep neural network.In:Proceedings of the 31st AAAI Conference on Artificial Intelligence,San Francisco,2017
8 Jaderberg M,Simonyan K,Vedaldi A,et al.Reading text in the wild with convolutional neural networks.Int JComput Vision,2016,116:1-20
9 Gupta A,Vedaldi A,Zisserman A.Synthetic data for text localisation in natural images.In:Proceedings of IEEEConference on Computer Vision and Pattern Recognition,Las Vegas,2016
10 Ren S Q,He K M,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks.IEEE Trans Pattern Anal Mach Intel,2017,39:1137-1149
11 Liu W,Anguelov D,Erhan D,et al.SSD:single shot multibox detector.In:Proceedings of European Conference on Computer Vision,Amsterdam,2016
12 Ross G,Jeff D,Trevor D,et al.Rich feature hierarchies for accurate object detection and semantic segmentation.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Columbus,2014
13 Girshick R B.Fast R-CNN.In:Proceedings of IEEE International Conference on Computer Vision,Santiago,2015
14 Zhang Z,Zhang C Q,Shen W,et al.Multi-oriented text detection with fully convolutional networks.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,2016
15 Zhang Z,Shen W,Yao C,et al.Symmetry-based text line detection in natural scenes.In:Proceedings of Computer Vision and Pattern Recognition,Boston,2015.2558-2567
16 Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation.In:Proceedings of IEEEConference on Computer Vision and Pattern Recognition,Boston,2015
17 Lecun Y,Bottou L,Bengio Y,et al.Gradient-based learning applied to document recognition.Proc IEEE,1998,86:2278-2324
18 Shahab A,Shafait F,Dengel A.ICDAR 2011 robust reading competition challenge 2:reading text in scene images.In:Proceedings of International Conference on Document Analysis and Recognition,Beijing,2011.1491-1496
19 Karatzas D,Shafait F,Uchida S,et al.ICDAR 2013 robust reading competition.In:Proceedings of the 12th International Conference on Document Analysis and Recognition,Washington,2013.1484-1493
20 Shi B G,Bai X,Belongie S.Detecting oriented text in natural images by linking segments.In:Proceedings of IEEEConference on Computer Vision and Pattern Recognition,Honolulu,2017
21 Tian Z,Huang W L,He T,et al.Detecting text in natural image with connectionist text proposal network.In:Proceedings of European Conference on Computer Vision,Amsterdam,2016
22 He P,Huang W L,He T,et al.Single shot text detector with regional attention.In:Proceedings of IEEE International Conference on Computer Vision,Venice,2017.3066-3074
23 Hu H,Zhang C Q,Luo Y X,et al.Word Sup:exploiting word annotations for character based text detection.In:Proceedings of IEEE International Conference on Computer Vision,Venice,2017.4950-4959
24 He W H,Zhang X Y,Yin F,et al.Deep direct regression for multi-oriented scene text detection.In:Proceedings of IEEE International Conference on Computer Vision,Venice,2017.745-753
25 Zhou X Y,Yao C,Wen H,et al.EAST:an efficient and accurate scene text detector.In:Proceedins of IEEEConference on Computer Vision and Pattern Recognition,Honolulu,2017.2642-2651
26 Karatzas D,Gomez-Bigorda L,Nicolaou A,et al.ICDAR 2015 competition on robust reading.In:Proceedings of the13th International Conference on Document Analysis and Recognition,Tunis,2015.1156-1160
27 Mishra A,Alahari K,Jawahar C J.Scene text recognition using higher order language priors.In:Proceedings of British Machine Vision Conference,Surrey,2012
28 Yao C,Bai X,Shi B G,et al.Strokelets:a learned multi-scale representation for scene text recognition.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Columbus,2014.4042-4049
29 Bai X,Yao C,Liu W Y.Strokelets:a learned multi-scale mid-level representation for scene text recognition.IEEETrans Image Process,2016,25:2789-2802
30 Alsharif O,Pineau J.End-to-end text recognition with hybrid HMM maxout models.Co RR,2013.Ar Xiv:1310.1811
31 Almaz′an J,Gordo A,Forn′es A,et al.Handwritten word spotting with corrected attributes.In:Proceedings of IEEEInternational Conference on Computer Vision,Sydney,2013.1017-1024
32 Bissacco A,Joseph M,Netzer Y,et al.Photo OCR:reading text in uncontrolled conditions.In:Proceedings of IEEEInternational Conference on Computer Vision,Sydney,2013.785-792
33 Shi B G,Bai X,Yao C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition.IEEE Trans Pattern Anal Mach Intel,2017,39:2298-2304
34 Hochreiter S,Schmidhuber J.Long short-term memory.Neural Comput,1997,98:1735-1780
35 Wang K,Babenko B,Belongie S J.End-to-end scene text recognition.In:Proceedings of International Conference on Computer Vision,Barcelona,2011
36 Lucas S M,Panaretos A,Sosa L,et al.ICDAR 2003 robust reading competitions:entries,results,and future directions.Int J Doc Anal Recogn,2005,7:105-122
37 Shi B G,Wang X G,Lyu P Y,et al.Robust scene text recognition with automatic rectification.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,2016.4168-4176
38 Jaderberg M,Simonyan L,Zisserman A,et al.Spatial transformer networks.In:Proceedings of Conference on Neural Information Processing Systems,Montreal,2015.2017-2025
39 Phan T Q,Shivakumara P,Tian S X,et al.Recognizing text with perspective distortion in natural scenes.In:Proceedings of IEEE International Conference on Computer Vision,Sydney,2013
40 Risnumawan A,Shivakumara P,Chan C S,et al.A robust arbitrary text detection system for natural scene images.Expert Syst Appl,2014,41:8027-8048
41 Yang S L,Bo L F,Wang J,et al.Unsupervised template learning for fine-grained object recognition.In:Proceedings of the 25th International Conference on Neural Information Processing Systems,Lake Tahoe,2012.3122-3130
42 Jia D,Jonathan K,Li F F.Fine-grained crowdsourcing for fine-grained recognition.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Portland,2013.580-587
43 Zhang N,Donahue J,Girshick R,et al.Part-based R-CNNs for fine-grained category detection.In:Proceedings of European Conference on Computer Vision,Zurich,2014.834-849
44 Bai X,Yang M K,Lyu P Y,et al.Integrating scene text and visual appearance for fine-grained image classification with convolutional neural networks.Co RR,2017.Ar Xiv:1704.04613
45 Szegedy C,Liu W,Jia Y Q,et al.Going deeper with convolutions.In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Boston,2015
46 Karaoglu S,van Gemert J C,Gevers T.Con-text:text detection using background connectivity for fine-grained object classification.In:Proceedings of the 21st ACM International Conference on Multimedia,Barcelona,2013.757-760
47 Karaoglu S,Tao R,Gevers T,et al.Words matter:scene text for image classification and retrieval.IEEE Trans Multim,2017,19:1063-1076
48 Liu Y L,Jin L W,Zhang S T,et al.Detecting curve text in the wild:new dataset and new solution.Co RR,2017.Ar Xiv:1712.02170
49 Shi B G,Yao C,Liao M H,et al.Competition on reading chinese text in the wild.In:Proceedings of the 14th IAPRInternational Conference on Document Analysis and Recognition,Kyoto,2017
50 Lyu P Y,Bai X,Yao C,et al.Auto-encoder guided GAN for chinese calligraphy synthesis.Co RR,2017.Ar Xiv:1706.08789
51 Goodfellow I J,Pouget-Abadie J,Mirza M,et al.Generative adversarial nets.In:Proceedings of the 27th International Conference on Neural Information Processing Systems,Montreal,2014
1)https://github.com/Mh Liao/Text Boxes.
2)https://github.com/bgshih/seglink.
3)https://github.com/bgshih/crnn.
4)ICDAR2017 competition on multi-lingual scene text detection and script identification.http://rrc.cvc.uab.es/?ch=8.