候选框密度可变的YOLO网络国际音标字符识别方法

英文篇名：YOLO network character recognition method with variable candidate box density for international phonetic alphabet
作者：郑伊 ; 齐冬莲 ; 王震宇
英文作者：ZHENG Yi;QI Donglian;WANG Zhenyu;School of Humanities, Zhejiang University;College of Electrical Engineering, Zhejiang University;
关键词：国际音标 ; 字符检测与识别 ; YOLO网络 ; 深度学习
英文关键词：International Phonetic Alphabet(IPA);;character detection and recognition;;You Only Look Once(YOLO) network;;deep learning
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：浙江大学人文学院;浙江大学电气工程学院;
出版日期：2019-01-15 14:59
出版单位：计算机应用
年：2019
期：v.39;No.346
基金：国家自然科学基金资助项目(61571394);; 浙江省科技项目(2019C01001);; 浙江大学学科交叉预研专项(2018FZA122)~~
语种：中文;
页：JSJY201906021
页数：5
CN：06
ISSN：51-1307/TP
分类号：125-129

摘要

针对传统方法对国际音标(IPA)的字符特征提取存在的识别精度低、实效性差等问题,提出了一种候选框密度可变的YOLO网络国际音标字符识别方法。首先,以YOLO网络为基础,结合国际音标字符图像X轴方向排列紧密、字符种类和形态多样的特点来改变YOLO网络中候选框的分布密度;然后,增加识别过程中候选框在X轴上的分布,同时减小Y轴方向上的密度,构成YOLO-IPA网络。对采集自《汉语方音字汇》的含有1 360张、共72类国际音标图像的数据集进行检验,实验结果表明:所提方法对尺寸较大的字符识别率达到93.72%,对尺寸较小的字符识别率达到89.31%,较传统的字符识别算法,大幅提高了识别准确性;同时,在实验环境下检测速度小于1 s,因而可满足实时应用的需求。
Aiming at the low recognition accuracy and poor practicability of the traditional character feature extraction methods to International Phonetic Alphabet(IPA), a You Only Look Once(YOLO) network character recognition method with variable candidate box density for IPA was proposed. Firstly, based on YOLO network and combined with three characteristics such as the characters of IPA are closely arranged on X-axis direction and have various types and forms, the distribution density of candidate box in YOLO network was changed. Then, with the distribution density of candidate box on the X-axis increased while the distribution density of candidate box on the Y-axis reduced, YOLO-IPA network was constructed. The proposed method was tested on the IPA dataset collected from Chinese Dialect Vocabulary with 1 360 images of 72 categories. The experimental results show that, the proposed method has the recognition rate of 93.72% for large characters and 89.31% for small characters. Compared with the traditional character recognition algorithms, the proposed method greatly improves the recognition accuracy. Meanwhile, the detection speed was improved to less than 1 s in the experimental environment. Therefore, the proposed method can meet the need of real-time application.

引文

[1]燕海雄,江荻.国际音标符号的分类、名称、功能与Unicode编码[J].语言科学,2007,6(6):82-91.(YAN H X,JIANG D.The classifications,functions,Chinese names of IPA symbols and their unicode[J].Linguistic Sciences,2007,6(6):82-91.)
    [2]吕佳,江荻.国际音标扩展表的分类、命名与功能[J].听力学及言语疾病杂志,2013,21(6):665-668.(LYU J,JIANG D.The classification,nomenclature and function of extensions to the international phonetic alphabet[J].Journal of Audiology and Speech Pathology,2013,21(6):665-668.)
    [3]曹雨生,徐昂.微机国际音标系统[J].民族语文,1990(1):74-79.(CAO Y S,XU A.The international phonetic alphabet system in microcomputer[J].Minority Languages of China,1990(1):74-79.)
    [4]潘晓声.国际音标符号名称的简称[J].民族语文,2012(5):56-61.(PAN X S.The name abbreviation of international phonetic alphabet symbols[J].Minority Languages of China,2012(5):56-61.)
    [5]PADEFOGED H,石在.国际音标的一些主要特征[J].齐齐哈尔师范学院学报(哲学社会科学版),1995(2):150-153.(PADEFOGED H,SHI Z.Some major features of the international phonetic alphabet[J].Journal of Qiqihar University(Philosophy&Social Science Edition),1995(2):150-153.)
    [6]邱立松.国际音标字符识别算法的研究[D].上海师范大学,2015:2-3.(QIU L S.Study on the recognition algorithm of international phonetic alphabet characters[D].Shanghai:Shanghai Normal University,2015:2-3.)
    [7]张玉叶,姜彬,李开端,等.一种结合结构和统计特征的脱机数字识别方法[J].微型电脑应用,2016,32(8):76-79.(ZHANGY Y,JIANG B,LI K D,et al.An off-line handwritten numeral recognition method combined with the statistical characteristics and structural features[J].Microcomputer Applications,2016,32(8):76-79.)
    [8]陈东杰,张文生,杨阳.基于深度学习的高铁接触网定位器检测与识别[J].中国科学技术大学学报,2017,47(4):320-327.(CHEN D J,ZHANG W S,YANG Y.Detection and recognition of high-speed railway catenary locator based on deep learning[J].Journal of University of Science and Technology of China,2017,47(4):320-327.)
    [9]白翔,杨明锟,石葆光,等.基于深度学习的场景文字检测与识别[J].中国科学:信息科学,2018,48(5):531-544.(BAI X,YANG M K,SHI B G,et al.Deep learning for scene text detection and recognition[J].SCIENTIA SINICA Informationis,2018,48(5):531-544.)
    [10]钟冲,徐光柱.结合前景检测和深度学习的运动行人检测方法[J].计算机与数字工程,2016,44(12):2396-2399.(ZHONGC,XU G Z.Movement pedestrian detection method combined with foreground subtraction and deep learning[J].Computer&Digital Engineering,2016,44(12):2396-2399.)
    [11]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Image Net classification with deep convolutional neural networks[C]//NIPS2012:Proceedings of the 25th International Conference on Neural Information Processing Systems.North Miami Beach,FL:Curran Associates Inc.,2012:1097-1105.
    [12]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2014:580-587.
    [13]HE K M,ZHANG X Y,REN S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//Proceedings of the 2014 European Conference on Computer Vision,LNCS 8691.Cham:Springer,2014:346-361.
    [14]GIRSHICK R.Fast R-CNN[C]//ICCV 2015:Proceedings of the 2015 IEEE International Conference on Computer Vision.Washington,DC:IEEE Computer Society,2015:1440-1448.
    [15]REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
    [16]DAI J F,LI Y,HE K M,et al.R-FCN:object detection via region-based fully convolutional networks[C]//NIPS 2016:Proceedings of the 30th International Conference on Neural Information Processing Systems.North Miami Beach,FL:Curran Associates Inc.,2016:379-387.
    [17]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2016:779-788.
    [18]LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shot multibox detector[C]//ECCV 2016:Proceedings of the 2016European Conference on Computer Vision,LNCS 9905.Cham:Springer,2016:21-37.
    [19]REDMON J,FARHADI A.YOLO9000:better,faster,stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2017:6517-6525.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700