基于卷积神经网络的孤立词语音识别

英文篇名：Speech recognition of isolated words based on convolution neural networks
作者：侯一民 ; 李永平
英文作者：HOU Yi-min;LI Yong-ping;School of Automation Engineering,Northeast Electric Power University;
关键词：卷积神经网络 ; 语音识别 ; 局部感知野 ; 权值共享 ; 池化
英文关键词：convolutional neural networks;;speech recognition;;local perception;;weight sharing;;pooling
中文刊名：SJSJ
英文刊名：Computer Engineering and Design
机构：东北电力大学自动化工程学院;
出版日期：2019-06-16
出版单位：计算机工程与设计
年：2019
期：v.40;No.390
基金：吉林省科技发展计划基金项目(20150414051GH)
语种：中文;
页：SJSJ201906044
页数：6
CN：06
ISSN：11-1775/TP
分类号：258-263

摘要

为有效减少模型训练参数和训练时间,提高孤立词语音识别正确率,提出将卷积神经网络应用到语音识别中的方法。该网络中的局部感知野、权值共享与池化等特殊结构,能够在保证识别性能的前提下,极大地压缩训练模型的尺寸,深入分析卷积层卷积器个数与尺寸和池化层池化参数对识别结果的影响情况;经过动态时间规整网络,将发音单元不同长度帧的特征参数规整到同一帧数,输入到网络中进行语音识别。在自建库上的实验结果表明,相比传统的深度神经网络,卷积神经网络的语音识别正确率有12%的提升,是一种优良的语音识别模型。
To reduce the model training parameters and training time effectively and to improve the speech recognition rate of isolated words,convolutional neural network was proposed to apply to speech recognition.The special structure of local perception field,weight sharing and pooling in the network greatly reduced the size of the training model on the premise of ensuring the recognition performance,and the influence of the number and size of convolver of convolutional layers and the pooling parameters of pooling layers on the recognition results were deeply analyzed.After the dynamic time warping network,the characteristic parameters of different length frames of the pronunciation unit were normalized to the same number of frames and were input into the network for speech recognition.Experimental results on self-built databases show that compared with the traditional deep neural network,the accuracy of speech recognition of convolutional neural networks is improved by 12%,which is an excellent speech recognition model.

引文

[1]Abdel-Hamid O,Mohamed A r,Jiang H,et al.Convolutional neural networks for speech recognition[J].IEEE/ACMTransactions on Audio,Speech,and Language Processing,2014,22(10):1533-1545.
    [2]Qimike·Batexi,HUANG Hao,WANG Xianhui.Uighur speech recognition based on deep neural network[J].Computer Engineering and Design,2015,36(8):2239-2244(in Chinese).[其米克·巴特西,黄浩,王羡慧.基于深度神经网络的维吾尔语语音识别[J].计算机工程与设计,2015,36(8):2239-2244.]
    [3]Maimaitiaili·Tuerxun,DAI Lirong.Deep neural network based Uyghur large vocabulary continuous speech recognition[J].Journal of Data Acquisition&Processing,2015,30(2):365-371(in Chinese).[麦麦提艾力·吐尔逊,戴礼荣.深度神经网络在维吾尔语大词汇量连续语音识别中的应用[J].数据采集与处理,2015,30(2):365-371.]
    [4]DAI Lirong,ZHANG Shiliang,HUANG Zhiying.Deep learning for speech recognition:Review of state-of-the-arts technologies and prospects[J].Journal of Data Acquisition&Processing,2017,32(2):221-231(in Chinese).[戴礼荣,张仕良,黄智颖.基于深度学习的语音识别技术现状与展望[J].数据采集与处理,2017,32(2):221-231.]
    [5]Hinton G,Deng L,Yu D,et al.Deep neural networks for acoustic modeling in speech recognition[J].IEEE Signal Process,2013,29(6):82-97.
    [6]MEI Junjie.Research on speech recognition based on convolutional neural network[D].Beijing:Beijing Jiaotong University,2017(in Chinese).[梅俊杰.基于卷积神经网络的语音识别研究[D].北京:北京交通大学,2017.]
    [7]Sainath T N,Mohamed A r,Kingsbury B,et al.Deep convolutional neural networks for LVCSR[C]//IEEE International Conference on Acoustics,Speech and Signal Processing,2013:8614-8618.
    [8]Abdel-Hamid O,Deng L,Yu D.Exploring convolutional neural network structures and optimization techniques for speech recognition[C]//INTERSPEECH,2013:3366.
    [9]Zheng Yi,Liu Qi,Chen Enhong,et al.Time series classification using multi-channels deep convolutional neural networks[C]//Proceedings of 15th International Conference on WebAge Information Management.Macau:Springer Verlag,2014:298-310.
    [10]WANG Gongpeng,DUAN Meng,NIU Changyong.Stochastic gradient descent algorithm based on convolutional neural network[J].Computer Engineering and Design,2018,39(2):441-445(in Chinese).[王功鹏,段萌,牛常勇.基于卷积神经网络的随机梯度下降算法[J].计算机工程与设计,2018,39(2):441-445.]
    [11]HU Qing,LIU Benyong.Speaker recognition algorithm based on convolution neural network[J].Journal of Computer Applications,2016,36(S1):79-81(in Chinese).[胡青,刘本永.基于卷积神经网络的说话人识别算法[J].计算机应用,2016,36(S1):79-81.]
    [12]ZHU Xixiang,LIU Fengshan,ZHANG Chao.Research on vehicle speech recognition based on one-dimensional convolutional neural network[J].Microelectronics and Computers,2017,34(11):21-25(in Chinese).[朱锡祥,刘凤山,张超.基于一维卷积神经网络的车载语音识别研究[J].微电子学与计算机,2017,34(11):21-25.]
    [13]Li S J,Liu Z Q,Antoni B Chan.Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network[J].International Journal of Computer Vision,2015,113(1):19-36.
    [14]Zheng W Q,Yu J S,Zou Y X.An experimental study of speech emotion recognition based on deep convolutional neural networks[C]//International Conference on Affective Computing and Intelligent Interaction.Xi’an:IEEE,2015:827-831.
    [15]WANG Shanhai,JING Xinxing,YANG Haiyan.Research on speech recognition based on deep learning neural network for isolated words[J].Application Research of Computers,2015,32(8):2289-2291(in Chinese).[王山海,景新幸,杨海燕.基于深度学习神经网络的孤立词语音识别的研究[J].计算机应用研究,2015,32(8):2289-2291.]

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700