摘要
利用小样本声纹作为训练集训练卷积神经网络(CNN)时,网络不能达到较好的收敛状态,从而导致识别率较低。为此,提出一种新的声纹识别方法。利用深度CNN提取潜在的声纹特征,在CNN训练过程中采用基于凸透镜成像原理的图像增多算法解决小样本训练样本不足的问题,并在卷积过程中引入快速批量归一化(FBN)方法以提高网络收敛速度、缩短训练时间。在包含630人的TIMIT语音数据库中进行训练、验证和测试,结果表明,FBN-Alexnet网络比Alexnet网络训练时间缩短48.2%,与GMM、GMM-UBM及GMM-SVM方法相比,该方法识别率分别提高7.3%、2.2%、2.8%。
When training Convolutional Neural NetWork(CNN) with small sample voiceprints as training set,the network cannot reach a good convergence state,which results in low recognition rate.So,this paper proposes a new voiceprint recognition method.The proposed method uses deep CNN to extract the rich and latent features of voiceprint,which improves the voiceprint recognition rate.In order to solve the problem that small sample cannot train the CNN,this paper proposes an image increasing algorithm based on the principle of convex lens imaging.At the same time,the Fast Batch Normalization(FBN) is introduced in the convolutional process,which improves the speed of the network convergence and shortens the training time.Select a TIMIT speech database containing voices of 630 speakers for training,validating and testing.Experimental results show that,compared with the GMM,GMM-UBM,and GMM-SVM algorithms,the proposed method improves the recognition rate by 7.3%,2.2%,and 2.8% and compared with the original network,the training time of the FBN-Alexnet network is reduced by 48.2%.It means that it is an effective method for voiceprint recognition of small samples.
引文
[1] SALEEM M M,HANSEN J H L.A discriminative unsupervised method for speaker recognition using deep learning[C]//Proceedings of IEEE International Workshop on Machine Learning for Signal Processing.Washington D.C.,USA:IEEE Press,2016:1-5.
[2] 陈锦飞,徐欣.基于梅尔频率倒谱系数与动态时间规整的安卓声纹解锁系统[J].计算机工程,2017,43(2):201-205.
[3] BAE H S,LEE H J,LEE S G.Voice recognition based on adaptive MFCC and deep learning[C]//Proceedings of IEEE Conference on Industrial Electronics and Applications.Washington D.C.,USA:IEEE Press,2016:1542-1546.
[4] AZMY M M.Classification of lung sounds based on linear prediction cepstral coefficients and support vector machine[C]//Proceedings of Applied Electrical Engineering and Computing Technologies.Washington D.C.,USA:IEEE Press,2015:1-5.
[5] 林舒都,邵曦.基于i-vector和深度学习的说话人识别[J].计算机技术与发展,2017,27(6):66-71.
[6] CIRESAN D D,MEIER U,MASCI J,et al.Flexible,high performance convolutional neural networks for image classification[C]//Proceedings of the International Joint Conference on Artificial Intelligence.Palo Alto,USA:AAAI Press,2011:1237-1242.
[7] ABDEL-HAMID O,MOHAMED A R,JIANG H,et al.Convolutional neural networks for speech recognition[J].IEEE/ACM Transactions on Audio Speech and Language Processing,2014,22(10):1533-1545.
[8] HUANG J T,LI J,GONG Y.An analysis of convolutional neural networks for speech recognition[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2015:4989-4993.
[9] ZHANG Y,PEZESHKI M,BRAKEL P,et al.Towards end-to-end speech recognition with deep convolutional neural networks[EB/OL].[2017-11-12].https://arxiv.org/abs/1701.02720.
[10] OQUAB M,BOTTOU L,LAPTEV I,et al.Learning and transferring mid-level image representations using convolutional neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2014:1717-1724.
[11] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Proceedings of International Conference on Neural Information Processing Systems.Red Hook,USA:Curran Associates Inc.,2012:1097-1105.
[12] LIU X,KAN M,WU W.et al.VIPLFaceNet:an open source deep face recognition SDK[J].Frontiers of Computer Science,2017,11(2):208-218.
[13] BEIGY H,MEYBODI M R.Adaptation of parameters of BP algorithm using learning automata[C]//Proceedings of Brazilian Symposium on Neural Networks.Washington D.C.,USA:IEEE Press,2000:24-31.
[14] KLINE D M,BERARDI V L.Revisiting squared-error and cross-entropy functions for training neural network classifiers[J].Neural Computing and Applications,2005,14(4):310-318.
[15] 赵立辉,毛竹,霍春宝,等.基于GMM-SVM的说话人识别系统研究[J].工矿自动化,2014,40(5):49-53.
[16] 周国鑫,高勇.基于GMM-UBM模型的说话人辨识研究[J].无线电工程,2014,44(12):14-17.