基于深度学习的小样本声纹识别方法

英文篇名：Small Sample Voiceprint Recognition Method Based on Deep Learning
作者：李靓 ; 孙存威 ; 谢凯 ; 贺建飚
英文作者：LI Jing;SUN Cunwei;XIE Kai;HE Jianbiao;School of Electronic and Information,Yangtze University;School of Computer Science,Yangtze University;College of Information Science and Engineering,Central South University;
关键词：声纹识别 ; 深度学习 ; FBN-Alexnet网络 ; 小样本 ; 快速批量归一化 ; 图像增多算法
英文关键词：voiceprint recognition;;deep learning;;FBN-Alexnet network;;small sample;;Fast Batch Normalization(FBN);;image increasing algorithm
中文刊名：JSJC
英文刊名：Computer Engineering
机构：长江大学电子信息学院;长江大学计算机科学学院;中南大学信息科学与工程学院;
出版日期：2018-02-08 15:56
出版单位：计算机工程
年：2019
期：v.45;No.498
基金：国家自然科学基金(61272147);; 湖北省教育厅项目(B2015446);; 长江大学青年基金(2016cqn10);; 大学生创新创业计划基金(2017009)
语种：中文;
页：JSJC201903044
页数：7
CN：03
ISSN：31-1289/TP
分类号：268-273+278

摘要

利用小样本声纹作为训练集训练卷积神经网络(CNN)时,网络不能达到较好的收敛状态,从而导致识别率较低。为此,提出一种新的声纹识别方法。利用深度CNN提取潜在的声纹特征,在CNN训练过程中采用基于凸透镜成像原理的图像增多算法解决小样本训练样本不足的问题,并在卷积过程中引入快速批量归一化(FBN)方法以提高网络收敛速度、缩短训练时间。在包含630人的TIMIT语音数据库中进行训练、验证和测试,结果表明,FBN-Alexnet网络比Alexnet网络训练时间缩短48.2%,与GMM、GMM-UBM及GMM-SVM方法相比,该方法识别率分别提高7.3%、2.2%、2.8%。
When training Convolutional Neural NetWork(CNN) with small sample voiceprints as training set,the network cannot reach a good convergence state,which results in low recognition rate.So,this paper proposes a new voiceprint recognition method.The proposed method uses deep CNN to extract the rich and latent features of voiceprint,which improves the voiceprint recognition rate.In order to solve the problem that small sample cannot train the CNN,this paper proposes an image increasing algorithm based on the principle of convex lens imaging.At the same time,the Fast Batch Normalization(FBN) is introduced in the convolutional process,which improves the speed of the network convergence and shortens the training time.Select a TIMIT speech database containing voices of 630 speakers for training,validating and testing.Experimental results show that,compared with the GMM,GMM-UBM,and GMM-SVM algorithms,the proposed method improves the recognition rate by 7.3%,2.2%,and 2.8% and compared with the original network,the training time of the FBN-Alexnet network is reduced by 48.2%.It means that it is an effective method for voiceprint recognition of small samples.

引文

[1] SALEEM M M,HANSEN J H L.A discriminative unsupervised method for speaker recognition using deep learning[C]//Proceedings of IEEE International Workshop on Machine Learning for Signal Processing.Washington D.C.,USA:IEEE Press,2016:1-5.
    [2] 陈锦飞,徐欣.基于梅尔频率倒谱系数与动态时间规整的安卓声纹解锁系统[J].计算机工程,2017,43(2):201-205.
    [3] BAE H S,LEE H J,LEE S G.Voice recognition based on adaptive MFCC and deep learning[C]//Proceedings of IEEE Conference on Industrial Electronics and Applications.Washington D.C.,USA:IEEE Press,2016:1542-1546.
    [4] AZMY M M.Classification of lung sounds based on linear prediction cepstral coefficients and support vector machine[C]//Proceedings of Applied Electrical Engineering and Computing Technologies.Washington D.C.,USA:IEEE Press,2015:1-5.
    [5] 林舒都,邵曦.基于i-vector和深度学习的说话人识别[J].计算机技术与发展,2017,27(6):66-71.
    [6] CIRESAN D D,MEIER U,MASCI J,et al.Flexible,high performance convolutional neural networks for image classification[C]//Proceedings of the International Joint Conference on Artificial Intelligence.Palo Alto,USA:AAAI Press,2011:1237-1242.
    [7] ABDEL-HAMID O,MOHAMED A R,JIANG H,et al.Convolutional neural networks for speech recognition[J].IEEE/ACM Transactions on Audio Speech and Language Processing,2014,22(10):1533-1545.
    [8] HUANG J T,LI J,GONG Y.An analysis of convolutional neural networks for speech recognition[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2015:4989-4993.
    [9] ZHANG Y,PEZESHKI M,BRAKEL P,et al.Towards end-to-end speech recognition with deep convolutional neural networks[EB/OL].[2017-11-12].https://arxiv.org/abs/1701.02720.
    [10] OQUAB M,BOTTOU L,LAPTEV I,et al.Learning and transferring mid-level image representations using convolutional neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2014:1717-1724.
    [11] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Proceedings of International Conference on Neural Information Processing Systems.Red Hook,USA:Curran Associates Inc.,2012:1097-1105.
    [12] LIU X,KAN M,WU W.et al.VIPLFaceNet:an open source deep face recognition SDK[J].Frontiers of Computer Science,2017,11(2):208-218.
    [13] BEIGY H,MEYBODI M R.Adaptation of parameters of BP algorithm using learning automata[C]//Proceedings of Brazilian Symposium on Neural Networks.Washington D.C.,USA:IEEE Press,2000:24-31.
    [14] KLINE D M,BERARDI V L.Revisiting squared-error and cross-entropy functions for training neural network classifiers[J].Neural Computing and Applications,2005,14(4):310-318.
    [15] 赵立辉,毛竹,霍春宝,等.基于GMM-SVM的说话人识别系统研究[J].工矿自动化,2014,40(5):49-53.
    [16] 周国鑫,高勇.基于GMM-UBM模型的说话人辨识研究[J].无线电工程,2014,44(12):14-17.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700