基于多特征i-vector的短语音说话人识别算法

英文篇名：Short utterance speaker recognition algorithm based on multi-featured i-vector
作者：孙念 ; 张毅 ; 林海波 ; 黄超
英文作者：SUN Nian;ZHANG Yi;LIN Haibo;HUANG Chao;School of Advanced Manufacturing Engineering, Chongqing University of Posts and Telecommunications;School of Automation, University of Posts and Telecommunications;
关键词：说话人识别 ; i-vector ; 短语音 ; 多特征 ; 主成分分析 ; 线性判别分析
英文关键词：speaker recognition;;i-vector;;short utterance;;multi-feature;;Principal Component Analysis(PCA);;Linear Discriminant Analysis(LDA)
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：重庆邮电大学先进制造工程学院;重庆邮电大学自动化学院;
出版日期：2018-06-08 16:59
出版单位：计算机应用
年：2018
期：v.38;No.338
基金：重庆市基础科学与前沿技术研究专项重点项目(cstc2015jcyjBX0066)~~
语种：中文;
页：JSJY201810016
页数：5
CN：10
ISSN：51-1307/TP
分类号：93-97

摘要

当测试语音时长充足时,单一特征的信息量和区分性足够完成说话人识别任务,但是在测试语音很短的情况下,语音信号里缺乏充分的说话人信息,使得说话人识别性能急剧下降。针对短语音条件下的说话人信息不足的问题,提出一种基于多特征i-vector的短语音说话人识别算法。该算法首先提取不同的声学特征向量组合成一个高维特征向量,然后利用主成分分析(PCA)去除高维特征向量的相关性,使特征之间正交化,最后采用线性判别分析(LDA)挑选出最具区分性的特征,并且在一定程度上降低空间维度,从而实现更好的说话人识别性能。结合TIMIT语料库进行实验,同一时长的短语音(2 s)条件下,所提算法比基于i-vector的单一的梅尔频率倒谱系数(MFCC)、线性预测倒谱系数(LPCC)、感知对数面积比系数(PLAR)特征系统在等错误率(EER)上分别有相对72. 16%、69. 47%和73. 62%的下降。不同时长的短语音条件下,所提算法比基于i-vector的单一特征系统在EER和检测代价函数(DCF)上大致都有50%的降低。基于以上两种实验的结果充分表明了所提算法在短语音说话人识别系统中可以充分提取说话人的个性信息,有利地提高说话人识别性能。
When the length of the test speech is sufficient, the information and discrimination of single feature is sufficient to complete the speaker recognition task. However, when the length of the test speech was very short, the performance of speaker recognition is decreased significantly due to the small data size and insufficient discrimination. Aiming at the problem of insufficient speaker information under the short speech condition, a short utterance speaker recognition algorithm based on multi-featured i-vector was proposed. Firstly, different acoustic feature vectors were extracted and combined into a high-dimensional feature vector. Then Principal Component Analysis( PCA) was used to remove the correlation of the feature vectors, so that the features were orthogonalized. Finally, the most discriminating features were picked out by Linear Discriminant Analysis( LDA), which led to reduce the spatial dimension. Therefore, this multi-featured system can achieve a better speaker recognition performance. With the TIMIT corpus under the same short speech( 2 s)condition, the experimental results showed that the Equal Error Rate( EER) of the multi-featured system decreased respectively by 72. 16%, 69. 47% and 73. 62% compared with the single-featured systems including Mel-Frequency Cepstrum Coefficient( MFCC), Linear Prediction Cepstrum Coefficient( LPCC) and Perceptual Log Area Ratio( PLAR) based on ivector. For the different lengths of the short speech, the proposed algorithm provided rough 50% improvement on EER and Detection Cost Function( DCF) compared with the single-featured system based on i-vector. Experimental results fully indicate that the multi-featured system can make full use of the speaker' s characteristic information in the short utterance speaker recognition, and improves the speaker recognition performance.

引文

[1]CAMPBELL J P.Speaker recognition:a tutorial[J].Proceedings of the IEEE,1997,85(9):1437-1462.
    [2]BHATTACHARJEE U,SARMAH K.Speaker verification using acoustic and prosodic features[J].Advanced Computing:An International Journal,2013,4(1):45-51.
    [3]XU L,KONG A L,LI H,et al.Generalizing i-vector estimation for rapid speaker recognition[J].IEEE/ACM Transactions on Audio Speech&Language Processing,2018,26(4):749-759.
    [4]TAN Z,MAK M W,MAK B.DNN-based score calibration with multi-task learning for noise robust speaker verification[J].IEEE/ACM Transactions on Audio Speech&Language Processing,2018,26(4):700-712.
    [5]LI N,MAK M W,CHIEN J T.DNN-driven mixture of PLDA for robust speaker verification[J].IEEE/ACM Transactions on Audio Speech&Language Processing,2017,25(6):1371-1383.
    [6]MARTIN A F,GREENBERG C S.The NIST year 2010 speaker recognition evaluation plan[EB/OL].[2018-01-10].http://www.itl.nist.gov/iad/mig/tests/spk/2010/NIST_SRE10_evalplan.r6.pdf.
    [7]LI L,WANG D,ZHANG C,et al.Improving short utterance speaker recognition by modeling speech unit classes[J].IEEE/ACM Transactions on Audio Speech&Language Processing,2016,24(6):1129-1139.
    [8]PODDAR A,SAHIDULLAH M,SAHA G.Speaker verification with short utterances:a review of challenges,trends and opportunities[J].IET Biometrics,2018,7(2):91-101.
    [9]MA J,SETHU V,AMBIKAIRAJAH E,et al.Duration compensation of i-vectors for short duration speaker verification[J].Electronics Letters,2017,53(6):405-407.
    [10]KENNY P,BOULIANNE G,OUELLET P,et al.Joint factor analysis versus eigenchannels in speaker recognition[J].IEEETransactions on Audio Speech&Language Processing,2007,15(4):1435-1447.
    [11]YANG I H,HEOH S,YOON S H,et al.Applying compensation techniques on i-vectors extracted from short-test utterances for speaker verification using deep neural network[C]//Proceedings of the 2017 IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway,NJ:IEEE,2017:5490-5494.
    [12]樊春玲,陈秀霆.基于PCA和LDA的人脸识别系统设计[J].控制工程,2012,19(4):712-715.(FAN C L,CHEN X T.Design of face recognition system based on PCA and LDA[J].Control Engineering of China,2012,19(4):712-715.)
    [13]甄斌,吴玺宏,刘志敏,等.语音识别和说话人识别中各倒谱分量的相对重要性[J].北京大学学报(自然科学版),2001,37(3):371-378.(ZHEN B,WU X H,LIU Z M,et al.On the importance of components of the MFCC in speech and speaker recognition[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2001,37(3):371-378.)
    [14]尹聪,白静,龚宬,等.基于PLAR的说话人确认系统的噪音鲁棒性[J].清华大学学报(自然科学版),2013,53(6):791-795.(YIN C,BAI J,GONG C,et al.Noise-robustness of speaker verification based on the perceptual log area ratio[J].Journal of Tsinghua University(Science and Technology),2013,53(6):791-795.)
    [15]STAFYLAKIS T,KENNY P,ALAM M J,et al.Speaker and channel factors in text-dependent speaker recognition[J].IEEE/ACM Transactions on Audio Speech&Language Processing,2016,24(1):65-78.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700