说话人识别系统的研究

英文题名：Study of Speaker Recognition System
作者：蒋纯纲
论文级别：硕士
学科专业名称：控制理论与控制工程
中文关键词：端点检测 ; 特征提取 ; 线性预测系数 ; 美尔倒谱系数 ; 说话人识别 ; 矢量量化
英文关键词：endpoint detection ; feature extraction ; LPCC ; MFCC ; speaker recognition ; VQ
学位年度：2008
导师：屈百达
学科代码：081101
学位授予单位：江南大学
论文提交日期：2008-05-01
答辩委员会主席：杨慧中

摘要

说话人识别作为生物认证技术的一种,是根据语音波形中反映说话人生理和行为特征的语音参数,自动鉴别说话人身份的一项技术。说话人识别技术以其独特的方便性、经济性和准确性等优势受到世人瞩目,并日益成为人们日常生活和工作中重要且普及的安全验证方式。因此,研究一种识别率高、鲁棒性强的说话人识别方法是国内外众多研究者努力的目标。
     本文通过分析说话人识别基本原理与系统结构,考察现有的说话人识别技术,研究采用线性预测倒谱系数和美尔倒谱系数为特征参数,运用矢量量化的说话人识别方法,建立说话人识别系统。为了有效地提高系统的识别效果,具体工作总结如下:
     首先研究了语音端点检测算法,介绍了常用的短时能量、短时平均过零率、基于小波变换后的分形理论和基于频带方差的端点检测方法,相关实验仿真均反映其各自算法特点。并在分析以上算法存在不足的情况下,提出了改进算法即子带频带方差和功率谱熵的端点检测算法,实验仿真结果证明了其优越性。
     接着研究了特征提取算法,主要研究了几种常见的语音特征参数(LPC、LPCC、MFCC),并对MFCC和LPCC进行了一定的理论推导,并提出了一种新的特征参数—基于最小方差无失真响应的感知倒谱系数PMCC。
     然后研究了说话人识别方法,简单介绍了各类常用的说话人识别方法,动态时间规正(DTW)方法,矢量量化(VQ)方法,隐马尔可夫模型(HMM)方法,高斯混合模型(GMM)方法,人工神经网络(ANN)方法、支持向量机模型(SVM)方法。着重详细地介绍了矢量量化(VQ)方法的基本原理及其应用,同时提出了改进的矢量量化(VQ)方法,并作为本系统识别方法。
     最后研究了系统的实现过程,提取的线性预测系数语音特征参数(LPCC)和美尔倒谱系数语音特征参数(MFCC),首先对LPCC和MFCC运用矢量量化(VQ)方法在不同码本容量,不同时长进行说话人识别实验,然后对LPCC和MFCC运用改进的矢量量化(VQ)方法在不同时长进行说话人识别实验,并比较、分析其识别实验结果,得出最佳识别方法—基于标准差的WDMVQ算法作为系统的识别方法。
Speaker recognition as one of the biometrics techniques is to recognize speaker's identity from its voice which contains physiological and behavioral characteristics specific to each individual. Speaker recognition has caught many attentions for its particularly advantage on convenience, economy and veracity and become an important and popular authentication technique in human life and work. Therefore, a more robust method for speaker recognition with high accuracy of recognition rate is the aim for researchers at home and abroad.
     By analyzing the general principles and system structure of speaker recognition and considerating subsistent technology of speaker recognition, Linear prediction cepstrum coefficient(LPCC) and Mel cepstrum coefficient(MFCC) are adopted as characteristic parameters, the vector quantization(VQ)is used as speaker recognition method to set up speaker recognition system. To improve the recognition effect, the tasks are made as follows:
     Firstly, endpoint detection is studied, some classic endpoint detection methods are discussed here, such as: short-time energy, average zero-crossing rate, based on fractal dimension after wavelet transform, based on spectrum variance. The related results all show the characteristics of their own. By analyzing the faults of those algorithms, endpoint detection algorithms based on adaptive subband spectral entropy and power entropy are proposed, the experimental results prove their superiority.
     Secondly, feature extraction is studied, It mainly studied some common characteristic parameters of speech such as LPC, LPCC and MFCC. MFCC and LPCC are theoretically stated. And a new feature, that is perceptual cepstral coefficients based on the minimum variance distortless response(PMCC), is proposed
     Thirdly, speaker recognition is studied, some methods of speaker recognition are presented, such as DTW, VQ, HMM, GMM, ANN and SVM. Espeacilly, the basic principle and application of VQ are detailedly presented. Meanwhile an improved VQ is proposed and it is as the method of this recognition system.
     Finally, the realization progress of this system is studied. LPCC and MFCC are extracted. The speaker recognition experiments are made using LPCC and MFCC based on VQ in different capacity and time., and then based on improved VQ in different time. Experiment results are compared and analyzed ,and result in best recognition method -WDMVQ based on standard deviation as speaker recognition method of this system .

引文

[1]韩纪庆,张磊,郑铁然.语音信号处理[M].北京:清华大学出版社,2004,1-10
    [2]杨行峻,迟惠生.语音信号数字处理[M].北京:电子工业出版社,1995,3-8
    [3]王炳锡,屈丹.实用语音识别基础[M].北京:国防工业出版社,2005.1,1-9
    [4]姚文冰.基于高阶累计量的抗噪声识别方法[D].华中科技大学博士学位论文,2001
    [5]朴春俊,马静霞,徐鹏.带噪语音端点检测方法研究[J]计算机应用2006(11):85-90
    [6]李祖鹏,姚佩阳.一种语音段起止端点检测新方法[J]电讯技术,2000(3):68-70
    [7]Thomas F.Quatieri著,赵胜辉等译.离散时间语音信号处理-原理与应用[M].北京:电子工业出版社,2004:504-512
    [8]韦晓东,胡光锐,任晓林.应用倒谱特征的带噪语音端点检测方法[J].上海交通大学学报2000,34(2):185-188
    [9]柏静,韦岗.一种基于线性预测与自相关函数的语音基音周期检测新算法[J]电声技术,2005(8):43-46
    [10]Junqua J C,Mak B,Revaes B.A robust algorithm for word boundary detection in the presence of noise [J],IEEE Trans.Speech Audio Processing,1994,2(4):406-412
    [11]Shen J L,Hung J W,Lee L S.Robust entropy-based endpoint detection for speech recognition in noisy environments[J],presented at the ICSLP,1998,7(6):400-405
    [12]Wu G D,Lin C T.Word boundary detection with reel-scale frequency bank in noise environment[J].IEEE Trans.Speech Audio Processing,2000,8(3):541-544
    [13]Wu B F,and Wang K C,Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments[J].IEEE Speech Audio Processing,2005,13(5):762-774
    [14]何振亚等.语音信号的主分量特征.应用科学学报[J].1999,12(3):12-14
    [15]Makhoul J.,Gray A.Linear Prediction of Speech,Springer-Verlay,1976
    [16]胡航.语音信号处理[M].哈尔滨工业大学出版社,2000.5
    [17]何英,何强.MATLAB扩展编程[M].清华大学出版社,2002.6
    [18]易克初,田斌,付强.语音信号处理Speech Signal Processing[M].国防工业出版社,2000,249-264
    [19]胡光锐等.基于倒谱特征的带噪声语音端点检测[J].电子学报,2000,10(4):35-37
    [20]Murthy H,Bcaufays F,Heck L,et al.Robust Text-Independent Speaker Identification over Telephone Channels[J].IEEE Trans on Speech and Audio Processing,1999,7(5):554-568
    [21]Fakotakis N,Sirigos J.A High Performance Text-Independent Speaker Identification System Based on Vowel Spotting and Neural Nets[J].In:Proceedings of IEEE Int Conf on Acoustics,Speech and Singal Processing,Atlanta,GA,USA,1996,661-664
    [22]Hermansky H.Perceptual Linear Predictive(PLP) Analysis for Speech[J].Journal of the Acoustical Society of America,1990,87(4):1738-1752
    [23]M.Savic and S.K.Gupta.Vaiable parameter speaker verification system based on hidden Markov models[J],Proc.IEEE ICASSP,1990,281-284
    [24]王定让,柴配琪.语音倒谱特征研究[J].计算机工程,2003,29(13),31-33
    [25]张红.基于听觉感知机理的语音特征研究[D].西南交通大学研究生博士学位论文,1998
    [26]Jialong He,Li Liu,and Gunther Palm.On the use of residual cepstrum in speech recognition[J].IEEE transactions on speech and audio processing,1996 5(1):180-183
    [27]Tohkura Y.A weighted cepstral distance measure for speech recognition[J].IEEE Trans.ASSP.1987,35(10):1414-1422
    [28]Paliwal K K.On the performance of the frequency-weighted cepstral coefficients in vowel recognition [J].Speech Communication.1982:151-154
    [29]Juang B H,Rabiner L R,Wilpon J G.On the use of bandpass liftering in speech recognition[J].IEEE trans.ASSP.1987,35(7):947-954
    [30]Hunt M.Spectral signal processing for ASR[J].IN:Proc.of ASRU,1999,25(12):17-26
    [31]Gu L,Rose K.Perceptual harmonic cepstral coefficients as the front-end for speech recognition[J].IN:Proc.ICSLP,2000,583-586
    [32]Dharanipragada S,Rao B,MVDR-based Feature Extraction for Robust Speech recognition[C].IN;Proc.IS\CASSP,2001(1).309-312
    [33]Yapanel U H,Dhranipragada S.Perceptual MVDR-Based Cepstral Coefficients(PMCCs) for noise robust speech recognition[C],Proc.ICASSP,2003(1).644-647
    [34]F.K.Soong,A.E.Rosenberg,L.R.Rabiner,Vector Quantization Approach to Speaker Recognition[J].Proc.IEEE ICASSP,1985,387-390
    [35]张炜,胡起秀,吴文虎,距离加权矢量量化文本无关的说话人识别[J],清华大学学报(自然科学版),1997,37(3):20-23
    [36]尉洪,周浩,杨鉴,基于矢量量化的组合参数法说话人识别[J],云南大学学报(自然科学版),2002,24(2):96-100
    [37]A.Gersho,R.M.Gray,Vector Quantization and Signal Compression[M],Boston:Kluwer Academic Publishers,1992
    [38]Y.Linde,A.Buzo,and R.M.Gray,An algorithm for Vector Quantization Design[J].IEEE Trans,Commun.Vol.COM-28,1980,84-95
    [39]P.Franti,J.Kivijarvi,Randomized Local Search Algorithm for the Clustering Problem[J].Pattern Analysis & Applications,2000,358-369
    [40]A.E.Rosenberg.etc.,Sub-Word Unit Talker Verification Using Hidden Markov Models[J].IEEE ICASSP,1990,269-272
    [41]C.W.Che,Q.G..Yuk,An HMM Approach to Text-Prompted Speaker Verification[C],The 1996 IEEE International Conference On Acoustics,Speech and Signal Processing Conference Proceedings,1996(7-10):673-676
    [42]M.Birnbaum,K.L.Brown,S.Bardenhagen,Text-independent Speaker Identification Using Fenonic Speaker Markov Modeling[C],The 1996 IEEE International Conference on Acoustics,Speech and Signal Processing Conference Proceedings,1996,677-680
    [43]李灿伟,杨震,基于HMM的说话人确认系统的研究[J].南京邮电学院学报(自然科学版),2001,21(2):52-55
    [44]Reynolds D.,Rose R.C.,Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models[J].IEEE Trans.Speech and Audio Processing,Vol.3,1995,72-83
    [45]杨澄宇,赵文,杨鉴,基于高斯混合模型的说话人确认系统[J].计算机应用,200l,Vol.21,No.4,
    [46]J.Oglesby,J.S.Mason,Radial Basis Function Network for Speaker Recognition[J].Proc.ICASSP'91,Vol.1,393-396,1991
    [47]B.Yegnanarayana,K.S.Reddy,S.P.Kishore,Source and System Features for Speaker Recognition using AANN Models[J].in Proc.ICASSP,2001
    [48]J.C.Burges,A Tutorial on Support Vector Machines for Pattern Recognition[J].Data Mining and Knowledge Discovery,vol.2,no.2,121-167,1998
    [49]C.Burges,A Tutorial on Support Vector Machines for Pattern Recognition[J].Data Mining and Knowledge Discovery,2,p.121—167.1998
    [50]A.E.Rosenberg etc.Connected word talker verification using hidden Markov models[J].Proc.ICASSP'91,Vol.1,May 1991,381-384
    [51]R.A.Finan,R.I.Damper,A.T Sapeluk.Improved Data Modeling for Text-Dependent Speaker Recognition Using Sub-Band Processing[J].Intemational journal of speech technology 2001,4(1):45-62
    [52]Roch,M.Hurtig.The integral decode:a smoothing technique for robust HMM-based speaker recognition[J]. IEEE Transactions on Speech and Audio Proceeding, 2002,315-324
    [53] R.K Soong. Vector quantization approach to speaker recognition[J]. Proc. IEEE ICASSP, 1985,71-79

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700