基于GMM的说话人识别系统研究与实现

英文题名：Research and Implementation on Speaker Recognition System Based on GMM
作者：陈强
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：语音处理 ; 说话人识别 ; 高斯混合模型 ; 矢量量化 ; 特征提取
英文关键词：Speech Processing ; Speaker Recognition ; GMM ; VQ ; Feature Extraction
学位年度：2010
导师：阙大顺
学科代码：081002
学位授予单位：武汉理工大学
论文提交日期：2010-04-01
答辩委员会主席：杨杰

摘要

说话人识别也称声纹识别,其目的是根据说话人的声音特征来完成说话人的辨认或确认。随着网络信息化技术的迅猛发展,身份验证的数字化、隐性化、便捷化显得越来越重要,说话人识别作为一种生物认证技术,在视觉监控、身份验证、司法刑侦及金融安全等领域有着广泛应用前景,成为当前语音信号处理领域的研究热点。说话人识别技术研究的关键是语音信号的特征提取和模式匹配等问题。本文在研究当前说话人识别主要算法的基础上,通过研究基于声学特性的倒谱特征提取方法和基于模板匹配及概率统计的模式匹配方法,研究实现了基于矢量量化VQ的说话人识别系统,重点研究设计了与文本无关的基于混合高斯模型GMM的说话人识别系统。
     论文主要研究内容如下：
     (1)总结归纳说话人识别技术的发展、研究热点和难点,分析讨论了现有说话人识别主要算法。
     (2)分析研究了说话人识别语音预处理,重点对减谱法语音增强算法进行了改进,通过实验分析了语音增强效果,提高了噪声环境下的说话人识别系统的鲁棒性；研究了说话人识别的特征提取原理和方法,仿真实现了说话人基音特征、LPCC和MFCC参数及差分倒谱参数等的提取。
     (3)在分析VQ基本原理、LBG算法和VQ码本初始化的基础上,设计实现了基于VQ的说话人识别系统,完成了模型参数训练和匹配识别过程,实验分析了不同模型参数及不同语音样本时长下的系统识别性能。
     (4)为了提高系统识别率和稳定性,在研究GMM模型参数估计期望最大化(EM)算法、模型参数初始化、训练和识别过程的基础上,研究设计了基于GMM的说话人识别系统,并完成了系统仿真实验,分析了不同模型参数、不同特征提取方法、不同语音样本时长和不同信噪比噪声环境下的说话人识别性能。
     (5)分析了开集说话人识别方法、说话人确认阈值选取方法,研究了一种先辨认后确认的开集说话人识别方法,分析了针对集外冒充说话人的“拒识问题”,并完成了基于VQ和GMM两种模型的开集说话人识别系统性能分析比较。
Speaker Recognition is also known as Voiceprint Identification, of which the purpose is to indentify or verify the speaker based on the voice.With the rapid development of network information technology, the digitalization, recessivation and facilitation of identity authentication has become more and more important. As a biological authentication technology, Speaker Recognition has wide application prospects in many fields such as surveillance, authentication, investigation and finance security and become a hot spot in the research on speech signal processing. The key technologies of Speaker Recognition are feature extraction and pattern matching currently. On the condition that research the key algorithm of the current speaker recognition, this paper study the method of feature extraction based on acoustic performance, the method of pattern matching base on template matching and probability-statistics.Analyze and verify Speaker Recognition System base on VQ. Thoroughly, study and design of Text-independent Speaker Recognition System based on GMM.
     The concrete content is as follows:
     (1) Summarize status of development, the study hotspot and difficulty in speaker recognition technology. Analyze and discuss the existing main algorithm in speaker recognition.
     (2) Study voice signal processing and spectral subtraction method of speech enhancement algorithms in speaker recognition system of front end process, improves spectral subtraction method. The experiment shows that the robustness of speaker recognition system is improved in noisy environment. Research the fundamental principle of feature extraction of speaker recognition. Realize parameter extraction process of pitch, LPCC, MFCC and its difference by simulation.
     (3) On the basis of analyzing the fundamental principle of VQ, the algorithm of LBG and mode initialization in VQ, Design and Implementation of speaker recognition system based on VQ. Establish of training model parameters and the process of recognizing parameters matching. Analyze the performance of speaker recognition system in different model parameters and duration of speech samples by experiments.
     (4) To improve the recognition rate and the stability of the system, research the algorithm of expectation maximization (EM) for parameter estimation, model parameter initialization, the process of training parameters and recognizing parameters in GMM, and complete simulation and experiment. Analyze the performance of system in different model parameter, methods of feature extraction, duration of speech samples, various SNR.
     (5)Analyze the open-set speaker recognition, the rule and method of getting threshold value in speaker verification. A method of speaker identification followed speaker verification in open-set speaker recognition is presented. Solve "rejection problems" for pretenders.Finally, analyses and compares the performance of open-set speaker recognition based on VQ and GMM.

引文

[1]M. Chetouani, M. Faundez-Zanuy, B. Gas, and JL. Zarader. Investigation on LP-residual representation for speaker identification, Pattern Recognition.2009, (42):487～94
    [2]GONG Wei-Guo, YANG Li-Ping, CHEN Di. Pitch Synchronous Based Feature Extraction for Noise-Robust Speaker Verification. CISP congress on Image and Signal Processing.2008, (5):295～298
    [3]M. F. Abu El-Yazeed, M.A. El Gamal, M. M. H. El Ayadi. On the Determination of Optimal Model order for GMM-Based Text-Independent Speaker Identification. Journal on Applied Signal Processing.2007, (8):1078～1087
    [4]Mijail A, Anil A, Philip Z. A Bayesian network approach combining Pitch and spectral envelope features to reduce channel mismatch in speaker verification and forensic speaker recognition [A]. In Proceedings of Inter Speech [C], Lisbon, Portugal,2005,2009～2013.
    [5]Reynolds D A, Campbell W, Gleason T T. The 2004 MIT Lincoln laboratory speaker recognition system [A]. In Proceedings of ICASSP [C], Philadel Phia, USA,2008.
    [6]W.Han, C.Chan. An efficient MFCC extraction method in speech recognition.2006 IEEE International Symposium on Circuits and Systems.2006,145～148
    [7]V. Ramasubramanian and Amitav Das. Text-dependent speaker-recognition-A survey and state-of-the-art. Tutorialat. ICASSP-2006, Toulouse, France, May 2006.
    [8]I. M. Chagnolleau, G. Durou and F. Bimbot. Application of time-frequency principal component analysis to text-independent speaker identification, IEEE Transaction on Speech and Audio Processing.2007, (6):371～378
    [9]陈杰,张玲华.说话人识别中语音特征参数的研究[J].信息技术学报,2006,(11)：88～93
    [10]王坚.说话人识别中的说话人自适应研究[D].北京邮电大学.2007
    [11]X.Huang, A. Acero, H. Hon. Spoken Language Processing:A Guide to Theory, gorithm, and System Development. Prentice Hal.2008
    [12]J. Poruba. Speech enhancement based on nonlinear spectral subtraction. IEEE Proc. Int. Conf. on Devices, Circuits and Systems [C].2007,311～314
    [13]郑成诗,周釜,李晓东.基于联合语音出现概率的先验信噪比估计算法[J].电子与信息学报,2008,30(7)：1680～1683
    [14]边肇祺,张学工.模式识别(第2版)[M].北京：清华大学出版社,2008
    [15]蔡连红,黄德智,蔡锐.现代语音技术基础与应用[M].北京：清华大学大学出版社, 2003
    [16]林玮,杨莉莉,徐柏龄.基于修正MFCC参数汉语耳语语音的话者识别[J].南京大学学报,2006,42(1)：54～62
    [17]Youngetal, The HTK Book(forHTKVersion3.0), Speech Vision and Robotics Group, Cambridge University Engineering Department, Jul.2009
    [18]D. A. Reynolds, T. F. Quatieri, R. B. Dunn, "Speaker Verification Using Adapted Gaussian Mixture Model", Digital Signal Processing.2007, (10):19～41
    [19]C. Vair, D. Colibro, F. Castaldo, E. Dalmasso, P. Laface. Channel Factors Compensation in Model and Feature Domain for Speaker Recognition, Odyssey workshop on speaker and language recognition 2006, (8):14～31
    [20]Changwoo Seo, Ki Yong Lee, Joohun Lee. GMM based on local PCA for speaker identification. Electronics Letters.2007.37(24).1486～888
    [21]Reynolds D. A., Quatieri T. F, and Dunn R. B. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing,2008, (10):19～41
    [22]Steve Young, Gunnar Evermann. The HTK book for version 3.3,2005
    [23]Plumpe M D, Quatieri T F, Reynolds D A. Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans on Speech and Audio Processing,2008, 7(5):569～586
    [24]Goutam Saha, Sandipan Chakroborty, Suman Senapati. An F-Ratio Based Optimization Technique for Automatic Speaker Recognition System[J]. IEEE INDIA ANNUALCONFERENCE.2007,70～73
    [25]Marcos Faundez-Zanuy. State-of-the-art in Speaker Recognition [J]. IEEE A&E system magazine,2005, (5):7～12
    [26]赵力.语音信号处理(第2版)[M].北京：机械工业出版社,2009
    [27]张雄伟,陈亮,杨吉斌.现代语音处理技术及应用[M].北京：机械工业出版社,2003
    [28]韩纪庆,张磊,郑铁然.语音信号处理[M].北京：清华大学出版社,2004
    [29]胡航.语音信号处理[M].哈尔滨：哈尔滨工业大学出版社,2005
    [30]申朝文,何家峰,蔡继祖.说话人识别技术的方法和展望[J].中国科技信息.2007,(4)：63～68
    [31]刘大力,赵力.与文本无关说话人识别系统的性能比较[C].全国物理声学会议论文集,2004：105～106
    [32]崔桂香,丁晓明.基于GMM的说话人识别系统研究[J].北京电子科技学院学报,2004,12(4)：2～9
    [33]林江云,说话人辨认中GMM模型的聚类优化研究[J].计算机学报,2008,(5)：31～38
    [34]李燕萍,唐振民,丁辉,张燕.基于GMM特征变换和模糊LS-SVM的说话人辨认[J].华中科技大学学报.2008,(6)：23～29
    [35]唐建,郭立.矢量量化码书设计与矢量量化应用研究[D].中国科学技术大学,2006
    [36]陈继旭,刘明辉,戴蓓蓓,李辉.文本无关说话人确认中的一种新的评分规整方法[J].信号处理,2006,(4)：36～43
    [37]马静,李国勇,王珺.基于改进VQ算法的说话人识别[J].机械工程与自动化.2008
    [38]姚志强,戴蓓情.说话人识别中提高GMM性能方法的研究[D].中国科学技术大学,2006
    [39]R. Bahl, P. F. Brown, P. V. de Souza, R. L. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Process.1,2006,49～52
    [40]Ramachandran P, Zilovic M S, Mammone R J.A comparative study of robust linear predictive analysis methods with application to speaker identification. IEEE Transactions on Speech and Audio Processing,2007,3(2):117～125
    [41]J. Danglnan. The importance of being random:Statistical Principles of iris recognition. Patten Recognition,2006,36(2):279～291
    [42]WangJian, GuoJun, LiuGang, Lei Jianjun, UBM Based Speaker Selection and Model Re-Estimation for speaker Adaptation, IEEE ICCI'06,2006,856～560
    [43]Gurmeet Singh, Ashish Panda, and Saurav Bhattacharyya, "Vector Quantization techniques for GMM based speaker verification". IEEE,ICASSP,2008
    [44]Roland Auckenthaler, Eluned S. Parris and Michael J. Carey. Improving a GMM speaker verification system by phonetic weighting. IEEE,2006
    [45]J. Mariethoz, S. Bengio. A comparative study of adaptation methods for speaker verification. Proc. ICSLP,2007
    [46]Bing Xiang, Berger, T. Efficient Text-independent Speaker Verification with Structural Gaussian Mixture Models and Neural Network. IEEE Transactions on Speech and Audio Processing, Sept 2003,11(5):447～456

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700