基于广义音素的文本无关说话人认证的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
从二十世纪80年代开始,随着科技的发展,文本无关说话人认证作为模式识别领域类一个的分支,越来越受到研究人员的青睐。目前,最热门的文本无关说话人认证系统均是基于高斯混合模型并结合背景模型的,这类系统忽略说话人说话的内容、语言等,因而其工程应用价值大打折扣。为了弥补当前技术的不足,近两年,基于广义音素的说话人认证系统引起了学术界的关注。采用广义音素的说话人认证不仅可以结合语音识别技术、文本无关说话人认证技术,还可以引入商业应用中比较成功的文本相关说话人认证中的技术:另外,广义音素的说话人认证可以很好的解决由于说话人语言多样性而带来的问题。
     在课题中,作者从广义音素的定义开始,对基于广义音素的说话人认证系统作了深入研究。文中,作者提出了一套完善的广义音素定义及模型训练方法并设计了基于广义音素的说话人认证系统的整体框架,使系统的性能和流行的基于高斯混合模型并结合背景模型的系统性能相当;同时,为了提高音素识别前端处理以及说话人自适应的效率,作者分别提出了快速声道长度归一化算法和说话人自适应鲁棒性算法;除了对基于应马尔可夫模型的广义音素说话人认证作了大量的研究,作者还提出了以本征音说话人自适应训练因子来张成说话人空间并使用支撑向量机在该空间来做说话人认证判决的系统,该系统能对传统的系统判决起到很好的补充作用。
As an important branch in the area of pattern recognition, text-independent speaker verification has attracted attention from more and more scientists since the last twenty years of the last century. Currently, Gaussian Mixture Model-Universal Background Model based speaker verification, dominates the field of text-independent speaker verification. Unfortunately, due to the regardless of content and language information, this kind of system has its limitation when applied to commercial tasks. To compensate the drawback of the Gaussian Mixture Model-Universal Background Model based speaker verification system, researchers have proposed speaker verification using broad phones, which could not only take use of techniques in the sphere of speech recognition, text-independent and text-dependent speaker verification, but also address problems introduced by the diversity of languages spoken by the speakers enrolling in the system.
     In this research work, the author proposed the definition of the broad phones, training method of the broad phonetic Hidden Markov Modes, and the framework of the broad phonetic Hidden Markov Modes based speaker verification system that has equivalent performance compared to Gaussian Mixture Model-Universal Background Model based speaker verification system. To boost the computational efficiency of the front-end processing in the phone recognizer, the author proposed a novel rapid Vocal Tract Length Normalization algorithm. Besides, the author proposed a algorithm to enhance the efficiency in speaker adaptation phase. In addition, the author successfully introduced the weights of Eigen Voice Speaker Adaptation into the Support Vector Machine, and constructed a new kind of speaker verification system which could provide complementary information to conventional Gaussian Mixture Model-Universal Background Model based speaker verification system.
引文
[1]Fr ed" erio Bimbot,Jean-Franc,ois Bonastre,Corinne Fredouille,Guillaume Gravier,Ivan Magrin-Chagnollean,SylvainMeiguier,Teva Merlin,Javier Ortega-Garc'la,Dijana Petrovska-Delacr'etaz,and Douglas A.Reynolds "A Tutorial on Text-Independent Speaker Verification" EURASIP Journal on Applied Signal Processing,Vol.5,2004,pp.219-241
    [2]Douglas A.Reynolds,Richard C.Rose "Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models" Digital Signal Processing,Vol.10,1995,pp.343-357
    [3]D.A.Reynolds,T.F.Quatieri,R.B.Dunn,"Speaker Verification Using Adapted Gaussian Mixture Models",Digital Signal Processing,Vol.10,2000,pp.19-41.
    [4]Vincent Wan and Willian M.Campbell,"Support vector machines for speaker verification and identification,"in Neural Net works for Signal Processing,Proceedings of the 2000 IEEE Signal Processing Workshop,2000,pp.773-784
    [5]X.Huang,A.Acero,H Hen."Spoken Language Processing:A Guide to Theory,Algorithm,and System Development" Prentwe Hal.2001
    [6]Hermansky,H.,"Perceptual Linear Predictive(PLP)Analysis of Speech," Journal of the Acoustical Society of America,1990,87(4),pp.1738-1752.
    [7]Steve Young,Guunar Evermann,etc."HTKBook".Cambridge
    [8]Hynek Hermansky,Nelson Morgan."RASTA Processing of Speech" IEEE Transactions on Speech and Audio Processing,eel.2,NO.4,Oct 1994,pp.678-685
    [9]C.Vair,D.Colibro,F.Castaldo,E.Dalmasso,P.Laface,"Channel Factors Compensation in Model and Feature Domain for Speaker Recognition",Odyssey 2006 Workshop on Speaker and Language Recognition.
    [10]J.Pelecanos,S.Sridharan,"Feature warping for robust speaker verification",Proc.2001:A Speaker Odyssey,2001,pp.213-218
    [11]A.Dempster,N.Laird,and D.Rubin,"Maximum likelihood from incomplete data via the EM Algorithm," Journal of the Royal Statistical Society,vol.39,no.1,1977,pp.1-38.
    [12]R.Collobert,S.Bengio,"SVMToreh:Support vector machines for large-scale regression problems," Journal of Machine Learning Research,vol.1,2001,pp.143-160.
    [13]V.Wan,S.Renals,"Speaker verification using sequence discriminant support vector machines,"IEEE Trans.Speech and Audio Processing,vol.13,no.2,200.5,pp.203-210.
    [14]W.M.Campbell,D.E.Sturim,D.A.Reynolds and A.Solomonoff,"SVM-based speaker verification using a GMM superveotor kernel and NAP variability compensation," in Prec.ICASSP,2006,pp.97-100.
    [15]J.-L.Gauvain and C.-H.Lee,"Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans.Speech,and Audio Processing,vol.2,no.2,1994,pp.291-298
    [16]R.Kuhn J.C.Junqua,P.Nguyen,and N.Niedzielski,"Rapid Speaker Adaptation in EigenvoioeSpace",IEEE Trans.on Speech and Audio Processing,Vol.8,No.6,Nov.2000,pp.695-707.
    [17]Hao Yang,Yuan Dong,Xianyu Zhao,ere,"Discriminative Transformation for Sufficient Adaptation in Text-Independent Speaker Verification "International Symposium on Chinese Spoken Language Processing,Singapore,2006,pp558-565
    [18]K.P.Li,J.E.Porter."Normalizations and selection of speech segments for speaker recognition scoring" in Prec.IEEE Int.Conf Acoustics,Speech,Signal Processing,vol.l,New York,NY,USA,April 1988,pp.595-598.
    [19]R.Auokenthaler,M.Carey,H.Lloyd-Thomas."Score normalization for text-independent speaker verification system" Digital Signal Processing,eel.10,no.1,2000,pp.987-992.
    [20]D.E.Sturim,D.A.Reynolds "Speaker adaptive cohort selection for TNorm in text-independent speaker verifioation"in Prec.ICASSP,2005,pp.97-100.
    [21]L.Rabiner,B.H.Juang,"Fundamentals of Speech Recognition",Prentice Hall,1993.
    [22]L.Lee and R.C.Rose.,"Speaker Normalization Using Efficient Frequency Warping Procedures",Proceeding of IEEE lnternational Conference on Acoustics,Speech,Signal Processing,Atlanta,GA,1996,353-356.
    [23]Hao Yang,Yuan Dong,Guangri Cui,etc,"A rapid vocal tract length normalization in speech recognition",submitted to Journal of China Universities of Posts and Telecommunications
    [24]Povey D.and P.C.Woodland.,"Frame Discrimination training of HMMs for Large Vocabulary Speech Recognition",Proceeding of IEEE International Conference on Acoustics,Speech,Signal Processing,,Phoenix,AZ US,1999,pp.333-336.
    [25]N.Kumar.,"Investigation of Silicon-Auditory Models and Generalization of Linear Diseriminaat Analysis for Improved Speech Recognition".Ph.d.Dissertation,John Hopkins University,Baltimore,USA,1997
    [26]M.J.F.Gales.,"Maximum likelihood multiple projection schemes for hidd,n Markov models",Technical Report CUED/F-INFENG/TR.365,Cambridge University,UK(1999)
    [27]Koolwaaij,J.,de Veth,J."The use of broad phonetic class models in speaker,recognition".In Prec.ICSLP,1998,pp.234-240.
    [28]Asmaa El Hannani,etc."Using Data-driven and Phonetic Units for Speaker Verification" Speaker and Language Recognition Workshop,2006
    [29]Brendan Baker,Sridha Sridharan."Speaker Verification using Hidden Markov Models in a Multilingual Text-oonstrained Framework",Speaker and Language Recognition Workshop,2006
    [30]Eddie Wong,etc."Multilingnal Phone Clustering for Recognition of Spontaneous Indonesian Speech Utilising Pronunciation Modelling Techniques",EuroSpeech,2003,pp.234-240.
    [31]A.Stolcke,L.Ferre,etc,"MLLR transforms as features in speaker recognition';in Proc.Interspeech,2005,pp.234-240.
    [32]Hao Yang,Yuan Dong,Xianyu Zhao,etc,"Clustering Adaptive Training Weights as Features in SVM-Based Speaker Verification",International Conference on Spoken Language Processing (InterSpeech),Belgium,2007,pp2012-2016
    [33]A.O.Hatch,S.Kajarekar and A.Stolcke,"Within-Class Covariance Normalization for SVM-based Speaker Recognition",in Proc.ICSLP '2006,2006,pp1012-1016
    [34]Campbell,W.M.,Sturim,D.E.,Reynolds,D.A.,Solomonoff,A.,"SVM based speaker verification using a GMM supervector kernel and NAP variability compesnsation",in Proc.of ICASSP' 2006,2006,pp713-717.
    [35]Xianyu Zhao,Yuan Dong,Hao Yang,etc,"The 2006 France Telecom Research and Development Center(Beijing)Speaker Recognition System" Speaker and Language Recognition Workshop,2006
    [36]"The NIST 2006 speaker reoognition evaluation plan,"http://www.nist.gov/speech/tests/spk/spk/2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700