基于特征选择及其融合方法的说话人识别

英文题名：Effective Feature Selection and Relevant Merged Methods-based Speaker Identification
作者：孙彦群
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：有效特征集 ; 算法融合 ; 说话人识别 ; GMM ; MMI
英文关键词：Effective feature set ; Merged methods ; Speaker identification ; GMM ; MMI
学位年度：2011
导师：俞一彪
学科代码：081002
学位授予单位：苏州大学
论文提交日期：2011-04-01

摘要

说话人识别就是通过对话者的语音进行处理来实现对说话人的身份的判别。语音易获取,不像钥匙和卡一样需要随身携带,不会对人体舒适度产生影响,语音不会丢失,在日常生活中的使用十分方便。由此可见,相关说话人识别产品的开发必定会给人们的日常生活带来极大的便利。然而,说话人识别在经历了一段时间的快速发展之后,在比较长的一段时间里都没有跨越性的发展。但是一些学者和机构有的提出了自己的方法,有的在原有算法的基础上提出了一些改进的和融合的方法。例如一些学者在对语音学深入研究的基础上,提出了一些对说话人的个性特征具有较强表征能力的有效特征集的选择方法,并取得了比较好的实验结果。但是迄今为止,人们并没有深入到语音的本质,并没有将说话人的个性信息和语义信息提取并表征出来。因此,对语音深层次的研究有待深入。
本文首先对说话人识别的基本理论和一些基本方法进行了介绍。并建立了基于高斯混合模型(GMM)的说话人识别系统,提取表征说话人身份的有效语音数据并进行建模和评估。在对说话人识别相关的理论分析和大量的针对性实验的基础上提出了一种基于有效特征集选择的说话人识别方法。对表征说话人特性的特征数据进行了个性和共性的有效划分,实验证明提出的方法是合理的、有效的。并在此基础上,结合一些好的方法,提出了基于有效融合方法的说话人识别方法,经实验验证,提出的方法有效地提高了说话人识别系统的识别性能。高斯混合模型和最大互信息的结合提高了说话人识别系统的综合性能。在此基础上,进一步融入了有效特征选择的方法,使得说话人识别系统性能得以进一步提高,使系统更加完善。另外,对基于浊音语音的说话人识别方法进行了分析,实验证明浊音语音对说话人身份的表征是比较有效的。课题研究中还建立了基于matlab的实时说话人识别系统,在普通的学生宿舍环境下进行测试,取得了比较理想的判别效果。
Speaker identification system distinguish different people by means of processing speech.Speech is easily gotten,not like keys or cards which need taking along by people and will not cause comfortable problems,it will not be lost.So we can see that relevent speaker products will bring enormous convenience to our everyday life.After a stage of rapid expansion,we have not got a leapfrog development for a long time.While some scholars and organizations propose their own methods and some present improved or merged methods.For example,some scholars bring out different methods for effective feature set selection which have preferably characteristic capability for the identification of the speaker on the basis of deeply basal study and achieve good results.But so far,we still can not get into the nature of speech,and can not extract the essene feature of speaker and semantics.Then we can see,deepgoing study remains penetrated into.
Firstly,basic theories and methods of speaker identification are introduced.Then a GMM-based speaker identification system is built. Effective speech sounds which characteristic the identification of the speaker are extracted on which we build modals and evaluate their effectivenesses. An effective feature set-based speaker identification method is put forward on the basis of analysis on relevant theories and many pertinent experiments which can make valid separation on characteristic and common features and is supported by experiments.On this basis,other wonderful methods are added to improve the performance of speaker identification system and is proved by experiments. Performance is improved by combining GMM and Maximum Mutual Information.On this condition,method of effective feature selection which can improve the performance of speaker identification system is added,which perfects the system.Besides,voiced speech based speaker identification system is analysed and the experiments indicate that voiced speech is relatively effective to characterize the identity of the speaker.Meanwhile,a real-time speaker identification system is built which is based on matlab and is tested under the conditification of normal dormitory environment of students,experiments get relative ideal results.

引文

[1]赵力.语音信号处理[M].北京:机械工业出版社,2003.
    [2] H.Kuwabara,Restructuring speech representations using a pitch adaptive time frequency smooothing and an instantaneous frequency based F0 extraction[J].Speech Communication,1999.
    [3]俞一彪,袁冬梅,薛峰.一种适于说话人识别的非线性频率尺度变换[J] .声学学报,2008.09,33(5):450-455.
    [4]陆伟,戴蓓蒨,李辉,刘青松.MFCC中的基音频率信息对说话人识别系统性能的影响[J].中国科技大学学报,2009.08,39(8):859-884.
    [5]李冬冬,吴朝晖,杨莹春.基于基频的情感语音聚类的说话人识别方法[J].模式识别与人工智能,2009.02,22(1):136-141.
    [6]张庆芳.基于mel子带系数的文本无关的说话人识别[D].苏州大学, 2006.
    [7] Cornel B.Efficient.LSP computation and quantization[C].International Symposium on Signals,Circuits and Systems,2005,1(1):175-178.
    [8]于明,袁玉倩,董浩,王哲.一种基于MFCC和LPCC的文本相关说话人识别方法[J].计算机应用.2006.04,26(4):883-885.
    [9]王伟,邓辉文.基于MFCC和VQ的说话人识别系统[J].仪器仪表学报,2006.27(6):2253-2255.
    [10]丁爱明.作为说话人识别特征参量的MFCC的提取过程[J].电子工程师,2006.01, 32(1):51-53.
    [11]余建潮,张瑞林.基于MFCC和LPCC的说话人识别[J].2009,30 (5):1189-1191.
    [12]郭春霞.基于VQ的说话人识别系统实现[J].2009.10, 5(29):8256-9257.
    [13] Zhen-dong Zhao,Jing Zhang,Jing-feng Tian,Yun-yong Lou.An effective identification method for speaker recognition based on PCA and double VQ[C].July 12-15 2009,Proceedings of the Eighth International Conferenceon Machine Learning and Cybernetics,Baoding.3:1686-1689.
    [14]王平,毛剑琴.支持向量机训练算法及其应用[J].2005.12,3(4):309-314.
    [15] Reynolds D A,Rose R C.Robust text-independent speakr identification using Gaussian Mixture Speaker Models[J]. IEEE Speech and Audio,1995
    [16]俞一彪,王朔中.文本无关说话人识别的全特征矢量集模型及互信息评估方法[J].声学学报,2005.11,30(6):536-541.
    [17]余建潮,张瑞林.基于MFCC和LPCC的说话人识别[J].2009,30(5):1189-1191.
    [18] Yuhuan Zhou,Jinming Wang,Xiongwei Zhang.Research on Adaptive Speaker Identification Based on GMM[C].IEEE,2009 International Forum on Computer Science-Technology and Applications.2:330-332.
    [19] W.M.Campbell,D.A.Reynolds.Support Vector Machines Using GMM Supervecters for Speaekr Verification[J].IEEE,2006.05,13( 5):308-311.
    [20]亢明,汪成亮,陈娟娟.基于动态矢量量化的说话人识别[J].计算机应用,2009.01,29(1):146-148.
    [21]吴庆棋,林江云.基于聚类优化GMM提高说话人识别性能的研究[J].计算机技术与发展,2009.04,19(4):35-40.
    [22]郭武,戴礼荣,王仁华.说话人识别中的串行因子分析[J].模式识别与人工智能,2009.08,22(4):514-518.
    [23] C.Arun Kumar,B.Bharathi,T.Nagaranjan.A Discriminative GMM Technique using Product of Likelihood Gaussians[C].IEEE,TECON 2009-2009 IEEE Region 10 Conference,2009:1-6.
    [24] Chi-Sang Jung,Mo Young Kim,Hong-Goo kang.Selecting Feature Frames for Automatic Speaker Recognition Using Mutual Information[J]. IEEE Transactions on Audio, Speech and Language Processing,August 2010, 18(6):1332-1340.
    [25] T.Nagarajan and Douglas O’Shaughnessy.Dicriminative MLE training using a product of Gaussian likelihoods[C].in INTERSPEECH, 2006.Pittsburgh, Pensylvania, USA, Sept2006,601-604.
    [26] H.C.Peng,F.Long,and C.Ding.Feature selection based on mutual information:Criteria of max-dependency,max-relevance,and min-redundancy[J].IEEEtrans.Pattern Analysis and Machine Intelligen, 2005.08 27(8):1226-1238.
    [27] S.Kim,S.W.Yoon,T.Eriksson,H.G.Kang,and D.H.Youn.A noise-robust pitch synchronous feature extraction algorithm for speaker recognition systems[C].2005. 09,1:2029-2032.
    [28] T.Wu,D.Compernolle,J.Duchateau,and H.Van Hamme.Maximum likelihood based temporal frame selection[C],in Proc.Int.Conf.Acoust.Speech Signal Process., 2006, 1:349-352.
    [29] T.Eriksson,S.Kim,H.G.Kang,and C.Y.Lee.An information-theoretic perspective on feature selection in speaker recognition[J].IEEE Signal Process. Lett., 2005,0, 12(7):500-503.
    [30] S.Cang and H.Yu.A New Aproach For Dtecting the Best Feature Set[C].IEEE, 2005.03,74-79.
    [31] T.Nagarajan and Douglas O’Shaughnessy.Bias estimation and correction in a classifier using product of likelihood-gaussians[C].IEEE, 2007,3:1061-1064.
    [32] Youngjoo Suh,Hoirin Kim.Class-Based Histogram Equalization for Robust Speech Recognition[J].IEEE,2006.08,28(4):502-505.
    [33] Ning Wang,P.C.Ching,N.H.Zheng and Tan Lee.Robust Speaker Recognition Using Both Vocal Source and Vocal Tract Features Estimated from Noisy Input Utterances[C].IEEE International Symposium on Signal Processing and Information Technology,2007,772-777.
    [34]徐利敏,唐振民,何可可,钱博.基于自适应直方图均衡化的鲁棒说话人辨认研究[J].自动化学报,2008.07,34(7):752-759.
    [35]李燕萍,唐振民,丁辉,张燕.基于非参数直方图模型的鲁棒说话人识别算法[J].数据采集与处理,2010.01,25(1):81-85.
    [36] P. Kenny, G. Boulianne, P. Ouellet, and P. Dumouchel.Joint factor analysis versus eigenchannels in speaker recognition[J]. IEEE Transactions on Audio, Speech and Language Processing,2007,15(4):1435-1447.
    [37] Kuo-Hwei You,Tai-Hwei Hwang,Hsiao-Chuan Wang.Combination of Autocorrelation- Based Features and Projection Measure Technique for Speaker Identification[J].IEEETransactions on Speech and Audio Processing,2005.07,13(4):565-574.
    [38] Ji Ming,Timothy J.Hazen,James R.Glass,Douglas A.Reynolds.Robust Speaker Recognition in Noisy Conditions[J].IEEE Transactions on Audio,Speech and Language Processing,2007.07,15(5):1711-1723.
    [39] W.N.Chan,N.Zheng,T.Lee.Discrimination power of vocal source and vocal tract related features for speaker segmentation[J].IEEE Trans.Audio Speech and Lang.Process,2007.08,15(6):1884-1892.
    [40] Md.Sahidullah and Goutam Saha.On the Use of Perceptual Line Spectral Pairs Frequencies for Speaker Identification[C].IEEE, 2010,1-5.
    [41] M.S.Sinith,Anoop Salim,Gowri Sankar K,Sandeep Narayanan K V,Vishnu Soman.A Novel Method for Text-Independent Speaker Identification Using MFCC and GMM[C].IEEE ICALIP.2010,292-296.
    [42] H.R.Sadegh Mohammadi,R.Saeidi.Speaker Identification Performance Enhancement Using Gaussian Mixture Model with GMM Classification post-processor [C]. 2007IEEE International Conference on Signal Processing and Communications (ICSPC 2007), 24-27 November 2007,504:507.
    [43] Chee-Ming Ting,Sh-Hussain Sallen,Tian-Swee Tan,A.K.Ariff.Text Independent Speaker Identification Using Gaussian Mixture Model[C].International Conference on Intelligent and Advanced Systems 2007,194:198.
    [44] Cunbao Chen,Li Zhao,Yan Zhao.Using GMM with embedded TDNN to Speaker identification[C].IEEE The 1st International Conference on Informaton Science and Engineering(ICISE2009),3713-3716.
    [45]蒋晔,唐振民.一种改进的GMM实时说话人辨认系统的设计与实现[C].IEEE,2009,1-5.
    [46]孙彦群,俞一彪.基于有效特征集选择的说话人识别[J].电脑知识与技术,2011.04,7(10):2360-2362.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700