基于文本无关的说话人识别
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
说话人识别技术因其独特的方便性、经济性和准确性,在生物特征识别领域中具有广阔的应用前景。现有的说话人识别技术在理想条件下效果很好,但在实际环境中却由于各种因素的影响,不能得到普遍的应用,其中最重要的一个原因是大训练量和实时性不够。因此如何在不影响识别率的情况下,提高系统的训练时间和识别时间成为本领域的研究热点。
     SVM是一种基于结构风险最小化原则的模式分类方法,在处理样本中非线性、高维数问题时有很大的优势,应用于基于语音样本的说话人识别上有良好的效果。本文深入研究了SVM在说话识别中的大样本训练,及识别时需要匹配所有的参考模型等问题,并提出自己的解决方案。具体做了如下几方面的工作:
     1、针对标准SVM在说话人识别中的大样本训练问题,提出一个基于多约简支持向量机(MRSVM)的说话人辨识方法,既采用PCA变换和模糊核聚类分别减少训练样本的维数和个数,在不影响识别率的情况下,减少了标准SVM的训练量和系统存储量。
     2、提出一个基于PCA和MRSVM的多级说话人辨识方法,提高系统的辨识速度。利用PCA分类器具有无需训练、实现简单、快捷的优点。识别时用PCA对注册说话人进行快速预判决。利用SVM具有很强分类能力的优点,根据预判决的结果只判决一部分MRSVM的个数,从而减少了系统的辨识时间。相对于传统的识别方法,实验结果表明本文方法具有很大的时间优势,且整个系统具有很好的可扩性。
Due to its special merits of flexibility, economy and accuracy, speaker recognition technology has a broad application future in biometrics security field. However, speaker recognition techniques have performed well under ideal conditions. There are still many problems when we want to apply speaker recognition to real applications, One most cause is the long computational time of training a speaker model or test an utterance, and in the recognition stage, the test utterance must match the every speaker mode. This makes real-time implementation very hard and expensive. Thus the problem of improving train time and recognition time has turned into the most active research filed without deteriorating recognition performance.
     Support vector machine technology is one of statistics learning theories. It has a very great advantage while dealing the samples with nonlinear and multidimensional problems on the basis of the mode categorized method of the structure risk with minimizing principle. So there are good results on speaker recognition based on speech signal samples. However, training a speaker SVM model consumes large memory and long computing time with all the speech parameters, and in the recognition stage, the test utterance must match the every speaker mode. This thesis has systematically investigated existing works from other colleagues, and proposed some novel approaches:
     1、In the speaker recognition, there are some major difficulties that confront large extractive feature data, which will consumes large memory and long computing time to training SVM with all speech parameters. This paper proposes a speaker identification method based on multi-reduced support vector machine (MRSVM) to reduce training time and the memory size for SVM. Viz. PCA and kernel-based fuzzy clustering are used to reduce the dimensions and amounts of training data respectively, the experiment results show that the training data, time and storage can be reduced remarkably by using our method without deteriorating recognition performance, and the system has better robustness.
     2、To save the recognition time of speaker identification, this paper proposes a novel hierarchical speaker identification(HSI) system based on MRSVM and PCA classifier. PCA classifier come true easy and fast because it needn't to train, so that the PCA classifier is used to get a coarse judge by a fast scan all registered speakers. And the selected MRSVM models are used to get a final decision by the result of the first judge. Experiments show that HSI have the similar identification performance compared with traditional method, but the identification velocity is improved greatly. And the system is easy to add and delete a new speaker
引文
[1]胡航.语音信号处理.哈尔滨:哈尔滨工业大学出版社,2005,209-210.
    [2]王炳锡,屈丹,彭煊.实用语音识别基础.北京:国防工业出版社,2005,264-265.
    [3]张军英.说话人识别的现代方法与技术.西安:西北大学出版社,1994,1-3.
    [4]韩纪庆,张磊,郑铁然.语音信号处理.北京:清华大学出版社,2004,5-6.
    [5]易克初,田斌,付强.语音信号处理.北京:国防工业出版社,2000,249-264.
    [6]杨行峻,迟惠生.语音信号数字处理.北京:电子工业出版社,1995,29-31.
    [7]赵力.语音信号处理.北京:机械工业出版社,2003,236-237.
    [8]许艳红.基于HHT变换在说话人识别中的应用.浙江大学,硕士论文,2005,3-4.
    [9]Changwoo Seo,Ki Vong Lee,Joohun Lee.MM based on local PCA for spea ker identification.LECTRONICS LETTERS,2001,37(24):1486-888.
    [10]Ki Yong Lee.Local fuzzy PEA based GMM with dimension reduction on speaker identification.Pattern Recognition Letters,2004(25):1811-1817.
    [11]Brian Mak,James T.Kwok.Sign Ho.USING KERNEL PCA TO IMPROVE EIGENVOIC SPEAKER ADAPTATION.the Third International Conference on Machine Learning and Cybernetics,Shanghai,2004,26-29.
    [12]Chin-Ta Chen,Ching-Tai Chiang,Yuan-Hwang Chen.Efficient KLT Based on Overlapped Subframes for Speaker Identification.IEEE Third Workshop on Signal Processing,2001,376-379.
    [13]A.Harrag,T.Mohamadi,J.F.Serignat.LDA Combination of Pitch and MFCC Features in speaker Recognition.IEEE Indicon Conference,Chennai,India,2005,237-240.
    [14]Shai Fine,Jiwi Navratil,Ramesh A.et al.a hybrid GMM/SVM approach to speaker recognition.International Conference on Acoustics.Speech,and Signal Processing,2001(Ⅰ),17-420.
    [15]Dat Tran,Michael Wagner.Fuzzy Hidden Markov Models for Speech and Speaker Recognition,Fuzzy Information Processing Society,18th International Conference of the North American,1999,426-30.
    [16]徐爽.小波分析理论在说话人识别中的应用研究.燕山大学,硕士论文,2004.
    [17]邱政权,尹俊勋.真对说话人识别进行的加权小波去噪方法.噪声控制,2005(9):53-56.
    [18]芮贤义,俞一彪.基于小波变换的鲁棒型特征提取及说话人识别.电路与系统学报,2005,10(5):129-132.
    [19]Makhoul J.,Gray A.Linear Prediction of Speech,Springer-Verlay,1976.
    [20]Atal B S.Effectiveness of Linear Prediction Characteristics of the Speech wave for Automatic Speaker Identification and Verification.J.Acoust.Soc.Am,1974(55):1304-1312.
    [21]Kay S M.Modern Spectral Estimation:Theory and Application[M].New Jersey:Print-Hall,1998.
    [22]Noll A M.Cepstrum Pitch Determination.J.Acoust.Soc.Am,1967(47):293-309.
    [23]甄斌,吴玺宏等.语音识别和说话人识别中各倒谱分量的相对重要性.北京大学学报(自然科学版).2001,37(3):371-378.
    [24]Yu,K,Mason,J.Oglesby.J.Speaker recognition using hidden Markov models,dynamic time warping and vector quantization.Image Signal Process,Ⅰ995,142(5):313-318.
    [25]Chai Wutiwiwatchai,Varin Achariyakulporn,Chularat Tanprasert.Text-dependent Speaker Identification using LPC and DTW for Thai Language.EEE TENCON,1999(1):674-677.
    [26]Medha Pandit,Josef Kittkr.Feature selection for a DTW-based speaker verification system.Proceedings of ICASSP,1998,769-772.
    [27]李丹.基于VQ和HMM的说话人识别系统得研究.武汉理工大学,硕士论文,2005,43-46.
    [28]Robert M.Gray.Vector Quantization.IEEE ASSP MAGAZINE APRIL 1984.
    [29]A 1 Ariyaeeinia,P Sivakumaran.Comparison of VQ and DTW classifiers for speaker verification.Security and Detectio,European Conference on,1997,142-146.
    [30]Loga.E.A,Wrench.A.A,Sutherland.A.M,Jack.M.A.A real time speaker verification system using hidden Markov models[C].Systems and Applications of Man-Machine Interaction Using Speech I/O,IEE Colloquium on,1991(7),1-3.
    [31]李霄寒,黄南晨,戴蓓,姚志强.基于HMM-UBM和短语音的说话人身份确认.信息与控制,2004,33(6):762-764.
    [32]Liu.L,He.J..On the use of orthogonal GMM in speaker recognition.Acoustics,Speech,and Signal Processing,IEEE International Conference on,1999(2),845-848.
    [33]Miyajima.C,Hattori.Y,Tokomak et al.Speaker identification using Gaussian mixture models based on multi-space probability distribution.Acoustics,Speech and Signal Processing,IEEE International Conference on,2001(1),433-436.
    [34]Inal.M.,Butun.E,Erkan.K.et al.Comparison of linear predictive analysis methods for ANN-based speaker identification.Neural Network Applications in Electrical Engineering,2000,109-112.
    [35]Stefano Scanzio,Pietro Laface,Roberto Gemello et al.Adaptation of Hybrid ANN/HMM Using Weights Interpolation.Acoustics,Speech and Signal Processing,IEEE International Conference on,2006,V-1033-V-1036.
    [36]C.I.C.Burges.A tutorial on support vector machines for pattern recognition.Data Mining and Knowledge Discovery,1998,2(2):121-167.
    [37]李国正,王猛,曾华军译,支持向量机导论,电子工业出版社.
    [38]MINGHUI LIU,YANLU XIE,ZHIQIAN YAO et al.A New Hybrid GMM/SVM for Speaker Verification.The 18th International Conference on Pattern Recognition,,2006(4),314-317.
    [39]Mashao,D.A hybrid GMM-SVM speaker identification system.7 Th AFRICON Conference in Africa,2004(1),319-322.
    [40]PLATT J.Fast Training of Support Vector Machines Using Sequential Minimal Optimization.Advances in Kernel Methods-Support Vector Learning[C].MIT Press,1999.
    [41]CAMPBELL C.Algorithmic Approaches To Training Support Vector Machine.Proceedings of ESANN,2000,27-36.
    [42]Jwu-Sheng Hu,Chieh-Cheng Cheng,Robust speaker's location detection in a vehicle environment using GMM models,Control Systems,IEEE Syst ems,Man,and Cybernetics Society,2006,36(2):403-412.
    [43]Stavros Tsakalidis,Vlasios Doumpiotis,and William Byrne,Discriminative Linear Transforms for Feature Normalization and Speaker Adaptation in HMM Estimation.Speech and Audio Processing,IEEE Signal Processing Society,2005,13(3):367-376.
    [44]MOONASAR.V,VENAYAGAMOORTY.G.K.Speaker identification using a combination of different parameters as feature inputs to an artificial neural network classifier.IEEE,APRICON,1999(1),189-194.
    [45]MINGHUI LIU,YANLU XIE,ZHIQIAN YAO et al.A New Hybrid GMM/SVM for Speaker Verification.the 18th International Conference on Pattern Recognition,2006(4),314-317.
    [46]Mashao,D.J.A hybrid GMM-SVM speaker identification system.7th AFRICON Conference in Africa.2004(1),319-322.
    [47]HOU FENGLEI,WANG BINGXI.Text-independent speaker recognition using support vector machines[A].Proceeding of ICII,.2001,402-407.
    [48]Y.-J.Lee,O.L.Mangasarian.RSVM:reduced support vector machines,in Proc.1st SIAM Int.Conf.Data Mining,2001.
    [49]Zheng Songfeng,Lu Xiaofeng,Zheng Nanning et al.UNSUPERVISED CLUSTERING BASED REDUCED SUPPORT VECTOR MACHINES.The Institute of Artificial Intelligence and Robotics,ICASSP 2003(Ⅱ),821-824.
    [50]Fangfang Wu,Yinliang Zhao.A Novel Multi-Reduced Support Vector MachineNeural Networks and Brain,ICNN&B International Conference on.2005(1),322-326.
    [51]Yuh-Jye Lee,Su-Yun Huang.Reduced Support Vector Machines:A Statistical Theory.IEEE TRANSACTIONS ONNEURAL NETWORKS,JANUARY 2007,18(1),1-13.
    [52]SHENG-YU SUN,TSENG.C.L,CHEN.Y.H,et al.Cluster-based Support Vector Machines in Text-Independent speaker Identification[A].Neural Network.Proceeding International Joint Conference.2004,729-734.
    [53]严骏.模糊聚类算法应用研究.浙江大学,硕士论文.2006,15-16.
    [54]张莉,周伟达,焦李成.核聚类算法.计算机学报,2002,25(6):587-590.
    [55]Jiun-Hau Wang,Wan-Jui Lee,Shie-Jue Lee.A Kernel-based Fuzzy Clustering Algorithm.Proceedings of the First International Conference on Innovative Computing,Information and Control(ICICIC).AUGUST 2003,11(4),518-526.
    [56]Jolliffe.L.Principal Component Analysis,New York:Springer-Verlag,1986.
    [57]章万锋.基于PCA与LDA的说话人识别研究.浙江大学,硕士论文.2004,16-19,68-69.[58]朱君波.PCA在语音检测中的应用研究.浙江大学,硕士论文.2004,51-53.
    [59]Hotclling.H.Analysis of a Complex of Statistical Variables into Principal Components.Journal of Educational Psychology,1933,Vol.24,417-441.
    [60]T.nastie,W.Stuetzle.Principal Curves.Journal of the American Statistical Association.1989(84),504-516.
    [61]R.Webb.An Approach to Nonlinear Principal Components-analysis Using Radically Symmetrical Kernel Functions.Statistics and Computing,1996,6(2),159-168.
    [62]G.E.Hinton,P.Dayan,M.Revow,Modelling the Manifolds of Imges of Handwriten Digits.IEEE Transactions on Neural Networks,1997,8(1),64-74.
    [63]N.Kambhatla,T.K.Leen.Dimension Reduction by Local Principal Component Analysis.Neural Computation,1997,9(7),1493-1516.
    [64]N.Rambhatla.Local Models and Gaussian Mixture Models for Statistical Data Processing.Ph.D.thesis.Oregon Graduate Institute,Center for Spoken Language Understanding.
    [65]Platt J C.Fast training of support vector machines using sequential minimal optimization.Burges C,Scholkopf B.Advances in Kernel Methods:Support Vector Learning.Cambridge,MA:MITPress,1999,185-208.
    [66]Bing Sun,etal.Hierarchical speaker identification using speaker clustering.Proc LP-KE Beijing,China:IEEE,2003,299-304.
    [67]Wei-Ho Tsai S hih-Sian Cheng,Hsin-Min Wang.Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation.IEEE TRANSACTIONS ON AUDIO,SPEECH,AND LANGUAGE PROCESSING,2007.15(4),1461-1471.
    [68]侯风雷,王炳锡.基于说话人聚类和支持向量机的说话人确认研究.计算机应用,2002,22(10):33-35.
    [69]Bing Sun,Wenju Lid,Qiuhai Zhou.Hierarchical speaker identification using speaker clustering.Pro.Natural language processing and Knowledge Engineering,2003,299-304.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700