基于支持向量机的说话人识别研究

作者：周畅宇
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：说话人识别 ; 支持向量机 ; 核函数 ; 样本约简 ; 支持聚类区提取
英文关键词：support vector machine ; kernel function ; sample reducing ; support cluster abstraction
学位年度：2009
导师：梁昔明
学科代码：081203
学位授予单位：中南大学

摘要

在说话人识别领域,基于支持向量机(Support Vector Machine,SVM)的识别方法是当今的研究热点。同其他模式识别方法相比该方法主要有两个不同点:一是它采用一个非线性核函数来表示特征空间的内积,另外一方面它采用分类间隔最大的最优分类超平面实现结构风险最小化。而这些特征使得支持向量机方法能得到广泛的应用。
     论文详细阐述了说话人识别的基本原理和实现过程。首先对特征参数的提取做了较深入的研究,对当今最常用的线性预测倒谱系数和美尔倒谱系数的理论基础和实现过程做了阐述,并将这些参数和其差分参数相组合,测试它们提取说话人个性特征的准确程度。用不同的特征参数构造说话人识别系统并考察它们对系统识别率和抗噪性能的影响。
     核函数是支持向量机模型的核心机制,函数类型的选择和参数的整定对于分类的准确度至关重要。论文阐述了核函数的基本理论,对目前常用的多项式核函数、径向基核函数及多层感知器核函数进行了仿真和分析,测试了它们在干净语音和加噪语音环境下的系统识别率和稳健度。
     为缩短说话人识别系统的训练时间,在对样本进行基于支持向量机的训练之前,需要对样本进行约简。论文总结了该领域的理论成果并给予归纳,提出了一种新的约简方法——支持聚类区提取法(SupportCluster Abstracting,SCA),阐述了该方法的理论基础并给出了具体实现步骤,并对SCA方法和传统的方法进行了实验和分析,用实验演示了算法对线性可分样本边界的描述准确度。考察了算法对线性不可分样本即语音样本在约简率和识别率方面的性能。
     SCA方法的参数的合适度决定了约简集是否能包含所有的支持向量同时尽可能减轻SVM训练的负担。论文通过试凑的方法对扇出系数、聚类数目及接近度因子等相关参数进行了整定,实验证明,和其他的约简方法相比,经过参数整定后的SCA方法能以较高的约简率获得较高的识别率,符合理论上预计的结果。通过实验考察了SCA-SVM模型与SVM模型以及其他说话人模型在性能上的差异。
In speaker recognition field, recognition method based on Support Vector Machine (SVM) technique is a hot spot. Unlike other conventional pattern recognition techniques, this method has two perculiar characteristics. Firstly, the proposed SVM technique expresses inner product of feature space using a non-linear kernel function. Secondly, the SVM method carries out structural risk minimization principle using optimal classification super surface. This made the proposed SVM technique widely applicable.
     In this thesis, we investigate the fundamental theory and realization procedure for speaker recognition. We began with a thorough review on feature parameter. This is followed by an investigation of the linear prediction cepstrum coefficient (LPCC) and mel-frequency cepstrum coefficient (MFCC). The thesis combined features from LPCC and MFCC into several feature vectors and tested their degree of accuracy in abstracting personal characteristics. The thesis also investigated the impact of virous feature parameter on rate of recognition and noise abatement.
     Since kernel function is an essential technique in SVM theory and the accuracy of feature classification is greatly influenced by the selection of function and parameter, we conducted a review of the basic theory of kernel functions. A simulation and analysis of kernel function such as polynomial function, radial basis function, sigmoid function is presented. Then, the rate of recognition and steadiness of pure speech signal and noisy signal condition is also presented.
     Before SVM training, the size of sample set is critical to achieving high rate of recognition and time efficiency, therefore, we propose reducing the size of the sample set. We also presented a new algorithm for reducing the so-called Support Cluster Abstracting (SCA). We conducted a review of the SCA's fundamentals and provide its realistic steps. At last, the thesis presented a simulation and analysis comparing SCA and other methods. On one hand, we tested linear divisible samples and their performance at boundary description. On the other hand, we tested linear non-divesible samples and measured their rate of reduction and recognition.
     The obtained SCA parameters determine whether reducing sample set can contain all the supporting vectors and relieve the burden of SVM training as far as possible. In this thesis, we set up SCA parameters experimentally. The parameters include fan-out coefficient k, clustering numbers C and approximation degree factor a. The simulation results reveal that, compared to other reducing algorithms, SCA reaches the higher rate of recognition at higher rate of reducing after coefficient set-up. The results of our experiments justify the prediction of theory. This thesis investigated the difference of capability of virous speaker recognition model.

引文

[1]易克初.语音信号处理.北京:国防工业出版社,2000.6
    [2]韩纪庆,张磊,郑铁然.语音信号处理.北京:清华大学出版社,2004
    [3]蔡耿平,黄顺珍,徐志鸿等.声纹识别系统.深圳大学学报(理工版),2002,6(19):78-81
    [4]Vincent Wan,Steve Renals.Speaker Verfication Using Sepuence Discriminate Support Vector Machine[C].IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING,2005,13(2):203-210
    [5]Ganesh N.Ramaswarny,Ji Navratil,UpendraV.Chaudhari and Ran D.Zilca.The IBM system for the NIST 2002 cellular speaker verification evaluation.ICASSP 2003
    [6]Zhu Xiaoyuan.A Study of Hidden Markov Models for Text-independent Speaker Recognition.Journal of Northern Jiao Tong University[J],Feb 1997
    [7]徐波.语音技术现状及展望.计算机世界[J],1999,3(1):34-36
    [8]蔡莲红,黄德智,蔡锐.现代语音技术基础与应用[M].北京:清华大学出版社,2003,3(4):233-238
    [9]王秀丽.说话人识别系统中特征提取和端点检测算法研究及系统的DSP实现[D].吉林大学硕士学位论文,2006
    [10]王深.语音中身份与情感信息提取及其在普适计算中的应用.北京交通大学硕十学位论文,2007
    [11]钱向民.包含在语音信号中情感特征的分析电子技术应用.2000,2(10),43-45
    [12]T.Moriyama,S.Ozawa.Emotion Recognition and Synthesis System on Speech.IEEE ICMCS 99,June 1999
    [13]宋敏,刘幺和,谭保华.MATLAB环境下基于矢量量化的说话人识别系统.湖北工业大学学报,2006,21(6):27-29
    [14]Pan J S,Lu ZM,Sun SH.An Efficient Encoding Algorithm for Vector Quantization Based on Subvector Technique.IEEE TRANSACTIONS ON IMAGE PROCESSING,MAR 2003,12(3):264-270
    [15]Kai-Fu Li,Hsiao-Wuen Hon.Large-Vocabulary,Speaker-Independent,Continuous Speech Recognition Using HMM.In Proceedings of the IEEE International Conference on Acoustics,Speech,and Signal Processing,1988, 123-126
    [16]Reynolds D.A.Speaker identification and verification using Gaussian mixture speaker models[J].Speech Communication,1995,17:92-108.
    [17]H.Harb.Isolated Word Recognition Using Neural Networks.The 7~(th) IEEE International Conference of Electronics.California,2000,349-351
    [18]Corinna Cortes,Vladimir Vapnik.Support-Vector Networks[J].Machine Learning,1995,20(3):273-297
    [19]Vapnik V.The nature of statistical learning theory[M].New York:Springer-Verlag,1995,张学工译.统计学习理论的本质.北京:清华大学出版社,2000
    [20]王会明,张雄伟.话者识别系统中语音特征参数的研究与仿真[J].系统仿真学报,2003,15(9):1276-1278
    [21]林琳,王树勋,郭纲.短语音说话人识别新方法的研究[J].系统仿真学报2007,19(10):230-235
    [22]李燕萍,唐振明,钱博等.基于PLAR特征补偿的鲁棒性说话人识别仿真研究.系统仿真学报,2009,1(7):409-412
    [23]陈杰,张铃华.说话人识别中语音特征参数的研究.信息技术,2006,4(11):88-93
    [24]吴峰燕,李志华.基于小波包分解和噪声分析的抗噪说话人识别特征参数.计算机与现代化,2009,9(4):113-115
    [25]H.Torres,H.Rufiner,Automatic Speaker Identifacation by Means of Mel Cepstrum,Wavelets and Wavelet Packets.Processing of the 22 Annual EMBSInternational Conference,Chicag,July2000,978-981
    [26]Waleed H.Abdulla,Nikola K.Kasabov.The Concepts of Hidden MarkovModel in Speech Recognition.Technical Report.Information Science Department University of Otago New Zealand,2001,17-35
    [27]刘静萍,姜占财,德熙嘉措.语音信号的预处理技术探讨.甘肃联合大学学报(自然科学版),2006,20(5):61-64
    [28]Wei L,Weiss S,Hanzo L.Subband-selective partially adaptive broadband beamforming with cosine-modulated blocking matrix.IEEE Acoustics,Speech,and SignalProcessing,2002,3(5):2913-2916
    [29]但志平,沪港,刘勇.基于LPC倒谱参数分析的说话人识别系统.三峡大学学报(自然科学版),2007,29(1):60-62
    [30]于明,袁玉倩,董浩.一种基于MFCC和LPCC文本相关说话人识别方法[J]. 计算机应用,2006,26(4):883-885
    [31]余建潮,张瑞林.基于MFCC和LPCC的说话人识别.计算机工程与设计.2009,30(5):1189-1191
    [32]丁爱明.作为说话人识别特征参量的MFCC的提取过程[J].电子工程师,2006,32(1):51-53
    [33]宫晓梅.噪声环境下的MFCC特征提取[J].微计算机信息,2007,23(8):247-249
    [34]Kanedera N,Arai T,Hermansky H.On the Importance of Various Modulation Frequencies for Speech Recogni-tion.In:Proceedings of EUROSPEECH,1997,103-105
    [35]Wu Zunjing,Cao Zhigang.Improved MFCC-based feature for robust speaker identification[J].Tsinghua Science and Technology,2005,10(2):158-161
    [36]Md K I M,Hirose K.On the Effectiveness of MFCCs and Their Statistical Distribution Properties in Speaker Identification.IEEE International Conference on Virtual Environments,Human-Computer Interfaces and Measurement Systems,July 2004,13-14
    [37]Vidyassgar M.A Theory of Learning and Generalization.Great Britain:Springer,1997,304-307
    [38]Vapnik V.Statistical learning theory.New York,John Wiley and Sons,1998,22-34
    [39]Yiqiang Zhan,Dinggang Shen.Design efficient support vector machine for fast classification[J].PatternRecognition,2005,38:157-161
    [40]Sch lkopf B,Plat J C,Shawe-Taylor J.Estimating the support of a high-dimensional distribution[J].NeuralComputation,2001,13(7):1443-1471
    [41]Joachims T.Transductive inference for text classification using support vector machine[A].In proceedings of the Sixteenth International Conference on Machine Learning[C].Morgan Kaufmann,1999,148-156
    [42]N Cristianini,J Shawe-Yaylor.An Introduction to Support Vector Machines and Other Kernel-based Learning Methods.Cambridge,Cambridge University Press,2000
    [43]Changxue Ma,Randolph,M.A.,Drish,J.A Support Vector Machines-Based Rejection Technique for Speech Recognition.Acoustics,Speech,and Signal Processing,2001 IEEE International Conference,2001,1(6):381-384
    [44]W.M.Campbell,D.E.Sturim,D.A.Reynolds,A.Solomonoff.SVM Based Speaker Verification using a GMM Supervector Kemel and NAP Variability Compensation[C].Acoustics,Speech and Signal Processing,2006,2(8):97-100
    [45]陆荣秀.支持向量机技术及其应用.科技情报开发与经济,2006,16(14):159-160,190
    [46]王维民.一种基于支持向量机的虹膜识别算法[J].辽宁石油化工大学,2008,28(4):82-85
    [47]David M.J.T,Robert P.W.Using two-class classifiers for multiclass classification.Proceeding of International Conference on Pattern Recognition.,2002
    [48]陈荣元,蒋加伏等.基于神经网络和层次SVM的多姿态人脸识别.计算机工程,2006,32(24):209-210
    [49]吕佳.核聚类算法及其在模式识别中的应用.重庆师范大学学报.2006,3(2):22-24
    [50]赵辉,荣莉莉.基于模糊核聚类的SVM多类分类方法.系统工程与电子技术,2006,5(7):770-774
    [51]高争艳,张玉双,王慕坤.基于核K.均值聚类和支持向量机结合的说话人识别方法.哈尔滨理工大学学报.,2008,10(1):40-42
    [52]张振领,贾仰理.考虑性别差异基于SVM的说话人识别研究.计算机工程与设计.2008,3(6):1516-1518
    [53]陈玉昆,于洪洁.精简训练样本与支持向量[J].哈尔滨工程大学学报,2006,27(9):428-433
    [54]LEE Y J,MANGASARIAN O L.RSVM:Reduced support vector machine[C]//First SIAM International Conference on DataMin-ing.Chicago:[s.n.],2001,350-366
    [55]李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器.计算机学报,2001,24(1):62-68
    [56]TSAIW H,CHENG S S,WANG HM.Automatic speaker cluste-ring using a voice characteristic reference space andmaximum purityestimation[J].IEEE Transactions on Audio,Speech,and Lan-guage Processing,2007,15(4),1461-1471
    [57]WANG JH,LEEW J,LEE S J.A kernel-based fuzzy clustering algorithm [C]//Proceedings of the First International Conference on Innovative Computing,Information and Contro.1 Beijing:IEEE CSPress,2006,1,550-553
    [58]Tran Q A,Zhang Q L,Li X.Reduce the Number of Support Vectors by Using ClusteringTechniques.The Second International Conference on Machine Learning and Cybernetics,2003,36-43
    [59]张金泽,单甘霖,模糊支持向量机.军械工程学院学报,2005,17(3):65-67
    [60]武方方,赵银亮,一种基于Morlet小波核的约简支持向量机.控制与决策,2006,21(8):848-853
    [61]汪西莉,焦李成.一种基于马氏距离的支持向量快速提取算法.西安电子科技大学学报(自然科学版).2004,31(4):640-643
    [62]奉国和,李拥军,朱思铭.边界邻近支持向量机.计算机应用研究,2006,4(3):11-12
    [63]Ravindra Koggalage,Saman Halgamuge.Reducing the Number of Training Samples for Fast Support Vector Machine Classification-Letters and Reviews.Neural lnformationProcessing,2004,2(3),53-56

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700