语音识别中的说话人自适应研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
今天,各种高效、快捷的算法使得建立实时的连续语音识别系统成为可能,但是在实际应用中由于说话人的改变会使得系统性能下降。说话人自适应技术利用少量的自适应数据来提高系统性能,能够较好的解决这这种声学差异问题。本文将基于大词汇量连续语音识别平台,围绕说话人自适应技术展开研究,具体工作和创新包括以下几个方面:
     1.MAP和MLLR算法比较
     文章在讨论由说话人引起的声学差异基础上,研究两种基于模型的自适应算法:最大似然线性回归(MLLR)和最大后验概率(MAP)。实验结果表明,不论采用哪种自适应都能使识别率有一定的提升。两种算法之间的差异性在于MAP具有良好的渐进性,但收敛性较差,而MLLR在很大程度上改善了收敛特性,但其渐进特性却不如MAP。
     文章讨论了在MAP自适应中,初始模型参数的先验知识对自适应效果的影响,以及在MLLR中,回归类对自适应效果的影响。文章还进一步研究了采用两种算法的累加自适应效果,从结果看MAP和MLLR结合的方法比单独使用MAP和MLLR的效果要好。文章还对包括基于特征层的归一化算法和用于基于声学模型的MLLR算法等效性进行讨论,并给出了统一的算法框架。
     2.改进的基于聚类的说话人自适应算法
     文章提出以模型间加权交叉似然比为距离测度的说话人聚类自适应算法框架。在识别过程中,寻找训练说话人和测试说话人的相关性,充分利用可以提供的自适应语料和训练语料,是提高说话人自适应性能的有效手段。本文中,利用高斯混合模型来表征说话人,并通过说话人聚类来减少参考模型数量,实现粗分类。以此为基础,根据测试说话人的声学特征对参考说话人进行选择,从而实现快速说话人自适应。同时,文章还采用了统一的背景模型来作为各说话人模型的基线系统以增加模型间的耦合度。
     在目标说话人模型生成阶段,本文利用模型训练过程中产生的声学统计量,快速得到所需的模型参数。实验结果表明,利用说话人聚类技术进行参考说话人粗分类后,识别率比基线系统有较大提高。而且,粗分类精识别的手段表现在不同模型混合度上,都取得了较好的性能。
     3.参考说话人的动态选择技术及其改进
     文章在对参考说话人选择技术进行分析的基础上提出了基于支撑向量机的动态参考说话人选择技术(Speaker Support VectorSelection,SSVS)。参考说话人数量及其数据是否足够描述所有参考说话人的分布是取得好的自适应效果的关键。支撑向量机具有自动寻找那些对分类有较好区分能力的支撑向量的能力,因此本文提出将参考说话人视作支撑向量,结合支撑向量机训练过程进行参考说话人选择,以满足最优化和动态的要求。SSVS将参考说话人的选择由手动变为自动,同时满足声学模型完整性和声学近似性的要求。实验证明,这种方法能够取得较好的自适应效果。
     在此基础上,文章对SSVS进行改进,通过直接选取代表参考说话人的支撑向量来完成参考说话人选择(Reference Support SpeakerSelection,RSSS)。动态参考说话人选择的实现关键在于寻找代表参考说话人的支撑向量。本文借助SVM中的核函数来计算高维特征空间中两个样本间的距离,遍历训练样本集后得到最优分类面附近的样本集,其中各样本即为所需要的参考说话人支撑向量,同时,文章利用置信度来约束支撑向量选择过程。实验数据表明基于RSSS的说话人选择能有效提高系统性能。
Today, various effective and rapid algorithms make the relization of continous speech rocgniton system become possible, however, when there exits mismatch between test speaker and training speaker, the performance of recognition will degrades severely. Speaker adaptation techniques aim to improve recognition performance in test eviroment with a small amount of data. This thesis will make research foucs on speaker adaptation based on our large vocabulary continuous speech recognition (LVCSR) system. The research and innovations are describedin details as follows:
     1. Comparison of MAP and MLLR Algorithm
     Two classical model based adaptation algorithm: MLLR and MAP dicussed in the thesis, and the experimental results show that either of these two methods work better than the baseline system to improve the recognition results. the difference of the two algorithms is MAP has desirable asymptotic properties and MLLR has better convergence properties. In MAP, prior knowledge of model parameter the the effect of speaker adaptation and in MLLR, the regression class also make its influence upon final results, so both of them are discussed in the paper. A further research is focus on the adaptation policy of combining two algorithms, from the experiments results we can conclude that the combine method is better than a single one. A unified view of normalization algorithm based on feature space and MLLR algorithm based on acoustic model is also presented in this paper.
     2. An improvement of clustering based speaker adaptation
     In this paper, a new measurement for speaker clustering using cross likelihood ratio is proposed. In the process of recognition, the effective means of improving the adaptation is take advantage of the correlation of test speaker and training speakers as well as make full use of the adaptation data and training data available. In this paper, GMM based speaker clustering is adopted to reduce the number of reference models, based on it, chossing the appropriate reference speakers according the acustic feature of test speaker and realizing rapid speaker adaptation. In the clustering processing, the model CLR is used as distance measurement and universal background model is also used to provide a tighter coupling between the speaker(?)models.
     The adapted model can be calculated by using the previously stored hidden markov model (HMM) statistics, by which, a quick adaptation can be done. By using speaker clustering to perform speaker classification, the better performance is obtained even with different model mixture number.
     3 Dynamic selections of reference speakers and relative improvement
     This thesis proposed a new method for dynamic selections of reference speakers by using SVM (support vector machine) which named as SSVS and, a relative improvement is also proposed named as RSSS. Good adaptation performance depends on not only the number of selected speakers but also whether these statistics are sufficient for describing the distribution of the reference speakers. How to select is still a very trick problems relied on the experiments. Dynamic instead of fixed number of close speaker selection can make a trade off between good coverage and small variance among the cohorts. In this paper, we try to find subset of training speakers who are acoustically close to the test speaker using (SVM) which outperforms general speaker selection method since it uses a smart way to choose an optimal set of reference models as well as save computation time. Experimental results show that SSVS algorithm can obtain relatively accurate model.
     It can be concluded that the dynamic selection of reference speakers depend on finding appropriate support vector. In the thesis, rely on the kernal function to compute the distance of two samples in high-dimensional feature space, we traversing the training set and get the samples set near the optimal classification surface, in which the samples is what we need to represent reference speakers. Meanwhile, confidence measure is using to the selection process, the experimental results show that the proposed method can improve the recognition accuracy effectively.
引文
[1] L. Rabiner, B. Juang. Fundamentals of Speech Recognition. Englewood Cliff, New Jersey: Prentice-Hall, 1993
    [2] V Formkin, Rodman R. An Introduction to language. Orlando Brace Jovanvich 1993
    [3] 杜利民.自动语言辩识.电子科技导报,1996(4):16~25
    [4] 张宜.汉语语音识别技术的研究与发展.广西广播电视大学学报.2003,Vol.14,No.4,18-22.
    [5] 何湘智.语音识别的研究与发展.计算机与现代化.2002.Vol.79,No.3,3-6.
    [6] 姚文冰,姚天任,韩涛.稳健语音识别技术研究.计算机工程与应用.2002,No.7.69-71
    [7] Chris J Leggetter. Improved Acoustic Modeling for HMMs Using Linear Transformations. University of Cambridge, February 1995. 80~137
    [8] 杨行峻,迟惠生等.语音信号数字处理.北京:电子工业出版社.1995.334~335.
    [9] 戴礼荣.人机语声对话特点及系统设计.NCMMSC-96,1996.22~26
    [10] X. D. Huang, A. Acero, H. Hon, et al. Spoken. Language Processing. New Jersey: Prentice Hall PTR, 2000, 401~403, 429~437
    [11] Ron Cole, Lynette Hirschman, et al, The Chanllenge of Spoken Language Systems: Research Directions fro the Nineties. IEEE Trans. on Speech and Audio Processing, 1995, 3(1): 1~7
    [12] Zheng Rong, Wang Zuoying. Speaker Adaptation: An Overview. Chinese Journal of Electronics, 1998, 7(2): 121~127
    [13] 王霞.声学模型及其评价方法的研究:[硕士学位论文].北京:清华大学计算机科学与技术系,1999
    [14] Qiguang Lin, Chiwei Che. Normalizing the Vocal Tract Length for Speaker Independent Speech Recognition. IEEE Signal Processing Letters, 1995, 2(11): 201~203
    [15] 陈景东,徐波,黄泰翼.基于Mellin变换的语音新特征与说话人自适应技术的比较.第五届全国人机语音通讯会议论文集,哈尔滨.1998.86~91
    [16] Chen Hingdong, Xu Bo, Huang Taiyi. A New Speech Feature Insensitive to the Variation of Different Speakers. Chinese Journal of Electronics, 1999, 8(1): 67~72
    [17] Chen Hingdong, Xu Bo, Huang Taiyi. A Novel Robust Speech Feature Based on the Mellin Transform and Speaker Normalization. Proc. ISCSLP98. Singapore: 1998. 191-195
    [18] Jean-Luc Gauvain, Chin-Hui Lee. Maximum A Posteriori Estimation for Multivariate
    [19] Gaussian Mixture Observations of Markov Chains. IEEE Trans. on Speech and Audio Proc., 1994, 2(2): 291~298
    [20] Lee C-H., Lin C-H., Juang B-H. A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models. IEEE Trans. on Signal Proc, 1991, 39(4): 806~81
    [21] Seyed Mohammad Ahadi-Sarkani, Bayesian and Predictive Techniques for SpeakerAdaptation: [PhD Thesis]. Cambridge Univ, 1996
    [22] 李虎生,杨明杰,刘润生.汉语数码语音识别自适应算法.电路与系统学报,1999,4(2):1~6
    [23] Qiang Huo, Adaptive learning and Compensation of Hidden Markov Model for Robust Speech Recognition. Proc.ISCSLP98.Singapore: 1998. 31~43
    [24] Qiang Huo, Chin-Hui Lee. On-Line Adaptive Learning of Continuous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate. IEEE Trans. on Speech and Audio Proc., 1997, 5(2): 161~171
    [25] Qiang Huo, Chorkin Chan, Chin-Hui Lee. On-Line Adaptation of SCHMM Parameters Based on the Segmental Quasi-Bayes Learning for Speech Recognition. IEEE Trans.On Speech and Audio Proc, 1996, 4(2): 141~144
    [26] Qiang Huo, Chin-Hui Lee. On-Line Adaptive Learning of Correlated Continuous Density Hidden Markov Models for Speech Recognition. IEEE Trans. on Speech and Audio Proc, 1998, 6(4): 386~397
    [27] Jun-ichi Takahashi, Shigeki Sagayama. Vector-field-smoothed Bayesian Learning for Fast and Incremental Speaker/Telephone-channel Adaptation. Computer Speech and Language, 1997, 11: 127~146
    [28] Tasos Anastasakos, John McDonough, John Makhoul. Speaker Adaptive Training: A Maximum Likelihood Approach to Speaker Normalization. International Conference on Acoustics, Speech and Signal Processing (ICASSP?7)
    [29] Tasos Anastasakos, John McDonough, Richard Schwarz, John Makhoul. A compact model for speaker adaptive training. International Conference On Spoken Language Processing (ICSLP? 6)
    [30] Cox S. Predictive speaker adaptation in speech recognition Computer Speech and Language, 1995, 9 (1): 1-17.
    [31] WANG zuoying, Liu feng. Speaker adaptation using maximum likelihood model interpolation Proceedings of ICASSP. IEEE ICSLP Library Series, 1999, (2): 1368-1372.
    [1] 李虎生,刘加,刘润生.语音识别说话人自适应研究现状及发展趋势.电子学报.Jan.2003,Vol.31,No.1
    [2] C. H. Lee, C. H. Lin. B. H. Juang. A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models
    [3] J. de Veth, L. Boves. Channel normalization techniques for automatic speech recognition over the telephone. Speech Communication, Vol. 25: 149-163
    [4] B. S. Atal. Automatic Recognition of Speakers from Their Voices Proc. IEEE, 1976, 64(4): 460~475
    [5] F. Fallside, W. A. Woods. Computer Speech Processing. Prentice-Hall, London, 1985
    [6] F. Nolan. The Phonetic Bases of Speaker Recognition. Cambridge University Press, Cambridge, 1983
    [7] T. J. Hazen. The Use of Apeaker Correlation Information for Automatic Speech Recognition. PhD thesis, MIT, 1998
    [8] A. Sanker, F. Beaufays, and V. Digalakis.Training data clustering for improved speech recognition. In Proc. Of European Conference on Speech Communication and Technology, pages: 502-505, Madrid, Spain, 1995.
    [9] 丁鹏,徐波.综合非语境音素的语音数据分类与声学建模研究,声学学报.Sep.2002
    [10] Q. Huo and C-H. Lee. Robust speech recognition based on adaptive classification and decision strategies. Speech Communication, Vol34, No. 1-2: 175-194, 2001
    [11] A. Sanker and C.H.Lee. Maximum likelihood approach to-stochastic matching for robust speech recognition. IEEE Trans. On Speech and Audio Processing, Vol.4, No. 1: 190-202, 1996
    [12] Chin-Hui.Lee.Learning from Surprises-Statistics and Speech/Speaker Recognition. Speech Lab 20th Anniversary Celebration, Tsinghua University, Beijing, 1999
    [13] Y. L. Chow, et al. BYBLOS: The BBN Continuous Speech Recognition System. In: IEEE, eds. Proc. ICASSP87. 1987, 89~92
    [14] R.M Stern, B Raj, P. J Moreno. Compensaion for Enviromental Degradation in Automatic Apeech Recognition. Proc. Of the ESCA Tutorial and Research Workshop on Robust Recognition for Unknown Communication Channels, 1997: 33-42
    [15] Je-Tzung Chien, Lee-Min Lee, Hsiao-Chuan Wan. Noisy Speech Using Variance Adapted Likelihood Measure. Proc. ICASSP? 6, 1996: 45-48
    [16] Y.Minami, S Furui. A maximun Likelihood Procedure for a Universal Adaptation Method based on HMM Composition. Proc. of ICASSP, 1995.
    [17] A P Varga, R K Moore. Hidden Markov Model Decompsition of Speech and Noise, ICASSP? 0, 1990
    [18] P J Moreno, B raj, R M Stern. A vector-Taylor Series Approach for Enviroment Independent Speech Recognition, Proc ICASSP, 1996
    [19] J.L.Gauvain, C.H.Lee. MAP Estimation of Continuous Density HMM: Theory and Applications. DARPA Sp.& Nat.Lang. Workshop, 1992
    [20] C.J.Leggetter, P.C.Woodland. Maximum likelihood Linear Regression for Speaker Adaptation of Continuous Density hidden Markov Models. Computer and Language, 1995, 9: 171-185
    [21] J U Takahashi, Sagayama. Vector-field-smoothed Baysian Leanring for Fast and Incremental Speaker/Telephone channel Adaptation. Computer and Language, 1997, 11: 127-146
    [22] Q huo, C Lee. On-line Adaptive Learning of the Continous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate. IEEE Trans. on ASSP, 1997, 5(2): 161-172
    [23] V Digalakis. On-line Adaptation of Hidden Markov Models Using Incremental. Estimate Algorithm. IEEE Trans.ASSP.
    [24] S. Kajarekar, N.Malayath, and H.Hermansky. Analysis. of speaker and channel variablity in speech. In Proc.Of European Conference on Speech Communication and Technology, pages 343-346
    [25] S. Kajarekar, N.Malayath, and H.Hermansky. Analysis of speaker and channel variablity in speech.. In Proceeding of 1999 International Automatic Speech Recognition and Understanding, Colorado, 1999
    [26] A, Rosenberg, C-H.Lee, and F.soong. Ceptral channel normalization techniques for HMM-based speaker verification. In Proc. Of the International Conference of Spoken Language Processing, pages 1835-1838, 1994
    [27] Sarel Van Vuuren and Hynek Hermansky. On the importance of components of the modulation specyrum for speaker verification. In Proc. Of International of Spoken Language Processing, Sydney, Australia, 1998
    [28] R.Haeb-Umbach. Investigations on inter-speaker variablity in the feature space. In Proc Of International Conference on Acoustics, Speech and Signal Processing, pages 397-400, Phoenix, USA, 1999.
    [29] Tao Chen, Chao Huang, Eric Chang, and Jingchun Wang. On the use of gaussian mixture model for speaker variablity analysis. In Proc. International Conference on Spoken Language Processing, pages: 1249-1252, Denver, 2002
    [30] C.Huang, T.chen, S.Li, E.Chang, and J. Zhou. Analysis of speaker variablity. In Proc Of European Conference on Speech Communicaiton and Technology, pages 1377-1380, 2001
    [31] P.Ding. Discriminative Optimizaiton of Conversational Mandarin LVCSR System. In Proc. Of European Conference on Speech Communicaiton and Technology, pages 1965-1968, 2003
    [32] Olli Viikki and Kari Laurila. Ceptral damain segmental feature vector normalization for noise robust speech recogniton. Speech Communication, Vo.25, No.1-3: 133-147, August 1998
    [33] H.Hermansky and N.Morgan. RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, Vol.2, No.4:578-589,1994
    [34] Lutz.Welling,Herman.Ney, and Stephan.Kanthak.Speaker adaptive modeling by vocal tract normalization. IEEE Trans. On Speech and Audio Processing, Vol 10, No. 6: 415-425, September,2002.
    [35] 李虎生.汉语数码串语音识别及说话人自适应.北京:清华大学电子工程系,2000.
    [36] F. Macro, M. M. Anna. Fast Speaker Adaptation: Some Experiments on Different Techniques for Codebook Adaptation and HMM Parameters Estimation. In: IEEE, eds. Proc. ICASSP. May 1991. 849~852
    [37] L. Gauvain and C H. Lee. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Trans. Speech Audio Proessing, Vol. 2: 291-298, Apr. 1994.
    [38] C. J. Leggetter. Improved acoustic modelling for HMMs using linear transformations [D]. Cambridge University, 1995.
    [39] G. Zavaliagkost, R Schwatz, J Makhoul. Batch, incremental, and instantaneous adaptation techniques for speech recognition. ICASSP. 1995.
    [40] 郑榕.在线学习汉语连续语音识别系统的研究.北京:清华大学电子工程系,1998.
    [41] C. H. Lee, C H Lin, B HJuang. A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Trans. on Acoustic and Speech Signal Processing, 1991, 39(4): 806-814.
    [42] S. M. Ahadi, P. C. Woodland. Rapid speaker adaptation using model prediction. ICASSP. 1995.
    [43] J. Takahashi, S. Sagayama. Vector field smoothed Bayesian learning for fast and incremental speaker/telephone channel adaptation. Computer Speech and Language, 1997, 11 (2): 127-146.
    [44] B. MShahshahani. A Markov randomfield approach to Bayesian speaker adaptation. IEEE Trans. on Speech and Audio Processing, 1997, 5(2): 183-191.
    [45] A. Sankar, C. H. Lee. Maximum likelihood approach to stochastic matching for robust speech recognition [J]. IEEE Trans. on Speech and Audio Processing, 1996, 4 (1): 190-202.
    [46] A. C. Surendran, C H Lee, M Rahim. Nonlinear compensation for stochastic matching [J]. IEEE Trans. on Speech and Audio Processing,1999, 7 (6): 643-655.
    [47] S. Goronzy & R. Kompe (1999) A MAP-Like Weighting Scheme for MLLR Speaker Adaptation. Proc. Eurospeech? 9: 5-8, Budapest.
    [48] A. Gunawardana & W. Byrne (2001). Discounted Likelihood Linear Regression for Rapid Speaker Adaptation. Computer Speech & Language, Vol. 15: 1-14
    [49] A. Gunawardana & W. J. Byrne. Discriminative Adaptation with Conditional Maximum LikelihoodLinear Regression. Presented at NIST Hub5 Workshop, May 2001
    [50] Li Lee, R. Rose. A frequency warping approach to speaker normalization. IEEE Trans. on Speech and Audio Processing, 1998, 6(1): 49-60.
    [51] T Anastasakos et al. Speaker adaptive.training: amaximum likelihood approach to speaker normalization [A]. ICASSP [C]. 1997. 1043-1046.
    [52] D. Pye & P.C. Woodland (1997) Experiments in Speaker Normalisation and Adaptation for Large Vocabulary Speech Recognition.Proc.ICASSP? 7: 1047-1050, Munich.
    [54] L. F. UebeI&P. C. Woodland (1999). An Investigation into Vocal Tract Length Normalisation. Proc. Eurospeech? 9, pp. 2519-2522, Budapest
    [55] Qiguang Lin. Chiwei Che. Normalizing the Vocal Tract Length for Speaker Independent Speech Recognition. IEEE Signal Processing Lettters, 1995, 2(11): 201-203
    [56] Chen Hingdong, Xu. Bo, Huang Taiyi. A new Speech Feature Insensitive ti he Variation of Diffferernt Speakers.Chinese Journal of Electronics, 1999, 8(1): 67-72.
    [57] 陈景东,徐波,黄泰翼.基于Mellin变换的语音新特征与说话人自适应技术的比较.第五届全国人机语音通讯会议论文集.哈尔滨.1998:86-91
    [1] 张金槐,唐雪梅.BAYES方法.长沙:国防科技大学出版社,1993.
    [2] R. O. Duda, P E Hart. Pattern Classification and Scene Analysis. New York: John Wiley, 1973.
    [3] C HLee, C. H. Lin, B HJuang. A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Trans. on Acoustic and Speech Signal Processing, 1991, 39(4): 806-814.
    [4] J L Gauvain, C HLee. Maximum a posteriori estimation for multivariate Gaussian observations. IEEE Trans. on Speech and Audio Processing, 1994, 2(2): 291-298.
    [5] M. J. Lasry and R. M. Stern. A posteriori estimation of correlated jointly Gaussian mean vectors. IEEE Trans On Patern Analysis and Machine Intelligence, vol. PAMI-6, no. 4: 530-535
    [6] S. M. Ahadi, P. C. Woodland. Rapid speaker adaptation using model prediction. ICASSP. 1995.
    [7] J. Takahashi, S. Sagayama. Vector field smoothed Bayesian learning for fast and incremental speaker/telephone2channei adaptation [J]. Computer Speech and Language,1997, 11(2): 127-146.
    [8] B. MShahshahani A. Markov randomfield approach to Bayesian speaker adaptation. IEEE Trans. on Speech and Audio Processing, 1997, 5(2): 183-191
    [9] Qiang Huo and Bin Ma, Online Adaptive Learning of Continous-Density Hidden Markov Models Based on Multiple-Atream Prior Evolution and Posterior Pooling, IEEE Trans. Speech Audio Processing, Vol. 11, No. 1 2003: 388-398
    [10] S. S. Chen and P. DeSouza. Speaker adaptation by correlation, Proc. of Eurospeech: 2111-2114, 1997
    [11] K. Shinoda and C. H. Lee. Unsupervised adaptation using structured Bayes approach, Proc. of IEEE International Conference on Acoustivs, Speech, and Signal Processing: 793-796, 1998
    [12] 何磊,武健.最大后验估计和最近邻线性回归结合的说话人自适应方法.电子学报Vol.28 No.11:55-58.
    [13] 李虎生.汉语数码串语音识别及说话人自适应.北京:清华大学电子工程系,2000.
    [14] C J Leggetter, P C Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 1995, 9 (2): 171-185.
    [15] 郑榕.在线学习汉语连续语音识别系统的研究.北京:清华大学电子工程系,1998.
    [16] JT Chien. Online hierarchicial transformation of hidden Markov models for speech recognition. IEEE Trans. on Speech and Audio processing, 1999, 7(6): 656-667.
    [17] 徐向华,朱杰.决策树结构对说话人自适应影响的研究.声学学报.vol.31,No.1:42-47.
    [18] Chin-Hui, Lee. Learning from Surprises-Statistics and Speech/Speaker Recognition. Speech Lab 20th Anniversary Celebration, Tsinghua University, Beijing, 1999
    [19] JT Chien, C. H. Lee and H. C. Wang. A hybird algorithom for sapeaker adaptation using MAP transformation and adaptation, IEEE Signal Processing Letters, vol. 4. no. 6: 167-169
    [20] H. Botterweck (2000). Very Fast Adaptation for Large Vocabulary Speech Recognition Using Eigenvoices. Proc.ICSLP? 000,Vol Ⅳ, pp. 354-357, Bejing.
    [21] A. Gunawardana, W J. Byrne (2001). Discriminative Adaptation with Conditional Maximum Likelihood Linear Regression. Presented at NIST HubWorkshop, May 2001
    [22] A. Sankar, C. H. Lee. Maximum likelihood approach to stochastic matching for robust speech recognition. IEEE Trans. on Speech and AudioProcessing, 1996, 4 (1): 190-202.
    [23] A. C. Surendran, C. H. Lee, M Rahim. Nonlinear compensation for stochastic matching. IEEE Trans. on Speech and Audio Processing, 1999, 7 (6): 643-655.
    [24] L. Lee, R. Rose. A Frequency Warping Approach to Speaker normalization. IEEE Transaction on Speech and Audio Processing, 1998, 6(1): 49-60
    [25] G. Fant. Speech Sounds and Feature. Cambridge, MA: MIT Press, 1973
    [26] E. Eide, H Gish A Parametrics Approach to Vocal Tract length Normalization. In Proc. Int. Conf. on Acouics, Speech and Signal Processing, 346-348.
    [27] G. Evermann, H. Chan, M. Gales, T. Hain, X. Liu,D. Mrva, L. Wang, P. Woodland: Development of the 2003 CU-HTK Conversational Telephone Speech Transcription System. Proc IEEE Int. Conf on Acoustics, Speech and Signal Processing, Vol. 1:249? 52, Montreal, Canada, May2004
    [28] L. F. Uebel, P. C. Woodland: An investigation into Vocal Tract Length Normalisation. Proc. ISCA Europ. Conf. on Speech Communication and Technology, Vol. 6, pp. 2527? 530, Budapest, Hungary, Sept. 1999
    [1] WANG Zuoying, Liu Feng. Speaker adaptation using maximum likelihood model interpolation. Proceedings of ICASSP. IEEE ICSLP Library Series, 1999, (2): 1368-1372.
    [2] WU Ji, WANG Zouying. A Decision Tree Structured Algorithm of Speaker Adaptation Based on Gaussian Similarity Analysis. Chinese Journal of Electronics, 2001, 10(2): 166-169.
    [3] Hazen T, Glass J, A comparison of novel techniques for instantaneous speaker adap tation. Proceedings of Eurospeech. Rhodes, Greece, IEEE ICSLP Library Series, 1997. (4): 2047 2050.
    [4] R.Kuhn, P.Nguyen, J.C. Junqua Eigenvoices for speaker adaptation. Proceedings of ICSLP. Sydney, Australia: IEEE ICSLP Library Series, 1998. (5): 1771-1774.
    [5] R.Kuhn, P.Nguyen, J.C. Junqua and N. niedzielski. Rapid speaker adaptation in engenvoice space. IEEE Transactions on Speech and Audio Processing. Vol.8 No.6: 695-70, Nov. 2002
    [6] Chen Hingdong, Xu Bo, Huang Taiyi. A New Speech Feature Insensitive to the Variation of Different Speakers. Chinese Journal of Electronics, 1999, 8(1): 67~72
    [7] Chen Hingdong, Xu Bo, Huang Taiyi. A Novel Robust Speech Feature Based on the Mellin Transform and Speaker Normalization. Proc.ISCSLP98. Singapore: 1998: 191-195.
    [8] A.Imamura. Speaker-Adaptive HMM-Based Speech Recognition with A Stochastic Speaker Classifier. In Proc. IEEE Int. Conf. Acoustic, Speech, Signal Proc., 1991, 841~844
    [9] L. Mathan, L. Miclet, Speaker Hierarchical Clustering for Improving Speaker-Independent HMM Word Recognition. In: IEEE, eds. Proc. ICASSP90. 1990, 149~152
    [10] T. Kosaka, S. Sagayama, Tree-Structured Speaker Clustering for Fast Speaker Adaptation. In: IEEE, eds. Proc. ICASSP94, 1994, 1: 245~248
    [11] C. Huang, T. Chen, S. Li, E. Chang and J. L. Zhou, Analysis of Speaker Variability, in Proc. Eurospeech2001, vol.2, pp. 1377-1380, 2001.
    [12] C. Huang, T. Chen and E.Chang, Speaker Selection raining for Large Vocabulary Continuous Speech Recognition, in Proc. ICASSP2002, 2002.
    [13] M. Padmanabhan, L. Bahl, D. Nahamoo and M. Picheny, Speaker Clustering and Transformation for Speaker Adaptation in Speech Recognition Systems, IEEE Trans. Speech and Audio Processing, vol. 6, n1, pp. 71-77, 1998.
    [14] Yamada M, Komori Y, Fast algorithm for speech recognition using speaker cluster HMM. In: Proc EuroSpeech, 1997: 2043-2046
    [15] T sagayama s. Tree-structure speaker clustering for case speaker adaptation. In: Proc. ICASSP, 1994: 245-248
    [16] Kosaka T, Matsunaga S, Sagayama S. Speaker-independent speech recognition based on tree-structured speaker clustering. Computer Speech and Language, 1996: 10: 55-74.
    [17] D. A. reynolds, T. F. Quasieri, and R. B. Dunn. Speaker verificaion using adapted Gaussian mixture models. Digital Signal Processing, Vol. 10: 19-41, 2000
    [18] J. L. Gauvain and C. H, Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. Vol. 2: 291-298, 1994.
    [1] Vapnik. Nature of Statistical Learning Theory.John Wiley and Sons, Inc.,New York, in preparation.
    [2] J.C.Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Bell Laboratories, Lucent Technologies. 1997.
    [3] Filip. Mulier. Vapnik-Chervonenkis(VC) Learning Theory and Its Applications. IEEE Trans. on Neural Networks. Vol. 10, No.5, Sep 1999.
    [4] Edgar Osuna et al. Training Support Vector Machines: an Application to Face Detection.
    [5] Chris J.C. Burges. Simplified Support Decision Rules
    [6] Chris J.C. Burges, Beruhard Scholkopf. Improving the Accuracy and Speed of Support Vector Machines. Neural Information Processing Systems, Vol.9, M. Mozer, M. Jordan, & T. Petsche, eds. MIT Press, Cambridge, MA, 1997.
    [7] P.S. Bradley, O.L. Mangasarian. Feature Selection via Concave Minimization and Support Vector Machines.
    [8] Decoste D, Training invariant support vector machines. Machine Learning 2002, 46: 161-190
    [9] G.ratsch, T.Onada and Muller. K-R.Soft Margins for AdaBoost. Machine learning 42: 287-320
    [10] S.RGunn Support Vector Machines for Classificationan Regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton, 1997.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700