基于CDHMM的噪声环境下口令识别方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音识别系统的噪声鲁棒性是决定语音识别技术从实验室走向实际应用的关键环节,是目前语音识别领域的研究热点与难点。由于语音识别系统是一种基于训练的系统,实际应用环境与形成系统参数的训练环境噪声的失配是造成系统性能下降的主要因素。本文建立了一个基于CDHMM的汉语口令识别系统,以此为基础,从选择强噪声鲁棒性的语音特征参数,及对模型参数的补偿与修正等方面,对加性平稳背景噪声环境下的短语音汉语口令识别方法进行了研究。
     本文的研究内容包括以下几方面:
     1.建立了一个汉语口令短语音识别系统过程,在此过程中研究了HMM参数的训练问题。
     2.分析讨论了训练状态数和混合度的最佳选取问题,通过大量实验得到了适合于汉语短语音(口令)的最佳状态数和混合度。
     3.深入研究和分析了参数级的抗噪问题,通过实验分析比较了静态特征参数以及高阶动态参数之间的抗噪性能,得出了一般噪声环境下性能相对较好的特征参数形式。
     4.在语音特征参数级去噪的基础上,提出了一种基于HMM和倒谱特征的噪声补偿方法,通过对纯净环境下的模型参数的补偿与修正,实现训练环境与测试环境的匹配。通过实验验证了该方法的可行性,并将模型级抗噪与参数级抗噪结合起来,实现了系统较好的噪声鲁棒性。
The noise robustness is one of the crucial factors that have deep influence upon the practicability of the speech recognition system, and then it has become the focus in the research field of automatic speech recognition. Because the speech recognition system is training-based, its performance will degrade drastically when the characteristic of noise in practical environments is far from that of in training environments. In this dissertation, we constitute a Chinese password recognition system based on CDHMM. Serials of techniques like choosing the robust features, amending recognition models and compensating the parameters of models are presented.
    The main content and results of study involved in this dissertation are divided into four parts:
    Firstly, we establish a Chinese password recognition system based on CDHMM and research into the training problem of HMM parameters.
    Secondly, to optimize the system's settings, some important issues about the recognition of short-term speech recognition are discussed, such as the state number of Markov chains, the scale of training-set, the Gauss mixture number, etc.
    Thirdly, we have deeply studied the noise-restrained parameters. With lots of experiments, the capability of static state parameter and dynamic parameter are compared, and then the better parameter is found.
    Fourthly, since the missing between training and practical environments is the fundamental reason for the degradation of performance of automatic speech recognition, we have proposed a method to compensate and amend HMM to adapt noise environments. Experiments show that better noisy robustness can be achieved, especially in stationary background noisy environments.
引文
[1] 杨行峻,迟惠生等.语音信号处理,电子工业出版社,1995.
    [2] 谢锦辉.隐Markov模型(HMM)及其在语音处理中的应用,华中理工大学出版社,1995.
    [3] 胡航.语音信号处理,哈尔滨工业大学出版社,2000
    [4] 陈永彬等.语音信号处理,中国电子科技大学,1990
    [5] RabinerL,Juang. BH Fundamentals of speech recognition[M]. Englewood: PrenticeHall, 1993.340~364.
    [6] 李祖鹏,姚佩阳,一种语音段起止端点检测新方法,《电讯技术》2000年第3期.
    [7] 姚天任,孙洪。现代数字信号处理,华中理工大学出版社1999,pp171-183.
    [8] 伯哓晨,李涛。Matlab工具箱应用指南,电子工业出版社,2000,pp29-36.
    [9] C.H. Lee, F.K. Soong et al., Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publ., Boston, 1996.
    [10] Acero. A., Stem R. M. Cepstral normalization for robust speech recognition. Proceeding of the Speech Processing in Adverse Conditions, 1992, pp. 90-95.
    [11] Michael Kleinschmidt, Combining speech enhancement and auditory feature extraction for robust speech recognition, Speech Communition 34(2001)75-91.
    [12] Olli Viikki,Kari Laurila, Cepstral domain segmental feature vector normalization for noise robust speech recognition, speech Communication 25(1998)133-147.
    [13] Paul M. Baggenstoss. A Modified Baum-Welch Algorithm for Hidden Markov Models with Multiple Observation Spaces. IEEE. Transaction on Speech and Audio Processing, VOL.9, NO.4, MAY 2001.
    [14] M.G. Rahim and B.H.Juang, Signal Bias Removal by Maximum Likelihood Estimation for Robust Telephone Speech Recognition, IEEE, Trans. Acoustic, Speech, Signal Processing,Vol4,no 1,pp19-30,Jan. 1996.
    [15] Kan-Wing Mak, Associate Member, IEEE, and Enrico Bocchieri, Direct Training of Subspace Distribution Clustering Hidden Markov Model Brian IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO4. MAY 2001.
    [16] D.P. Morgan and C.L. Scofield, Neural Networks and Speech Recognition,
    
    Kluwer Academic Publishers, 1991.
    [17]A.E. Rosenberg and F.K. Soong. Evaluation of a vector quantization talker recognition system in text independent and text dependent modes. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 873-876, 1986
    [18]T.Matsui, S. Furui, Comparision of Text-independent Speaker Recognition Methods using VQ-distortion and Discrete/Continous HMMs. CSLP'92.
    [19]刘江华等,支持向量基训练算法综述,信息与控制,2003年2月.
    [20]Charles A. Micchelli, Peder Olsen. Penalized maximum-likelihood estimation, the Baum-Welch algorithm, diagonal balancing of symmetric matrices and application to training acoustic data. ELSEVIER. Journal of Computational and Applied Mathematics 119(2000) 301-331.
    [21]Chow Y L, Maximum Mutual Information Estimation of HMM Parameters for Continuous Speech Recognition Using the N-Best Algorithm. Proc. ICASSP'90, pp701-704.
    [22]Chou W, Juang B H, Lee C H. Segment GPD Training of HMM Based Speech Recognition. Proc. ICASSP'92, 1992,pp450-476.
    [23]Betram E. SHI, et al. SOFT GPD ROR MINIMUM CLASSIFICATION ERROR RATE TRAINNING 2000IEEE.
    [24]张春涛,吴善培 语音识别中基于最小误识率准则的区分训练方法.信号处理.1999年3月.第15卷.第1期.
    [25]Zuoying Wang, Feng Liu, Speaker Adaptation Uing Maximum Likelihood model Interpolation ,Volume 2, Page (NA), paper no. 1368.
    [26]俞一彪 赵鹤鸣,语音信号互信息估计的非线性搜索算法及识别应用,信号处理,第18卷第二期,2002年4月.
    [27]茅晓泉等,一种基于最大互信息/进化计算混合结构的语音识别方法,上海交通大学学报,第36卷第3期,2002年3月.
    [28]Iain B Collings, Tobias Ryden. A New Maximum Likelihood Gradient Algorithm for On-line Hidden Markov Model Identification, IEEE, ICASSP'98, Vol.4, pp.2261~2265, 1998.
    [29]U.V. Chaudhari, J. Navr atil et al., Transformation enhanced multi-grained modeling for text-independent speaker recognition," in Proc. ICSLP, 2000.
    [30]D.A. Reynolds, T.E. Quatieri, and R.B. Dunn, Speaker verification using adapted Gaussian mixture models," Digital Signal Processing, Oct. 2000.
    [31]张炎,张杰等.语音识别中隐马尔可夫模型状态数的研究.南京理工大学学报.1998年6月.第2卷.第3期.
    [32]韩纪庆等,噪声环境下顽健的语音识别系统,电声技术,2002年第1期.
    
    
    [33]Reichl W, Ruske G. On estimating Robust Probability distribution in HMM-based Speech Recognition, IEEE Transaction on Speech and Audio-Processing, 1995. Vol.3, No.4, pp. 269-289.
    [34]T.Matsui, S. Furui, A text-independent speaker recognition method robust against utterance variations," in Proc. ICASSP '91, pp. 333-38.
    [35]http://www.nist.gov/speech/tests/spk
    [36]http://www.hsn.com
    [37]http://htk.eng.cam.ac.uk
    [38]http://www.research.att.com/projects/SCANmail
    [39]Montfi Karnjanadecha and Stephen A.Zahofian. Signal Modeling for High-Performance Robust Isolated Word Recognition. IEEE TRANSACTION ON SPEECH AND AUDIO PROCESSING, VOL.9, NO.6, SEP. 2001.
    [40]Alvin A, Garcia, Richard J.Mammone, Channel-Robust Speaker Identification Using Modified-Mean Cepstral Mean Normalization with Freqency Warping,ICASSP'99, pp. 157-167, March 1992.
    [41]李霄寒、戴蓓倩等,“高阶MFCC的话者识别性能及其噪声鲁棒性”,《信号处理》,17(2),2001年.
    [42]徐金甫.基于功率谱差分的语音特征机器在带噪语音识别中的作用.计算机工程与应用2001年9月.
    [43]Ruhi Sarikaya and John H. L. Hansen, High Resolution Speech Feature Parametrization for Monophone-Based Stressed Speech Recognition. IEEE SIGNAL PROCESSING LETTERS, VOL. 7, NO. 7, JULY 2000.
    [44]Yuexian Zou, S.C. Chan, T.S. Ng, A Robust M-estimate Adaptive Filter For Impulse Noise Suppression. Volume 4, Page (NA), pp. 1948, ICASSP'99.
    [45]Nam Soo Kim, Member, IEEE, Filtering on Hidden Markov Models, 2000IEEE.
    [46]Enrico L Bocchieri, Vassilis Digalakis, Correlation Modeling Of MLLR Transform Biases For Rapid HMM Adaptation To New Speakers, Volume 2, pp. 2343, ICASSP'99.
    [47]Olivier Siohan, Chin Hui Lee, Arun C Surendran, Qi Li, Background Model Design for Flexible and Portable Speaker Veri_cation Systems, Volume 2, Page(NA), pp. 2068. ICASSP'99.
    [48]Chuang He, George Zweig, Adaptive Two-Band Spectral Subtraction with Multi-window Spectral Estimation,Volume 2, Page(NA), pp.1809, ICASSP'99.
    [49]Gales, M. J. F., Young. etal. Cepstral parameter compensation for HMM recognition in noise. Speech Communication1993, pp. 231-240.
    [50]Jeih-weih Huang, Improved Robustness for Speech Recognition Under Noisy
    
    Conditions Using Correlated Parallel Model Combination, 1998 IEEE.
    [51]S.P. Kishore, B. Yegnanarayana, S.V. Gangashetty, Online text-independent speaker verification system using autoassociative neural network models,"Proc. IJCNN'01, 2001.
    [52]T. Matsui, S.Furui, Comparison of Text-Independent Speaker Recognition Using Vocal Tract and Pitch Information, Proc. ICSLP'90, pp134-160, Nov. 1990, Kobe, Japan.
    [53]M.J. Gales and S.J. Young, "Robust Speech Recognition in Additive and Convolutional Noise Using Parallel Model Combination ", Computer, Speech and Language 9,pp289-307,1995.
    [54]Hans-Gunter Hirsch, HMM adaptation for applications in telecommunication 2001 Elsevier Science.
    [55]Edmondo Trentin, A survey of hybrid ANN/HMM models for automatic speech recognition, 2001 Elsevier Science.
    [56]Zhimin Liu, Xihong Wu, Bin Zhen, Huisheng Chi, An auditory feature extraction method based on forward-masking and its application in robust speaker identification and speech recognitionl ICASSP'2000.
    [57]张有为等,一种噪声环境下语音命令识别控制器的设计与实现,电子技术应用,2000年第四期.
    [58]Chang D Yoo, Utilizing Interband Acoustical Information for Modeling Stationary Time-Frequency Regions of Noisy Speech, Volume 2, Page (NA), pp.2435, ICASSP'99.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700