连续语音识别的稳健性技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
说话人差异,信道失真和背景噪声导致训练环境和测试环境不匹配,严重影响了非特定人连续语音识别系统的性能。为提高中文连续语音识别系统的稳健性和自适应能力,本文从信号空间、特征空间和模型空间三个方面对说话人归一化、语音增强、端点检测、特征补偿和不确定解码等关键技术进行了深入研究和分析,提出了一些新的思路和方法,并以大量的实验予以证明。本文主要完成了如下工作:
     1.将双线性频率弯折方法引入到声道长度归一化中。传统的频率弯折方法存在声道模型假设过于简单,变换后信号频谱带宽改变的问题。本文根据双线性变换中低通滤波器截止频率的映射公式,求出对齐不同说话人或人群第三共振峰的频率弯折因子。利用该频率弯折因子,对Mel滤波器组的位置和宽度进行双线性变换,得到声道长度归一化的特征矢量。该方法避免了对弯折因子的线性搜索,同时还利用了双线性变换使弯折频谱连续且无带宽改变的优点。实验证明,该方法是一种快速的、尤其适用于无监督模式下的稳健性方法。语音特征参数经过声道长度归一化后,在孤立词识别中,成年男性语料训练的基线系统对成年女性语料的识别率从71.50%提高到了91.00%,对儿童语料的识别率从71.00%提高到了84.00%;在连续语音识别中,男性语料训练的HMM声学模型参数集对女性语料的识别率从13.91%提高到了50.56%。
     2.采用高斯混合模型(Gaussian Mixture Model,GMM)分类器对测试语句的信道环境进行分类。在多信道环境下进行语音识别时,当基线系统的信道环境与测试语句的信道环境匹配,识别率要明显高于用某一种信道语料或多种信道语料混合训练的基线系统的识别率。如果用各信道的语料分别建立一个GMM模型,信道的差别就转而体现在各GMM的差别上,且具有可分性。本文用各电话信道的洲练语料训练出相应的GMM信道模型和HMM声学模型,在识别时候,对测试语句进行信道分类,选择相应信道下的HMM声学模型对该语句进行识别。实验结果表明,该方法能有效提高多信道环境下的语音识别率。
     3.推导了一种基于离散余弦变换和听觉掩蔽效应的子空间降噪算法。本文采用离散余弦变换来逼近本征分解中的Karhunen-Loeve变换,用基于Johnston掩蔽模型的感知滤波器对降噪后的语音进行后置滤波。该方法利用基于离散余弦变换的本征分解快速算法,可将运算复杂度O(N~3)减少到N~2,同时能有效地抑制残差噪声。
     4.提出了特征空间能量熵的定义。当背景噪声为有色噪声或噪声能量可变时,传统的语音端点检测方法往往失效。带噪语音的空间可分为正交的信号加噪声子空间和噪声子空间。语音信号是由确定性的非线性动力系统产生,所以它的能量将集中在信号加噪声子空间。而随机噪声的能量在整个带噪语音空间中近似均匀分布。因此语音和噪声具有不同的空间能量分布,有着不同的空间能量熵。本文对语音信号的协方差矩阵进行本征分解,由特征值求出信号在特征空间能量概率分布,提出了特征空间能量熵的
The inter-speaker variation, channel distortion and background noise result in the mismatch between the training condition and the testing condition. The mismatch degrades significantly the performance of the speaker-independent continuous speech recognition system. In order to increase the robustness and adaptation ability of Chinese continuous speech recognition, speaker normalization, speech enhancement, endpoint detection, feature compensation and uncertainty decoding methods respectively viewed from signal space, feature space, model space are studied in detail in this dissertation. Some new methods are proposed by using a lot of experiments. The main contributions of the dissertation are as follows:
    1. A vocal tract length normalization method based on the bilinear frequency warping is proposed. The traditional frequency warping methods have the faults that the vocal tract model is too simple and the bandwidth (BW) of the transformed signal differing from that of the original. We compute the frequency warp factor by the cut-off frequency map of the prototype low-pass filter to the desired low-pass filter. Then the Mel filterbanks are adjusted by bilinear frequency warping to get the vocal tract normalization MFCC. The method avoids the exhaustive search for the frequency warp factor and warps the spectrum continuous without suffering the bandwidth problem. It is proved to be a quite fast adaptation technique, and especially suitable for the unsupervised adaptation. The effectiveness of this method is examined on isolated and continuous speech recognition. The baseline isolated digit recognizer is trained on adult males' data and the baseline continuous speech recognizer is trained on men's data respectively. After the vocal tract normalization, in isolated digit speech recognition, the recognition accuracy of adult female's isolated digit is improved from 71.50% to 91.00% and that of children's isolated digits is improved from 71.00% to 84.00%. In continuous speech recognition, the recognition accuracy of continuous speech of women is improved from 13.91% to 50.56%.
    2. In order to increase the robustness of speech recognition in multi-channel environment, a GMM (Gaussian Mixture Model)-based channel classifier is used. If the speech signals filtered by a kind of channel are modeled by a GMM, the difference of the channels can be characterized by the GMM. The GMMs of different channels are discriminable. A GMM-based channel classifier is used to the select a most likely HMM from pre-trained HMMs of each specific telephone channel environment. The selected HMM is used as the reference HMM to recognize each utterance. The results of Mandarin continuous speech recognition show that the proposed speech recognition scheme is an efficient framework to enhance the robustness of speech recognition in multi-channel environment.
    3. A speech enhancement algorithm based on discrete cosine transform and hearing masking properties is deduced. The discrete cosine transform is used to approximate to Karhunen-Loeve transform (KLT) in the subspace-based speech enhancement, which reduces the computation of eigenvalues of a N×N symmetric Toeplitz matrix
引文
[1] http://www.nist.gov/speech/tests/index.htm
    
    [2] Digalakis V, Rtischev D, Neumeyer L G. Speaker adaptation using constrained estimation of Gaussian mixture. IEEE Trans, on Speech and Audio Process, 1995, 3(5): 357-366
    [3] C. J. Leggetter, P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 1995,9(2): 171-185
    [4] A. Sankar, C. H. Lee. Maximum Likelihood approach to stochastic matching for robust speech recognition. IEEE Trans, on Speech and Audio Processing, 1996, 4(1): 190-202
    [5] Lee C-H, Gauvain J L. Speaker adaptation based on MAP estimation of HMM parameters. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, 652-655
    [6] J. L. Gauvain, C. H. Lee. Maximum a posteriori estimation for multivariate Gaussian observations. IEEE Trans, on Speech and Audio Processing, 1994, 2(2): 291-298
    [7] Huo Q, Chan C, Lee C-H. Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition. IEEE Trans. On Speech and Audio Processing, 1995, 3(5): 334-345
    [8] L. Lee, R. Rose. A frequency warping approach to speaker normalization. IEEE Transaction on Speech and Audio Processing, 1998, 6(1): 49-60
    [9] E. Eide, H. Gish. A Parametric Approach to Vocal Tract Length Normalization. Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, 1996, 346-348
    
    [10] J. McDonough, W. Byrne, X. Luo. Speaker Normalization with All-Pass Transforms. IEEE International Conference on Acoustics , Speech , and Signal Processing, 1999, 6: 2307-2310
    [11] P. Zhan, A. Waibel, Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition. CMU-CS-97-148, 1997, Carnegie Mellon University, Pittsburgh
    [12] Tranzai LEE, Fang ZHENG, Wenhu WU. Reference point alignment frequency warp method for speaker adaptation. International Conference on Signal Pocessing, 2000, 2: 756-759,
    [13] Galse M J F. Maximum likelihood linear transformation for HMM-based speech recognition. Computer Speech and Language, 1998, 12: 75-98
    [14] Mokbel C, Pachesleal P, Monne J. Compensation of telephone line effect for robust speech recognition. International Conference on Spoken Language Processing, 1994, 987-990
    [15] Liu F. H, Stern R M, Acero A, Moreno P. J. Environmental normalization for robust speech recognition using direct cepstral comparison. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994, 2: 61-64
    
    [16] Rahim M. G, Juang B. H. Signal bias removal method by maximum likelihood estimation for robust telephone speech recognition. IEEE Trans, on Speech and Audio Processing, 1996, 4(1): 19~30
    [17] Rahim M. G, Juang B. H. Chou, W. Buhrke, E. Signal conditioning techniques for robust speech recognition. IEEE Signal Processing Letters, 1996, 3(4): 107~109
    [18] Vaseghi S, Milner B. A comparative analysis of channel robust features and channel equalization methods for speech recognition. Int. Conference on Spoken Language Processing, 1996, 877~880
    [19] Boll S, Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics Speech and Signal Processing, 1977, 27(2): 113~120
    [20] Sim B, Tong Y, Chang J, Tan C. A Parametric Formulation of the Generalized Spectral Subtraction Method. IEEE Transactions on Speech and Audio Processing, 1998, 6(4): 328~337.
    [21] S. Kamath, P. Loizou. A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, 4164~4168
    [22] Y. Ephraim, H. L. V. Trees. A signal subspace approach for speech enhancement. IEEE Trans. Speech and Audio Processing, 1995, 3: 251~266
    [23] Rolf Vetter, Nathalie Virag. Single channel speech enhancement using principal component analysis and MDL subspace selection. EUROSPEECH, 1999, 5: 2411~2414
    [24] Y. Hu, P. C. Loizou. A subspace approach for enhancing speech corrupted by colored noise. IEEE Signal Processing Letter, 2002, 9: 204~206
    [25] Y. Ephraim, H. L. V. Trees. A signal subspace approach for speech enhancement. IEEE Trans. Speech and Audio Processing, 1995, 3: 251~266
    [26] J. D. Johnston. Transform coding of audio signals using perceptual noise criteria. IEEE J. on Selected Areas in Comm. 1988, 6: 314~323
    [27] N. Virag. Signal channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech and Audio Processing, 1999, 7:126-137
    [28] 赵以宝,王祁,聂伟,孙圣和.一种基于数据融合的多话筒识别方法 计算机研究与发展,1988,36(9):1148~1152
    [29] 王炳锡等.实用语音识别基础.北京:国防工业出版社,2004年
    [30] M. Hamada, Y. Takizawa, T. Norimatsu. A noise robust speech recognition system. Int. Conference on Spoken Language Processing, 1996, 877~880
    [31] Bou-Ghazale. S.E, Assaleh K. A robust endpoint detection of speech for noisy environments with alication to automatic speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, 1753~1756.
    [32] Jean-Claude Junqua, Brian Mak, Ben Reaves. A robust algorithm for word boundary detection in the presence of noise. IEEE Transaction on Speech and Audio Processing, 1994, 2(3): 357~366,
    [33] Jia-lin Shen, Jeih-weih Hung, Lin-shan Lee. Robust entropy-based endpoint detection for speech recognition in noisy environments. International Conference on Speech Processing, 1998, 232~235
    [34] Huang L S, Yang C H. A novel approach to robust speech endpoint detection in car environments. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000, 3: 1751~1754.
    [35] 王让定,柴佩琪.一个基于谱熵的语音端点检测改进方法.信息与控制,2004,33(1):77~81
    [36] Stefaan Van Gerven, Fei Xie. A comparative study of speech detection methods. European conference on Speech Communication and Technology, 1997, 3: 1095~1098
    [37] Kosmides E, Dermatas E, Kokkinakis G. Stochastic endpoint detection in noisy speech. International Workshop on Speech and Computer, 1997, 109~114
    [38] 朱杰,韦晓东.噪声环境中基于HMM模型的语音信号端点检测方法.上海交通大学学报,1998,32(10):14~17
    [39] Gin-Der Wu, Chin-Teng Lin. A Recurrent Neural Fuzzy Network for word boundary detection in variable noise-level environment. IEEE Transaction on Speech and Audio Processing, 2001, 31(1): 357~366
    [40] Davis S. B, Mermelstein P, Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences. IEEE Transaction on Speech and Audio Processing, 1980, 28(2): 357~366
    [41] L. R. Rabiner, R. W. Schafer. Digital processing of speech signals. Prentice-Hall, 1978.
    [42] Hynek Hermansky, Nelson Morgan. RASTA Processing of Speech. IEEE Transaction on Speech and Audio Processing, 1994, 2(4): 587~589
    [43] Yapanel U.H, Dharanipragada S., Perceptual MVDR-based Cepstral Coefficients(PMFCCS) for robust speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, 2003, 1: 644~647
    [44] Matthias Wolfel, John McDonough, Alex Waibel, Kristian Kroschel. Scaled minimum variance distortionless response spectral estimation for robust speech recognition. IEEE Signal Processing Magazine, 2005, 117~123
    [45] Yapanel U.H, Hansen J.H.L. A new perspective on feature extraction for robust in~vehicle speech recognition. Eurospeech, 2003, 1281-1284
    [46] Shajih Ikbal, Hemant Misra, Herv'e Bourlard. Phase Autocorrelation(PAC) derived robust speech features. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003, 133~136
    [47] Shajih Ikbal, Hemant Misra, Herv'e Bourlard, Hynek Hermansky. Phase Autocorrelation(Pac) Features In Entropy Based Multi-Stream For Robust Speech Recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, 17~21
    [48] P. J. Moreno, B. Raj, R. M. Stern. A vector Taylor series approach for environment-independent speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996, 733-736
    [49] Angel de la Torre, Antonio M. Peinado, Jose C. Segura. Histogram Equalization of Speech Representation for Robust Speech Recognition. IEEE Trans on Speech and Audio Processing, 2005, 13(3): 355-366
    [50] Brendan J. Frey, Li Deng, Alex Acero, Trasusi. Kristjansson. ALGONQUIN: Iterating Laplace's method to remove multiple types of acoustic distortion for robust speech recognition. Proc. Eurospeech, 2001, 901-904,
    [51] Mohamed Afify. Accurate Compensation in the Log-Spectral Domain for Noisy Speech Recognition. IEEE Trans on Speech and Audio Processing, 2005, 13(3): 388-398
    [52] Li Deng, Jasha Droo, Alex Acero. A Bayesian approach to speech feature enhancement using the dynamic cepstral prior. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, 1: 829-832
    [53] M. Lieb, A. Fischer. Experiments with the Philips continuous ASR system on the AUROR: A noisy digits database. Proc. Eurospeech, 2001.
    [54] M. J. F Gales, S. J. Young. Robust Continuous Speech Recognition Using Parallel Model Combination. IEEE Trans. On Speech and Audio Processing, 1996, 4(5): 352-359
    [55] Q. Huo, C. Lee. A Bayesian predictive approach to robust speech recognition. IEEE Trans. Speech Audio Process, 2000, 8(8): 200-204
    [56] H. Jiang, Q. Huo. Robust speech recognition based on Bayesian prediction approach. IEEE Trans. Speech Audio Processing, 1999, 7: 426-440
    [57] J. Droo, A. Acero, L. Deng. Uncertainty decoding with SPLICE for noise robust speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002,1:57-60
    [58] H. Liao. M. J. F. Gales. Uncertainty Decoding for Noise Robust Automatic Speech Recognition. CUED/F-INFENG/TR 499, 2004
    [59] T. Kristjansson, B. Frey. Accounting for uncertainty in observations: a new paradigm for robust automatic speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, 61-64
    [60] Li Deng, Jasha Droo, Alex Acero. Dynamic Compensation of HMM Variances Using the Feature Enhancement Uncertainty Computed From a Parametric Model of Speech Distortion. IEEE Trans on Speech and Audio Processing, 2005,13(3): 412-421
    [61] Xu Wang, Wang Bingxi, Ding Qi. Frequency warping approach for vocal tract length normalization in speech recognition. Proceedings of the Third International Symposium on Instrumentation Science and Technology, 2004, 2: 494-499
    [62] Xu Wang, Wang Bingxi, Ding Qi. A bilinear transform approach for vocal tract length normalization. The 8th Int. Conf. on Control, Automation, Robotics and Vision, 2004, 39-41
    [63] Xu Wang, Peng Xuan, Wang Bingxi. A GMM-based telephone channel classification for mandarin speech recognition. 7th Int. Conf. on Signal Processing, 2004,1: 645-648
    [64] 徐望,王炳锡,丁琦 一种基于信号子空间和听觉掩蔽效应的语音增强方法 信号处理,2003,20(2):112~115
    [65] 徐望,丁琦,王炳锡 一种基于特征空间能量熵的语音信号端点检测算法 通信学报,2003,24(11):125~132
    [66] Xu Wang, Wang Bingxi. A speech endpoint detector based on space-energy-entropy. Journal of Acoustic Science and Technology, 2004, 25(1): 54~57,
    [67] 徐望,王炳锡.一种基于0阶矢量Taylor级数的不确定性解码方法 声学学报 已投
    [68] Wang Xu, Bingxi Wang. A Noise Robust Front-End for ASR Using Wiener Filter, Probability Model and CMS. IEEE International Int. Conference on Natural Language Processing, and Knowledge Engineering, 2005, 102~105
    [69] Xu wang, Wang Bingxi, Inference in Non~Gaussian Probability Model for Robust Speech Recognition. Journal of Dynamics of Continuous, Discrete and Impulsive Systems, Series B, 2006,
    [70] J. D. Johnston. Transform coding of audio signals using perceptual noise criteria. IEEE J. on Selected Areas in Communication, 1988, 6:314-323
    [71] Tokuda K., Masuko T, Kobayashi T, Imai S, Mel-generalized Cepstral Analysis-A Unified Approach to Speech Spectral Estimation. International Conference on Spoken Language Processing, 1994
    [72] Smith, J. O, Abel J. S. Bark and ERB Bilinear Transforms. IEEE Trans. Speech and Audio Processing, 1999
    [73] B. L. Pellom, J. H. L. Hansen. An Improved Constrained Iterative Speech Enhancement for Colored Noise Environments. IEEE Trans. Speech and Audio Processing, 1998, 6(6): 573~579
    [74] http://spib.rice.edu/spib/data/signals/noise
    [75] 李净,徐明星.汉语连续语音识别中声学模型基元比较:音节、音素、声韵母.第六届全国人机语音通讯会议,2001,267~272
    [76] Rabiner L, Juang B.H, Fundamentals of Speech Recognition. Prentice Hall, 1993
    [77] Baul L R, Brown P F. Maximum mutual information of hidden Markov models parameter. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1986, 49~52
    [78] Juang B-H, Katagiri S. Discriminative learning for minimum error rate training. IEEE Trans. on Signal Processing, 1992, 40: 3043~3054
    [79] Juang B-H. Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains. AT&T Technical Journal, 1985, 64(6):, 1235~1250
    [80] P. Kenny, R. Hollan, V. N. Gupta. A* admissible heuristics for rapid lexical access. IEEE Trans on Speech and Audio Processing, 2005, 1(1): 412~421
    [81] Huang X-D, Acero A, Hon H. Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall Press, 2001
    [82] Ney H, Aubert X. A word graph algorithm for large vocabulary. Proc. International Conference on Spoken Language Processing, 1994, 1355~1358
    [83] Bridle J. S, Brown M. D, Chamberlain R. M. An algorithm for connected word recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1982
    [84] 徐望,王炳锡N~gram语言模型中的插值平滑技术研究” 信息工程大学学报,2002,3(4):13~15
    [85] Ney H. Essen U, Kneser R. On Structuring probabilistic dependencies in stochastic language modelling. Computer Speech and Language, 1994, 8(1): 1~38
    [86] S. Chen, J. Goodman. An empirical study of smoothing techniques for language modeling. TR-10-98, 1998, Harvard University
    [87] F. Jelinek, Mercer L. R, Roukos S. Principles of lexical language modeling for speech recognition. Advances in Speech Signal Processing. Marcel Dekker eds, 1991
    [88] Wei Xu, Alex Rudnicky. Can Artificial Neural Networks Learn Language Models? International Conference on Spoken Language Processing, 2000
    [89] Woosung Kim, Sanjeev Khudanpur. Smoothing Issues in the Structured Language Model. Proc. 7th European Conf. on Speech Communication and Technology, 2001, 1: 717~720
    [90] Kneser R, Ney H. Improved backing-off for n-gram language modeling. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1995, 181~184,
    [91] 赵以宝,孙圣和.一种基于单字统计二元文法的自组词音字转换算法.电子学报,1998,26(10):55~59,
    [92] 黄顺珍,方棣棠.基于拼音模型的声学层识别的研究.中文信息学报,2002,16(3):46~51
    [93] S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK book, 1999
    [94] 何磊.语音识别中的说话人鲁棒性和自适应技术研究,博士论文,清华大学 2001
    [95] L. Richard R. A frequency warping approach to speaker normalization. IEEE Trans. on Speech and Audio Processing, 1998, 6(1): 49~60
    [96] Tom Claes, Ioannis Dologlou, Louis ten Bosch. A Novel Feature Transformation for Vocal Tract Normalization in Automatic Speech Recognition. IEEE Transaction on Speech and Audio Processing., 1998 6(6): 549~557
    [97] H. Wakita. Normalization of vowels by vocal tract length and its application to vowel identification. IEEE Trans on Acoustic, Speech, Signal Processing, 1977, 25: 183~192
    [98] Yoon Kim, Smith, J. O. A speech feature based on Bark frequency warping-The non-uniform linear prediction(NLP) ceptrum. Proc. IEEE workshop on Application of Signal Processing to Audio and Acoustics, 1999, 131~134
    [99] Naito M, Deng L, Sagisaka Y. Model-based speaker normalization methods for speech recognition. Proc. Eurospeech, 1999, 2515~2518
    [100] 何磊,方棣棠,吴文虎,最大后验估计和加权近邻回归结合的说话人自适应方法.清华大学学报(自然科学版),2001,1(41):60~63
    [101] J. Kupin. A wireline simulator(software). The Center for Communications Research-Princeton, Apr. 1993.
    [102] Peng Xuan, Wang Xu, Bingxi Wang. Speaker clustering via novel pseudo-Divergence Gaussian mixture models. IEEE International Conference on Natural Language Processing, and Knowledge Engineering, 2005, 111~115
    [103] Kosaka T, Matsunaga S, Sagayama S. Speaker-independent speech recognition based on tree-structured speaker clustering. Computer Speech and Language, 1996, 10: 55~74
    [104] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984, 32(6): 1109~1121
    [105] Y. Ephraim, D. Malah. Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Trans. on Acoustic, Speech, Signal Processing, 1985, 33: 443~445
    [106] Andre Adami, Lukas Burget, Stephane Dupont. QUALCOMM-ICSI-OGI Features for ASR. Proc. 7th International Int. Conference on Spoken Language Processing, 2002, 4~7
    [107] Y. Hu, P. C. Loizou. A perceptually motivated subspace approach for speech enhancement. International Conference on Spoken Language Processing, 2002, 1797~1800
    [108] P. Kroon, B. S. Atal. Predictive coding of speech using analysis by synthesis techniques, in Advances in speech signal processing, S. Furui and M. Sondhi, Eds, 1992, 141~164.
    [109] F. Jabloun, B. Champagne. On the use of masking properties of the human ear in the signal subspace speech enhancement approach. Proc. International Workshop on Acoustic Echo and Noise Control, 2001, 199-202.
    [110] J. Huang, Y. Zhao. A DCT-based fast signal subspace technique for robust speech recognition. IEEE Trans. Speech and Audio Processing, 2000, 8: 747~751.
    [111] Method for objective measurements of perceived audio quality, Recommendation ITU_RBS. 1387, International Telecommunication Union, 1999.
    [112] N. Virag. Signal channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech and Audio Processing, 1999, 7: 126-137
    [113] M. Hawkes, A. Nehorai, P. Stoica. Performance breakdown of subspace-based methods: Prediction and cure. IEEE Conference on Acoustics, Speech, Signal Processing, 2001, 6: 4005~4008
    [114] S. R. Quackenbush, T. P. Barnwell Ⅲ, M. A. Clements. Objective Measures of Speech Quality. Prentice-Hall, 1988.
    [115] J.Hansen, B.Pellom. An effective quality evaluation protocol for speech enhancement algorithms. Int. Conf. Spoken Language Processing, 1998, 2819~2822
    [116] 董远,胡光锐.多重分形维数在语音分割和语音识别中的应用.上海交通大学学报,33(11):1998,1406~1408
    [117] 林嘉宇,王跃科,黄芝平,沈振康.一种基于混沌的语音、噪声判别方法.通信学报,2001,22(2):123~128
    [118] 徐望,丁琦,王炳锡,一种基于高维嵌入的词边界检测算法声学技术(增刊),2002,21:371~375,
    [119] F.Beritelli. A robust endpoint detector based on differential parameters and fuzzy pattern recognition. Proc. Int. Conf. on Signal Processing, 1998, 601~604
    [120] P. J. Moreno. Speech Recognition in Noisy Environments. Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, 1996.
    [121] J. C. Segura, A. de la Torre, M. C. Benitez, A. M. Peinado. Model based compensation of the additive noise for continuous speech recognition. Experiments using the Aurora-Ⅱ database and tasks. EuroSpeech 2001, 221-224.
    [122] A. Papoulis, S. U. Pillai. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 2002.
    [123] R. Price. A useful theorem for nonlinear devices. Tans. inform. Theory, 1958, 69~72
    [124] Brendan J. Frey, Trausti T. Kristjansson, Li Deng, Alex Acero. Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition. Neural Information Processing Systems Conference, Signal Processing, 2001, 1165~1171
    [125] Jorge Nocedal, Stephen J. Wright. Numerical Optimization. Springer-Verlag, 1999.
    [126] Trasusi. Kristjansson, Ramesh Gopinath. Ceptrum Domain Laplce Denoising. unpublished
    [127] T. Kristjansson. Speech Recognition in Adverse Environments: a Probabilistic Approach. PhD thesis, University of Waterloo, Canada, 2002.
    [128] J. Droo, A. Acero, L. Deng. Uncertainty decoding with SPLICE for noise robust speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, 1:57~60
    [129] L. Deng, A. Acero, M. Plumpe, X.D. Huang. Large vocabulary speech recognition under adverse acoustic environments. Proceedings of the International Conference on Speech and Language Processing, 2000, 806~809
    [130] J. Droo, L. Deng, A. Acero. Evaluation of the SPLICE algorithm on the Aurora2 database(web update). Proc. Eurospeech 2001, 217~220
    [131] H. Liao, M. J. F. Gales. Joint Uncertainty Decoding for Noise Robust Speech Recognition. Proc. Interspeech, September 2005
    [132] J. C. Segura, M. C. Benitez, A. de la Torre, S. Dupont, A. J. Rubio. VTS residual noise compensation. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000, 409~412.
    [133] Qifeng Zhu, Abeer Alwan. Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise. Computer Speech and Language, 2003, 17:381~402
    [134] 李虎生,刘加,刘洞生.语音识别说话人自适应研究现状及发展趋势.电子学报,2003,31(1),103~108
    [135] He L, Fang D-T, Wu W-H. Speaker normalization training and adaptation for speech recognition. Proceedings of the International Conference on Speech and Language Processing, 2000
    [136] Takahiro Shinozaki, Sadaoki Furui. Hidden mode HMM using Bayesian network for modeling speaking rate fluctuation. Proc. Automatic Speech Recognition and Understanding, 2003, 417~422
    [137] K. Daoudi, D. Fohr, C. Antoine. Dynamic Bayesian Networks for Multi-Band Automatic Speech Recognition. Computer Speech and Language, 2003, 17: 263~285.
    [138] J. A. Bilmes. Graphical Models and Automatic Speech Recognition. Mathematical Foundations of Speech and Language Processing, Institute of Mathematical Analysis Volumes in Mathematics Series, Springer-Verlag, 2003
    [139] 刘加.汉语大词汇量连续语音识别系统研究进展.电子学报,2000,28(1):85~91
    [140] 张建平.大词汇量自然连续语音识别中的语言模型和理解算法研究.博士论文,清华大学,1999

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700