用户名: 密码: 验证码:
噪声环境下的语音识别算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着语音识别技术的快速发展,语音识别系统的性能得到了大幅度的提升,它作为一种方便、快捷、有效的人机交互方式,逐渐步入了人们的生活。然而,这些系统在实际使用过程中往往会遇到识别环境和训练环境不匹配的情况,从而使得识别器性能急剧下降。因此,如何提高语音识别系统在背景噪声环境下的鲁棒性成为其走出实验室,走向人们生活的关键问题之一。
     本文在总结和分析现有鲁棒语音识别算法的基础上,依据噪声对语音识别系统的影响,从信号空间、特征空间及模型空间三个层面上展开了语音增强、特征增强及语音模型补偿\增强等方面的研究工作,本文主要的工作及创新点如下:
     提出基于子带谱熵的噪声谱动态估计方法,改进了基于先验信噪比的维纳滤波算法。所提算法首先利用子带谱熵对带噪语音信号进行端点检测以区分有声段与无声段,在此基础上,对无声段数据逐帧地估计噪声功率谱并将当前帧所估计的功率谱与前一帧所估计的功率谱进行加权处理,以加权后的功率谱代替固定的噪声功率谱来进行先验信噪比估计。实验结果表明所提算法可以有效提高语音识别系统的识别正确率。
     研究了基于多次自相关运算的去噪算法,其目的是保证在去噪的同时而不改变语音信号的频谱结构。算法利用语音信号的多次自相关序列受噪声影响不大这一特点,提出了使用多次自相关后的观测序列来替代带噪语音信号序列作为语音识别系统输入,从而实现对噪声的抑制。本文给出了算法的推导过程,进行了不同相关次数下的语音识别实验,并对结果进行了分析。
     提出一种使用频域独立分量分析(Independent Component Analysis, ICA)的方法进行语音信号鲁棒特征提取的新算法,用以解决在卷积噪声环境下语音信号的训练与识别特征不匹配问题。该算法通过短时傅里叶变换(Short Time Fourier Transform, STFT)将带噪语音信号从时域转换到频域后,采用频域ICA方法从带噪语音的短时谱中分离出语音信号的短时谱,然后根据所得到的语音信号短时谱计算美尔倒谱系统(Mel Frequency Cepstrum Coefficient, MFCC)及其一阶差分作为特征参数。实验结果表明基于频域ICA方法的语音特征参数在卷积噪声环境下具有良好的鲁棒性。
     提出了一种基于动态时间规整(Dynamic Time Warping, DTW)的排序新方法,用以解决语音信号频域ICA算法中出现的排序模糊问题。这种方法依据相邻频点间信号具有较高相似度这一特点,通过采用动态时间规整技术实现对相邻频点数据的比较并根据比较结果调整排序位置,实验结果表明基于动态时间规整的排序算法能有效减少频域ICA算法中排序错误次数,提高分离语音质量。
     深入研究了在加性噪声与卷积噪声环境下使用并行模型合并算法(Parallel Model Combination, PMC)进行模型补偿的基本原理,推导了两种情况下算法的实现过程;提出了一种基于双通道的卷积环境下噪声估计的新方法,即首先在参考通道内使用频域ICA方法作对语音和噪声的短时谱进行分离,然后在主通道内使用带噪语音信号短时谱减去由参考通道所估计的“纯净”语音信号短时谱即可得到噪声的短时谱。实验部分验证了卷积环境下噪声估计的准确性,语音识别结果表明PMC模型补偿算法可以有效提高语音识别系统在噪声环境下的鲁棒性。
     提出了并行子带隐马尔可夫模型(Hidden Markov Model, HMM)和神经网络(Neural Network, NN)混合的鲁棒语音识别模型,用以解决语音识别系统在噪声环境下当部分频带受到干扰时,基于全频带HMM的语音识别系统的识别率将会下降这一问题。混合模型是将全频带的HMM分解成许多子带HMM,并在每个子带上独立地进行语音识别,然后根据多个子带的输出再次提取出一些新的特征参数,通过神经网络对这些参数进行融合来产生一个全局的决策结果。语音识别实验结果表明,所提混合模型在噪声环境下具有较强的鲁棒性。
With the development of the speech recognition technology, more advancement has been seen in the performance of Automatic Speech Recognition (ASR) system. As a convenient, quick and effective style of the Human Computer Interface (HCI), ASR system has gotten an access to people's daily life gradually. However, due to the mismatch between the training and recognition environment, the performance of these ASR systems will dramatically degrade in practice. Therefore, how to improve the robustness of ASR system has become a key point which decides whether it can be widely used in practical conditions.
     Based on the summarization and analysis of different robust speech recognition algorithms and the influence of noise on ASR system, some research in the aspects of speech enhancement, feature enhancement and model compensation\enhancement aiming at the signals space, feature space and model space of ASR system will be presented in this thesis. The main research work and the innovation points are shown as follows:
     A dynamic noise power spectrum estimation method and an improved wiener filter algorithm were proposed, which utilize band-partitioning spectral entropy to achieve accurate and robust speech endpoint detection. Furthermore, in non-speech segment, noise power spectrum can be estimated frame by frame and it will be weighted with the previous one to calculate the prior Signal Noise Ratio (SNR) instead of the fixed noise power spectrum. Experimental results reveal that the proposed speech enhancement algorithm can improve recognition ratio on ASR system.
     A denoising algorithm based on multi-order autocorrelation was studied, which is used to retain the structure of speech frequency spectrum while suppressing the noise. It is a fact that the multi-order autocorrelation sequence is not severely affected by noise; therefore, the observation sequence after the multi-order autocorrelation can be utilized to suppress the noise instead of the noisy speech sequence. The inferential process of it has been given. Moreover, speech recognition experiment and results analysis under the different autocorrelation orders has been carried on.
     A robust speech features extraction algorithm based on Independent Component Analysis (ICA) was proposed, which is used to resolve the mismatch between training features and testing features in convolutive noise environment. In order to achieve this function, noisy speech signals are firstly converted from time-domain to frequency-domain via Short Time Fourier Transform (STFT), then a complex ICA algorithm is used to acquire short-time spectrum of speech signal from that of noisy speech signal, furthermore, Mel Frequency Cepstral Coefficients (MFCC) and its first-order differential coefficients are computed in accordance with the separated speech signals frequency spectrum. Experimental results reveal that the speech features based on frequency-domain ICA have a good robust performance.
     A permutation alignment algorithm based on Dynamic Time Warping (DTW) was proposed. It is used to eliminate permutation ambiguity in speech signal frequency-domain ICA algorithm. It is a fact that the adjoining frequency bin signal has a high similarity, and the algorithm can use dynamic time warping technology to match these adjoining frequency bins signal. Consequently, the positions can be adjusted according to the output. Experimental results show the proposed algorithm can reduce the errors of permutation and improve the quality of separated speech.
     The fundamental principle of the Parallel Model Combination (PMC) algorithm was studied. And its realization process has been also inferred in additive and convolution noise environment. In addition, a noise spectrum estimation method based on double channels was proposed. The algorithm firstly separates the short-time spectrum of speech and noise by frequency-domain ICA algorithm in reference channel, and then noise short-time spectrum is achieved by subtracting the estimated "clean" speech short-time spectrum from the noisy speech in main channel. Experimental results validated the accuracy of estimated noise signal and proved that the proposed algorithm could improve robustness of ASR system in noise environment.
     In another way, as a matter of fact, the recognition ratio of the traditional whole frequency band HMM will decrease when partial frequency bands are corrupted by noise. In order to solve this problem, a hybrid parallel sub-bands Hidden Markov Model (HMM) and neural network (NN) model were proposed. The algorithm firstly splits the whole frequency band HMM into a few sub-bands HMM, in which different speech recognizers can be independently applied. And then, some new feature parameters can be extracted according to all sub-bands HMM outputs. Finally, these new feature parameters are merged by the neural network in order to yield a global recognition decision. The results show that the proposed model can provide better robustness in the case of noisy speech.
引文
[1]Xiaoping Wang, Yufeng Hao, Degang Fu, Chunwei Yuan. Audio-Visual Automatic Speech Recognition for Connected Digits[C]. Intelligent Information Technology Application, IITA'08. Second International Symposium on, Page: 328-332,2008.
    [2]Dautrichr, L Rabiner, T Martin. On the Effects of Varying Filter Bank Parameters on Isolated Word Recognition[J]. IEEE Trans. On ASSP,1983; 31(4):793-806.
    [3]Lockwood, P. Alexandre. Experiments with a nonlinear spectral subtract and hidden Markov models and the projection for robust speech recognition in cars [J]. Speech Communication,1992; 11 (2-3):215-228.
    [4]Das, R. Bakis, A. Nadas, et al. Influence of background noise and microphone on the performance of the IBM TANGORA speech recognition system[C], In: Proceedings of IEEE International Conference on Acoustics. Speech Signal Processing, Vol II,71-74,1993.
    [5]Xiong Xiao, Jinyu Li, Eng Siong Chng, Haizhou Li and Chin-Hui Lee. A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2010; 18(6):1158-1169.
    [6]K. H. Davis, R. Biddulph, et al. Automatic recognition of spoken digits[J], Journal of Acoustic Society of America,1952; 24(6):637-642.
    [7]Vintsyuk T K. Speech discrimination by dynamic programming[J]. kibernetika, 1968,4(2):81-88.
    [8]J. Makhoul. Linear prediction:A tutorial review[J]. Proceedings. of the IEEE,19 75; 63:561-580.
    [9]Yu, K.; Mason, J.; Oglesby, J.. Speaker recognition using hidden Markov models, dynamic time warping and vector quantization [J], Vision, Image and Signal Processing, IEE Proceedings,1995; 142(5):313-318.
    [10]K. F. Lee, H. W. Hon, D. R. Reddy. An overview of the Sphinx speech recognition system[J]. IEEE Trans on Acoust Speech and Signal Processing,1990; 38:600-610.
    [11]刘海滨.鲁棒性语音识别的特征提取与模型自适应算法研究[D].东南大学 博士学位论文,2004.
    [12]Bobillet, W., Grivel, E., Guidorzi, R., Najim, M.. Cancelling convolutive and additive coloured noises for speech enhancement[C]. Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP'04). IEEE International Conference on, Page(s):ii-777-80 vol.2,2004.
    [13]Bobillet, W., Grivel, E., Guidorzi, R., Najim, M.. Cancelling convolutive and additive coloured noises for speech enhancement[C]. Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP'04). IEEE International Conference on, Page(s):ii-777-80 vol.2,2004.
    [14]王炳锡,屈丹,彭煊.实用语音识别基础[M].北京,国防工业出版社,2005.
    [15]韩纪庆,张磊,郑铁燕.语音信号处理[M].北京.清华大学出版社,2008.
    [16]Baum L E, Petrie T, Souls G, et al. A maximization technique occurring in the statically analysis of probabilistic functions of Markov chains [J], Annals of Mathematical Statistics,1970; 41(1):164-171.
    [17]J. K. Baker. The dragon system-An overview [J]. IEEE Trans. Acoustic, Speech, Signal,1975; 23(1):24-29.
    [18]F. Jelinek. Continuous speech recognition by statically methods [J]. Proceedings of the IEEE,1976; 64:532-536.
    [19]L. R. Rabiner. A tutorial on hidden Markov models and selected application in speech recognition[J], Proceedings of the IEEE,1989; 77 (2):257-286.
    [20]Rabiner, L., Juang, B. H., Fundamentals of Speech Recognition[M]. Englewood Cliffs, NJ:Prentice-Hall,1993.
    [21]易克初,田赋,付强.语音信号处理[M].长沙:国防工业出版社,2000.
    [22]Zhi-Jie Yan; Soong, F.K.; Ren-Hua Wang; Word Graph Based Feature Enhancement for Noisy Speech Recognition[C], Acoustics, Speech and Signal Processing. ICASSP 2007. IEEE International Conference on,4:IV-373-IV-376, 2007.
    [23]Jianping Deng, Bouchard, M., Tet Hin Yeap. Feature Enhancement for Noisy Speech Recognition with a Time-Variant Linear Predictive HMM Structure [J]. Audio, Speech, and Language Processing, IEEE Transactions on,2008,16(5): 891-899.
    [24]B.A.Dautrich, L.R.Rabiner and T.B.Martin, On the Effects of Filter Bank Parameters on Isolated Word Recognition[J]. IEEE Trans on Acoustics, Speech, and Signal Processing,1983; 31(4):793-807.
    [25]G..M.White and R.B.Neely. Speech Recognition Experiments with Linear Prediction, Bandpass Filtering, and Dynamic Programming[J]. IEEE Trans on Acoustics, Speech, and Signal Processing,1976; 13(24):183-188.
    [26]Zwicker E. Masking and psychological excitation as consequences of the ear's frequency analysis [J]. In:Plomp R, Smoorenburg GF(eds) Frequency Analysis and Periodicity Detection in Hearing. Leiden, The Netherlands, AW Sijthoff, pp 376-394,1970.
    [27]L R Rabiner,B H Juang, S E Levinsion, et al. Recognition of Isolated Digit Using Hidden Markov Model with Continuous Mixture Densities[J]. AT&T Bell Lab, Tech,64(6):1211-1233,1985.
    [28]L R Bahl, P F Brown, P V deSouza, et al. A new algorithm for the estimation of Hidden Markov model parameters[C]. IEEE Proc ICASSP-88. New York,1988.
    [29]C. T. Lin, J. Y. Lin, and G. D. Wu, A robust word boundary detection algorithm for variable noise-level environment in cars[J], Intelligent Transportation Systems, IEEE Transactions on,2002; 3(1):789-101.
    [30]Benesty, J., Jingdong Chen, Huang, Y.A.. Noise Reduction Algorithms in a Generalized Transform Domain[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2009; 17(6):1109-1123.
    [31]Kalinli, O., Seltzer, M.L., Droppo, J., Acero, A.. Noise Adaptive Training for Robust Automatic Speech Recognition[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2010; 18(8):1889-1901.
    [32]赵力.语音信号处理[M].北京,机械工业出版社.2005.
    [33]P.Vary, Noise suppression by spectral magnitude estimation-mechanism and theoretical limits[J], Signal Processing,1985; 8(4):387-400.
    [34]Ning Wang, Ching, P.C., Nengheng Zheng, Tan Lee. Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2011; 19(1):196-205.
    [35]Robert D Preuss. A Frequency Domain Noise Canceling Preprocessor for narrowband Speech Communications Systems[J]. IEEE Trans on Acoustics, Speech and Signal Processing,1979; 27(4):212-215.
    [36]M Berouti. Enhancement of Speech Corrupted by Acoustic Noise[J]. IEEE Trans on Acoustics, Speech and Signal Processing,1979; 27(4):208-211.
    [37]B. Widrow and S.D. Stearns. Adaptive Signal Processing[J]. Englewood Cliffs, NJ:Prentic Hall,1985; p491.
    [38]W.Etter and G.S. Moschytz, Noise reduction by noise-adaptive spectral magnitude expansion[J], J.Audio Eng. Soc.,1994; 42(5):341-349.
    [39]Jingdong Chen, Benesty, J., Yiteng Huang, Doclo, S.. New insights into the noise reduction Wiener filter[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2006; 14(4):1218-1234.
    [40]Fingscheidt, T., Suhadi, S., Stan, S.. Environment-Optimized Speech Enhancement[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2008; 16(4):825-834.
    [41]McAulay, R.; Malpass, M.. Speech enhancement using a soft-decision noise suppression filter[J], Acoustics, Speech and Signal Processing, IEEE Transactions on,1980; 28(2):137-145.
    [42]A. Agarwal and Y. M. Cheng. Two-stage mel-warped wiener filter for robust speech recognition[C]. In International Workshop on Automatic Speech Recognition and Understanding (ASRU'1999),1999.
    [43]Suhadi, S., Last, C., Fingscheidt, T.. A Data-Driven Approach to A Priori SNR Estimation[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2011; 19(1):186-195.
    [44]D. Malah and R. V. Cox. A generalized comb filtering technique for speech enhancement[C]. In ICASSP; Page(s) 160-163,1982.
    [45]Hu, H.T.. Robust pitch estimation based on modified comb filtering approach[J]. Electronics Letters,2007; 43(25):1471-1472.
    [46]Wen Jin, Xin Liu, Scordilis, M.S., Lu Han. Speech Enhancement Using Harmonic Emphasis and Adaptive Comb Filtering[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2010; 18(2):356-368.
    [47]M. R. Sambur. Adaptive noise canceling for speech signals[J]. IEEE Trans. on Acoustics, Speech and Signal Processing,1978; 26 (5):419-423.
    [48]Sven Nordholm, Hai Quang Dam, Nedelko Grbic and Siow Yong Low. Adaptive Microphone Array Employing Spatial Quadratic Soft Constraints and Spectral Shaping[J]. Signals and Communication Technology,2005; 229-246.
    [49]J. S. Lim and A. V. Oppenheim. All-pole modeling of degraded speech[J]. IEEE Trans. Acoustic. Speech Signal Proc,1978; 26 (3):197-210.
    [50]B. R. Music and J. S. Lim. An iterative technique for maximum likelihood parameter estimation on noisy data[C]. In ICASSP, Page(s) 224-227,1979.
    [51]J. H. L. Hansen and M. A. Clements. Constrained iterative speech enhancement with application to speech recognition[J]. IEEE Trans. Signal Proc,1991; 39(4): 795-805.
    [52]K. K. Paliwal and A. Basu. A speech enhancement method based on kalman filtering[C]. In ICASSP,1987.
    [53]Zenton Goh, Kah-Chye Tan, and B. T. G. Tan. Speech enhancement based on a voiced-unvoiced speech model[C]. In ICASSP,1998.
    [54]A. S. Bregman. Auditory Scene Analysis[M]. MIT Press, London,1990.
    [55]Rouat, J.. Computational Auditory Scene Analysis:Principles, Algorithms, and Applications[J], J. Acoust. Soc. Am.,2008; 124(1):13-13.
    [56]Yang Shao, DeLiang Wang. Robust speaker identification using auditory features and computational auditory scene analysis[C]. ICASSP 2008. IEEE International Conference on Acoustics, Speech and Signal Processing, Page(s):1589-1592, 2008.
    [57]Yi Li, Ying-Le Fan, Qin-Ye Tong. Endpoint detection in noisy environment using complexity measure[C]. Wavelet Analysis and Pattern Recognition. ICWAPR'07. International Conference on, Page(s):1004-1007,2007.
    [58]Xue-Dan Tan, Ji-Hua Gu, He-Ming Zhao, Zhi Tao. A Noise Robust Endpoint Detection Algorithm for Whispered Speech Based on Empirical Mode Decomposition and Entropy[C]. Intelligent Information Technology and Security Informatics (IITSI),2010 Third International Symposium on, Page(s):355-359, 2010.
    [59]J. L. Shen, J. W. Hung, and L. S. Lee, Robust entropy-based endpoint detection for speech recognition in noisy environments[C], presented at the ICSLP,1998.
    [60]Bing-Fei Wu, Kun-Ching Wang. Robust Endpoint Detection Algorithm Based on the Adaptive Band-Partitioning Spectral Entropy in Adverse Environments [J]. IEEE Transactions on Speech and Audio Processing (S1063-6676),2005; 13(5): 762-775.
    [61]Molina, C., Yoma, N.B., Huenupan, F., Garreton, C., Wuth, J.. Maximum Entropy-Based Reinforcement Learning Using a Confidence Measure in Speech Recognition for Telephone Speech[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2010; 18(5):1041-1052.
    [62]L. Lamel, L. Labiner, A. Rosenberg, and J. Wilpon, An Improved Endpoint Detector for Isolated Word Recognition[J], Acoustics, Speech and Signal Processing, IEEE Transactions on,1981; 29(4):777-785.
    [63]M.H. Savoji, A Robust Algorithm for Accurate Endpointing of Speech[J], Speech Communication,1989; 8(1):45-60.
    [64]J.C. Junqua, B. Mak, and B. Reaves, A Robust Algorithm for Word Boundary Detection in the Presence of Noise[J], IEEE Trans. on Speech and Audio Processing,1994; 2(3):406-412.
    [65]J. L. Shen, J. W. Hung, and L. S. Lee, Robust entropy-based endpoint detection for speech recognition in noisy environments[C], presented at the ICSLP,1998.
    [66]S. Kullback, Information Theory and Statistics[M], Wiley, New York,1959.
    [67]Spriet, A., Moonen, M.; Wouters, J.. Robustness analysis of multichannel Wiener filtering and generalized sidelobe cancellation for multimicrophone noise reduction in hearing aid applications [J]. Speech and Audio Processing, IEEE Transactions on,2005; 13(4):487-503.
    [68]Tinston, M., Ephraim, Y.. Speech enhancement using the multistage Wiener filter. Information Sciences and Systems[C]. CISS 2009.43rd Annual Conference on, Page(s):55-60,2009.
    [69]Scalart, P. Filho, J.V. Speech enhancement based on a priori signal to noise estimation[C]. Acoustics, Speech and Signal Processing, ICASSP-96. Conference Proceedings.,1996 IEEE International Conference on, page(s):629-632 vol.2, 1996.
    [70]Ephraim, Malah, Speech Enhancement Using MMSE Short-Time Spectral Amplitude Estimator[C], IEEE Trans. On ASSP, vol.32, Dec.1984.
    [71]Yang. Frequency domain noise suppression approaches in mobile systems[C]. In ICASSP,1993.
    [72]Hansen, J.H.L., Radhakrishnan, V., Arehart, K.H.. Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2006; 14(6):2049-2063.
    [73]Javier Hernando, Climent Nadeu. Linear Prediction of One-sided Autocorrelation Sequence for Noisy Speech Recognition. IEEE Transaction on Speech and Audio Processing,1997,5(1):80-84.
    [74]黄新宇,吴淑珍.基于单边自相关线性预测噪声中汉语语音识别[J],北京大 学学报(自然科学版),2000;36(5),672-680.
    [75]LI Deng, Jasha Droppo, Alex Acero. Log-domain speech feature enhancement using sequential MAP noise estimation and a phase-sensitive model of the acoustic environment[C]. In ICSLP,2002.
    [76]Ikbal, Misra S, Bourlard H, H. Phase autocorrelation (PAC) derived robust speech features[C]. In:Proc. ICASSP; 2003(2):133-136.
    [77]Bhiksha Raj, Michael L. Seltzer, Richard M. Stern. Reconstruction of missing features for robust speech recognition[J]. Speech Communication,2004; 4(3): 275-296.
    [78]ZHEN Bin and WU Xihong. On the importance of components of the MFCC in speech and speaker recognition[J]. Acta Scientiarum Naturalium Universitatis Pekinensis,2001; 37(3):371-378.
    [79]Han J, Han M, Park G B. Relative mel-frequency cepstral coefficients compensation for robust telephone speech recognition[C]. In:Proc. European Conf. on Speech Communication and Technology; 1997(3):1531-1543.
    [80]Windmann, S.; Haeb-Umbach, R.. Approaches to Iterative Speech Feature Enhancement and Recognition[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2009; 17(5):974-984.
    [81]Alejandro Acero. Acoustical and Environmental Robustness in Automatic Speech Recognition[D]. PhD thesis, Carnegie Mellon University,1990.
    [82]王智国,吴及,戴礼荣等.一种对加性噪声和信道函数联合补偿的模型估计方法[J].声学学报,2008;33(3):238-243.
    [83]Atal B. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification[J]. Journal of Acoustic Society of America,1974; 55(12):1304-1312.
    [84]张华,冯大政,庞继勇.卷积混迭语音信号的联合块对角化盲分离方法[J].声学学报,2009;34(2):168-174.
    [85]徐舜,陈绍荣,刘郁林.基于非线性时频掩蔽的语音盲分离方法[J].声学学报,2007;32(4):375-381.
    [86]Aapo Hyvarinen, Juha Karhunen, Erkki Oja.独立成分分析[M],北京,电子工业出版社,2007.
    [87]Jen-Tzung Chien, Bo-Cheng Chen. A new independent component analysis for speech recognition and separation[J]. IEEE transactions on audio, speech and language processing,2006; 14(4):1245-1253.
    [88]Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino. Frequency Domain Blind Source Separation for Many Speech Signals[J]. Computer Science,2004; 3195(2004),461-469.
    [89]Matsuoka, K., Ohba, Y., Toyota, Y., Nakashima, S. Blind separation for convolutive mixture of many voices[C]. In:Proc. IWAENC 2003,279-282,2003.
    [90]Atal. B. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification[J]. Journal of Acoustic Society of America,1974; 5(5):1304-1312.
    [91]A. Acero and R. M. Stern. Environmental robustness in automatic speech recognition[C]. In ICASSP, pages 849-852,1990.
    [92]F. H. Li u, A. Acero and R. M. Stern. Efficient joint compensation of speech for the effect of additive noise and linear filtering[C]. In ICASSP, Page(s) 865-868, 1992.
    [93]Pedro J. Moreno. Speech Recognition in Noisy Environments[D]. PhD thesis, Carnegie Mellon University,1996.
    [94]Kandel E R, Schwartz J H, Jessell T M, et al. Principles of neural science[M], third edition. Elsevier Science Publishing Co, Inc,1991,481-498.
    [95]Xuewen Luo, Ing Yann Soon, Chai Kiat Yeo. An auditory model for robust speech recognition[C]. Audio, Language and Image Processing, ICALIP 2008. International Conference on,1105-1109,2008.
    [96]Koniaris, C., Chatterjee, S., Kleijn, W.B.. Selecting static and dynamic features using an advanced auditory model for speech recognition[C]. Acoustics Speech and Signal Processing (ICASSP),2010 IEEE International Conference on,4342-4345,2010.
    [97]Shamma S A. Speech processing in the auditory system I:the representation of speech sounds in the responses of the auditory nerve[J]. J A coust Soc Am,1985; 78(5):1612-1621.
    [98]Shamma S A. Speech processing in the auditory system II:lateral inhibition and the central processing of speech evoked activity in the auditory nerve[J]. J A coust Soc Am,1985; 78(5):1622-1632.
    [99]H. Hermansky. Perceptual Linear Predictive (PLP) Analysis for Speech[J]. J. A coust Soc Am,1990; 87(4):1738-1752.
    [100]陈永彬,王仁华.语音信号处理[M].合肥:中国科学技术大学出版社,1990.
    [101]L M Arslan, J H L Hansen. Frequency Characteristics of Foreign Accented Speech[C]. In Proceedings of International Conference on Acoustics, Speech and Signals Proceeding 97, Page(s):1123-1127,1997.
    [102]Alejandro Acero. Acoustical and Environmental Robustness in Automatic Speech Recognition[D]. PhD thesis, Carnegie Mellon University,1990.
    [103]Lionel Delphin-Poulat, Chafic Mokbel, and Jerome Idier. Frame synchronous stochastic matching based on the kullback-leibler information[C]. In ICASSP, 1998.
    [104]Kaisheng Yao, Kuldip K. Paliwal, and Satoshi Nakamura. Noise adaptive speech recognition in time-varying noise based on sequential kullback proximal algorithm[C]. In ICASSP,2002.
    [105]Kaisheng Yao., Netsch, L., Viswanathan, V.. Speaker-Independent Name Recognition Using Improved Compensation and Acoustic Modeling Methods for Mobile Applications[C]. Acoustics, Speech and Signal Processing. ICASSP 2006 Proceedings.2006 IEEE International Conference on,2006, (1):I-I.
    [106]吴小培,冯焕清,周荷琴,王涛.独立分量分析及其在脑电信号预处理中的应用[J].北京生物医学工程,2001;20(1):35-37.
    [107]吴小培,李晓辉,冯焕清,周荷琴.基于盲源分离方法的工频干扰消除[J].信号处理,2003;19(1):81-84.
    [108]Aapo Hyvarinen, Juha Karhunen, Erkki Oja.独立成分分析[M],北京,电子工业出版社,2007.
    [109]Marossero,D.E, Erdogmus,D, Euliano,N, Principe, J.C, Hild,K.E. Independent components analysis for fetal electrocardiogram extraction:a case for the data efficient Mermaid algorithm[C]. Neural Networks for Signal Processing, NNSP'03, IEEE 13th Workshop on, Page(s):399-408,2003.
    [110]杨福生等.独立分量分析及其在生物医学工程中的应用[J].国外医学生物医学工程分册,2000;23(3):129.
    [111]吴小培,叶中付,郭晓静,张道信,胡人君.基于滑动窗口的独立分量分析算法[J].计算机研究与应用,2007;44(1):185-191.
    [112]Olsson, R.K., Hansen, L.K.. Blind Separation of More Sources than Sensors in Convolutive Mixtures[C]. Acoustics, Speech and Signal Processing. ICASSP 2006 Proceedings.2006 IEEE International Conference on,2006, (5):Ⅴ-Ⅴ.
    [113]Benabderrahmane, Y., Ben Salem, A., Selouani, S.-A., O'Shaughnessy, D.. Blind speech separation using high order statistics[C]. Electrical and Computer Engineering. CCECE'09. Canadian Conference on, Page(s):670-673,2009.
    [114]张超.基于独立分量分析的语音信号盲解卷积研究[D].硕士论文,2009.
    [115]H. Li and T. Adali, Gradient and fixed-point complex ICA algorithms based on kurtosis maximization[C], Machine Learning for Signal Processing. Proceedings of the 2006 16th IEEE Signal Processing Society Workshop on; Page(s):85-90,2006.
    [116]Bingham E, Hyvarinen A, A fast fixed-point algorithm for independent component analysis of complex valued signals [J], Int. J.Neural Systems,2000; 10(1):1-8.
    [117]Anthony J. Bell, Terrence J. Sejnowski. An Information-Maximization Approach to Blind Separation and Blind Deconvolution[J], Neural Computation, 1995; 7(6):1129-1159.
    [118]Chih-Chieh Cheng, Fei Sha, Saul, L.K. Large-margin feature adaptation for automatic speech recognition[C], Automatic Speech Recognition & Understanding. ASRU 2009. IEEE Workshop on, Page(s):87-92,2009.
    [119]Sawada H, Mukai.R, Araki S. et al. A robust and precise method for solving the permutation problem of frequency-domain blind source separation[J]. IEEE Trans.on Speech and Audio Processing,2004; 12(15):530-538.
    [120]吴逊.基于独立成分分析的特征提取方法研究[D].厦门:厦门大学,2007.
    [121]Ikeda S, Murata N. A method of ICA in time-frequency domain[C], In:Proc. ICA99; Page(s):365-370,1999.
    [122]杨福生,洪波.独立分量分析的原理与应用[M].北京:清华大学出版社,2006.
    [123]D.Sankoff and J. B. Kruskal:Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison[M]. Addison-Wesley, Reading, MA,1983.
    [124]J. FRANKE. A Levinson-Durbin recursion for autoregressive-moving average processes [J]. Biometrika,1985; 72(3):573-581.
    [125]Furui, S., Digital Speech Processing, Synthesis, and Recognition[M]. Marcel Dekker,1989.
    [126]Yan Y H. Development of an Approach to Language Identification Based on Language-dependent Phone Recognition[D]:(Ph. D. Thesis). Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Technology,1995.
    [127]Martin Cooke, Phil Green, Ljubomir Josifovski and Ascension Vizinho, Robust automatic speech recognition with missing and unreliable acoustic data[J], Speech Communication,2001; 34(3):267-285.
    [128]Jia-Ching Wang, Hsiao-Ping Lee, Jhing-Fa Wang, Cai-Bei Lin. Robust Environmental Sound Recognition for Home Automation[J], Automation Science and Engineering, IEEE Transactions on,2008; 5(1):25-31.
    [129]Tan, S.S. Ahmad, A.M. Adaptive Parallel Model Combination for reduced environmental mismatch in noisy speech recognition[C]. Electronic Design. ICED 2008. International Conference on, Page(s):1-6,2008.
    [130]丁沛.语音识别中的抗噪声技术[D].清华大学工学博士学位论文,2003.
    [131]M J F Gales, S J Young. Robust Continuous Speech Recognition Using Parallel Method Combination [J]. IEEE Trans. Speech and Audio Proc,1996; 4: 352-359.
    [132]Y Minami, S Furui. A maximum Likelihood Procedure for a Universal Adaptation Method Based on HMM Composition[C]. Proc. of ICASSP,1995.
    [133]A P Varga, R K Moore. Hidden Markov Model Decomposition of Speech and Noise[C], ICASSP'90,1990.
    [134]Tan, S.S., Ahmad, A.M.. Adaptive Parallel Model Combination for reduced environmental mismatch in noisy speech recognition, Electronic Design[C]. ICED 2008. International Conference on, Page(s):1-6,2008.
    [135]Veisi, H., Sameti, H.. An improved parallel model combination method for noisy speech recognition[C]. Automatic Speech Recognition & Understanding. ASRU 2009. IEEE Workshop on, Page(s):237-242,2009.
    [136]Tan, S.S., Ahmad, A.M.. Adaptive Parallel Model Combination for reduced environmental mismatch in noisy speech recognition[C]. Electronic Design. ICED 2008. International Conference on, Page(s):1-6,2008.
    [137]J L Gauvain, C H Lee. MAP Estimation of Continuous Density HMM: Theory and Applications[M]. DARPA Sp.& Nat. Lang. Workshop,1992.
    [138]C J Leggetter, P C Woodland. Maximum likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models[J]. Computer Speech and Language,1995; 9:171-185.
    [139]Digalakis, V., Neumeyer, L.. Speaker adaptation using combined transformation and Bayesian methods[C]. Acoustics, Speech, and Signal Processing. ICASSP-95.,1995 International Conference on, (1):680-683,1995.
    [140]J I Takahashi, S Sagayama. Vector-filed-smoothed Bayesian Learning for Fast and Incremental Speaker/Telephone-channel Adaptation[J]. Computer Speech and Language,1997; 11:127-146.
    [141]V Digalakis. On-line Adaptation of Hidden Markov Models Using Incremental Estimation Algorithms [J]. IEEE Trans. Speech Audio Processing, 1999; 7(3):253-261.
    [142]Jen-Tzung Chien, Chih-Hsien Huang. Online speaker adaptation based on quasi-Bayes linear regression[C]. Acoustics, Speech, and Signal Processing. Proceedings. (ICASSP'01).2001 IEEE International Conference on, (1):329-332 vol.1,2001.
    [143]Shigeki Sagayama, Yoshikazu, Satoshi Takahashi, and Jun-Ichi Takahashi. Jacobian approach to fast acoustic model adaptation[C]. In ICASSP,1997.
    [144]Sagayama, S. Yamaguchi, Y. Takahashi, S.. Jacobian adaptation of noisy speech models[C]. Automatic Speech Recognition and Understanding. Proceedings.,1997 IEEE Workshop on, Page(s):396-403,1997.
    [145]Parssinen, K., Salmela, P., Harju, M., Kiss, I.. Comparing Jacobian adaptation with cepstral mean normalization and parallel model combination for noise robust speech recognition[C]. Acoustics, Speech, and Signal Processing. Proceedings. (ICASSP'02). IEEE International Conference on, Page(s):1-193-1-196 vol.1,2002.
    [146]Yuan-Fu Liao, Jeng-Shien Lin, Sin-Horng Chen. A mismatch-aware stochastic matching algorithm for robust speech recognition[C]. Acoustics, Speech, and Signal Processing. Proceedings. (ICASSP'03).2003 IEEE International Conference on, (2):11-101-4 vol.2,2003.
    [147]A Sankar, C H Lee. A maximum Likelihood Procedure for a Universal Adaptation Method Based on HMM Composition[C]. Proc. of ICASSP,1995.
    [148]Yu Tsao, Chin-Hui Lee. An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2009; 17(5):1025-1037.
    [149]Young S. Large Vocabulary Continuous Speech Recognition:a Review[C]. In:Proc. Of the IEEE Workshop on Automatic Speech Recognition and Understanding, Utah:Snowbird, Page(s):3-28,1995.
    [150]胡广舒.数字信号处理[M],北京:清华大学出版社,2003.
    [151]Yao K S, Bertram E Shi, Fung P, et al. Log-Add approximation on-line noise compensation method adopting robust decision rules for robust digits recognition [J].Chinese Journal of Electronics,2000; 9 (3):278-283.
    [152]彭煊,刘金福,王炳锡.基于独立分量分析的语音增强[J].信号处理,2002;18(5):477-479.
    [153]陈健,陆佶人.噪声背景下双输入时延混合系统的盲源分离.声学学报[J],2002;27(4):477-480.
    [154]Yu Takahashi, Tomoya Takatani, Keiichi Osako, Hiroshi Saruwatari, Kiyohiro Shikano. Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment[J]. IEEE transactions on audio, speech, and language processing,2009; 17(4):650-664.
    [155]Boll, S. Suppression of acoustic noise in speech using spectral subtraction[J], Acoustics, Speech and Signal Processing, IEEE Transactions on,1979; 27(2): 113-120.
    [156]S. Kamath and P. Loizou. A multi-band spectral subtraction method for enhancing speech corrupted by colored noise[C], In Proceedings International Conference on Acoustics, Speech and Signal Processing,2002.
    [157]雷鸣,李学仁,李果.飞机舱音背景声下的鲁棒语音端点检测[J].振动与冲击,2008;27(10):83-86.
    [158]Ishi, C.T., Matsuda, S., Kanda, T., Jitsuhiro, T., Ishiguro, H., Nakamura, S., Hagita, N.. A Robust Speech Recognition System for Communication Robots in Noisy Environment[J]s. Robotics, IEEE Transactions on,2008; 24(3):759-763.
    [159]Kalinli, O., Seltzer, M.L., Droppo, J., Acero, A.. Noise Adaptive Training for Robust Automatic Speech Recognition[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2010; 18(8):1889-1901.
    [160]Mallat S G, Hwang W L.singularity. Detection and Processing With Wavelets[J]. IEEE Trans Information Theory,1992; 38(2):617-634.
    [161]H. Bourlard and S. Dupont. A new ASR approach based on independent processing and recombination of partial frequency bands[C]. In Proc. Int. Conf. on Spoken Language Processing, Page(s) 426-429, Philadelphia,1996.
    [162]H. Hermansky, S. Tibrewala, and M. Pavel. Towards ASR on partially corrupted speech[C]. In Proc. In Proc. Int. Conf. on Spoken Language Processing, Page(s):1579-1582, Philadelphia,1996.
    [163]Primor, D., Furst-Yust, M.. Sub-band speech recognition[C]. Electrical and Electronics Engineers in Israel. The 22nd Convention of, Page(s):10-12,2002.
    [164]Ghaemmaghami, M.P., Sameti, H., Razzazi, F., BabaAli, B., Dabbaghchian, S.. Robust speech recognition using MLP neural network in log-spectral domain[C]. Signal Processing and Information Technology (ISSPIT),2009 IEEE International Symposium on, Page(s):467-472,2009.
    [165]Feng S Y. Method and application of multi-objective decision theory[M]. Wuhan:Huazhong University of Science and Technology Press.136-155,1990.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700