语音增强相关问题研究

英文题名：Research of Several Problems on Speech Enhancement Systems
作者：方瑜
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：语音增强 ; 先验信噪比 ; 语音激活检测 ; 短时谱估计
英文关键词：speech enhancement ; a priori signal-to-noise ratio ; speech
英文关键词：voice detection ; short time spectral estimation
学位年度：2012
导师：郭军
学科代码：081002
学位授予单位：北京邮电大学
论文提交日期：2011-05-08

摘要

在语音通信中,纯净语音的编解码技术、传输技术和识别技术都已经非常发达,但在背景噪声和信道噪声存在的情况下,信号处理系统的性能都会急剧下降,并最终严重影响语音信号质量。尽管在该方面的研究已经进行了多年,在稳态的噪声环境或随时间变化缓慢的噪声环境中取得了一些成果,但是在随时间变化极快的复杂的非平稳噪声环境下,现有的技术还存在很多缺陷,经过去噪处理的语音信号的质量和可懂度都会受到严重影响,与纯净语音在信号处理系统中的性能相差甚远。针对上述问题,本文重点关注语音的前端处理过程,对语音增强及相关问题信息研究,具体的工作及创新包括以下方面：
     1.对已有的语音增强算法进行详细讨论,并根据接收端所使用的麦克风数目,将语音增强算法分为单通道语音增强算法和多通道语音增强算法。对其进行详细介绍和研究,并以此作为进行深入研究的理论基础。
     2.先验信噪比是语音增强算法中的重要参数,先验信噪比估计对语音增强系统的性能起决定性作用。为了解决现有先验信噪比算法中存在的时延问题和平滑因子不能与噪声环境变化自适应的问题,进而导致增强后语音具有混响或失真,使得语音质量、清晰度与可懂度受损的情况,提出了基于后验补偿的先验信噪比估计算法。该算法综合考虑了预估的先验信噪比和后验信噪比对信噪比估计的影响,弥补了后验信噪比估计具有抖动的不足和先验信噪比估计具有时延的不足；同时,联合考虑帧间相关性和频率点间相关性,针对不同的帧计算相应的平滑因子,进而对带噪语音进行不同程度的平滑处理,使得新的先验信噪比算法能够与噪声环境自适应；最后将更新后的先验信噪比用于语音增强系统中,并对其性能进行仿真。本论文提出的先验信噪比估计算法可以广泛用于基于短时谱估计的语音增强系统中,处理后的语音质量、清晰度与可懂度均有所提高,且更加适用于非平稳的噪声环境。
     3.频域内语音增强算法因其计算复杂度低,是目前最为常用的算法。针对在现有的频域算法中,由于频域内语音和噪声信息混合在一起,不能以此有效判断幅度的变化来自于语音或者噪声,从而使频域内的平滑算法容易因误判而产生短时的频谱尖峰,由此导致音乐噪声和语音失真等缺点,提出了基于倒谱平滑的语音增强算法。该算法针对信号在倒谱域内的特点,将信号由频域变换至倒谱域中,分别对信号的包络、细节特征和噪声进行不同程度的平滑,在尽可能消除频谱尖峰、保护语音起始段信息和低能量语音信号的基础上对噪声进行抑制。所提的算法可广泛用于现有的频域内语音增强算法和先验信噪比估计算法中,可有效抑制音乐噪声,保护语音的特性,提高语音增强系统的性能,解决现有算法在低信噪比和非平稳噪声环境中性能恶化的问题。
     4.语音激活检测技术是语音增强系统的第一步,也是其重要组成部分,语音激活检测技术的准确度直接影响了语音增强系统的性能。本文在对语音激活检测算法进行分类、研究和比较的基础上,探讨传统基于直接判决似然比测试的语音激活检测算法存在的问题。针对采用了固定判决门限而使得这类算法不能很好跟踪噪声变化情况,当受噪声影响比较严重时或者在非平稳噪声环境中,容易导致误判的问题进行改进,提出了基于信噪比自适应的似然比测试的语音激活检测算法。该算法分别根据每一帧中噪声成分的多少设定相应的判决门限,进而进行判决。实验证明,本文算法适用于现有基于似然比测试语音激活检测算法,能够提高语音激活检测算法性能,有效解决了已有基于似然比测试语音激活检测算法容易在非平稳环境下产生的误判问题。
Encoding and decoding, transmission and speech recognition technologies of pure speech have been well developed in the speech communication system. However, in the circumstances with background noise and channel noise, the performance of signal process system will degrade dramatically, and then impact on speech signal quality. The research on the speech enhancement has lasted for many years, and there are some achievements on better performance in the case of stationary and slow time-varying noise environment. However, in the complicated environment with fast time-varying and non-stationary noise environment, there are many deficiencies by using the existing noise reduction technologies, and it may cause big influence on the speech quality and intelligibility. The performance of enhanced speech in the signal processing system is worse than that of pure speech. Based on the above issues, this thesis mainly focuses on the front-end process of speech in the speech enhancement and related issues.
     1. Based on the discussion of speech enhancement algorithms, we summarized and classified them into two types, single-channel speech enhancement algorithm and multi-channel speech enhancement algorithm. These two algorithms are described in this thesis, and they are the basic theory of further research.
     2. The estimation of the a priori signal-to-noise (SNR) is a crucial part of speech enhancement algorithms. In order to solve the delay issue in the existing a priori SNR estimation algorithms and trace the speech signal in the noise environment, a new a priori SNR estimation algorithm is proposed. It takes the influence of both a priori SNR and posterior SNR to calculate smoothing factor, and can solve the problem of jitter caused by posterior SNR and the problem of delay caused by a priori SNR. The correlation of inter-frame and intra-frame is also considered in the proposed algorithm. Smoothing factor is calculated each frame, and make the noisy speech be handled with different smoothing. At last, the updated a priori SNR is applied to the speech enhancement system, and simulation is made to evaluate the performance in the modified speech enhancement system. The simulation results show that the proposed algorithm improved the performance of speech enhancement system and is better in the non-stationary noise environment. The proposed algorithm can be widely used in the speech enhancement system based on short-time spectrum estimation.
     3. The frequency domain speech enhancement algorithms is one of the most widely used algorithms. The existing frequency domain speech enhancement algorithms do not consider the speech signal, and lead to short-time spectral peaks, which is caused by the smoothing algorithm in frequency domain. The proposed algorithm transforms the related parameters of frequency domain speech enhancement algorithm to the cepstral domain first. Then make different smoothing to the envelope, fine characteristic and noise in order to restrain the spectral peaks, make compensation on the frequency domain algorithm and then restrain the noise while protecting the speech characteristics. The proposed algorithm can be widely applied to the existing frequency domain speech enhancement algorithms and estimation of the a priori SNR, and can effectively decrease the musical noise and improve the performance of speech enhancement system.
     4. Voice activity detection (VAD) is an important enabling technology for a variety of speech-based applications. Considering that VAD based on direct-decision likelihood test cannot well catch the variation of noise, that results in error decisions, we propose a new VAD algorithm based on the adaptive threshold likelihood ratio test. In the proposed algorithm, different decision thresholds are set according to the SNR each frame; then decisions are made using the adaptive threshold. Simulation results show that the proposed algorithm improved the performance of VAD algorithm, and solved the problem of error decisions which exist in the VAD based on direct-decision likelihood test.

引文

[1]韩纪庆,张磊,郑铁然.语音信号处理.北京：清华大学出版社,2004
    [2]Brenner N, Rader C. A new principle for fast Fourier transformation. IEEE Transactions on Acoustics, Speech and Signal Processing,1976,24(3):264-266
    [3]Itakura F. Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustic, Speech and Signal Processing,1975,23(1):67-72
    [4]Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE,1989,77(2):257-285
    [5]Dugad R, Desai U B. A tutorial on hidden Markov models. Technical Report SPANN-96.1. Signal Processing and Artificial Neural Networks Laboratory. Department of Electrical Engineering, Indian Institute of Technology,1996
    [6]Buzo A, Gray Jr A H, Gray R M, Markel J D. Speech coding based on vector quantization. IEEE Transactions on Acoustics, Speech and Signal Processing, 1980,28(5):562-574
    [7]张雄伟,陈亮,杨吉斌.现代语音处理技术及应用.北京：机械工业出版社,2003
    [8]Mao Xia, Chen Lijiang, Fu Liqin. Multi-level speech emotion recognition based on HMM and ANN. In:Proceedings of 2009 WRI World Congress on Computer Science and Information Engineering.2009,7:225-229
    [9]Windmann S, Haeb-Umbach R. Parameter estimation of a state-space model of noise for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing,2009,17(8):1577-1590
    [10]Ostendorf M, Digalakis V, Kimball O. From HMM's to segment models:a unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech Audio Processing,1996,4(5):360-378
    [11]Sun Jingwei, Ding Feng, Wu Yahui. A polynomial segment model based statistical parametric speech synthesis system. In:Proceedings of 2009 International Conference on Acoustics, Speech and Signal Processing.2009: 4021-4024
    [12]Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing,2000,10(1-3):19-41
    [13]Matejka P, Burget L, Schwarz P, Glembek O, Karafiat M, Grezl F. STBU system for the NIST 2006 speaker recognition evaluation. In:Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing.2007: 221-224
    [14]王尔玉,郭武,李轶杰,戴礼荣,王仁华.采用模型和得分非监督自适应的说话人识别.自动化学报,2009,35(3)：167-271
    [15]吴淑珍,赵朝阳.基于听觉模型的客观音质评价方法研究.电子学报,1999,17(7)：1-4
    [16]Kim D S, Tarraf A. Perceptual model for non-intrusive speech quality assessment. In:Proceedings of Acoustics, Speech, and Signal Processing.2004,3:1060-1063
    [17]Zhao J, Hamaker J, Deshmukh N, Ganapathiraju A, Picone J. Fast search algorithms for continuous speech recognition. In:Proceedings of IEEE Spitjeastcon'99.1999:36-39
    [18]Li C F, Siu M H, Jeff S A. Recursive likelihood evaluation and fast search algorithm for polynomial segment model with application to speech recognition. IEEE Transations on Audio, Speech, and Language Processing,2006,14(5): 1704-1718
    [19]梁维谦,原道德,丁玉国.大词表孤立词语音识别的快速搜索算法.清华大学学报(自然科学版),2011,51(1)：101-104
    [20]徐义芳,张金杰,姚开盛.语音增强用于抗噪声语音识别.清华大学学报(自然科学版),2001,41(1)：4144
    [1]Lim S, Oppenheim A V. Enhancement and bandwidth compression of noisy speech [J]. Proceedings of the IEEE,1979,67(12):1586-1604
    [2]赵力.语音信号处理.第2版.北京：机械工业出版社,2010
    [3]Schroeder M R. U.S. Patent No 3180936, April 27,1965
    [4]Schroeder M R. U.S. Patent No 3403224, April 27,1968
    [5]Roll S F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing,1979,27(2):113-120
    [6]Berouti M, Schwartz R, Makhoul J. Enhancement of speech corrupted by acoustic noise. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing,1979:208-211
    [7]Lim J S, Malpass M L. Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE,1979,67(12):1586-1604
    [8]陈紫强.基于先验信噪比参数自适应的频域联合语音增强方法.电子与信息学报,2007 29(2)：439-442
    [9]Ephraim Y, Malah D. Speech enhancement using a minimum mean square error short time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, Signal Processing,1984,32(6):1109-1121
    [10]Ephraim Y, Malah D. Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, Signal Processing,1985,33(2):443-445
    [11]Cohen I, Berdugo B. Speech enhancement for nonstationary noise environments. Signal Processing,2001,81(11):2403-2418
    [12]You C H, Koh S N, Li H, Rahardja S. Improved adaptive Beta-order MMSE speech enhancement. In:Proceedings of Asia-Pacific Signal and Information Processing Association.2009:797-800
    [13]Dendrinos M, Bakamidis S, Carayannis G. Speech enhancement from noise:A regenerative approach. Speech Communiation,1991,10(1):45-57
    [14]Jensen S H, Hansen P C, Hansen S D, Sorensen J A. Reduction of broadband noise in speech by truncated QSVD. IEEE Transactions on Speech Audio Processing,1995,3(6):439-448
    [15]Ephraim Y, Van Trees H L. A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,1995,3(7):251-266
    [16]Mittal U, Phamdo N. Signal/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech Audio Processing, 2000,8(2):159-167
    [17]Rezayee A, Gazor S. An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech Audio Processing,2001,9(2):87-95
    [18]Kim J U, lim S G, Yoo C D. The incorporation of masking threshold to subspace speech enhancement. In:Proceedings of International Conference on Acoustics, Speech and Signal Processing,2003,1:76-79
    [19]Jesper J, Richard H. Improved subspace-based single-channel speech enhancement using generalized super-Gaussian priors. IEEE Transactions on Audio, Speech, and Language Processing,2007,15(3):862-872
    [20]Lim S, Oppenheim A V. Enhancement and bandwidth compression of noisy speech. Proceedings of IEEE,1979,67(12):1586-1604
    [21]Hansen J H, Clements M A. Constrained iterative speech enhancement with application to automatic speech recognition. IEEE Transactions on Signal Processing,1991,39(4):795-805
    [22]Paliwal K K, Basu A. A speech enhancement method based on Kalman filtering. In:Proceedings of International Conference on Acoustics, Speech and Signal Processing.1987,12:177-180
    [23]Gibson J D, Koo B, Gray S D. Filtering of colored noise for speech enhancement and coding. IEEE Transactions on Signal Processing,1991,39(8):1732-1742
    [24]Gabrea M, Grivel E, Najim M. A single microphone Kalman filter-based noise canceller. IEEE signal Processing Letters,1999,6(3):55-57
    [25]Lee K Y, Jung S. Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise. IEEE Transactions on Speech and Audio Processing,2000,8(3):373-385
    [26]Deng J P, Bouchard M, Yeap T. speech enhancement using a switching kalman filter with a percepual post-filter. In:Proceedings of ICASSP 2005.2005: 1121-1124
    [27]章旭景.基于卡尔曼滤波的语音增强算法研究.合肥：中国科技大学,2009
    [28]Malah D, Cox R. A generalized comb filtering technique for speech enhancement. In:Proceedings of International Conference on Acoustics, Speech and Signal Processing.1982:160-163
    [29]Donoho D L. Denoising by soft-thresholding. IEEE Transactions on Information Theory,1995,41(3):613-627
    [30]Bahoura M, Rouat J. Wavelet speech enhancement based on the Teager energy operator. IEEE Signal Processing Letters,2001,8(1):10-12
    [31]Hu Y, Loizou P C. Speech enhancement based on wavelet thresholding the multiaper spectrum. IEEE Transactions on Speech and Audio Processing,2004, 12(1):59-67
    [32]Shao Y, Chang C H. A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. IEEE Transactions on Systems, Man, and Cybernetics,2007,37(4): 877-889
    [33]Ephraim Y, Malah D, Juang B H. On the application of hidden Markov models for enhancing noisy speech. IEEE Transactions on Acoustic, Speech, and Signal Processing,1989,37(12):1846-1856
    [34]Ephraim Y. A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Transactions on Signal Processing,1992,40(4): 725-735
    [35]Ephraim Y. Statistical model based speech enhancement systems. Proceedings of the IEEE,1992,80(10):1526-1555
    [36]Veisi H, Sameti H.A parallel cepstral and spectral modeling for HMM-based speech enhancement. In:Proceedings of 17th International Conference on Digital Signal Processing.2011:1-6
    [37]Tsoukalas D E, Mourjopoulos J N, Kokkinakis G. Speech enhancement based on audible noise suppression. IEEE Transactions on Speech and Audio Processing, 1997,5(6):497-514
    [38]Virag N. Signal channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech, Audio Processing,1999, 7(2):126-137
    [39]Jiang X, Yao T, Fu H. Single-channel speech enhancement method based on masking properties and minimum statistics. Journal of Systems Engineering and Electronics,2004,15(2):217-224.
    [40]Gunawan T S, Ambikairajah E, Epps J. Perceptual speech enhancement exploiting temporal masking properties of human auditory system. Speech Communication,2010,52(5):381-393
    [41]Hu Y, Loizou P C. A perceptually motivated approach for speech enhancement. IEEE Transactions on Speech Audio Processing,2003,11(5):457-465
    [42]Fah L B, Hussain A, Samad S A. Speech enhancement by noise cancellation using neural network. In:Proceedings of TENCON 2000.2000:39-42
    [43]Gunawan T S, Khalifa O O, Ambikairajah E. The development of a forward masking model using neural networks and its application to speech enhancement. In:Proceedings of International Conference on Computer and Communication Engineering.2008:212-216
    [44]Widrow B, Glover J R, Mccool J M. Adaptive noise canceling:principles and applications. Proceedings of IEEE,1975,63(12):1692-1716
    [45]Flanagan J L, Johnson J D, Zahn R, Elko G W. Computersteered microphone arrays for sound transduction in large room. Journal of the Acoustical Society of America,1985,78(5):1508-1518
    [46]Frost O L. An algorithm for linearly constrained adaptive array processing. Proceedings of IEEE,1972,60(8):926-935
    [47]Grifths L J. An alternative approach to linear constrained adaptive beamforming. IEEE Transactions on Antennas Propagation,1982,30(1):27-34
    [48]Hoshuyama O, Sugiyama A. A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. IEEE Transactions on Signal Processing,1999,47(10):2677-2684
    [49]Griebel S M, Brandstein M S. Microphone array speech dereverberation using coarse channel modeling. IEEE Transactions on Acoustics, Speech, and Signal Processing,2001,1:201-204
    [50]Mahmoudi D, Drygajlo A. Combined Wiener and coherence filtering in wavelet domain for microphone array speech enhancement. IEEE Transactions on Acoustics. Speech and Signal Processing,1998,1:385-388
    [51]欧世峰,赵晓晖,顾海军.改进的基于信号子空间的多通道语音增强算法.电子学报,2005,33(10)：1786-1789
    [52]Cubic N, Dahl M, Claesson I. Neural network based adaptive microphone array system for speech enhancement. In:Proceedings of International Joint Conference on Neural Networks.1998:2180-2183
    [53]Cohen I, Berdugo B. Speech enhancement based on a microhone array and log-spectral amplitude estimation. In:Proceedings of the 22nd Convention of Electrical and Electronics Engineers in Israel.2002:4-6
    [54]Thorpe L A, Shelton B. Subjective test methodology:MOS vs DMOS in evaluation of speech coding algorithms. In:Proceedings of IEEE Workshop on Speech Coding for Telecommunications.1993:73-74
    [55]王冬霞,殷福亮.联合波束形成与谱减法的麦克风阵列语音增强算法,大连理工大学学报,2006(1)：121-126
    [56]程宁,刘文举.基于听觉感知特性的信号子空间麦克风阵列语音增强算法,2009,35(12)：1481-1487
    [57]International Telecommunication Union. Methods for subjective determination of transmission quality. ITU Recommendation P.800,1996
    [58]Voiers W D. Diagnostic evaluation of speech intelligibility. Speech Intelligibility and Speaker Recognition, Hawley M E, ed. Stroudsburg:Dowden, Hutchinson and Ross,1977
    [59]Voiers W D. Diagnostic acceptability measure for speech communication systems. In:Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing.1977:204-207
    [60]Cohen I. Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Transactions on Speech and Audio Processing,2005,13(5): 870-881.
    [61]Deller J R, Hansen J H L, Proakis J G. Discrete-time processing of speech signals. 2nd ed. New York:IEEE Press,2000
    [62]Papamichalis P E. Practical approaches to speech coding. Englewood Cliffs: Prentice-Hall,1987
    [63]Hansen J H L, Pellom B. An effective quality evaluation protocol for speech enhancement algorithms. In:Proceedings of International Conference on Spoken Language Processing.1998:1-4
    [64]International Telecommunication Union. Perceptual Evaluation of Speech Quality (PESQ), An objective Method for End-to-end speech quality assessment of narrowband telephone networks and speech codes. ITU-T Recommendation P.862,2001
    [1]Ephraim Y, Cohen I. Recent advancements in speech enhancement. The Electrical Engineering Handbook. Boca Raton:CRC,2005
    [2]Breithaupt C, Krawczyk M, Martin R. Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech. In:proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process.2008: 4037-4040
    [3]Wexker J, Raz S. Discrete Gaor expansions. Speech Procee,1990,21(3):207-220
    [4]Crochiere R E, Rabiner L R, Multrirate digigtal signal processing. Englewood Cliffs:Prentice-Hall,1983
    [5]Lim J S, Oppenheim A V. Enhancement and bandwidth compression of noisy speech. Proceedings of IEEE,1979,67(12):1586-1604
    [6]Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on acoustics, Speech, and Signal Processing,1984,32(6):1109-1121
    [7]Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on acoustics, Speech, and Signal Processing,1985,33(2):443-445
    [8]Breithaupt C, Martin R. MMSE estimation of magnitude-squared DFT coefficients with supergaussian Priors. In:Proceedings of the 28th IEEE International Conference on Acoustics, Speech and Signal Proceesing.2003: I-896-I-899
    [9]Ephraim Y. A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Transactions on Signal Processing,1992,40(4): 725-735
    [10]Wolfe P J, Godsill S J. Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement. EURASIP Journal on Applied Signal Processing,2003,2003(10):1043-1051
    [11]Cohen I. Noise spectrum estimation in adverse environments:improved minima controlled recursive averaging. IEEE Transactions on Speech Audio Processing, 2003,11(5):466-475
    [12]Martin R. Noise power spectral density estimation based on optimal smoothing and minimum statistics. Ieee Transactions on Speech and Audio Processing,2001, 9(5):504-512
    [13]Shin J W, Kwon H J, Jin S H, Kim N S. Voice activity detection based on conditional MAP criterion. IEEE Signal Processing Letters,2008,15:257-260
    [14]IEEE Subcommittee. IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics,1969, AU-17(3):225-246
    [15]Varga A, Steeneken H J M, Tomlinson M, Jones D. The NOISEX-92 study on the effect of additive noise on automatic speech recognition. The NOISEX-92 CD-ROMs,1992
    [16]Quackenbush S R, Barnwell T P, Clements M A. Objective measures of speech quality. Englewood Cliffs:Prentice-Hall,1988
    [17]Deller J R, Hansen J H L, Proakis J G. Discrete-time processing of speech sinals. 2nd ed. New York:IEEE Press,2000
    [18]Papamichalis P E. Practical approaches to speech coding. Englewood Cliffs: Prentice-Hall,1987
    [1]Boll S F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing,1979,27(2):113-120
    [2]张雄伟,陈亮,杨吉斌.现代语音处理技术及应用.北京：机械工业出版社,2003
    [3]Lim J S, Oppenheim A V. Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE,1979,67(12):1586-1604
    [4]Ephraim Y, Malah D. Speech enhancement using a minimum mean square error short time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, Signal Processing,32(6),1984:1109-1121
    [5]Ephraim Y, Malah D. Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, Signal Processing,1985,33(2):443-445
    [6]Breithaupt C, Martin R. Voice activity detection in the DFT domain based on a parametric noise model. In:Proceedings of International Workshop on Acoustic Echo and Noise Control.2006
    [7]http://www.utdallas.edu/-loizou/speech/noizeus/
    [8]IEEE Subcommittee. IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics,1969, AU-17(3):225-246
    [9]Quackenbush S R, Barnwell T P, Clements M A. Objective measures of speech quality. Englewood Cliffs:Prentice-Hall,1988
    [10]Cohen I. Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Transactions on Speech and Audio Processing,2005,13(5): 870-881.
    [11]Plapous C, Marro C, Scalart P. Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Speech and Audio Processing,2C06, 14(6):2098-2108.
    [12]Martin R. Noise power spectral density estimation based on optimal smoothing and minimum statistics. Ieee Transactions on Speech and Audio Processing,2001, 9(5):504-512
    [13]http://www.utdallas.edu/-loizou/speech/noizeus/
    [1]Itoh K, Mizushima M. Environmental noise reduction based on speech/non-speech identification for hearing aids. In:Proceedings of Acoustics, Speech, and Signal Processing.1997:419-422
    [2]Kang G S, Fransen L J. Quality improvement of LPC-processed noisy speech by using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing,1989,37(6):939-942
    [3]Martin R. An efficient algorithm to estimate the instantaneous SNR of speech signals. In:Proceedings of EuroSpeech'93.1993:1093-1096
    [4]Dendrinos M, Bakamidis S, Carayannis G. Speech enhancement from noise:a regenerative approach. Speech Communication,1995,10:251-266
    [5]张雄伟、陈亮、杨吉斌,现代语音处理技术及应用,机械工业出版社,2003
    [6]Shen J L, Hung J W, Lee L S. Robust entropy-based endpoint detection for speech recognition in noisy environments:In Proceedings of International Conference on Spoken Language Processing.1998,232-235
    [7]Sohn J, Sung W. A voice activity detector employing soft decision based noise spectrum adaptation, In Proceedings of International Conference on Acoustics, Speech and Signal Processing.1998:365-368
    [8]Sohn J, Kin N S, Sung W. A statistical model-based voice activity detection. IEEE Signal Processing Letters,1999,16(1):1-3
    [9]Cho Y D, Al-Naimi K, Kondoz A. Improved voice activity detection based on a smoothed statistical likelihood ratio. In:Proceedings of International Conference on Acoustics, Speech and Signal Processing.2001:737-740
    [10]Acero A, Crespo C, De la Torre C, Torrecilla J. Robust HMM-based endpoint detector. In:Proceedings of EuroSpeech.1993:1551-1554
    [11]Kosmides E, Dermatas E, Kokkinakis G. Stochastic endpoint detection in noisy speech. In:Proceedings of Speech and Computer 97 Workshop.1997,109-114
    [12]Tian Y, Wu J, Wang Z. Lu D. Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing.2003:444-447
    [13]Abdulla W H, Kecman V, Kasabov N. Speech-background classification by using SVM technique. In:Proceedings of Joint International Conference ICANN/ICONIP.2003:310-315
    [14]刘鹏,王作英.多模式语音端点检测.清华大学学报(自然科学版),2005,45(7)：896-899
    [15]Sohn J, Sung W. A voice activity detector employing soft decision based noise spectrum adaptation. IEEE International conference on Acoustics, Speech and Signal Processing,1998,1:365-368
    [16]Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing,1984,32(6):1109-1121

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700