变换域语音增强算法的研究

英文题名：Research on Noisy Speech Enhancement in Transform Domain
作者：欧世峰
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：语音增强 ; 变换域 ; 最小均方误差 ; 先验信噪比 ; 相关 ; 拉普拉斯模型因子
英文关键词：Speech enhancement ; transform domain ; minimum mean-square error ; a priori signal-to-noise ratio ; correlation ; Laplacian factor
学位年度：2008
导师：赵晓晖
学科代码：081001
学位授予单位：吉林大学
论文提交日期：2008-06-01

摘要

语音增强是消除语音中噪声干扰的重要手段,其在语音识别、语音低速编码和人机语音交互等领域都有着重要的应用。本文主要围绕变换域语音增强算法开展以下研究:
     对于语音增强中先验信噪比参数的估计问题,在频域提出了一种改进的两步噪声消除算法,该算法无需语音增强系统增益因子的任何先验条件,在保持简单结构的同时避免了原算法的不足;在DCT域结合维纳滤波提出了一种新的单通道语音增强算法,该算法利用DCT域中连续时刻语音分量间的相关特性结合最小均方误差理论,实现了纯净语音分量的最优估计;根据广义高斯分布模型及其形态参数的概念与性质,在DCT域提出新的拉普拉斯模型因子估计方法,该算法消除了噪声分量对于估计精度的影响,有效地提高了估计质量;通过同时对角化麦克风阵列接收信号中语音和噪声信号的全局协方差矩阵,在子空间域改进了一种多通道语音增强算法,实现了色噪声背景下语音信号的最优估计;在自适应KLT域提出了一种新的语音信号统计模型,并基于该模型,利用最大后验估计理论提出了一种新型的单通道语音增强算法。
In our lives, speech is often corrupted acoustically by ambient noise which produces aesthetically undesirable effects on the performance of digital voice processor and even diminishes communication system ability to convey information across the interface. Therefore a speech enhancement system is strongly needed whose responsibility is improving the speech quality and ensuring reliability of digital voice communication systems.
     Depending on the processing manner, speech enhancement algorithms can be divided into two categories: one is in time domain and other is in transform domain. Algorithms in time domain process the noisy speech signals without any other transformation. They directly estimate the clean speech signals in time domain using the stationary or correlative characteristic of speech signals. Transform domain algorithms for speech enhancement lie in the choices made in the different processing stages of enhancement. The first stage consists of the analysis stage in which the signal is transformed in some domain via a transformation (e.g., DFT, DCT, and KLT). The second stage, which is the heart of most algorithms, consists of the suppression stage in which the transformed signal is multiplied by a gain function designed to attenuate the acoustic noise while preserving the speech signal. The last stage is the synthesis stage in which the modified signal is transformed back to the time domain using an inverse transformation. Algorithms in transform domain are found to be better in enhancing noisy speech as compared to that in time domain because of several advantages. The main reason is that the transformation can provide a significantly high energy compaction and reduce the correlation characteristic of clean speech, which means that the estimation methods can process each noisy speech component individually and can make it easier to remove noise embedded in the noisy speech signals.
     After general presentation of the algorithms for speech signal enhancement, chapter three addresses the problem of the a priori signal-to-noise ratio (SNR) estimation in the DFT domain: The well-known decision-directed (DD) approach drastically limits the level of musical noise, but the estimated a priori SNR is biased since it depends on the speech spectrum estimation in the previous frame. Therefore, the gain function matches the previous frame rather than the current one which degrades the noise reduction performance. The method called two-step noise reduction (TSNR) technique which is proposed by Plapous recently can solve the problem of DD approach. However, the performance of TSNR method depends on the choice of the gain function, and also the estimated a priori SNR can not reduce the residual musical noise to the lowest level. To remove the bias of the two approaches, a modified approach for the a priori SNR estimation with two steps like the TSNR method is proposed. While in the second step of TSNR method, the proposed approach computes directly the square of clean speech component using the estimated a priori SNR of the DD approach, its result is not restricted on the gain function, and thus the drawback of the TSNR method was removed while keeping the advantages of the DD method. Experimental results show the improved performance of the proposed approach under different noisy conditions.
     In chapter four, after analyzing the good performance of the DCT for noisy speech enhancement compared to that of the DFT and the problem of the classic DCT based speech enhancement algorithms, the statistical correlation characteristic of successive speech components is investigated across time in the DCT domain. It is found that between successive frames there yields a significant correlation across speech components, while the correlation coefficient is cosine-shape distributed along frequency index. Based on the result, a novel algorithm using DCT and Wiener filter is proposed with a single microphone. This algorithm does not rely on any speech signal model and can efficiently attain the optimal estimation of clean speech components using successive noisy speech components and minimum mean square error estimation in DCT domain. Furthermore, it can overcome the disadvantage of independent assumptions in classic methods for speech components. Simulation results demonstrate that the proposed algorithm possesses good performance both in objective and subjective tests with different kinds of noise.
     Most researches on noisy speech enhancement often have a basic assumption that in transform domain the coefficients of either clean speech or noise signals are all jointly zero mean Gaussian distributed random variables. This Gaussian assumption is motivated by the central limit theorem as these coefficients are just a weighted sum of a large number of the speech samples. In chapter five, however, we show that in DCT domain the Laplacian distribution is more suitable than the conventional Gaussian distribution for DCT coefficients of clean speech, and based on this research, we give the MMSE and ML estimator for speech enhancement employing the Laplacian-Gaussian mixture model proposed by Gazor, which is shown to result in better performance for noise reduction compared to other methods under Gaussian model. In this approach, however, the estimation of the Laplacian factor for clean speech is derived using the noisy speech signal instead of the clean speech, so the resulting Laplacian factor is not accurate because of the interference of noise energy. To further improve the performance, we present two novel approaches for Laplacian factor estimation based on the property of generalized Gaussian distribution model and its shape parameter. The proposed approaches can indirectly attain the estimation of Laplacian factor using its relation with the variance of clean speech components under the Laplacian distribution assumption, while keeping the resulting method simple. The algorithms can not be affected by noise components and give accurate estimations for Laplacian factor. Experimental results show the improved performance of the proposed algorithms compared to that of the original method.
     In chapter six, we address the problem of speech enhancement with multi-channel in subspace domain. Subspace based speech enhancement method was provided by Ephrim which is an optimal estimator that would minimize the speech distortion subject to the constraint that the residual noise fell below a preset threshold, but the major drawback of single-channel algorithms for noise reduction using subspace is the incurrence of musical noise. Multi-channel methods can give good performance for noisy speech enhancement, but they often need a large number of microphones. To cope with the drawbacks of both classes, combinations of the single and multi-channel techniques was proposed in [116], which is a multi-channel system with a post filter derived from the signal subspace decomposition. The covariance matrices required to design the filter is approximated from data gathered of different microphones. However, this method has a big drawback that it assumes the noise signal to be white. To extend the work to colored noise, chapter six proposes an improved speech enhancement approach based on signal subspace with multi-channel. Through simultaneous diagonalization of the overall covariance matrices of clean speech and noise signal observed by microphone array, the proposed algorithm estimates clean speech signal subspace without any presumption on the stochastic property of noise signal. It does not rely on any signal model and efficiently attains the optimal estimation of speech corrupted by colored noise, which overcomes the disadvantage of original method only suitable for white noise. Simulation results demonstrate that the algorithm possesses good performance both in objective and subjective tests.
     Subspace technique requires an accurate estimation of eigenvalues and eigenvectors of noisy speech covariance matrix. In common case, the estimation of eigenvalues and eigenvectors can be computed using KLT after the estimation of convariance matrix is made. This process is very time consuming, and also, since speech is not a stationary process the performance of KLT tracking algorithm may be improved by using adaptive subspace tracking algorithms. In chapter seven, we propose to use projection approximation subspace tracking method introduced by Yang, which is an adaptive KLT method that tracks eigenvectors of covariance matrix using RLS algorithm. Then we investigate the probability distribution of speech components as well as the correlation characteristic between the adjacent components in the adaptive KLT domain, and present a new speech model for enhancing noisy speech which takes into account the time correlation between speech components. Based on this model, a novel speech enhancement algorithm using MAP estimation is proposed, which incorporates inter-frame correlation information as a form of joint probability density function into MAP under Gaussian model assumption for speech and noise components. The obtained estimation result keeps simple and avoids deficiency of classic approaches of noisy speech enhancement in the adaptive KLT domain. In experimental simulations with speech signals degraded by various noises, the proposed algorithm shows improved performance for a number of objective and subjective measures.
     In chapter eight, conclusions of our work are drawn.

引文

[1] H. L. Van-Trees. Detection, estimation, and modulation theory. New York: Wiley, 1968.
    [2] B. Gold and L. R. Rabiner. Parallel processing techniques for estimating pitch periods of speech in the time domain. The Journal of the Acoustical Society of America, 1969, 46 (28): 442-448.
    [3] M. R. Sambur. Adaptive noise canceling for speech signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1978, 26(5): 419-423.
    [4] T. W. Parsons. Separation of speech from interfering speech by means of harmonic selection. The Journal of the Acoustical Society of America, 1976, 60(4): 911-918.
    [5] B. Wifrow. Adaptive noise canceling, principles and applications. In Proceedings of the IEEE, 1975, 63(12): 1692-1716.
    [6] R. H. Frazier, S. Samsam, L. D. Braida and A. V. Oppenheim. Enhancement of speech by adaptive filtering. In Proceedings of IEEE ICASSP’76, 1976, 1: 251-253.
    [7] R. J. Mcaulay and M. L.Malpass. Speech enhancement using a soft-decision noise suppression filter. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, 28(2): 137-145.
    [8] R. D. Preuss. Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1978, 26(5): 471-472.
    [9] S. F. Boll. Suppression of acoustics noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
    [10] M. Berouti, R. Schwartz and J. Makhoul. Enhancement of speech corrupted by acoustic noise. In Proceedings of IEEE ICASSP’79, 1979, 4: 208-211.
    [11] Y. Ephraim, D. Malah and B. H. Juang. On the application of hidden markov models for enhancing noisy speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1989, 37(12): 1846-1856.
    [12] Y. Ephraim and H. L. Van-Trees. A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 1995, 3(4): 251-266.
    [13] K. Y. Lee, S. Mclaughlin and K. Shirai. Speech enhancement based on neural predictive hidden markov model. Signal Processing, 1998, 65(3): 373-381.
    [14] L. P. Ainsleigh and C. K. Chui. A B-wavelet-based noise reduction algorithm. IEEE Transactions on Signal Processing, 1996, 44(5): 1279-1284.
    [15] Z. Goh, K. Tan and T. G. Tan. Postprocessing method for suppressing musical noise generated by spectral subtraction. IEEE Transactions on Speech and Audio Processing, 1998, 6(3): 287-292.
    [16] N. Virag. Single channel speech enhancement based on masking properties of the humanauditory system. IEEE Transactions on Speech and Audio Processing, 1999, 7(2): 126-137.
    [17] Z. L. Yu and M. H. Er. Robust subspace analysis and tis application in microphone array for speech enhancement. IEICE Transactions on Fundamentals, 2005, E88-A(7): 1708-1715.
    [18] T. Sekiya and T. Kobayashi. Speech enhancement based on multiple directivity patterns using a microphone array. In Proceedings of IEEE ICASSP’04, 2004, 1: 877-880.
    [19] E. Visser and T. W. Lee. Speech enhancement using bind source separation and two-channel energy based speech detection. In Proceedings of IEEE ICASSP’03, 2003. 884-887.
    [20] L. Y. Siow and S. Nordholm. A hybrid speech enhancement system employing blind source separation and adaptive noise cancellation. In Proceedings of IEEE NORSIG, 2004. 204-207.
    [21] K. Kokkinakis and A. K. Nandi. Multichannel blind deconvolution for source separation in convolutive mixtures of speech. IEEE Transactions Audio Speech and Language Processing, 2006, 14(1): 200-212.
    [22] I. Y. Soon and S. N. Koh. Speech enhancement using 2-D Fourier transform. IEEE Transactions on Speech and Audio Processing, 2003, 11(6): 717-724.
    [23] C. H. You, S. N. Koh and S. Rahardja. Subband Kalman filtering incorporating masking properties for noisy speech signal. Speech Communication, 2007, 49(7): 558-573.
    [24] Y. Nagata, T. Fujioka and M. Abe. Speech enhancement based on auto gain control. IEEE Transactions on Audio Speech and Language Processing, 2006, 14(1): 177-190.
    [25] I. Cohen. On the decision- directed estimation approach of Ephraim and Malah. In Proceedings of IEEE ICASSP’04, 2004, 1: 293-296.
    [26] N. S. Kim and J. H. Chang. Spectral enhancement based on global soft decision. IEEE Signal Processing Letters, 2000, 7(5): 108-110.
    [27] R. Martin. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 2001, 9(5): 504-512.
    [28] M. E. Hamid and T. Fukabayashi. A two-stage method for single-channel speech enhancement. IEICE Transactions on Fundamentals, 2006, E89-A(4): 1058-1068.
    [29] R. Monzingo and T. Miller. Introduction to Adaptive Arrays. New York: Wiley, 1980.
    [30] M. Brandstein and D. Ward. Microphone arrays. Germany: Springer, 2001.
    [31] O. Hoshuyama, A. Sugiyama and A. Hirano. A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. IEEE Transactions on Signal Processing, 1999, 47(10): 2677–2684.
    [32] M. Brandstein. An event-based method for microphone array speech enhancement. In Proceedings of IEEE ICASSP’99, 1999. 953-956.
    [33] F. Asano, S. Hayamizu, T. Yamada and S. Nakamura. Speech enhancement based on the subspace method. IEEE Transactions on Speech and Audio Processing, 2000, 8(5):497-507.
    [34] I. A. McCowan and H. Bourlard. Microphone array post-filter based on noise field coherence. IEEE Transactions on Speech and Audio Processing, 2003, 11(6): 709-716.
    [35] S. Doclo and M. Moonen. Multimicrophone noise reduction using recursive GSVD-based optimal filtering with ANC postprocessing stage. IEEE Transactions on Speech and Audio Processing, 2005, 13(1): 53-69.
    [36] C. Marro, Y. Mahieux and K. U. Sinner. Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering. IEEE Transactions on Speech and Audio Processing, 1998, 6(3): 240-259.
    [37] L. A. McCowan and H. Bourlard. Microphone array post-filter based on noise field coherence. IEEE Transactions on Speech and Audio Processing, 2003, 11(6): 709-716.
    [38] S. Cannot and I. Cohen. Speech enhancement based on the general transfer function GSC and postfiltering. IEEE Transactions on Speech and Audio Processing, 2004, 12(6): 561-571.
    [39] 杨毅, 杨宇, 余达太. 麦克风阵列及其消噪性能研究. 计算机工程, 2006, 32(2): 191-193.
    [40] 洪鸥. 麦克风阵列语音增强技术及其应用. 传感器与仪器仪表, 2006, 22(1): 142-145.
    [41] M. Gabrea. Speech signal recovery in colored noise using an adaptive Kalman filtering. In Proceedings of IEEE CCECE, 2002, 2: 974-979.
    [42] M. Gabrea. Robust adaptive Kalman filtering-based speech enhancement algorithm. In Proceedings of IEEE ICASSP’04, 2004, 1: 301-304.
    [43] Y. Tsukamoto and A. Kawamura. Speech enhancement based on MAP estimation using a variable speech distribution. IEICE Transactions on Fundamentals, 2007, E90-A(4): 1587-1593.
    [44] U. Mittal and N. Phamdo. Signal/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech and Audio Processing, 2000, 8(2): 159-167.
    [45] M. Y. Kim and W. Bastiaan. KLT-based adaptive classified VQ of the speech signal. IEEE Transactions on Speech and Audio Processing, 2004, 12(3): 277-289.
    [46] 邹霞, 陈亮, 张雄伟. 基于语音 Gamma 模型的语音增强算法. 通信学报, 2006, 27(10): 118-123.
    [47] 李雪耀, 谢华, 张汝波. 基于离散余弦变换的语音增强. 哈尔滨工程大学学报, 2007, 28(2): 198-202.
    [48] C. A. Medina and A. Alcaim. Using neural networks wavelet denoising of speech for threshold selection. Electronics Letters, 2003, 39(25): 1869-1871.
    [49] Y. Hu and P. C. Loizou. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio Processing, 2004, 12(1): 59- 67.
    [50] S. Y. Low, S. Nodrholm and R. Tongeri. Convolutive blind signal separation with post-processing. IEEE Transactions on Speech and Audio Processing, 2004, 12(5): 539-548.
    [51] H. M. Park, H. S. Oh and S. Y. Lee. Adaptive noise canceling based on independent component analysis. Electronics Letters, 2002, 38(15): 832-833.
    [52] J. Sohn, N. S. Kim and W. Sung. A statistical model-based voice activity detection. IEEE Signal Processing Letters, 1999, 6(1): 1-3.
    [53] S. Gazor and W. Zhang. A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Transactions on speech and audio processing, 2003, 11(5): 498-505.
    [54] L. Rabiner and M. Sambur. Voiced-unvoiced-silence detection using the Itakura LPC distance measure. In Proceedings of IEEE ICASSP’77, 1977. 323-326.
    [55] J. A. Haigh and J. S. Mason. Robust voice activity detection using cepstral features. In Proceedings of IEEE TENCON, 1993. 321-324.
    [56] J. Ramfrez, J. C. Segura and C. Benitez. Efficient voice activity detection algorithms using long term speech information. Speech Communication. 2004, 42(3-4): 271-287.
    [57] 徐望, 丁琦, 王炳锡. 一种基于特征空间能量熵的语音信号端点检测算法. 通信学报, 2003, 24(11): 125-132.
    [58] S. G. Tanyer and H. Ozer. Voice activity detection in non-stationary noise. IEEE Transactions on speech and audio processing, 2000, 8(4): 478-482.
    [59] F. Beritelli, S. Casale and S. Serrano. A low-complexity speech-pause detection algorithm for communication in noisy environments. European Transactions on telecommunications, 2004, 15(1): 33-38.
    [60] 李明远, 李建东. 基于相关性的语音激活检测器. 电声技术, 1995, 11(11): 6-9.
    [61] 江小平, 姚天任, 傅华. 基于最小统计量和掩蔽效应的单通道语音增强. 通信学报, 2003, 24(6): 23-1.
    [62] 陆生礼, 时龙兴. 听觉模拟的语音增强方法. 声学学报, 1996, 21(6): 879-883.
    [63] 张金杰, 曹志刚, 马正新. 一种基于听觉掩蔽效应的语音增强方法. 清华大学学报(自然科学版), 2001, 41(7): l-4.
    [64] 蔡汉添, 袁波涛. 一种基于听觉掩蔽模型的语音增强算法. 通信学报, 2003, 23(8): 93-98.
    [65] 马晓红, 殷福亮, 陆晓燕等. 基于小波变换的传声器阵列语音增强方法. 大连理工大学学报, 2003, 43(4): 824-839.
    [66] 孟哲. 基于小波变换的多尺度多闽值语音增强方法. 武汉理工大学学报, 2001, 25(2): 209-212.
    [67] S. Araiki, R. Mukai, S. Makino, T. Nishikawa and H. Saruwatari. The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Transactions on Speech and Audio Processing, 2003, 11(2): 109-115.
    [68] T. Nishikawa, H. Saruwatari, K. Shikano, S. Araki and S. Makion. Multi-sage ICA forblind source separation of real acoustic convolutive mixture. In Proceedings of ICA’03, 2003. 523-528.
    [69] N. Mitianoudis and M. E. Davis. Audio source separation of convolutive mixtures. IEEE Transactions on Speech and Audio Processing, 2003, 11(5): 489-497.
    [70] K. Rahbar and J. P. Reilly. A frequency domain method for blind source separation of convolutive audio mixtures. IEEE Transactions on Speech and Audio Processing, 2005, 13(5): 832-844.
    [71] S. Y. Low, S. Mordholm and R. Togneri. Convolutive blind signal separation with post-processing. IEEE Transactions on Speech and Audio Processing, 2004, 12(5): 539-548.
    [72] J. F. Cardoso. Blind signal separation: statistical principles. In Proceedings of the IEEE, 1998, 86(10): 2009-2025.
    [73] P. Smaragdis. Blind separation of convolved mixtures in the frequency domain. Neurocomputing, 1998, 22(1-3): 21-34
    [74] R. Mukai, S. Araki, H. Sawada and S. Makino. Evaluation of separation and dereverberation performance in frequency domain blind source separation. Acoustical Science and Technology, 2004, 25(2): 119-126.
    [75] J. Anemüller and B. Kollmeier. Amplitude modulation decorrelation for convolutive blind source separation. In Proceedings of ICA’00, 2000. 215-220.
    [76] N. Murata, S. Ikeda and A. Ziehe. An approach to blind source separation based on temporal structure of speech signals. Neurocomputing, 2001, 41(1-4): 1-24.
    [77] S. Kurita, H. Saruwatari, S. Kajita, K. Takeda and F. Itakura. Evaluation of blind signal separation method using directivity pattern under reverberant conditions. In Proceedings of IEEE ICASSP’00, 2000. 3140-3143.
    [78] M. Z. Ikram and D. R. Morgan. A beamforming approach to permutation alignment for mltichannel frequency-domain blind speech separation. In Proceedings of IEEE ICASSP’02, 2002. 881-884.
    [79] H. Saruwatari, S. Kurita and K. Takeda. Blind source separation combining independent component analysis and beamforming. EURASIP Journal on Applied Signal Processing, 2003, 11: 1135-1146.
    [80] R. Mukai, H. Sawada, S. Araki and S. Makino. Frequency domain blind source separation for many speech signals. In Proceedings of ICA’04, 2004. 461-469.
    [81] R. Mukai, H. Sawada, S. Araki and S. Makino. Frequency domain blind source separation using small and large spacing sensor pairs. In Proceedings of ISCAS’04, 2004. 1-4.
    [82] H. Sawada, R. Mukai, S. Araki and S. Makino. A robust and precise method for solving permutation problem of frequency-domain blind source separation. IEEE Transactions on Speech and Audio Processing, 2004, 12(5): 530-538.
    [83] L. Parra and C. Spence. Convolutive blind separation of non-stationary sources. IEEE Transactions on Speech and Audio Processing, 2000, 8(3): 320-327.
    [84] L. Schobben and W. Sommen. A frequency domain blind signal separation method based on decorrelation. IEEE Transactions on Signal Processing, 2002, 50(8): 1855-1865.
    [85] 张安清,邱天爽,章新华. 卷积混合信号频域盲分离技术研究.大连理工大学报,2004,44(5): 723-728.
    [86] 姜卫东, 陆佶人, 张宏滔, 高明生. 基于相邻频点幅度相关的语音信号盲源分离.电路与系统学报,2005, 10(3): 1-4.
    [87] S. Ding, J. Huang, D. Wei and A. Cichocki. A near real-time approach for convolutive blind source separation. IEEE Transactions on Circuits and Systems, 2006, 53(1): 114-128.
    [88] 张雪峰, 刘建强, 冯大政. 一种快速的频域盲语音分离系统.信号处理, 2005, 21(5): 434-438.
    [89] S. Makino, H. Sawada, R. Mukai and S. Araki. Blind source separation of convolutive mixtures of speech in frequency domain. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2005, E88-A(7): 1640-1655.
    [90] R. Prasad, H. Saruwatari and K. Shikano. Blind separation of speech by fixed-point ICA with source adaptive negentropy approximation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2005, E88-A(7): 1683-1692.
    [91] K. Kokkinakis and A. K. Nandi. Multichannel blind deconvolution for source separation in convolutive mixtures of speech. IEEE Transactions on Audio Speech and Language Processing, 2006, 14(1): 200-212.
    [92] J. R. Deller, J. Hansen and J. G. Proakis. Discrete-time processing of speech signals. New York: IEEE Press, 2000.
    [93] S. R. Quackenbush, T. P. Barnwell and M. A. Clements. Objective measures of speech quality. Englewood Cliffs, New Jersey: Prentice-Hall, 1988.
    [94] Y. Ephraim and D. Malah. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984, 32 (6): 1109-1121.
    [95] P. Scalart and J. V. Filho. Speech enhancement based on a priori signal to noise estimation. In Proceedings of IEEE ICASSP’96. 1996. 629-632.
    [96] 荣高峰, 张玲华, 吴玺宏. 基于谱增益迭代的先验信噪比估计语音增强算法. 南京邮电大学学报(自然科学版), 2007, 27(5): 56-59.
    [97] 陈紫强, 曾庆宁, 刘庆华. 基于先验信噪比参数自适应的频域联合语音增强算法. 电子与信息学报, 2007, 29(2): 439-442.
    [98] S. Ogata and T. Shimamura. Reinforced spectral subtraction method to enhance speech signal. Electrical and Electronic Technology, 2001, 8(1): 242-245.
    [99] Y. S. Park and J. H. Chang. A novel approach to a robust a priori SNR estimator in speech enhancement. IEICE Transactions on Communication, 2007, E90-B(8): 2182-2185.
    [100] C. Plapous and C. Marro, P. Scalart. Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio Speech and Language Processing, 2006, 14(6): 2098-2108.
    [101] O. Cappé. Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Transactions on Speech and Audio Processing, 1994, 2(2): 345-349.
    [102] J. Wexler and S. Raz. Discrete Gabor expansions. Speech Processing, 1990, 21(3): 207-220.
    [103] S. Kay. Fundamentals of statistical signal processing: estimation theory. Upper Saddle River, New Jersey: Prentice-Hall, 1993.
    [104] Y. Ephraim and D. Malah. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1985, 33(2): 443-445.
    [105] P. C. Loizou. Speech enhancement based on perceptually motivated Bayesian estimators of magnitude spectrum. IEEE Transactions on Speech and Audio Processing, 2005, 13(5): 857-869.
    [106] C. H. You, S. N. Koh and S. Ranardja. Beta-order MMSE spectral amplitude estimation for speech enhancement. IEEE Transactions on Speech and Audio Processing, 2005, 13(4): 475-486.
    [107] I. S. Gradshteyn and I. M. Ryzhik. Table of integrals, series, and products. New York: Academic, 1980.
    [108] I. Y. Soon, S. N. Koh and C. K. Yeo. Noisy speech enhancement using discrete cosine transform. Speech Communication, 1998, 24(3): 249-257.
    [109] J. H. Chang. Warped Discrete cosine transform-based noisy speech enhancement. IEEE Transactions on Circuits and Systems-Ⅱ: Express Briefs, 2005, 52(9): 535-539.
    [110] S. Gazor and W. Zhang. Speech probability distribution. IEEE Signal Processing Letters, 2003, 10(7): 204-207.
    [111] S. Gazor and W. Zhang. Speech enhancement employing Laplacian-Gaussian mixture. IEEE Transactions on Speech and Audio Processing, 2005, 13(5): 896-904.
    [112] S. Gazor. Employing Laplacian-Gaussian densities for speech enhancement. In proceeding of IEEE ICASSP’04, 2004. 297-230.
    [113] N. Ahmed, T. Natarajan and K. R. Rao. Discrete cosine transform. IEEE Transactions on Computers, 1974, C-23: 90-93.
    [114] W. H. Chen, C. H. Smith and S. C. Fralick. A fast computational algorithm for the discrete cosine transform. IEEE Transactions on Communication, 1977, COM-25(9): 1004-1009.
    [115] M. J. Narasimha and A. M. Peterson. On the computation of the discrete cosine transform. IEEE Transactions on Communication, 1978, COM-26(6): 934-936.
    [116] F. Jabloun and B. Champagne. A multi-microphone signal subspace approach for speech enhancement. In Proceedings of IEEE ICASSP’01, 2001, 1: 205-208.
    [117] T. H. Dat, K. Takeda and F. Itakura. Gamma modeling of speech power and its on-line estimation for statistical speech enhancement. IEICE Transactions on Information and System, 2006, E89-D(3): 1040-1049.
    [118] R. Martin. Speech enhancement based on minimum mean-square error estimation and supergaussin priors. IEEE Transactions on Speech and Audio Processing, 2005, 13(5): 845-856.
    [119] J. W. Shin, J. Chang and N. S. Kim. Statistical modeling of speech signals based on generalized Gamma distribution. IEEE Signal Processing Letters, 2005, 12(3): 258-261.
    [120] N. Farvardin and J. W. Modestino. Optimum quantizer performance for a class of non-Gaussian memoryless sources. IEEE Transactions on Information Theory, 1984, 30 (3): 485-497.
    [121] A. Cichocki and S. Amari. Adaptive blind signal and image processing: learning algorithms and application. New York: Wiley, 2002.
    [122] F. Müller. Distribution shape of two-dimensional DCT coefficients of natural images. Electronics Letters, 1993, 29(22): 1935-1936.
    [123] R. L. Joshi and T. R. Fischer. Comparison of generalized Gaussian and Laplacian modeling in DCT image coding. IEEE Signal Processing Letters, 1995, 2(5): 81-82.
    [124] V. K. Goyal, J. Zhang and M. Vetterli. Transform coding with backward adaptive update. IEEE Transactions on Information Theory, 2000, 46(4): 1623-1633.
    [125] T. Murakami, T. Hoya and Y. Ishida. Speech enhancement by spectral subtraction based on subspace decomposition. IEICE Transactions on Fundamentals, 2005, E88-A(3): 690-701.
    [126] J. Jensen, R. Heusdens and S. H. Jensen. A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids. IEEE Transactions on Speech and Audio Processing, 2004, 12(2): 121-132.
    [127] J. U. Kim, S. G. Kim and C. D. Yoo. The incorporation of masking threshold to subspace speech enhancement. In Proceedings of IEEE ICASSP’03, 2003, 1: 76-79.
    [128] K. Hermus and P. Wambacq. Assessment of signal subspace based speech enhancement for noise robust speech recognition. In Proceedings of IEEE ICASSP’04, 2004, 1: 945-949.
    [129] C. H. You, S. Rahardja and S. N. Noh. Audible noise reduction in eigendomain for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(6): 1753-1765.
    [130] P. Aarabi and G. Shi. Phase-based dual-microphone robust speech enhancement. IEEE Transactions on System, Man, and Cybernetics-Part B: Cybernetics, 2004, 34(4): 1763-1773.
    [131] T. Sekiya and T. Kobayashi. Speech enhancement based on multiple directivity patterns using a microphone array. In Proceedings of IEEE ICASSP04, 2004, 1: 877-880.
    [132] S. Lefkimmiatis and P. Maragos. A generalized estimation approach for linear and nonlinear microphone array post-filters. Speech Communication, 2007, 49(8): 657-666.
    [133] G. Reuven, S. Gannot and I. Cohen. Joint noise reduction and acoustic echo cancellation using the transfer-function generalized sidelobe canceller. Speech Communication, 2007, 49(8): 623-635.
    [134] G. Kim and N. I. Cho. Frequency domain multi-channel noise reduction based on the spatial subspace decomposition and noise eigenvalue modification. Speech Communication, 2008, 50(5): 382-391.
    [135] H. Lev-Ari and Y. Ephraim. Extension of the signal subspace speech enhancement approach to colored noise. IEEE Signal Processing Letters, 2003, 10(4): 104-106.
    [136] A. Rezayee and S. Gazor. An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 2001, 9(2): 87-95.
    [137] Y. Hu and P. C. Loizou. A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing, 2003, 1(11): 334-340.
    [138] J. L. Flanagan. Computer-steered microphone arrays for sound transduction in large rooms. The Journal of the Acoustical Society of America, 1985, 78(5): 1508-1518.
    [139] W. Kellemann. A self-steering digital microphone array. In Proceedings of IEEE ICASSP’91, 1991. 3581-3584.
    [140] O. L. Frost. An algorithm for linearly constrained adaptive array processing. In Proceedings of the IEEE, 1972, 60(8): 926-935.
    [141] L. J. Griffiths and C. W. Jim. An alternative approach to linearly constrained adaptive beamforming. IEEE Transactions on Antennas Propagat, 1982, 30(1): 27-34.
    [142] S. B. Searle, Matrix Algebra Useful for Statistics. New York: Wiley, 1982.
    [143] B. Yang. Projection approximation subspace tracking. IEEE Transactions on Signal Processing, 1995, 43(1): 95-107.
    [144] V. Solo and X. Kong. Performance analysis of adaptive eigenanalysis algorithms. IEEE Transactions on Signal Processing, 1998, 43(3): 636-646.
    [145] T. Gustafson. Instrumental variable subspace tracking using projection approximation. IEEE Transactions on Signal Processing, 1998, 43(3): 669-681.
    [146] 吕善伟, 贺宁蓉. 改进 PASTd 算法在大型自适应阵中的应用. 北京航空航天大学学报, 2005, 31(9): 949-952.
    [147] I. Cohen. Speech enhancement using a noncausal a priori SNR estimation. IEEE Signal Processing Letters, 2004, 11(9): 725-728.
    [148] I. Cohen. Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Transactions on Speech and Audio Processing, 2005, 13(5): 870-881.
    [149] E. Zavarehei and S. Vaseghi. Speech enhancement in temporal DFT trajectories using Kalman filters. In Proceedings of Interspeech’05, 2005. 2077-2080.
    [150] E. Zavarehei and S. Vaseghi and Q. Yin. Speech enhancement using Kalman filters for restoration of short-time DFT trajectories. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, 2005. 313-318.
    [151] E. Zavarehei, S. Vaseghi and Q. Yin. Inter-frame modeling of DFT trajectories of speech and noise for speech enhancement using Kalman filters. Speech Communication, 2006, 48(11): 1546-1555.
    [152] T. Lotter and P. Vary. Speech enhancement by MAP spectral amplitude estimation using a super-gaussian speech model. EURASIP Journal on Applied Signal Processing, 2005, 7(1): 1110–1126.
    [153] P. Wolfe and S. Godsill. Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement. In Proceedings of the 11th IEEE Workshop on Statistical Signal Process, 2001. 496-499.
    [154] T. H. Dat, K. Takeda and F. Itakura. Generalized Gamma modeling of speech and its online estimation for speech enhancement. In Proceedings of IEEE ICASSP05, 2005. 181-184.
    [155] T. Lotter and P. Vary. Noise reduction by maximum a posteriori spectral amplitude estimation with supergaussian speech model. In Proceedings of International Workshop on Acoustic Echo and Noise Control, 2003. 83-86.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700