基于模型的语音增强方法及质量评估研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
根据对语音信号处理方式的不同,语音增强算法可以分为两类:基于模型的方法和非模型的方法。相对于基于模型的增强方法,非模型的方法在一些方面有不足之处。有些非模型的方法需要两个麦克风,分别做为噪声和语音的输入。但是通常这是很难实现的,特别是某些需要实时处理的应用,比如在助听器上的应用。非模型的方法的一个较大的缺点在于必须假定噪声是相对平稳的,当噪声变化过快时,其效果不能令人满意。而且有些非模型的方法还会引入音乐噪声,比如常用的谱减法。基于模型的方法利用语音信号在时域中的统计特性或短时相关特性等来研究具有针对性的噪声消除技术。基于模型的方法从其增强机理就避免了音乐噪声的产生,可以很好地处理非平稳噪声。
     本文以随机信号处理技术作为理论工具,采用动态模型对语音建模,研究了几种基于模型的语音增强方法,其目的是改进现有语音增强算法的性能。另一方面,本文还对语音质量的主观和客观评价方法进行了研究。全文的主要研究内容如下:
     1.在子带H∞滤波构架上,提出了一种结合人耳听觉掩蔽特性的单通道语音增强方法。该方法不用对激励噪声和附加噪声的统计特性做出假设。将语音信号分解为子带信号,采用迭代H∞滤波估计出子带信号的低阶AR参数。在对子带噪声进行估计时引入噪声掩蔽阈值,提高了H∞滤波的效果,减少了语音失真。仿真实验结果表明,该算法不仅降低了计算量,而且在主观和客观测试中都获得了更好的增强效果。
     2.由于发音的差异或发声方法的差别,不同说话人对同一音素发音时能级并不相同。HMM模型并不能明确地描述这种差异。在HMM模型的框架下,通过对语音增益的参数化和建模来解决上述问题。语音HMM模型和时不变的增益参数通过训练数据在离线时得到,而时变的参数通过观测到的带噪语音在线更新。通过并行的H∞滤波器对带噪语音进行处理,由滤波器输出的加权和计算出对干净语音的估计。引入IMM(交互式多模型)算法使并行的滤波器能有效地交互,在不显著增加计算复杂度的情况下改进增强性能。通过实验表明,文中提出的增强方法能有效地去除背景噪声,处理后的语音失真也比较小。
     3.针对含有色噪声的语音,提出了一种基于Unscented粒子滤波的单通道语音增强方法。采用时变AR模型对干净语音和噪声建模,通过Unscented粒子滤波器估计AR模型的参数并滤除有色噪声。与大多数常用的粒子滤波选择的建议分布不同,Unscented粒子滤波器采用Unscented卡尔曼滤波器生成粒子滤波的建议分布。由于在粒子的更新过程中考虑了最近的观测值,Unscented粒子滤波器能够在粒子数少于传统粒子滤波算法所需粒子数目的基础上改善估计的性能。仿真实验结果表明,在有色噪声背景下该算法具有良好的语音增强效果。
     4.为了预测经过增强算法处理后的语音质量,评估了几种客观测度的性能。在对干净语音混入三种噪声,分别对六类增强算法进行增强处理,并通过文中介绍的客观测度测试增强算法引入的失真。对增强后的语音质量进行主观测试时采用ITU-T P.835中提出的主观测试方法,从语音信号失真等级、背景噪声等级和语音整体质量三方面评估增强后的语音质量。最后,采用多元自适应回归分析技术得到与主观质量高度相关的一种新的组合客观测度。
     5.提出了一种新的基于GMM-HMM模型和非均匀线形预测倒谱系数的客观语音质量评估方法。提取干净语音的非均匀线形预测倒谱系数用来对GMM-HMM模型进行训练。通过训练给干净语音建立参考模型。由参考模型和失真语音的非均匀线形预测倒谱系数向量可以得到它们之间的一致性测度。最后,通过多元非线性回归模型建立主观MOS分和一致性测度之间的映射关系,可以得到对MOS分的客观预测模型。通过这一模型进行语音质量的客观评价,实验表明,文中所提出的基于输出的客观语音质量评估算法的性能要好于ITU-T P.563标准中提出的算法。
Depending on the processing manner, current speech enhancement techniques can be categorized into two major classes:the model-free methods and the model-based methods. The model-free techniques are deficient in several aspects compared to the model-based methods. Some model-free techniques need to use two microphones for both noise and speech recordings. This is usually not possible, especially in on-line enhancement applications (e.g. in hearing-aid applications). One major source of the problems associated with model-free methods is the unreasonable assumption for the noise being relatively stationary. The results of model-free speech enhancement methods are usually unsatisfactory when noise characteristics change relatively fast. Further, more model-free techniques (e.g. spectral subtraction method) may introduce the audible "musical"-like artifact acting as signal dependent interference. Model-based methods estimate the clean speech signals in time domain using the statistical characteristic or correlative characteristic of speech signals. It is desirable to avoid the musical-noise problem from the very beginning and some model-based methods perform reasonably well for nonstationary noise.
     By utilizing dynamic modeling for speech signal and stochastic signal processing approachs, the dissertation discusses several model-based methods, and aim to improve performance of speech enhancement. On the other hand, this thesis explores subjective and objective evaluation for speech The main contents of this thesis are as follows:
     1. A novel approach to incorporate the masking threshold with subband H∞filtering is proposed for single channel speech enhancement. No statistical assumptions have to be made on the driving process and the observation noise. Subband speech signals are obtained by subband decomposition. Then an iterative H∞filtering scheme is adopted for the estimation of low-order autoregressive (AR) parameters. The masking threshold to each of corresponding subband is introduced to estimate noise. It makes a further improvement over conventional H∞filtering and reduces speech distortion. Simulation results show that the proposed method not only reduces the computational complexity, but also achieves a better performance both in objective and subjective tests.
     2. While HMM can not explicitly model the different speech energy levels of a phone, typically due to differences in pronunciation and/or different vocalizations of individual speakers. This thesis proposes a unified solution to the aforementioned problems using a parameterization and modeling of speech gains that is incorporated in the HMM framework. Through the introduction of gain variables, energy variation in speech is modeled in a unified framework. Time-invariant parameters of the speech gain models are obtained offline using training data, together with the remainder of the HMM parameters. The time-varying parameters are estimated in an online fashion using the observed noisy speech signal. Speech signal is filtered with the fixed number of H∞filters. The estimated clean speech is obtained from the sun of the weighted filtered outputs. As the IMM (interacting multiple models) algorithm handles the interactions between the parallel filters in an efficient way, enhancement performance is improved without much increase in complexity. The results show that the enhanced method leads to a significant reduction of background noise and has less speech distortion than conventional algorithms.
     3. Considering the speech signals with color noises, a novel speech enhancement algorithm is proposed based on unscented particle filter (UPF) using a single microphone. It models speech signals and noises with time-varying autoregressive (TVAR) models. Unscented particle filter is applied to estimate the parameters of AR model and filter non-Gaussian noises. Instead of most popular choice of proposal distribution, unscented particle filter uses an unscented Kalman filter (UKF) to generate the importance proposal distribution which allows the particle filter to incorporate the latest observations into a prior updating routine so as to improve estimation performance greatly with fewer particles. Simulation results demonstrate that the proposed algorithm possesses good performance with the presence of color noises.
     4. We evaluate the performance of several objective measures in terms of predicting the quality of noisy speech enhanced by noise suppression algorithms. The objective measures considered a wide range of distortions introduced by three types of real-world noise by six classes of speech enhancement algorithms.The subjective quality ratings were obtained using the ITU-T P.835 methodology designed to evaluate the quality of enhanced speech along three dimensions:signal distortion, noise distortion, and overall quality. This paper reports the results of the evaluation of correlations of several objective measures with these three subjective rating scales. A new composite objective measure is proposed by combining the individual objective measures using multivariate adaptive regression analysis techniques. The composite objective measure correlates very well with the subjective quality.
     5. A novel approach to output-based speech quality evaluation based on the Non-uniform Linear Prediction Cepstrum (NLPC) and GMM-HMM is proposed. Firstly, the spectrum warping is achieved by using the Bark Bilinear Transform (BBT) on a uniform frequency grid to generate a grid that incorporates the non-uniform resolution properties of the human ear. To model warped spectrum by Linear Prediction, NLPC is computed. GMM-HMM trained on features extracted from clean speech signals are used to form a model of normative behavior. A measure of consistency between the degraded coefficient vector and the clean coefficient model is obtained. Finally, using a multivariate nonlinear regression model, an objective forecast model is constructed to accomplish the mapping from the subjective Mean Opinion Score (MOS) to the consistency measure. The simulation result indicates that the proposed output-based objective quality measure performs better than that of the ITU-T P.563 standard.
引文
[1]易克初,田斌,付强.语音信号处理.国防工业出版社;2003.
    [2]杨行峻,迟惠生等.语音信号数字处理.北京:电子工业出版社;1995.
    [3]赵力.语音信号处理.机械工业出版社;2003.
    [4]P.C. Hansen, S.H. Jensen. FIR filter representations of reduced-rank noise reduction. IEEE Transactions on Signal Processing,1998,46(6):1737-1741.
    [5]C.T. Guan, S.H. Leung, W.H. Lau. Multi-model approach for noisy speech recognition. IEE Electronics,1988,34(1)30-31.
    [6]王成友,汤叔棋.语音识别中多种信息综合利用的方法.声学学报,1997,22(2):111-115.
    [7]L. Rabiner, B.H. Juang. Fundamentals of speech recognition. Prentice Hall,1993.
    [8]J. S. Lim, A. V. Oppenheim. Enhancement and bandwidth compression of noisy speeeh. Proceedings of IEEE,1979,67(1):1586-1604.
    [9]A. V. Oppenheim, Ronald W. Sehafer. Digital signal processing. PrenticeHall Inc.,1975.
    [10]A. V. Oppenheim, editor. Applications of digital signal processing. PrenticeHall Inc.,1978.
    [11]R. H. Frazier, S. Samsam, L. D. Braida, A. V. Oppenheim. Enhancement of speech by adaptive filtering. ICASSP,1976, p.251-253.
    [12]M. R. Sambur. Adaptive filtering for enhancing the quality of noisy speech. ICASSP,1978, p. 610-613.
    [13]S.V. Vaseghi. Advanced signal proeessing and digital noise reduetion. John Wiley & Sons Ltd., 1996.
    [14]S. F. Boll. Suppression of acoustics noise in speech using spectral subtraction. IEEE Transations on Acoustics, Speech, and Signal Processing,1979,27(2):113-120.
    [15]B. H. Juang, L. R. Rabiner. Signal restoration by spectral mapping. ICASSP,1987, p.2368-2371.
    [16]Y. Ephxaim. Statistical-model-based speeeh enhancement systems. Proceedings of IEEE,1992, 80(10):1526-1555.
    [17]K. K. Paliwal, A. Basu. A speech enhancement method based on Kalman filtering. ICASSP,1987, p.177-180.
    [18]Ephraim Y, Malah D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing,1984, 32(60):1109-1121.
    [19]Yi Hu, Loizou P. C. Speech enhancement based on wavelet thresholding the multitaper pectrum. IEEE Transactions on Speech and Audio Processing,2004,12(1):59-67.
    [20]Rezayee A, Gazor S. An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech andAudio Processing,2001,9(2):87-95.
    [21]Nathalie Virag. Single channel speech enhancement based on masking properties of human auditory system. IEEE Transactions on Speech and Audio Processing,1999,7(2):126-137.
    [22]Y. Ephraim, H. L. V. Trees. A signal subspace approach for speech enhancement. IEEE Transaction Speech and Audio Processing,1995,3(7):251-266.
    [23]F. Jabloun, B. Champagne. A multi-microphone signal subspace approach for speech enhancement. ICASSP,2001:205-208.
    [24]Liew Ban Fah, Hussain A, Samad S A. Speech enhancement by noise cancellation using neural network.TENCON, Kuala Lumpur,2000.
    [25]Murakami T., Namba M., Hoya T. Speech enhancement based on a combined higher frequency regeneration technique and RBF networks. Proceedings.2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering Oct.2002, p.457-460.
    [26]裴文江,刘文波,于盛林.基于分形理论的混沌信号与噪声分离方法.南京航空航天大学学报,1997,29(5):483-487.
    [27]T. Sekiya, T. Kobayashi. Speech enhancement based on multiple directivity patterns using a microphone array. ICASSP,2004, p.877-880.
    [28]Ning ping Fan. Low distortion speech denoising using an adaptive parametric Wiener filter. IEEE International Conference on Acoustics, Speech and Signal Processing.2004, p.309-312.
    [29]Pei Ding, Zhigang Cao. Combining MMSE enhancementwith LA model adaptation for robust automatic speech recognition. Electronics Letters,2001,37(8):539-540.
    [30]Martin R. Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors.IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando,2002, p.504-512.
    [31]Chang Huai You, Soo Ngee Koh, Rahardja S. Adaptive β-order MMSE estimation for speech enhancement. IEEE International Conference on Acoustics, Speech, and Signal Processing, Singapore,2003, p.1520-6149.
    [32]刘海滨,吴镇扬,赵力等.非平稳环境下基于人耳听觉掩蔽特性的语音增强.信号处理,2003,8:303-307.
    [33]Dai Qijun, Chen Yanpu, Bian Zhengzhong. Optimizing speech enhancement based on noise masked probability.2002 6th International Conference on Signal Processing, Xi'an,2002.
    [34]张金杰,曹志刚,马正新.一种基于听觉掩蔽效应的语音增强方法.清华大学学报自然科学版,2001,41(7):34-37.
    [35]陶智,赵鹤鸣,龚呈卉.基于听觉掩蔽效应和Bark子波变换的语音增强.声学学报,2005,30(4):367-372.
    [36]Moakes P.A., Beet S.W. Radial basis function networks for noise reduction of speech. Artificial Neural Networks,1995 Fourth International Conference on June 1995, p.7-12.
    [37]Sheng-Nan Wu, Jeen-Shing Wang. An adaptive recurrent neuro-fuzzy filter for noisy speech enhancement. Neural Networks. Proceedings 2004 IEEE International Joint Conference, July 2004, p.3083-3088.
    [38]Potamitis I., Fakotakis N.D., Kokkinakis G. Impulsive noise suppression using neural networks.Acoustics, Speech, and Signal Processing. ICASSP'00. Proceedings 2000 IEEE International Conference on Volume 3, June 2000, p.1871-1874.
    [39]Volkmer M. Neural speech enhancement in the time-frequency domain.Neural Networks for Signal Processing. IEEE 13th Workshop, Sept 2003, p.617-626.
    [40]Y. Ephraim, H. L. Van-Trees. A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,1995,3(4):251-266.
    [41]V. K. Goyal, J. Zhang, M. Vetterli. Transform coding with backward adaptive update. IEEE Transactions on Information Theory,2000,46(4):1623-1633.
    [42]T. Murakami, T. Hoya, Y. Ishida. Speech enhancement by spectral subtraction based on subspace decomposition. IEICE Transactions on Fundamentals,2005,88(3):690-701.
    [43]J. Jensen, R. Heusdens, S. H. Jensen. A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids. IEEE Transactions on Speech and Audio Processing,2004, 12(2):121-132.
    [44]J. U. Kim, S. G. Kim, C. D. Yoo. The incorporation of masking threshold to subspace speech enhancement. In Proceedings of IEEE ICASSP'03,2003, p.76-79.
    [45]K. Hermus and P. Wambacq. Assessment of signal subspace based speech enhancement for noise robust speech recognition. In Proceedings of IEEE ICASSP'04,2004, p.945-949.
    [46]C. H. You, S., Rahardja, S. N. Noh. Audible noise reduction in eigendomain for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing,2007,15(6): 1753-1765.
    [47]Paliw al K. K, Basu A. A. Speech enhancement method based on Kalman filtering. In Proceedings of ICASSP,1987, p.177-180.
    [48]M.Gabrea. Speech signal recovery in colored noise using an adaptive Kalman filtering. In Proceedings of IEEE CCECE,2002, pp:974-979.
    [49]M.Gabrea. Robust adaptive Kalman filtering-based on speech enhancement algorithm. In Proceedings of IEEE ICASSP'04,2004, pp:301-304.
    [50]Rabiner L, Juang B H. Fundamentals of Speech Recognition. New York:Prentice-Hall Int. Ltd., 1993.
    [51]Rabiner L, Schafer R W.语音信号数字处理,朱雪龙等译.北京:科学出版社;1983.
    [52]韦岗,张丽清,李向武,欧阳景正.语音信号同伦非线性模型分析理论与算法.自动化学报,1997,23(4):201-206.
    [53]Kalaba R, Tesfatsion L. Solving nonlinear equations by adaptive homotopy continuation. Apply Math Comput.1991,41:99-115.
    [54]Zhang L iqing, Han Guoqiang. Optimal homotopy methods for solving nonlinear systems. Numer Math.1993,65:523-538.
    [55]Kiselman, B.A., Krylov, V.V. Comparative analysis of linear and nonlinear speech signals predictors. IEEE Transactions on Speech and Audio Proceessing.2005,13(6):1093-1097.
    [56]V. Krylov. Design of models of dynamical systems structure from input-output relationships (The theory of abstract realization, Part Ⅰ and Ⅱ). Acad. Sci., USSR, Control, no.2,3,1984.
    [57]V. Krylov and E. Hermanis. Signal Processing Systems Models Riga, Latvia,1981.
    [58]P. G. Gallman. An iterative method for the identification of nonlinear systems using a Uryson Model. IEEE Trans. Automat. Contr, vol. AC-20, no.6, Dec.1975, p.771-775.
    [59]Ephraim Y., Malah D., Juang B H. On the application of hidden Markov models for enhanceing noisy speech. IEEE Trans. Acoust Speech, and Signal Proc.,1989,37(12):1846-1856.
    [60]Ephraim Y. A Bayesian estimation approach for speech enhancement using hidden Markov model. IEEE Trans Signal Processing,1992,41(4):725-735.
    [61]Matrouf Driss, Gauvain Jean-Luc. Using AR HMM state-dependent filtering for speech enhancement. ICASSP,1999, p.785-788.
    [62]Lee K Y., Lee B G, Song Iickho, et al. Recursive speech enhancement using the EM algorithm with initial conditions trained by HMMs. ICASSP,1996, p.621-624.
    [63]Lee K Y., Shirai K. Recursive estimation for speech enhancement using the Hidden Filter Model. Tokgo:Proc. Acoust. Soc. Japan Spring Meeting,1995, p.63-64.
    [64]H. Sheikhzadeh, L. Deng, Waveform-based speech recognition using hidden filter models: parameter selection and sensitivityto power normalization, IEEE Trans. Speech Audio Process. 1994,2(10):80-89.
    [65]L. Deng, K. Hassanein, M. Elmasry. Analysis of correlation structure for a neural predictive model with application to speech recognition. Neural Networks.1994,7(2):331-339.
    [66]L. Deng, Z. Ma. Spontaneous speech recognition using a statistical coarticulatory model for the hidden vocal-tract-resonance dynamics. J. Acoust. Soc. Am.2000,108 (6):3036-3048.
    [67]K.Y. Lee, S. McLaughlin, K. Shirai. Speech enhancement based on neural predictive hidden Markov model. Signal Processing.1998,65:373-381.
    [68]Joohun Lee Changwoo Seo K. Y. Lee. A new nonlinear prediction model based on the recurrent neural predictive hidden Markov model for speech enhancement. ICASSP'02,2002, p. 1037-1040.
    [69]T. L. Burrows, M. Niranjan. The use of feed-forward and recurrent neural networks for system identification. Technical Report CUED/F-INSENG/TR.158, Cambridge University, England.
    [70]S. J. Lee et al. Application of fully recurrent neural network for speech recognition. Proc. IEEE ICASSP,1991, p.77-80.
    [71]M. Gabrea, E. Grivel, M. Najim. A single microphone Kalman filter-based noise canceller. IEEE Signal Process. Letter,1999,6(3):55-57.
    [72]X. Shen, L. Deng. A dynamic system approach to speech enhancement using the H∞ filtering algorithm. IEEE Trans. Speech Audio Process,1999,4(7):391-399.
    [73]David Labarre, Eric Grivel, Mohamed Najim, et al. Dual H∞ algorithms for signal processing-application to speech enhancement. IEEE Transactions on Signal Processing,2007, 55(11):5195-5208.
    [74]U. Shaked, Y. Theodor. H∞-optimal estimation:A tutorial. In Proc. IEEE Conf. Decision Control,1992, p.2278-2286.
    [75]徐晶晶,赵振纲.基于临界频带的子带滤波语音增强.无线电工程,2007,37(11):18-20.
    [76]马义德,邱秀清,陈昱莅等.改进的基于听觉掩蔽特性的语音增强.电子科技大学学报,2008,37(2):255-257.
    [77]卜凡亮,王为民,戴启军等.基于噪声被掩蔽概率的优化语音增强方法.电子与信息学报,2005,27(5):753-756.
    [78]Chang Huai You, Rahardja, S., Soo Ngee Koh. Audible noise reduction in eigendomain for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing,2007,15(6):1753-1765.
    [79]M. Deriche. AR parameter estimation from noisy data using the EM algorithm. In Proc. Int. Conf. Acoust. Speech Signal Process,1994,4(7):69-72.
    [80]R. Martin. Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process,2002, 1(1):253-256.
    [81]Y. Ephraim. Gain-adapted hidden Markov models for recognition of clean and noisy speech. IEEE Trans. Signal Process,1992,40(6):1303-1316.
    [82]H. Sameti, H. Sheikhzadeh, L. Deng, and R. L. Brennan. HMM-based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Trans. Speech Audio Process,1998, 6(5):445-455.
    [83]David Y. Zhao, W. Bastiaan Kleijn. HMM-Based Gain Modeling for Enhancement of Speech in Noise. IEEE Transactions on Audio, Speech, and Language Processing,2007,15(3):882-892.
    [84]Ki Yong Lee, Joohun Lee. Recognition of noisy speech by a nonstationary ARHMM with gain adaptation under unknown noise. IEEE Transactions on Audio, Speech, and Language Processing, 2001,7(9):741-746.
    [85]A. P. Dempster, N. M. Laird, D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B,1977,1(39):1-38.
    [86]L. E. Baum, T. Petrie, G. Soules, N. Weiss. A maximization technique in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist,1970, (41):164-171.
    [87]Shaked U, Theodor Y. H∞ -optimal estimation:a tutorial. IEEE International Conference on Decision and Control, Tucson, AZ, USA,1992, p.2278-2286.
    [88]H.A.P. Blom, Y. Bar-Shalom. The interacting multiple model algorithm for systems with Markovian switching coeffcients. IEEE Automat. Control,1988,33:780-783.
    [89]J. Kim, K.Y. Lee, C. Lee. On the applications of IMM algorithm for enhancing noisy speech. IEEE Trans. Speech Audio Process.2000,8:349-352.
    [90]Ki Yong Lee, Joohun Lee. A study on IMM with NPHMM and an application to speech enhancement. Signal Processing,2004,84:1701-1707.
    [91]Hu H.T., Yu C. Adaptive noise spectral estimation for spectral subtraction speech enhancement. Signal Processing, IET 2007,1(9):156-163.
    [92]王振力,张雄伟.基于分数阶谱相减的语音增强法.电子与信息学报,2007,29(5):1096-1100.
    [93]Chang Huai You, Soo Ngee Koh, Susanto Rahardja. Subband Kalman filtering incorporating masking properties for noisy speech signal. Speech Communication,2007,49(4):558-573.
    [94]Vermaak J., Andrieu C., Doucet A. Particle Methods for Bayesian Modeling and Enhancement of Speech Signals. IEEE Transactions on Speech and Audio Processing,2002,10(3):173-185.
    [95]金乃高,殷福亮等.基于子带粒子滤波的一种语音增强方法.通信学报,2006,27(4):23-28.
    [96]A. Doucet, S. J. Godsill, and M. West. Monte Carlo filtering and smoothing with application to time-varying spectral estimation. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Process,2000: 701-704.
    [97]Cappe O., Godsill S. J., Moulines E. An overview of existing methods and recent Advances in sequential Monte Carlo. Proceedings of the IEEE,2007,99(5):899-924.
    [98]Merwe V. D. R, Douce A, Freitas N d, et al. The Unscented Particle Filter. Cambridge University Engineering Department CUED/F-I NFENG/TR380. England:Cambridge University Press,2000: 1-45.
    [99]Julier S. J., Uhlmann J. K. A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Trans. A. C.,2000,45(3):477-482.
    [100]Julier S. J., Uhlmann J. K. Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 2004,92(3):401-422.
    [101]ITU-T P.835, Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm, ITU-T Recommendation P.835,2003.
    [102]ITU-T. Perceptual evaluation of speech quality. ITU-T P.862 Recommendation, Feb.2001.
    [103]ITU-T P.56. Objective measurement of active speech level. ITU-T Recommendation P.56,1993.
    [104]IEEE Subcommittee. IEEE recommended practice for speech quality measurements. IEEE Trans. Audio Electroacoust,1969, p.225-246.
    [105]Tsoukalas, D.E., Mourjopoulos, J.N., Kokkinakis, G.,1997. Speech enhancement based on audible noise suppression. IEEE Trans. Speech Audio Processing,1997,5:479-514.
    [106]Cohen, I. Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Process Letter,2004, 11(9):725-728.
    [107]Scalart, P., Filho, J.,1996. Speech enhancement based on a priori signal to noise estimation. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process,1996, p.629-632.
    [108]Gannot S., Burshtein D. Weinstein E. Iterative and sequential Kalman filter-based speech enhancement IEEE Trans. Speech Audio Process,1998,6(4):373-385.
    [109]IEEE Subcommittee. IEEE Recommended Practice for Speech Quality Measurements. IEEE Trans. Audio and Electroacoustics,1969,17(3):225-246.
    [110]N. Kitawaki, M.Honda, K.itoh, Speech Quality Assessment Methods for Speech-coding Systems. IEEE Communications Magazine,1984,22(10):26-33.
    [111]W.D.Voiers. Diagnostic Evaluation of Speech Intelligility. In M.E.Hawley, Benchmark Papers in Acoustic, Stroudsburg, Pa.Dowden, Hutchenson and Ross,1977.
    [112]ITU-T Recommendation P.830. Subjective Performance Assessment of Telephone-band and Wideband Digital Codecs,1996.
    [113]W. D. Voiers. Diagnostic aceptability measure for speech communication systems. Proc. IEEE ICASSP,1997, pp.204-207.
    [114]中华人民共和国电子工业部标准SJ2467-84.通信设备汉语清晰度测试方法,1984.
    [115]J. Hansen and B. Pellom. An effective quality evaluation protocol for speech enhancement algorithms. In Proc. Int. Conf. Spoken Lang. Process.,1998,7:2819-2822.
    [116]D. Klatt. Prediction of perceived phonetic distance from critical band spectra. In Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process,1982,7:1278-1281.
    [117]Chu P L, Messerchmitt D G. A weighted Itakura-saito spectral distance measure. IEEE Trans. on Acoust, Speech Signal Processing,1982,30(4):545-560.
    [118]S. Quackenbush, T. Barnwell, and Clements, Objective measures of speech quality, Englewood Cliffs, NJ:Prentice-Hall,1988.
    [119]L. Thorpe and W. Yang. Performance of current perceptual objective speech quality measures. In Proc. IEEE Speech Coding Workshop,1999:144-146.
    [120]Chen Guo, Hu Xiulin, Zhang Yunyu, Zhu Yao Ting. A modified Itakura speech distortion measure based on auditory properties. Applied Acoustics,2001,62:545-553.
    [121]ITU-T Recommendation P.862. Perceptual Evaluation of Speech Quality (PESQ), An Objective Method for End-to-end Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs. International Telecommunication Union,2001.
    [122]ITU-T Recommendation G.107.The E-model, a computational model for use in transmission Planning. International Telecommunication Union,2000.
    [123]A. W. Rix,J. G. Beerends, M. P. Hollier, etc. PESQ-the new ITU standard for end-to-end Speech quality assessment,109th AES Convention, Sep.2000.
    [124]A. W. Rix, M. P. Hollier, J. G. Beerends. Perceptual Evaluation of Speech Quality(PESQ), The new ITU standard for end-to-end speech quality assessment. Part Ⅰ-time Alignment. Journal of the Audio. Eng. Soc,2001.
    [125]A. w. Rix, M. P. Hollier, J. G. Beerends. Perceptual Evaluation of Speech Quality(PESQ), the new ITU standard for end-to-end speech quality assessment. Part Ⅱ-psychoacoustic model. Journal of the Audio. Eng. Soc,2001.
    [126]ITU-T Recommendation P.800.1. Mean Opinion Score (MOS) Terminology. International Telecommunication Union,2003.
    [127]杨震,毕厚杰.一种新的用于语音主观质量评价的谱失真参数.电子与信息学报,2001,23(7):669-676.
    [128]J. H. Friedman. Multivariate adaptive regression splines. Ann. Statist.,1-141.
    [129]C. Jin, R. Kubichek. Vector quantization techniques for output-based objective speech quality. In Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing,1996, p.491-494.
    [130]鄢田云,云霞,靳蕃.RBF神经网络及其在基于输出的客观音质评价中的应用.电子学报,2004,32(8):1282-1285.
    [131]D.-S. Kim. ANIQUE:An auditory model for single-ended speech quality estimation. IEEE Trans. Speech Audio Process,2005,13(5):821-831.
    [132]ITU-T P.563, Single Ended Method for Objective Speech Quality Assessment in Narrow-Band Telephony Applications, Int. Telecommun.Union, Geneva, Switzerland, May 2004.
    [133]J.O.Smith, J.S.Abel. Bark and ERB Bilnear Transform. IEEE Transactions on Speech and Audio Processing,1999,7(6):697-708.
    [134]A. Dempster, N. Lair, D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc.,1977,39:1-38.
    [135]T. H.Falk, Q.Xu, and W. Y.Chan. Non-intrusive GMM-based speech quality measurement. Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing,2005,1:125-128.
    [136]ITU-T Rec. P. Supplement 23, ITU-T Coded-Speech Database. Int. Telecommun. Union, Geneva, Switzerland, Feb.1998.
    [137]黄惠明,王瑛,赵思伟,张知易.语音系统客观音质评价研究.电子学报,2000,28(4):112-114.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700