带噪语音编码的若干问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着移动通信技术的迅速发展和语音通信范围的不断扩大,在噪声环境下进行语音通信已经成为经常要遇到的情况,语音信号不可避免的要受到周围背景噪声的影响。对于参数编码方式,语音参数提取的准确与否以及对参数的量化编码方式都会对语音通信质量产生很大影响,因此研究从带噪语音中提取基音周期、提取描述声道的线性预测系数、有效的参数量化编码方法以及语音编码的抑制噪声方法具有非常重要的研究价值和实际应用前景。
     基音周期是语音编码中的一个重要激励源的参数,从实用化角度出发,提出了一种基于AMDF和ACF的计算复杂度低的快速基音周期参数的估计方法,通过对语音信号的AMDF值进行自相关运算,能够提高基音周期估计的准确率,经过对这一帧语音信号的AMDF值进行了变换,使一次自相关中的乘法运算变为只有一次加法的运算,由于只包含加减法和取绝对值运算,计算复杂度低,所以该算法可以广泛应用于需要实时基音周期估计的场合。还给出了一种适合于硬件电路实现的快速基音周期估计方法,并在一个FPGA(芯片型号为SpartanⅡXC2S30vq100-6)芯片上实现了语音信号的基音周期实时估计系统。目前还很少有适合采用硬件电路直接实现的基音周期估计算法,当需要实时提取语音信号的基音周期时,最好能够使用硬件电路实现实时的基音周期估计。
     针对语音信号的信噪比SNR比较低时,带噪语音信号的基音周期难以估计准确的问题,提出了一种基于GCI和小波变换的基音周期检测方法。采用小波变换直接从语音信号中检测出声门闭合时刻GCI的信号锐变点来提取基音周期,并且通过前置低通滤波器降低了噪声和共振峰的影响,用一级小波变换便可以获得了比较高的检测精度和噪声鲁棒性,同时降低了基音周期估计的计算复杂度。
     针对直接从带噪语音中难以准确提取线性预测系数的问题,给出了一种基于谱减的带噪语音的线性预测系数提取方法。由于背景噪声的能量和频率成分都是随时间发生变化的,采用了具有动态跟踪性能的最小值统计跟踪方法进行噪声功率谱估计,通过谱减方法得到干净语音信号的功率谱估计,然后再提取线性预测系数。实验结果表明,使用谱减的方法提高了提取线性预测系数的准确率。
     量化编码是参数编码中的重要技术,论文对几种常用的线谱频率参数矢量量化编码的方法进行了比较深入的探讨和研究,给出了一种基于高斯混合模型GMM新的量化编码方法。该方法的特点是其计算量和存储大小不随量化比特数的多少而改变。由于GMM量化器可以描述出参数空间分布的多种信息,因此可以采用非线性量化的设计方式,既提高了量化精度又减少了计算量和存储量。
     对于噪声污染比较严重的情况,通常采用在信号前端进行语音增强,论文提出了一种基于声道慢变特性的基于Kalman滤波的语音增强算法。该算法根据人们在发声时,声道的形状变化比较缓慢,声道系数也具有缓慢变化的特点,先将线性预测系数转化为线谱频率参数,然后对相邻帧的线谱频率参数做一阶平滑,修正了状态转移矩阵,抑制了增强语音中的孤立残留噪声。与传统的卡尔曼滤波语音增强算法和维纳滤波语音增强算法相比,基于声道慢变特性的Kalman滤波的语音增强算法,增强后的语音在分段信噪比和PESQ的评测结果上,都得到了进一步的提高。当语音信号的信噪比比较低时,采用论文提出了一种基于声道慢变特性的Kalman滤波的语音增强算法,作为语音编码的前端处理部分,提高了语音编码质量。
     论文的研究工作得到了国家自然科学基金项目(No.60272039)、教育部—微软重点实验室开放基金项目(No.06 120806)的支持。
The mobile communication technology develops rapidly and the range of the speech communication is expended. The speech communication is often in the noise background and the speech signal will be corrupted. As to the parameter coding method, the speech parameters will greatly affect the quality of the speech coding. The study on extracting the pitch and the linear prediction coefficients in the noisy speech and the effective quantization coding methods and the noise reduction methods is very important for the research and applications.
     The pitch is the very important parameter of exciting source in the speech coding. A pitch detection algorithm based on AMDF AND ACF is proposed for the real-time applications. The computational expense of the algorithm is decreased. At first, AMDF values are computed by AMDF algorithm for a frame of speech signal. And then ACF values are computed by ACF algorithm for the AMDF values. In order to decreases computational expense and complexity, the AMDF values of the frame of speech signal are then transformed into one bit signals. The method can also decrease the effects of amplitude and formants the speech signal for pitch detection. The pitch period is calculated by ACF algorithm for the one bit signals. The multiplication operation for short-time autocorrelation function of the one bit signals is replaced by simple addition operation. A real-time pitch detector based on the field programmable logic arrays to meet the needs of the real-time pitch detection is proposed. The memories and gates and sequential circuits of Spartan II XC2S30 chip are used to implement these algorithms, which meets the needs of real-time pitch detector.
     The pitch of the noisy speech can not correctly be estimated when the SNR of the speech signal is low. A pitch detection method of noisy speech signals based on GCI and the discrete wavelet transform is proposed. The GCI position of the speech can be estimated by using the wavelet transform and then the pitch is calculated. The effects of the noisy signal and speech formants for pitch detection are decreased by the 3-order lowpass elliptic filter. The precision of pitch detection is increased and the algorithm decreases computational expense and complexity compared with the multi-scales wavelet transforms algorithm.
     It is difficult to extract the linear prediction coefficients from the noisy speech signal. A method of extracting the linear prediction coefficients from the noisy speech signal based on the spectral subtraction is proposed. The minimum statistics tracking method is used to evaluate the noise power spectrum because the energy and the frequency the noise are changed with the time. The speech signal power spectrum is extracted by using the spectrum subtraction and then the linear prediction coefficients are extracted. The experiments results show the method increases the corrective ratio of extracting the linear predictive coefficients.
     Quantization coding is very important for the parameter coding. The paper deeply studies the several normal methods of the vector quantization of the line spectrum frequency parameters. The method of the vector quantization based on Gaussian mixture models has computationally efficiency, low memory requirements, with its complexity independent on the rate of the system. The much information of the parameters spaces distribution can be described by the GMM quantizer. The computational expense and memory requirements are decreased and the quantization precise is increased by the nonlinear quantization method.
     The speech enhancement technology is used in the pre-processing section when the speech signal is seriously corrupted. A speech enhancement algorithm based on the spectral envelope and Kalman smoothing is proposed. According to the characteristics of the slow changes of the vocal tract parameters, the linear prediction coefficients are converted into the line spectrum frequency parameters and then these parameters of the current frame and previous frame are smoothed. The residual isolated noise is reduced. The quality of the enhanced speech is evaluated by means of segmental SNR and ITU-PESQ scores. Experimental results indicate that the proposed algorithm achieves obvious improvements compared with conventional Kalman smoother and Wiener filter algorithm.
引文
[1] Alan V. McCree, K. Truong, E. B. George, T. P. Barnwell, and V. Viswanathan, "A 2.4 kb/s MELP Colder Candidate for the New U.S. Federal Standard," Proceedings of IEEE ICASSP 1996, pp. 200-203.
    [2] 王炳锡,王洪,变速率语音编码,西安电子科技大学出版社,2004.
    [3] A.Czyzewski and R.Krolikowski, "Noise Reduction in Audio Signal Based on the Perceptual Coding Approach," IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, NY, Oct. 1999.
    [4] Martinez-Alfaro, Horatio, Contreras-Vidal, JoseL, A robust real-time pitch detector based on neural networks, Proceedings-ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, v 1, 1991, p 521-523
    [5] 李香春,杜利民,一种基于多尺度边缘特征提取的基音检测算法[J],电子学报,2003,31(10):p 1500~1502
    [6] Du Limin,Hou ZiQiang, Determination of the Instants of Glottal Closure from Speech Wave Using Wavelet Transfor, ICSP'96 Beijing: IEEE & PHEI, p 473~476, 1996
    [7] J.J.Dubnowski, R.Sehafer and L.R.Rabiner, "Real-time Digital Hardware Pitch Detector" , IEEE Trans. Acoust., Speech and Signal Processing, Vol. ACSSP-24, No. 1,1976
    [8] 王长富,戴蓓倩,李辉等,基于声门闭合时刻的语音基音周期的提取[J],中国科学技术大学学报,1998,28(3):362~367。
    [9] 胡剑凌,徐盛,陈键,基于谱特征的浊清音判决[J],数据采集与处理,2002,17(1):20~24。
    [10] 赵毅,牟同升,刘庆江等,TETRA语音编码中基音预处理算法的优化[J],电路与系统学报,2003,8(1):105~108。
    [11] Xiao-Dan Mei, Jengshyang Pan, Sheng-He Sun, Efficient algorithms for speech pitch estimation, Intelligent Multimedia, Video and Speech Processing, 2001, pp. 421-424, 2001
    [12] R. Martin, Spectral subtraction based on minimum statistics. Proc. Eur. Signal Process. 1182-1185. 1994.
    [13] R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9 (5), 504-512. 2001.
    [14] F. Norden, J. Samuelsson, P. Hedelin "Recursive LPC Spectrum Coding" Trans. Acoust., Speech, Signal Processing, June 2000, vol.3, pp. 1451-1454
    [I5] J. Samuelsson, P. Hedelin, "Recursive coding of spectrum parameters", IEEE Trans. Speech Audio Processing, 2001, vol.9, no.5, pp. 492-503
    [16] Parry, J.J. Burnett, I.S. Chicharo, J.F. Linguistic mapping in LSF space for low-bit rate coding. Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings., 1999 IEEE International Conference on, Volume: 2,15-19 Mar 1999, Page(s): 653 -656 vol.2
    [17] Dong-Ⅱ Chang; Young-Kwon Cho; Souguil Ann; Efficient quantization of LSF parameters using classified SVQ combined with conditional splitting Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, Volume: 1 , 9-12 May 1995, Page(s): 736-739 vol.1.
    [18] P. Hedelin and J.Skoglund, "Vector quantization based on Gaussian mixture models", IEEE Trans. Acoust., Speech, Signal Processing, July 2000, vol.8, pp. 385-401
    [19] J. Samuelsson "Waveform quantization of speech using Gaussian mixture models", Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 2004, vol. 1, pp. 165-168
    [20] Marie Ogery, Stephane Ragot, and Roch Lefebvrey "Wideband LPC spectrum envelope coding based on Gaussian mixture models and companded lattice VQ", 22nd Biennial Symposium on Communications, 2004.
    [21] J. K. Su, R.M. Mersereau, "Coding using Gaussian mixture and generalized Gaussian models" Proc. IEEE int. Conf. image processing, 1996, vol. 1 pp.217-220.
    [22] J. Samuelsson, J. H. Plasberg, "Multiple description coding based on Gaussian mixture models", IEEE signal processing letters, June 2005, vol. 12, no. 6.
    [23] J.Samuelsson "Toward optimal mixture model based vector quantization", Proceedings of the Fifth International Conference on Information, Communications and Signal Processing, December 2005.
    [24] A.D. Subramaniam, B.D. Rao, "Speech LSF quantization with rate independent complexity, bit scalability and learning," Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 2001. vol. 2, pp. 705-708
    [25] William R. Gardner and Bhaskar D. Rao, Theoretical Analysis of the High-Rate Vector Quantization of LPC Parameters, IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 3, NO. 5, SEPTEMBER 1995.
    [26] A. D. Subramaniam, B. D. Rao, "PDF Optimized Parametric Vector Quantization of Speech Line Spectral Frequencies", IEEE Trans. Speech Audio Processing, March.2003, vol. 11, no. 2, pp. 130-142
    [27] L.M.Supplee, R.Pcohn, J.S.Collura, A.V.McCree, "MELP: The New Federal Standard at 2400 bps", IEEE International Conference Acoustics, Speech and Signal Processing, Munich, Germany, 1997.
    [28] A.v.McCree, and T.P.Barnwell, "A mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding," IEEE Trans. On Speech and Audio Processing, voi.3, No.4, pp.242-250, July 1995.
    [29] A.v.McCree, K.Truong, E.B.George, T.P.Barnwell, and V.Viswanthan, "A 2.4kbit/s MELP Coder Candidate for the New US Federal Standard," Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Atlanta, GA, vol.1, pp.200-203, May 1995.
    [30] 王洪,唐凯,低速率语音编码,国防工业出版社,2006。
    [31] 王欣等,离散信号滤波,电子工业出版社,2002。
    [32] S. Kadambe and G.F. Boudreaux-Bartels, "Application of the Wavelet Transform for Pitch Detection of Speech Signals," IEEE Trans. on Information Theory, vol. 38, no. 2, 1992, pp. 917-924.
    [33] Ki-Seung Lee, Richard V. Cox, "A Low Bit Rate Speech Coder Based on a Recognition/Synthesis Paradigm, " IEEE Trans. On Speech and Audio Processing, vol. 9, no. 5, 2001, pp. 482-491.
    [34] Kyung Jin Byun, Sangbae Jeong, Hoi Rin Kim, and Minsoo Hahn, Noise Whitening-Based Pitch Detection for Speech Highly Corrupted by Colored Noise, ETRI Journal, Volume 25, Number 1, February 2003.
    [35] K Tanaka and Mr. Abe, " A new fundamental frequency modification algorithm with transformation of spectrum envelope according to FO", Proceedings of IEEE ICASSP, vol. 2, 1997.
    [36] Taoufik En-Najjary, Olivier Rosec and Thierry Chonavel, A new method for pitch prediction from spectral envelope and its application in voice conversion, EUROSPEECH, 2003.
    [37] F. K. Soong and B. H. Juang, "Optimal quantization of LSP parameters", IEEE Transactions on Speech and Audio Processing, pp. 15-19, Jan., 1993
    [38] 韩继庆等,语音信号处理,清华大学出版社,2004。
    [39] M. A Kohler, "A Comparison of the New 2400 BPS MELP Federal with Other Standard Code", ICASSP97, pp1587-1590.
    [40] Wei Ran Lin, Soo Ngee, Koh. Xiao Lin, "Wideband Speech Coding Using MELP Model", ISSPA'99, pp487-490.
    [41] Tie Manxia, Wang Dusheng, "A Novel Variable-Tate MELP Speech Coder", Proceeding of ICSP2000, pp693-696
    [42] A. v. McCree and J. j. De Martin, "A 1.7Kbps MELP Coder with Improved Analysis and Quantization", Proc. IEEE Inter. Conf. Acoustics, Speech and Signal Processing, pp593-596, 1998.
    [43] McCree A V, Barnwell Ⅲ T P. Implementation and evaluation of a 2400 bit/s mixed excitation LPC vocoder. In: Proc IEEE ICASSP'93. Minneapolis, 1993. Ⅱ159~Ⅱ162
    [44] McCree A V, Barnwell Ⅲ T P. A new mixed excitation LPC vocoder model for low bit rate speech coding. IEEE Trans on Speech and Audio Processing, July, 1995, 3(4): 242~250
    [45] Kleijn W B, Shoham Y, Sen D, Hagen R. A low-complexity waveform interpolation coder. In: Proc ICASSP. 1996. 212~215
    [46] Yang G, Leith H, Boite R. Voiced speech coding at very low bit rates based on forward-backward waveform prediction. IEEE Trans Speech Audio Process, 1995, 3(1): 40~47.
    [47] Kabal P. and Ramachadran P., "The Computation of Line Spectral Frequencies Using Chebyshew Polynomials", IEEE Trans. on Acoustics, Speech and Signal Proc., Vol. 34, No. 6. Dec. 1986, pp. 1419-1426.
    [48] Grassi, S., Dufaux, A., Ansorge, M., Pellandini, F. "Efficient Algorithm to Compute LSP Parameters from 10th-order LPC Coefficients", ICASSP, 1997, vo 1.3, pp 1707-1710.
    [49] Samad, S. A. ; Hussain, A. ; Low Kok Fah, Pitch detection of speech signals using the cross-correlation technique TENCON 2000. Proceedings, 2000, Vol. 1P: 283~286.
    [50] K. K. Paliveal and H. S. Atal, "Efficient vector-scalar Quantization of LPC Coefficients", Proc.Int.Conf. on ASSP. 1991.P:662~664.
    [51] John Grass ,"Methods of improving vector-scalar Quantization of LPC Coefficient", Coefficients", Proc. Int. Conf. on ASSP. 1991 .P:657~660.
    [52] Deepak Sridhara, Thomas Fuja, "Performance of the Federal Standard 2.4 kbps MELP Vocoder Over ATM Networks" , 2000 Conference on Information Sciences and Systems, Princeton University, March 15-17, 2000
    [53] 王炳锡.语音编码.西安:西安电子科技大学出版社,2002.7
    [54] 杨行骏等.语音信号数字处理.北京:电子工业出版社,1995.
    [55] 张雄伟,陈亮,杨吉斌.现代语音处理技术及应用.北京:机械工业出版社,2003.
    [56] Thomas F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice Hall PTR, 2004.
    [57] Doblinger, G., Computationaily efficient speech enhancement by spectral minima tracking in subbands. Proc. Eurospeech'95, Madrid, 1995(Vol. 2): 1513-1516.
    [58] N.Virag. Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing, 1999(Vol.7, No.2):126-137.
    [59] I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process. 11 (5), 466-475.2003.
    [60] S.Gannot, D.Burshtein, E. Weinstein, Iterative and sequential Kalman filter-based speech enhancement algorithms, IEEE Trans. Speech Audio Process. 6 (4) 373-385. 1998.
    [61] K. Paliwal and A. Basu, A Speech Enhancement Method Based on Kalman Filtering. Proceedings of IEEE Int. Conf. Acoustics. Speech, 1987.
    [62] S. Rangachari, P. Loizou, Y. Hu, A noise estimation algorithm with rapid adaptation for highly nonstationary environments. Proc. IEEE Internet. Conf. on Acoustics. Speech Signal Process. I, 305-308.2004.
    [63] N.Virag. Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing, Vol.7, No.2:126-137. 1999.
    [64] Mark Klein, Peter Kabal. Signal Subspace Speech Enhancement with Perceptual Post-Filtering. Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (Orlando, FL), pp. 1-537-I-540, May 2002
    [65] I .Cohen. Speech enhancement using a noncausal a priori SNR estimator. Signal Processing Letters, IEEE .Volume 11, Issue 9, Sept. 2004 Page(s):725-728
    [66] Y.Ephraim, D.Malah. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. Acoustics, Speech, and Signal Processing, IEEE Transactions on. Volume 33, Issue 2, Page(s):443-445, Apr 1985.
    [67] D.V.Anderson and M.A,Clements, "Audio Signal Noise Reduction Using Multi-Resolution Sinusoidal Modeling,", Proc. IEEE Conf. Acoustics, Speech, and Signal Processing, Vol. 2, pp 805-808, March 1999.
    [68] J.D.Gibson, B.Koo, and S.D.Gray, "Filter of colored noise for speech enhancement and coding," IEEE Trans. on Signal Processing, vol.39, pp. 1732-1742, 1991.
    [69] W.Jin M.S.Scordilis ,"Speech enhancement by Kalman filter with residual noise clipping, " Southeast Con. Proceedings. IEEE, April 2005
    [70] C.Li, S.V.Andersen, "Integrating Kalman filter and multi-pulse coding for speech enhancement with a non-stationary model of the speech signal" Proceedings of the 39th Asilomar Conference on Signals, Systems, and Computers, June 2004.
    [71] V.Grancharov, J.Samuelsson, and W.B.Kleijn, "On Causal Algorithms for Speech Enhancement," IEEE Trans. on Audio, Speech, and Language Processing, vol.14, pp.764-773, MAY 2006.
    [72] S.Haykin, Adaptive filter theory, Prentice Hall, 2001.
    [73] T.Kailath, A.Sayed, and B.Hassiby, Linear Estimation. Englewood Cliffs, N J: Prentice-Hall, 2000.
    [74] P.Kaminski, A.Bryson and S.Schmidt, "Discrete Square Root Filter: A Survey of Current Technique," IEEE Trans. on Automatic Control, December 1971.
    [75] S.Kamath, P.Loizou, "A Multi-band Spectral Subtraction Method for Enhancing Speech Corrupted by Colored Noise," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol.4, pp. IV/4164, May 2002.
    [76] H.P.Knagenhjelm, W.B.Kleijn, "Spectral dynamics is more important than spectral distortion," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, pp. 732-735, 1995.
    [77] Boll S. Suppression of acoustic noise in speech using spectral subtraction [J]. IEEE Trans. Acoustic Speech Signal Processing, 1979, 2:113-120.
    [78] JS Lira, AV Oppenheim. Enhancement and Bandwidth Compression of Noisy Speech [C], Proceedings of IEEE, 1979, 67:1586-1604.
    [79] W Jin MS Scordilis. Speech enhancement by Kalman filter with residual noise clipping [C], Proceedings of IEEE Southeast Con., Lauderdale, Florida: IEEE Press, 225-228, 2005
    [80] 蔡洪.线性预报误差时变方差补偿与语音增强[J].系统工程与电子技术,26(7):870-872.2004.
    [81] Greg Welch and Gary Bishop, An Introduction to the Kalman Filter. UNC-Chapel Hill, TR 95-041, April 5, 2004

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700