混合激励线性预测声码器算法的研究

作者：赖长庆
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：混合激励 ; 多级矢量量化 ; 非周期标志 ; 自适应谱增强 ; 脉冲散布 ; 多带激励 ; 傅立叶谱幅度
英文关键词：Muti-band Excitation Linear Prediction ; Jitter ; Adaptive Spectrum Enhancement ; Pulse Dispersion ; Super Resolution Pitch Detect Algorithm ; Muti-stage Vector Quantization
学位年度：2003
导师：刘亚康 ; 朱学勇
学科代码：081002
学位授予单位：电子科技大学

摘要

近年来，随着宽带通信技术的飞速发展，语音通信的带宽占用在有线通信领域似乎已不再成为问题了，但是在无线通信领域，带宽始终是一种宝贵的资源，尤其在军用和保密通信中，语音编码上取得的成果可以迅速带来抗干扰、保密性能和系统容量的提高。另外在语音存储领域，近年来随着各种便携数码录音装置的流行，对高合成语音质量的语音编码算法也提出了迫切的要求。这些需求正是语音编码的原动力所在。
    经典的线性预测(LPC)声码器具有很高的编码效率，可以极低的码率(800～2400bps)对语音信号进行编码，不幸的是它的合成语音听起来很不自然，常常夹杂着嗡嗡声，重击声或者音调噪声。
    混合激励(MELP)声码器是近年来提出的一种以经典LPC声码器为基础的性能优良的语音编码方案，对它的研究方兴未艾，现已取得了不少的成果，可以在1.2kbps的码率下取得MOS分为3.0左右的合成语音，并且具有比较强的抗背景噪声的性能。MELP声码器继承了经典LPC声码器编码效率高的特点，并加入了一些新的特征以模仿人的自然语音。MELP声码器采用混合脉冲和噪声激励解决了经典LPC的嗡嗡声的问题；引入了抖动浊音状态以克服音调噪声；利用参数插值、脉冲散布和自适应谱增强等措施提高合成语音的自然度和可懂度；此外还采用了多带激励，使其具有了比较强的抗背景噪声的性能。
    本文以美国联邦标准2.4kbps－MELP算法为基础，在MATLAB上建立起了分析MELP算法的软件平台，对其性能进行了分析并提出了一些改进的建议；另外还针对MELP算法的特点对其软硬件实现进行了探讨。
    本文的第二章介绍了MELP声码器模型的原理，对其特征进行了详细的阐述，重点分析了各个特征的本质及其能够对提高合成语音质量起到的作用。第三章详细介绍了MELP声码器的基本算法，对其中采用的一些先进的技术手段如多级矢量量化(MSVQ)、高分辨率基音检测方法(SRPDA)等进行了重点的讲述。另外还对MELP声码器中使用的一些技术进行了实验分析，检验其效能。第四章利用在MATLAB上搭建的分析平台上对语音信号进行了编解码的试验，分析了MELP声码器的各种特征在语音编码中起到的作用。最后针对MELP声码器的特点，对其软硬件实现提出了建议。
Recently, with the development of broadband communication, it seems that the band is not a serious problem any more. But in wireless communication field , band is always a kind of rare resource. Especially in military and secret communication, any improvement in speech coding may enhance the system's performance rapidly. Digital speech's store is also an important field that requires high quality speech coding algorithm because now all kinds of portable digital recorder is more and more popular. The demand is the power forcing speech coding to progress.
    Traditionally linear prediction(LPC) vocoders are very efficient, which can encode speech from 800 to 2400bps, but unfortunately, artifacts such as buzzes, thump, and tonal noise always exist in them.
    Mixed excitation linear prediction(MELP) vocoder is a kind of speech coding algorithm providing superior speech quality under very low rate even 1.2kbps, as well as its capability withstanding strong background noise. MELP vocoders base on LPC vocoders. Furthermore they add some new features to mimic the natural speech. MELP vocoders utilize mixed pulse and noise as the excitation to elimate the buzzes in traditional LPC vocoders, and add a jitter voicing state to overcome the tonal noise. Parameters' interpolation, adaptive spectrum enhancement and pulse dispersion also are adopted to improve the continuity. The synthetic speech of MELP vocoders sound much more natural and perceivable than the traditional vocoders'.
    Basing the American federal 2.4kbps MELP algorithm, the analysis platform was established for analyzing the and testing the performance of MELP codec. This article analyzes the capability of the MELP vocoders. Finally some advice are given to realize the vocoder in hardware or software.
    Chapter 2 introduces the theory of MELP vocodes, and expounds the features in detail. The essence and the function of these features is focused in this chapter. Chapter 3 introduces the basic algorithm of MELP vocoders, and some advanced skills such as muti-stage vector quantization and super resolution pitch detect algorithm. Chapter 4 utilizes the analysis platform to

    analyze the capability of MELP vocoders and the features' performance. Finally aiming at the character of MELP algorithm, some advices about realizing it are given.

引文

[1] 杨行峻、迟惠生等编著，“语音信号数字处理”，电子工业出版社，1995。
    [2] 陈永彬、王仁华，“语言信号处理”，中国科学技术大学出版社，1990。
    [3] N.S. Jayant, "Step-size Transmitting Differential Coders for Mobile Telephony," Bell Sys. Tech. J., Vol. 54, pp. 1557-1581, Nov. 1975.
    [4] N.S. Jayant, "Digital Coding of Speech Waveforms: PCM, DPCM, and DM Quantizers," Proc. IEEE, vol. 62 pp. 11-632, May 1974.
    [5] C.E. Shannon, A mathematical theory of communication, Bell Syst. Journal, vol. 27, pp. 379-423, 623-656, 1948.
    [6] CCITT Recommendation G.721,"32kb/s Adaptive Differential Pulse Code Modulation (ADPCM)," Blue Book, Vol. III, Fascicle III.3, Oct. 1988.
    [7] N. Benevuto et al, "The 32Kb/s coding standard," AT&T Technical Journal, Vol. 65(5),pp. 12-22, Sept.-Oct. 1986 Spanias 81 Speech Coding: A Tutorial Review
    [8] N.S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice-Hall, INC. Englewood Cliffs, NJ 1984.
    [9] H. Bellanger et al, "Digital Filtering by Polyphase Network: Application to Sample-Rate Alteration and Filter Banks," IEEE Trans. ASSP, pp. 252-259, 1976.
    [10] J. Derby and C. Galand, "Multirate Sub-band Coding Applied to Digital Speech Interpolation," IEEE Trans. ASSP-35(12), p. 1684, Dec. 1987.
    [11] D. Esteban and C. Galand, "16 kb/s Sub-band Coder Incorporating Variable Overhead Information," Proc. ICASSP-82, p. 1684, April 1982.
    [12] T.A. Ramstad, "Sub-band Coder with a simple adaptive bit allocation algorithm," Proc. ICASSP-82, p. 203, Paris, April 1982.
    [13] A. Satt and D. Malah, "Design of Uniform DFT filter Banks Optimized for Sub-band Coding," IEEE Trans. ASSP-37(11), p. 1672, Nov. 1989.
    [14] M.J.T. Smith and T.P. Barnwell, "Exact Reconstruction Techniques for Tree-Structured Sub-band Coders," IEEE Trans. ASSP-34(3), p. 434, June 1986.
    [15] B.N.S. Babu, "Performance of an FFT-Based Voice Coding System in Quiet and Noisy Environments," IEEE Trans. ASSP-31, No. 5, p. 1323, Oct. 1983.
    [16] CCITT Recommendation G.722,"7 KHz Audio Coding within 64 kbits/s," Blue Book, Vol. III, Fascicle III, Oct. 1988.
    [17] N.S. Jayant, V. Lawrence, D. Prezas, "Coding of Speech and Wideband Audio," AT&T Tech. J., Vol. 69(5), pp. 25-41, Sept.-Oct. 1990.
    [18] J. Josenhans et al, "Speech Processing Applications Standards," AT&T Technical Journal,Vol. 65(5), p. 23, Sept.-Oct. 1986
    R. McAulay and T. Quatieri, "Speech Analysis/Synthesis Based on a Sinusoidal

    [19] Representation," IEEE Trans. ASSP-34, No. 4, p. 744, Aug. 1986.
    [20] M.R. Portnoff, "Short-Time Fourier Analysis of Sampled Speech", IEEE Trans. ASSP-29, no.3, pp. 364-373, June 1981.
    [21] T. Quatieri et al, "Frequency sampling of the short-time Fourier-transform magnitude for signal reconstruction," J. Opt. Soc. Am., 73, p.1523, Nov. 1983.
    [22] R. Schafer and L. Rabiner, "Design and Simulation of a speech Analysis-Synthesis System Based on Short-Time Fourier Analysis," IEEE Trans. AU-21, No. 3, p. 165, June 1973.
    [23] R. McAulay and T. Quatieri, "Low-Rate Speech Coding Based on the Sinusoidal Model," in Advances in Speech Signal Processing, Ch. 6, pp. 165-207, Ed. S. Furui and M.M. Sondhi, Marcel Dekker Inc., New York 1992.
    [24] R. McAulay and T. Quatieri, "Computationally Efficient Sine-Wave Synthesis and its application to Sinusoidal Transform Coding," Proc. ICASSP-88, pp. 370-373, New York, March 1988. Spanias 89 Speech Coding: A Tutorial Review
    [25] D. Griffin and J. Lim, "Multiband Excitation Vocoder," IEEE Trans. ASSP-36, No. 8, p. 1223, Aug. 1988.
    [26] D. Griffin, "Multiband Excitation Vocoder," Ph.D. Dissertation, M.I.T, Cambridge, MA 1987.
    [27] J. Hardwick and J. Lim, "The Application of the IMBE Speech Coder to Mobile Communications," Proc. ICASSP-91, pp. 249-252, May. 1991.
    [28] J.N. Holmes, "Formant Synthesizer: Cascade or Parallel?," Proc. Speech Communication, North Holland, vol. 2(4), pp. 251-273, Dec. 1983.
    [29] P.E. Papamichalis, Practical Approaches to Speech Coding, Prentice Hall, Englewood Cliffs, N.J., 1987
    [30] L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice-Hall, 1975.
    [31] W.D. Voiers, "Diagnostic Acceptability Measure for Speech Communications Systems," Proc. ICASSP-77, p. 204, May 1977.
    [32] W.D. Voiers, "Evaluating Processed Speech using the Diagnostic Rhyme Test," Speech Techn., Jan./Feb. 1983
    [33] 王炳锡，语音编码，西安，西安电子科技大学出版社，2002。
    [34] G. Fant, "Acoustic Theory of Speech Production," Mounton and Co., Gravenhage, The Netherlands, 1960.
    [35] Federal Standard 1015, Telecommunications: Analog to Digital Conversion of Radio Voice By 2400 Bit/Second Linear Predictive Coding, National Communication System Office Technology and Standards, Nov. 1984.
    [36] A. V. McCree and T. P. Barnwell III, "A Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding," IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 4, July 1995, pp. 242-250.


    [37] A.V.McCree and T.P. Barnwell III, "A new mix excitation LPC vocoder," in Proc. IEEE Int. Conf. Acoust, Speech, Signal Processing, Toronto, 1991, pp. 593-596
    [38] M.Brandstein, J. Hardwick, and J. Lim, "The multiband excitation speech coder," in Advances in Speech Coding, Norwell, MA;Kluwer,1991,pp.203-204
    [39] A.V. McCree and T.P. Barnwell III, "Improving the performance of a mixed excitation LPC vocoder in acoustic noise," in Proc. IEEE, int. Conf. ASSP, San Francisco,1992,pp. II137-II140
    [40] W. Hess, Pitch Determination of Speech Signals, Vienna, Ny, Springer,1983
    [41] "Improvement of the excitation source in the narrowband linear prediction vocoder," IEEE Trans. ASSP-33, pp, 377-386, Apr, 1985
    [42] J.H. Chen and A.Gersho, "Real-time vector APC speech coding at 4800bps with adaptive postfiltering," in Proc. IEEE Int. Conf. ASSP, Dallas,1987,pp, 2185-2188
    [43] J. N. Holmes, "Formant excitation before and after glottal closure," in Proc. IEEE. Inc. Conf. ASSP, 1976,39-42
    [44] A.E. Rosenberg, "Effect of glottal pulse shape on the quality of natural vowels," J. Acoust, Soc, Amer., Vol, 49,pp. 583-590,1971.
    [45] N. Sugamura, F.Itakura.Speech Data Compression by LSP Speech Analysis Synthesis Technique.日文：电子通信学会论文志，1981，Vol. J64-A, No.8: 599~606.
    [46] Y. Medan, E. Yair, and D. Chazan, "Super Resolution Pitch Determination of Speech Signals," IEEE Transactions on Signal Processing, Vol. 39, No. 1, January 1991, pp. 40-48.
    [47] A. McCree, K. Truong, E. B. George, T. P. Barnwell, and V. Viswanathan, "A 2.4 kbits/s MELP Coder Candidate for the New U.S. Federal Standard," Proceedings of IEEE ICASSP 1996, pp. 200-203.
    [48] P. Kabal and R. P. Ramachandran, "The Computation of Line Spectral Frequencies Using Chebyshev Polynomials," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-34, No. 6, December 1986, pp. 1419-1426.
    [49] Federal Information Processing Standards Publication, "Analog to Digital Conversion of Voice by 2,400 Bit/second Mixed Excitation Linear Prediction(MELP)."
    [50] W. P. LeBlanc, B. Bhattacharya, S. A. Mahmoud, and V. Cuperman, "Efficient Search and Design Procedures for Robust Multi-Stage VQ of LPC Parameters for 4 kb/s Speech Coding," IEEE Trans-actions on Speech and Audio Processing, Vol. 1, No. 4, October 1993, pp. 373-385.
    [51] L. Arslan, A. McCree, and V. Viswanathan, "New Methods for Adaptive Noise Suppression," Proceedings of IEEE ICASSP 1995, pp. 812-815.


    [52] "Specification for the Analog to Digital Conversion of Voice by 2,400 Bit/Second Mixed Excitation Linear Prediction", Federal Information Processing Standards Publication, Jan. 1998.
    [53] A. McCree, J. C. De Martin, "A 1.7KB/S Melp Coder With Improved Analysis and Quantization", ICASSP 1998, pp. 593-596.
    [54] A.E. Ertan, E.B. Aksu, "IMPLEMENTATION OF AN ENHANCED FIXED POINT VARIABLE BIT-RATE MELP VOCODER ON TMS320C549" TUBITAK-BILTEN / Speech Processing Laboratory Middle East Technical University Electrical and Electronics Engineering Department, D Block, 06531 Ankara - TURKEY

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700