基于DSP的混合激励线性预测语音编码算法及其实现

英文题名：The Mixed Excitation Linear Prediction Speech Coding Algorithm and Its Implementation Based on DSP
作者：王军
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：低速语音编码 ; 混合激励线性预测编码 ; 矢量量化 ; 基音周期估计
学位年度：2004
导师：赵继印
学科代码：081001
学位授予单位：吉林大学
论文提交日期：2004-04-01

摘要

引言
    在移动通信、卫星通信、军用通信系统中，语音编码技术在压缩语音信号的传输带宽、降低信道传输码率，进而提高信道利用率发挥着重要作用。近年来，语音编码技术取得了突飞猛进的发展，研究的焦点也随着信号处理和通信技术的发展集中在低码率和甚低码率编码算法的研究与实现上。
    传统的LPC声码器采用简单的二元激励模型，不能更好地模拟实际语音的特征，致使合成语音的质量以及鲁棒性较差；码激励线性预测（CELP）低速语音编码算法根据感知加权误差最小准则，从自适应码本以及固定码本中搜索最佳码矢量作为激励。它能在8~16kbps的速率上合成出质量较高的语音。当编码速率进一步降低时，由于没有足够的比特数来表示激励矢量，致使合成语音质量下降很快。近年来，国内外在开展4kb/s及其以下速率的语音编码研究方面，主要代表算法有AMBE、MELP、WI、STC等。这些算法都大大降低了传输码率而节省带宽。
    在目前的低码率语音编码研究中，混合激励线性预测编码（MELP）是一种比较好的方法，2.4kbps的MELP编码方法已经被确定为美国新的联邦语音编码标准。该算法结合了LPC、MBE算法的优点，能在较低的码率下得到好的再生语音。本文在对FTR 1024A 2.4Kbps MELP算法分析的基础上，对其核心算法进行了细致的研究和大量的实验，对基音周期检测、LSF系数的传递、矢量量化、语音合成等环节加以改进，提出了一种码率为1.8kbps左右的改进MELP低速语音编码算法。
    一、改进的MELP低速语音编码算法
    1．MELP模型的建立
    标准的MELP算法是基于传统LPC声码器的基础上，附加了五个特征参数，即：⑴混合激励，⑵非周期脉冲，⑶自适应谱增强，⑷脉冲散布，⑸付氏幅度模型。这些附加特征的引入很大程度上改善了原有LPC参数模型的激励源构造，也消除了LPC合成语音中有时出现的机械的或蜂鸣的音调噪


    声，允许MELP编码算法能够模拟自然语音的更多特征，从而使得MELP声码器在低比特率上能够产生高质量的语音，成为目前低速率语音编码中最有潜力的方法之一。
    与LPC10的简单清/浊音判决不同，MELP采用混合激励源：通过一组带通滤波器将语音信号分成五个子频带，对每个频带进行清浊音判别，在合成端将这五个子带信号相加得到混合激励，其主要功能是减少LPC声码器的蜂鸣声。
    当输入信号是浊音时，MELP编码器能用周期或者非周期脉冲来合成语音。非周期脉冲大多用在清/浊或浊/清转换的语音段中。其结果能够使解码端重生不定期的声门脉冲而不引入其它声调。
    自适应谱增强滤波器是一个零/极点滤波器，目的是为了使合成语音与自然语音在共振区有更好的波形匹配。
    脉冲散布利用一个固定的脉冲整形滤波器对合成语音进行后处理。它能让激励信号的能量散布于整个基音周期之内。这使合成语音在非共振区与原始语音有更好的波形匹配，有助于消除合成语音中的一些刺耳噪声。
    在编码部分，我们对LPC逆滤波得到的残差信号进行傅立叶变换，取其前10次谐波值，量化后传到解码端，用以合成周期脉冲，这样有助于提高合成语音的自然度，尤其在有男声和背景噪声时。
    2．语音分析
    输入的语音信号首先经过预处理，通过截止频率为60Hz的高通滤波器，目的是为了抑制50H电源干扰。然后利用本文提出的归一化基音检测算法提取基音周期。该算法用到了前一帧和后一帧的信号，以及长时平均基音周期，保证了相邻帧基音周期的连续性。采用线性内插进行分数基音的搜索，提高了基音周期估计精度。经典算法有时检测到的是实际基音周期的倍数，该算法采用倍数检测消除了估计的误差。大量的实验结果表明，该算法不仅具有基音平滑算法的准确性、可靠性，而且能在当前帧内实时地提取基音周期估计值。MELP编码是一种基于LPC的参数编码方法，与所有传统的基于LPC


    合成－分析方法相同，其参数是逐帧分析和传送的，这种做法的不足之处是考虑语音的形成过程中，声道响应特征变化较缓慢的特点，即相邻帧之间的相似性，本文归一化自相关函数来表示相邻帧LPC系数的相似性，当相似度大于某个阈值时，就可以不传送当前帧的LPC系数，而以前面帧的LPC系数来代替。实验表明，采用该方法，约有50%左右的语音帧的LPC系数可以采用替代的办法，从而可以大大减小编码的码率，而且不会对再生语音的质量带来多少影响。接下来分析确定子带清/浊音强度及非周期脉冲标志，用德宾算法推出LPC系数，计算残差信号的峰值更新子带清/浊音强度，接着计算增益并更新平均基音周期。将输入信号通过量化后的预测系数构成的线性预测滤波器，求得残差信号，求出残差信号的前十个基音周期谐波处的付氏幅度值。
    3．参数编解码
     经过语音分析，得到本算法的语音参数。在编码方案中的比特分配如表4-1。基音周期取对数后，用99阶的均匀量化器进行量化，这些数据采用查表的方法映射到7比特的码字上。用8比特对增益量化编码，其中采用5位的均匀量化器进行量化，然后，用3比特对进行量化编码。用4比特对子带清/浊音强度（Bpvc）量化编码。
    标准的MELP算法采用四级矢量量化，搜索路径为8，考虑到标准的MELP算法中，采用的码本容量太大，同时量化的码本矢量的第四级的补偿还比较大，本?
Introduction
    In mobile, satellite and military communications systems, the technology of speech coding plays an important role in increasing the availability of the channel by compressing transmission bandwidth and reducing transmission bit-rate of the speech signal. In recent years, the technology of speech coding advances rapidly. With the development of the signal processing and communication technology, the focus of speech coding research is centralized on the study and realization of low and very low bite rate speech coding algorithms.
    The traditional LPC vocoder is too simple for the speech signal model which partitions unvoice and voice in whole spectrum of the speech, so that the synthetical speech lacks the naturalness and robustness. The code-excited LPC algorithm（CELP）constructs an LPC excitation signal by the least rule of perceptual weighted error choose vectors from two codebooks: an “adaptive” codebook and a “stochastic” codebook, the algorithm can get highly synthesized speech quality, but the coding bit-rate continue to drop will result in the fast descend of the synthesized speech quality. In recent years, the representative speech coding algorithms have AMBE, MELP, WI, STC and so on in the study of equal to or less than 4kbps speech coding. These algorithms not only largely reduce the coding rate, but also economize bandwidth.
    The MELP is an good algorithm in current low bit rate speech coding. MELP coder has been adopted as the new US Federal Standard at 2.4kbps. the algorithm combine the merit of LPC and MBE algorithm. Several careful research and many experiments on the aspects of speech analysis, parameter code/encode and speech synthesis have been carried out. some new methods and ameliorations are employed in pitch detection, vector quantization and transmittion of LPC parameters. An improved 1.8 kbps MELP coding algorithm is proposed .
    一、An Improved MELP Low Bit Rate Speech Coding Algorithm
    1. MELP Model

    The MELP coder is based on the LPC model with additional features including mixed excitation, aperiodic pulses, adaptive spectral enhancement, pulse dispersion filtering, and Fourier magnitude modeling. These additional parameters largely amend the excitation structure of the LPC model, at the same time eliminate mechanical tone noise that come forth in LPC speech synthesize. these allow the MELP vocoder to simulate accurately natural speech.At this way MELP vocoder can synthesize the high quality speech. It has become one of the best potential low bit rate speech coding.
    Differing from LPC10 simple unvoice/voice distinguish, MELP vocoder adopt mixed excitation. Each frame is divided into five bands and U/V determination is made in every band. The five subband signals of the speech were summed up yielding the mixed excitation, it reduce the humming of the LPC vocoder.
     When the input signal is voiced, MELP encoder can synthesize the speech by the cycle or the aperiodic pulse. The aperiodic pulse is mostly used in U/V conversion speech. It can form the irregular glottis pulse without introducing other tones.
    The adaptive spectral enhancement filter is a zero/pole filter, it make the synthetical speech and the natural speech match on better wave forms in the resonance district.
    pulse dispersion filtering deal with the synthetical speech by a regular pulse shaping filter. It can make the excited signal energy scatter on the whole pitch. It make the synthetical speech and the natural speech match on better wave forms in the unresonance district, contribute to dispelling some ear-piercing noise.
    In code part, fourier transform is used in the residual signal through LPC inversely filtering, and adopt the first 10 harmonic factor, and passed to the decode after the quantization, used to synthesize the cycle pulse, contribute to improving the naturalness of the synthetical speech, especially in male voice and backgroud noise.
    2. Speech Analysis
    The input signal passes the pretreatment at first, through the high-pass filter of 60Hz, with the purpose of suppressing 50Hz p

引文

[1] 杨行峻, 迟惠生等. 语音信号数字处理.北京:电子工业出版社,1995
    [2] 易克初等. 语音信号处理.北京:国防工业出版社,2000年
    [3] M. A. 萨波日科父, B. Г. 米哈依洛父. 声码器通信. 王世福张志明译, 北京, 宇航出版社, 1988
    [4] 鲍长春. 低比特率数字语音编码基础. 北京:北京工业大学出版社, 2001
    [5] 孙圣和, 陆哲明. 矢量量化技术及应用. 北京:科学出版社, 2002
    [6] 胡征, 杨有为. 矢量量化原理与应用. 西安:西安电子科技大学出版社, 1998
    [7] 陈亮, 张雄伟等. 一种改善激励源的1.2kb/s语音编码算法及其实时实现. 解放军理工大学学报, 2002,3(4):5-9
    [8] 吕声, 王炳锡. 一种改进的混合线性预测的基音周期估计算法. 信号处理, 2001,17(1):56-59
    [9] 华国刚, 戴蓓倩等. 一种改进的MELP语音编码方法. 电路与系统学报, 2003,8(1):101-104
    [10] 胡剑凌, 徐盛. 2.4kb/s MELP算法设计. 上海交通大学学报, 2000,34(6): 789-792
    [11] Daniel W. Griffin. The Multi-band Excitation Vocoder. Ph. D Thesis of Massachusettes Institute of Technology, 1987
    [12] P. C. Meuse. A 2400 b/s Multi-Band Excitation Vocoder. Proc. IEEE ICASSP, 1991:9-12
    [13] F. C. A. Brooks, Lajos Hanzo. A Multiband Excited Waveform-Interpolated 2.35-kbps Speech Codec for Bandlimited Channels. IEEE Transaction Vehicular Technology, 2000,49(3):766-772
    [14] Eddie L. T. Choy. Waveform Interpolation Speech Coder at 4kb/s. M. S. Thesis, McCill University,1998
    [15] J. H. Chen, et al. A Low-Delay CELP Coder for the CCITT 16kb/s Speech Coding Standard. IEEE Journal on Selected Areas in Communication, 1992,10(5):830-849
    [16] M. Copperi and D. Sereno. CELP Coding for High-Quality Speech at 8 kb/s. Proc. IEEE ICASSP,1986:1685-1688

    [17] T. Tremain. The Government Standard Linear Predictive Coding Algorithm: LPC-10. Speech Technology Magazine,1982:40-49
    [18] A. McCree, K. Truong, E.B.Gerorge,T.P.Barnwell, and V. Viswanathan. A 2.4kb/s MELP Coder Candidate for the NEW U.S Federl Standard, Proceeding of IEEE ICASSP 1996:200-2003
    [19] A. V. McCree and. A New LPC Vocoder Model for Low Bit Rate Speech Coding. Ph. D. thesis, Georgia Institute Technology, Atlanta, 1992
    [20] Y. Medan, E.Yair, and D.Chazan. Super Resolution Pitch Determination of Speech Signals, IEEE Transactions on Signal Processing , Vol.39,No.1, January 1991:40-48
    [21] W. P. I. EBlanc, B. Bhattacharya, S.A.Mahmoud, and V.Cuperman. Efficient Search and Design Procedures for Robust Multi-Stage VQ of LPC Parameters for 4kbps Speech Coding, IEEE Transaction on Speech and Audio Processing, Vol.1,No.4,1993:373-385
    [22] L.Arslan, A.McCree, and V.Viswanathan. A New Methods for Adaptive Noise Suppression, Proceedings of IEEE ICASSP 1995,812-815
    [23] Takahiro Unno, Thomas P. Barnwell III, and Kwan Truong, An Inproved Mixed Excitation Linear Prediction (MELP) coder, Proceeding of IEEE ICASSP 1999:245-248
    [24] R. M. Gray. Vector Quantization. IEEE Transzction on ASSP,1984,1,4-29
    [25] K. K. Paliwal and B.S. Atal, “Efficient vector quantization of LPC parameters at 24bits/frame”, IEEE Trans.Speech Audio Proceeding, Vol 1,No 1,Jan.1993:3-14
    [26] F. K. Kang and L. J. Fransen. Application of Line spectrum pairs to low-bit-rate speech encoders, Pro.ICASSP, 1985: 244-247
    [27] Y. Linde, A. Buzo, and R.M. Gray. An algorithm for vector quantization design”, IEEE Trans. Commun, Vol. COM-28, Jan, 1980, 84-95
    [28] Miguel A. Ferrer-Ballester and Anibal R. Figueiras. Vidal, “Efficient Adaptive Vector Quantization of LPC Parameters, IEEE Trans. Speech Audio Processing, Vol.3,No.4, 1995:3-14
    [29] F. K. Soong and B. H. Juang. Line Spectrum Pai (LSP) and Speech Compression Proc. of ICASSP, 1984, 1(20):104~201.
    [30] T.taniguch et al. Pitch Sharpening for Perceptually Improved CELP and the Sparse-Delta Codebook for Reduced Computation. Proc. IEEE ICASSP,


    1991:241-244
    [31] 《TMS320C54X DSP Reference Set,Volume 1:CPU and Peripherals》1999.
    [32]《TMS320C54X DSP Reference Set,Volume 2:Mnemonic Instruction Set》1999.
    [33]《TMS320C54X DSP Reference Set,Volume 1:Algebraic Instruction Set》1999.
    [34]《TMS320C54X DSP Reference Set,Volume 1:Application Guide 》1999.
    [35]《TMS320C54X DSP Reference Set,Volume 1:Enhanced Peripherals》1999.
    [36] 李永明, 蒋天仪. 基于LSP的1.44kbps的语音编码算法. 微电子学与计算机,2001,3:37-45
    [37] 杨裕亮, 杨爽等. 线谱对矢量量化中的码本设计. 北京理工大学学报, 2000,20(6):725-728
    [38] 李永明, 陈弘毅等. 一种采用定点DSP实现的1.8kbps MBE-LPC 声码器. 半导体学报, 2000,21(8):803-809
    [39] 张雄伟. DSP芯片的原理与开发应用. 北京:电子工业出版社, 1997
    [40] 彭启琮. TMS320C54X 实用教程.四川:成都电子科技大学, 2000
    [41] 刘益成. TMS320C54X DSP应用程序设计与开发. 北京:北京航空航天大学出版社, 2002
    [42] 张雄伟, 陈亮等. DSP集成开发与应用实例. 北京:电子工业出版社, 2002
    [43] 汪安民. TMS320C54xx DSP实用技术. 北京：清华大学出版社, 2002
    [44] 赫伯特·希尔特. C语言大全. 王子恢, 戴健鹏译, 北京:电子工业出版社, 2001

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700