超低速率语音编码算法研究

英文题名：Research on Very Low Bit Rate Speech Coding Algorithm
作者：邹绘华
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：变速率编码 ; 混合激励 ; 基音提取 ; 频谱斜率 ; 线性插值
英文关键词：variable frame rate speech coding ; Mixed Excitation ; pitch extraction ; spectral slope ; linear interpolation
学位年度：2002
导师：李双田 ; 孙怡
学科代码：081001
学位授予单位：大连理工大学
论文提交日期：2002-03-10

摘要

语音编码技术在数字通信系统中起着重要的作用。在传输比特率限制十分严格的场合下，超低速率语音编码则具有特别重要的意义。
     作为低速率编码一种重要算法，美国联邦标准MELP算法在2.4Kb／s的速率下取得了不错的语音质量，但是仍然存在不少的问题，尤其是在非平稳语音段和编码效率方面。
     本文对混合激励(MELP)算法进行了深入研究，针对编码效率不高的问题，提出了匀速率帧间插值算法；在G.729B的VAD算法基础上提出了BD-VAD算法；本文调查研究了变速率语音编码的各种算法，并研究了本语音分析系统中语音信号各参数的帧间相关性之后，进一步压缩速率，提出了基于频谱斜率约束条件的帧间插值算法，其语音质量、运算复杂度与原算法接近。
     以此方案建立的语音编码／解码系统传输速率降到了300～800b／s。经重建语音信号比较及主观试听表明，该系统性能与美国联邦标准推荐的2.4kb／s混合激励线性预测(MELP)算法较接近或下降有限。
Speech Coding is of great importance in digital communication systems. At the situation where the transmission rate is limited strictly, Very Low Bit Rate Speech Coding (LBRSC) is especially significant.
    As an important algorithm of LBRSC, the Mixed Excitation Linear Prediction (MELP) algorithm which was choosen as U.S. Federal Standard has got quite good speech quality at the rate of 2.4Kb/s, but there are still some perceivable problems, particularly around non-stationary speech segments and in the aspect of coding efficiency.
    In this thesis, MELP algorithm is deeply studied. In order to higher the coding efficiency, the interpolation algorithm of invariable frame rate is presented; Based on the VAD algorithm in G.729B, the BD-VAD algorithm is promoted; After the investigation and analysis of the inter-frame parameter correlation, an interpolation algorithm based on the spectral slope constraint is presented to lower the coding rate, meanwhile, the speech quality and the operating complexity is similar to MELP algorithm.
    The transmission bit rate of this speech coding/decoding system is lowered to 300-800 bps. After comparing and subjectively evaluating the reconstructed speech, it is concluded that the performance of this system approaches to that of 2.4Kb/s MELP algorithm which is in the Federal Telecommunication Recommendation.

引文

[1] L.R.Rabiner, R.W. Schafer. "Digital Processing of Speech Signals". Prentice Hall, 1978.
    [2] M．A．萨波日科大，Β，Γ．米哈依洛夫，《声码器通信》，王世福张志明译，宇航出版社，北京，1988年。
    [3] 杨行峻迟惠生，《语音信号数字处理》，电子工业出版社，北京，1995年。
    [4] 陈永彬王仁华，《语言信号处理》，中国科学技术大学出版社，合肥，1990年。
    [5] 曹志刚钱亚生，《现代通信原理》，清华大学出版社，北京，1992年。
    [6] 傅祖芸，《信息论基础》，电子工业出版社，北京，1989年。
    [7] Thomas E. Tremain, "The Government Standard Linear Predictive Coding Algorithm: LPC-10", Speech Technology, April 1982, pp40～49.
    [8] M.R.Schroeder and B.S.Atal , "Code-Excited Linear Prediction (LPC) High Quality at Very Low Bit Rates", Proceedings of IEEE ICASSP, 1985, pp937～940.
    [9] A.McCree, K.Truong, E.B.Gerorge, T.P. Barnwell, and V. Viswanathan, "A 2.4kbits/s MELP Coder Candidate for the New U.S. Federal Standard," Proceeding of IEEE ICASSP 1996, pp.200～203.
    [10] W. B. Kleijn, "Encoding Speech Using Prototype Waveforms," IEEE Trans. on Speech and Audio Processing, vol.1, No.4, October 1993, pp. 386～399.
    [11] A.V. McCree and T.P. Barnwell Ⅲ, "A Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding", IEEE Transactions on Speech and Audio Processing, Vol.3, No.4, July 1995, pp.242～250.
    [12] Y. Medan,E. Yair, and D.Chazan, "Super Resolution Pitch Determination of Speech Signals," IEEE Transactions on Signal Processing, Vol.39, No.1, January 1991, pp.40～48.
    [13] P. Kroon and B. S. Artal, "On Improving tile Performance of Pitch Predictors in Speech Coding Systems," in Advances in Speech Coding, B.S. Atal et al. eds., Kluwer Academic Publishers, 1991, pp. 321～327.
    [14] W.P. LeBlanc, B.Bhattacharya, S.A.Mahmoud, and V. Cuperman, "Efficient Search and Design Procedures for Robust Multi-Stage VQ of LPC Parameters for 4Kbps Speech Coding," IEEE Transactions on Speech and Audio Processing, Vol.1, No. 4, October t993, pp.373～385.
    [15] L.Arslan,A.McCree,and V. Viswanathan, "New Methods for Adaptive Noise Suppression", Proceedings of IEEE ICASSP 1995, pp.812～815.


    [16] Takahiro Unno, Thomas P. Barnwell Ⅲ,and Kwan Truong ,"An Improved Mixed Excitation Linear Prediction(MELP) Coder," Proceeding of IEEE ICASSP 1999, pp.245～248.
    [17] 容观澳，《模式识别讲义》，清华大学，北京，1994年9月。
    [18] Jianping Pan, and Thomas R. Fischer, "Vector Quantization of Speech Line Spectrum and Reflection Coefficients," IEEE Transactions on Speech and Audio Processing, Vol.6, No.2, March 1998, pp. 106～115.
    [19] 潘胜昔，刘加，王作英，陆大琻，“基于MBE的帧间谱幅度相关算法”，《声学学报》，第23卷，第三期，1998年5月。
    [20] Costas S. Xydeas, and Charalampos Papanastasiou, "Split Matrix Quantization of LPC parameters," IEEE Transaction on Speech and Audio Processing, Vol.7, No.2,March 1999, pp.113～125.
    [21] Thomas Eriksson, Jan Linden,and Jan Skoglund, "Interframe LSF Quantization for Noisy Channels," IEEE Transactions on Speech and Audio Processing, Vol.7, No.5 September 1999,pp.495～509.
    [22] Roar Hagen, Erdal Paksoy, and Allen Gersho, "Voicing-Specific Lpc Quantization for Variable-Rate Speech Coding," IEEE Transaction on Speech and Audio Processing, Vol.7, No.5, September 1999, pp.485～494.
    [23] ITU-T Recommendation G. 729-Annex B, November 1996.
    [24] G.S. Kang, and D.C. Coulter. 600BPS Voice Digitizer. IEEE Conference on Speech Communication and Processing, 1976, pp. 91～94.
    [25] J.D.Markel A.H.Gray, Jr. "Linear Prediction of Speech", Springer_Verlarg Berlin Heidelberg, New York 1976.
    [26] 杨明，“低速率从多带激励语音编码算法的研究”，中国科学院声学所硕士学位论文，2000年。
    [27] 邱峰海，“低速率混合激励语音编码算法研究”，中国科学院声学所硕士学位论文，2001年。
    [28] Vishu R. Viswanathan, John Markhoul, Richard M. Schwartz, and A.W.F. Huggins. "Variable Frame Rate Transmission: A Review of Methodology and Application to Narrow-Band LPC Speech Coding".IEEE Trans. Communications, Vol. Com30, No. 4, April 1982, pp. 674～686.
    [29] Lawrence Rabiner, Biing-Hwang Juang. "Fundamentals of Speech Recognition", Prentie Hall International, Inc. June 1999.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700