MELP语音编码器和AC-3音频解码器的研究及优化实现

作者：楼佳佳
论文级别：硕士
学科专业名称：信息与通信工程
中文关键词：MELP ; BF561 ; 优化 ; AC-3 ; 定点实现 ; ARM
英文关键词：MELP ; BF561 ; optimization ; AC-3 ; fixed-point description ; ARM
学位年度：2007
导师：刘云海
学科代码：081001
学位授予单位：浙江大学
论文提交日期：2007-08-01

摘要

随着信息化社会的发展，对声音的压缩传递技术已经成为当今社会发展水平的一个重要标志，不论是用于人类通信的语音编解码技术，还是用于电视广播的音频编解码技术，都已成为当代社会生活不可或缺的重要组成部分，在多媒体、网络通信和保密通信等领域中发挥着各自的重要作用。
     混合激励线性预测(MELP)声码器由于其低码率、低时延、低复杂度等优点被广泛应用于商业、军事等方面的语音通信中，对其进行优化实现十分有现实意义。BF561是ADI公司新一代高性能的DSP，采用双核结构，内核时钟频率可达600MHz，且外围接口丰富，是多媒体应用的理想平台之一。本文将MELP编码器在BF561上进行实现及优化。
     AC-3音频标准也早在1995年就成为美国高清晰度电视标准组成部分之一，在电影、电视、广播等领域得到了广泛的应用，是大部分数字电视片源采用的声音压缩标准，在我国数字电视普及率越来越高的情况下，将此标准的解码算法进行实现和优化也有重大的现实意义。DM6446是TI公司新推出的集ARM与DSP于一体的双核DSP，非常适合多媒体应用，本文将AC-3解码器在DM6446的ARM上进行定点优化实现。
     论文首先对语音编码及音频编码的现状、发展以及编解码器的嵌入式应用现状进行了简要概述；其次介绍了语音编码基本原理，并依照MELP编码流程对各算法模块进行阐述；然后简要介绍了MELP编码系统及开发平台，包括BF561的结构特点、评估板的系统配置、软件平台Visual DSP等，并介绍了编码器输入输出接口的实现和系统运行时间的设置；接着阐述了在BF561平台上所使用的各种优化方法，主要介绍了MELP编码算法基于DSP的C代码优化、汇编代码优化及内存分配等，同时选取了几个例子加以说明；跟着详细介绍了AC-3音频解码算法、实现平台、系统框架和输入输出接口实现方法，将AC-3解码器浮点实现后进行定点化改造和存储空间优化，使其最终在DM6446的ARM上得以实现；最后对本文的工作进行了总结，并提出了今后的努力方向。
With the development of informational society, the compression andtransmission technology of sound has already become important representation of thesocial development level. Both the speech coding/decoding technology, which is usedin human communication, and the audio coding/decoding technology, which is usedin TV and broadcast transmission, are important parts of the modem social life. Inadditional, these coding/decoding technologies play a very significant role in themultimedia, network communication and security communication.
     MELP(Mixed Excitation Linear Prediction) vocoder is widely applied incommercial and military speech communication because of its low-bit-rate, low-delayand low-complexity. Therefore, realizing the MELP speech coder is very useful andhas important meaning. BF561 is the new generation DSP produced by ADICorporation, which has two cores to deal with more instructions, and each core canwork at 600MHz. BF561 also has lots of peripheral interfaces. So, BF561 is an idealplatform for multimedia applications. In this thesis, we put the MELP speech coderonto this platform. Then we implement and optimize it on the BF561.
     The AC-3 audio standard became one part of the American HDTV standard in1995, and is widely applied in many fields, such as TV, film, broadcast etc. In China,digital TV is stepping in people's normal life, in this situation, realizing the AC-3audio decoder also has significant practical meaning. DM6446 is newly proposeddual-core DSP by TI Corporation, which possesses both ARM and DSP properties.DM6446 is very suitable for multimedia applications. We apply AC-3 decodingalgorithm on the ARM of DM6446 in order to perform fixed-point description.
     The thesis at first surveys the development and the current progress of thespeech coding and current audio formats. Secondly, we introduce the principle ofspeech coder, and present the important model of MELP algorithm. Thirdly, wesimply introduce MELP codec system and its platform, including the structurecharacter of the BF561, audio sampling and sending on evaluating board, softwareplatform Visual DSP, and so on. Fourthly, we introduce methods to optimize programperformance and use them in MELP speech coder. Fifthly, we introduce the flow ofAC-3 decoding algorithm, realizing platform, system framework, I/O interface settingand its fixed-point description realization on the ARM of DM6446. Then we usefixed-point description to modify the program, and optimize the memory. Finally, wepresent the conclusion and tasks in the future.

引文

[1] A.S. Spanias, "Speech Coding: A Tutorial Review", Proceeding of the IEEE, Oct 1994, pp: 1541-1582.
    [2] 傅祖芸，《信息论基础》，北京：电子工业出版社，1989年．
    [3] 王炳锡著，《语音编码》，西安：西安电子科技大学出版社，2002。
    [4] 陈显治，《现代通信技术》，北京：电子工业出版社，2001。
    [5] M.E. Perkins; K. Evans; D. Pascal; L.A. Thorpe, "Characterizing the Subjective Performance of the ITU-T 8kb/s Speech Coding Algorithm—ITU-T G.729", Communications Magazine, IEEE Sept. 1997, pp: 74-81.
    [6] L.M. Supplee; R.P. Coho; J.S. Collura; A.V. Macree, "MELP: The New Federal Standard at 2400bps", IEEE International Conference, Apr. 1997, pp: 1591-1594
    [7] T. Unno; Y.P. Barnwell; Kwan Truong, "An Improved Mixed Excitation Linear Prediction (MELP) Coder", IEEE International Conference, Mar. 1999, pp: 245-248.
    [8] Kroon P, Deprettere E F, Slayter R J, "Regular pluse excitation-A novel approach to effective and efficient multipulse coding of speech", IEEE Trans on ASSP, 1986, 34(5): 1054～1063.
    [9] M.R. Schroeder; B.S. Atal, "Code-Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates", IEEE International Conference, Mar. 1985, pp: 937-940.
    [10] ITU-T Recommendation G729, "Coding of Speech at 8kbit/s Using Conjugate-Structure Algebraic Code Excited Linear Prediction (CS-ACELP)", March 1996.
    [11] I.A. Gerson; M.A. Jasiuk, "Vector Sum Excited Linear Prediction (VSELP) Speech Coding at 8kbps", IEEE International Conference, Apr. 1990, pp: 461-464.
    [12] CCITT Recommendation G.728, "Coding of Speech at 16kbit/s Using Low-Delay Code Exited Linear Prediction", September 1992.
    [13] J.H. Park; Young Min Kim, "Analysis and Optimization of Speech Coder Algorithm for CDMA Digital Cellular", Singapore ICCS'94, Conference Proceedings, Nov. 1994, pp: 870-874.
    [14] B. Bessette; R. Salami; R. Lefebvre; M. Jelinek; J. Rotola-Pukkila; J. Vainio, "The Adaptive Wideband Speech Codec(AMR-WB)", IEEE Transactions, Nov. 2002, pp: 620-636.
    [15] Kwon S Y, Goldberg A J, "An enhanced LPC vocoder with no voiced/unvoiced switch", IEEE Trans Acoust Speech, Signal Processing, Apr, 1985,ASSP-33:377～386.
    [16] McCree A V, Bamwell T P, "A new mixed excitation LPC vocoder", In: Proc IEEE ICASSP91, Toronto, 1991, 593～596.
    [17] McCree A V, Bamwell T P, "Implementation and evaluation of a 2400 biffs mixed exciation LPC vocoder", In:Proc IEEE ICASSP93,1993,159～162.
    [18] McCree A.V, Bamwell T P, "A new mixed excitation LPC vocoder model for low bit rate speech coding", IEEE Trans on Speech and Audio Processing, July, 1995, 3(4): 242～250.
    [19] ATSC A/52B, "Digital Audio Compression Standard (AC-3, E-AC-3) Revision B", United Advanced Television Systems Committee, June, 2005.
    [20] 王洪，唐凯，《低速率语音编码》，北京：国防工业出版社，2006年．
    [21] 杨俊，蔡萱平，“数字音频技术及其应用与发展(一)”，电声技术，2001年第5期，
    [22] 杨俊，蔡萱平，“数字音频技术及其应用与发展(二)”，电声技术，2001年第6期．
    [23] 张瑾，基于DVB标准的MPEG音频解码器设计研究与实现，浙江大学硕士论文，2006．
    [24] 张季，面相数字电视心愿解码芯片的音频算法研究与实现，浙江大学硕士论文，2005．
    [25] 王念旭，《DSP基础与应用系统设计》，北京：北京航空航天大学出版社，2001．
    [26] 张雄伟，《DSP芯片的原理与开发应用》，北京：电子工业出版社，2003．
    [27] 陈峰，《Blackfin系列DSP原理与系统设计》，北京：电子工业出版社，2004．
    [28] 苏涛，卢光跃，张林让，《DSP使用技术》，西安：电子科技大学出版社，2002．
    [29] 华国光，2kbits以下低码率混合激励线性预测语音编码的研究，中国科学技术大学，2001．
    [30] G. Fant, "Acoustic Theory of Speech Production", Mouton & Co.,The Hague: Niedierlande, Paris, 2.edition, 1970.
    [31] J.Makhoul, "Linear Prediction: A Tutorial Review", Proc, IEEE, 1975, 63: 561～580.
    [32] J.Markel, A.Gray, "Linear Prediction of Speech", Springer Verlag, Berlin, Heidelberg, 1976.
    [33] A.V.Oppenheim, R.W.Schafer, "Digital Signal Processing" , Prentice Hall, 1975.
    [34] L. Rabiner; M. Cheng; A. Rosenberg; C. McGonegal, "A Comparative Performance Study of Several Pitch Detection Algorithms", IEEE Transactions, Oct. 1976, pp: 399-418.
    [35] J.A. Marks, "Real Time Speech Classification and Pitch Detection", COMSIG 88 Southern African Conference, Jun. 1988, pp: 1-6.
    [36] A.V.McCree, T.P.Barwell, "Improving the performance of a mixed excitation LPC vocoder in acoustic noise", ICASSP'92, 1992, pp: 159-162.
    [37] A.V.McCree, K.Truong, E.B.Georg, T.P.Barwell, "A 2.4Kbits MELP Coder Candidate for the New US Federal Standard", Proessing of IEEE ICASSP'99 1999, pp: 245-248.
    [38] H. Ohmuro; T. Moriya; S. Miki, "Coding of LSP Parameters Using Interframe Moving Average Prediction and Multi-Stage Vector Quantization", Proc. IEEE, Oct. 1993, pp: 63-64.
    [39] S. Arva; N. Phamo; D. Mount, "Fast Search Algorithms with Applications to Split and Multi-Stage Vector Quantization of Speech LSP Parameters", Proc. IEEE, Oct. 1993, pp: 65-66.
    [40] F.Itakura, "Line Spectrum Representation of Linear Predective Coefficients of Speech Signals", J.Acoust, Soc. Am., 57, S35(a),535,1975.
    [41] N.Sugamura, F.Itakura, "Speech Data Compression by LSP Speech Analysis Synthesis Technique"，电子通信学会论文册，1981，Vol．J64-A，No．8：599-606．
    [42] Analog Devices, Inc, "ADSP Blackfin Processor Hardware Reference", http://www.analog.com, 2003.
    [43] Analog Devices, Inc, "Visual DSP++ 4.0 C/C++ Compiler and Library Manual for Blackfin Processors", http://www.analog.com,2004.
    [44] Analog Devices, Inc, "Visual DSP++ 4.0 Linker and Utilities Manual for Blackfin DSPs", http://www.analog.com,2004.
    [45] Analog Devices, Inc, "Visual DSP++ 4.0 User's Guide for Blackfin Processors", http://www.analog.com,2004.
    [46] Analog Devices, Inc, "Visual DSP++ 4.0 Getting Started Guide for Blackfin Processors", http://www.analog.com,2004.
    [47] Analog Devices, Inc, "Blackfin Embedded Symmetric Multiprocessor ADSP-BF561",http://www.analog.com, 2007.
    [48] Analog Devices, Inc, "Bf53x-BF56x Processor Programming Reference", http://www.analog.com, 2007.
    [49] Analog Devices, Inc, "BF561 Processor Hardware Reference", http://www.analog.com, 2007.
    [50] 汪燮彬，多媒体处理库(MML)在BF53x上的优化研究，浙江大学硕士论文，2006．
    [51] 陈辉，MELP语音编码的研究及其在DSP上的优化，浙江大学硕士论文，2006．
    [52] TI, "TI Digital Media Software—AC3 Decoder", http://www.ti.com, 2007.
    [53] 郝软层，徐金甫，“基于DSP芯片的MELP声码器的算法实现”，微计算机信息，2006年第10期．
    [54] Texas Instruments Technology, "DSP Selection Guide", http://www.ti.com, 2007.
    [55] Analog Devices, Inc, "Blackfin Processor: High Performance, Low Power Embedded Processing", http://www.analog.com, 2007.
    [56] Salor.O. etc, "An Efficient Algorithm for Pitch Determination of Speech Signals-Kalman Filter Approach", Signal Processing and Communications Applications, IEEE 14th, 2006.
    [57] Wai C. Chu, "Embedded quantization of line spectral frequencies using a multistage tree-structured vector quantizer", IEEE Transactions, Volume 14, July 2006 Page(s): 1205-1217.
    [58] Berisha V., Spanias A., "Enhancing vocoder performance for music signals", Circuits and Systems, IEEE International Symposium, May 2005, Vol. 4, Page(s): 4050-4053.
    [59] Daniel E.J., Teague K. A., "Federal standard 2.4 kbps MELP over IP", Circuits and Systems, 2000. Proceedings of the 43rd IEEE Midwest Symposium, Vol 2, Aug. 2000 Page(s):568-571.
    [60] Ding Qi, Xu Wang, Xu Jinfu, "A Variable Low Bit Rate Speech Coder Based on Melp for Wireless Communications", TENCON 2005, 2005 IEEE Region 10, Nov. 2005 Page(s): 1-4.
    [61] Gibson J.D., "Speech coding methods, standards, and applications", Circuits and Systems Magazine, IEEE, Volume 5, Issue 4, 2005 Page(s):30-49.
    [62] Ming Yang, "Low bit rate speech coding", Potentials, IEEE, Volume 23, Issue 4, Oct-Nov 2004, Page(s):32-36.
    [63] Guerchi D., Louzi A., "Multi-Track Codebook in Low-Rate Celp Coding", Industrial Electronics, 2006 IEEE International Symposium, Volume 1, July 2006, Page (s):671-675.
    [64] Yue Li.etc, "A high quality 4 kb/s multiple pulse-dispersion ACELP speech coder", Communications, Circuits and Systems, Intemational Conference, Vol 2, 2004, Page(s):923-926.
    [65] Teague K.A., Andrews W.D., "Enhanced spectral modeling for MBE speech coders", Conference Record of the 31th Asilomar Conference, Volume 2, Nov. 1997, Page(s): 1071-1074.
    [66] Cheung-Fat Chan, Wai-Kwong Hui, "Wideband enhancement of narrowband coded speech using MBE re-synthesis", 3rd Intemational Conference, Volume 1, Oct. 1996, Page(s):667-670.
    [67] Tian Wang, Kun Tang, Chongxi Feng, "A high quality MBE-LPC-FE speech coder at 2.4 kbps and 1.2 kbps", Acoustics, Speech, and Signal Processing, IEEE International Conference, Volume 1, May 1996, Page(s):208-211.
    [68] Nascimento F.A.R., Fraga F.J., "New methods for improvement of sinusoidal transform vocoders", Multimedia and Expo, 2004 IEEE International Conference, Volume 2, June 2004, Page(s):1159-1162.
    [69] Chang W. W., Wang D. Y., "Quality enhancement of sinusoidal transform vocoders", Vision, Image and Signal Processing, Volume 145, Issue 6, Dec. 1998, Page(s):379-383.
    [70] Agarwal N, "Computationally efficient ABS scheme for sinusoidal transform coding of speech", Proceedings of the 3rd IEEE International Symposium, Dec. 2003, Page(s):737 - 740.
    [71] Ertan A.E., Barnwell T.P. III, "Improving the 2.4 Kb/s Military Standard MELP (MS-MELP) Coder Using Pitch-Synchronous Analysis and Synthesis Techniques", Acoustics, Speech, and Signal Processing, IEEE International Conference, Volume 1, March 2005, Page(s):761-764.
    [72} Rahikka D.J., Collura J.S., Fuja T.E., Fazel T., "US Federal Standard MELP vocoder tactical performance enhancement via MAP error correction", Military Communications Conference Proceedings, 1999, IEEE, Volume 2, Nov. 1999, Page(s):1458-1462.
    [73] Kohata M., Suzuki M., Makino S., "A New Segment Quantizer for Line Spectral Frequencies Using Lempel-Ziv Algorithm", IEEE International Conference, Vol 1, 2005, Page(s): 133-136.
    [74] Kohler M.A., "A comparison of the new 2400 bps MELP Federal Standard with other standard coders", Acoustics, Speech, and Signal Processing, 1997 IEEE International Conference, Volume 2, April 1997 Page(s):1587-1590.
    [75] Jing Li, Changchun Bao, "Quantization of SEW and REW magnitude for 2 kb/s waveform interpolation speech coding", Chinese Spoken Language Processing, 2004 International Symposium, Dec. 2004 Page(s): 141-144.
    [76] Jing Wang, Jingming Kuang, Shenghui Zhao, "A Low Bit Rate Scalable CWI Coder based on Wavelet Transform", Communication Technology, International Conference, Nov. 2006, Page(s):1-4.
    [77] Li Jing, Bao Changchun, "A 2 kb/s enhanced waveform interpolation speech coder", Signal Processing, 2004 7th International Conference, Volume 1, Sept. 2004, Page (s):598-601.
    [78] Griffin D.W., Lim J.S., "Multiband excitation vocoder", Acoustics, Speech and Signal Processing, IEEE Transactions, Volume 36, Issue 8, Aug. 1988, Page (s):1223-1235.
    [79] G. Guilmin, F. Capman, B. Ravera, F. Chartier, "New Nato Stanag Narrow Band Voice Coder at 600 Bits/s", Acoustics, Speech and Signal Processing, IEEE International Conference, Volume 1,2006, Page(s): 689-692.
    [80] Jian Cong, Suo Cong, "New Speech Encoding Algorithms for Ultra Low Bit Rate at 600/300 Bps", Acoustics, Speech and Signal Processing, IEEE International Conference, Volume 1, 2006, Page (s): 709-712.
    [81] TI, TMS320DM6446 Digital Media System on-Chip, SPRS283, DECEMBER 2005.
    [82] 韦晓东,周琼芳等,“嵌入式 RISC 核 MPEG2/AC3 解码器”, 半导体技术,第 28 卷第7 期, 2003.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700