音频感知编码模型及关键技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
关于音频感知编码模型的研究主要集中在两个方面:一是各种音频压缩编码算法的研究;二是音频编解码器实现技术的研究。当前,随着移动通讯网络的普及,音频产品的传播变得更加频繁和方便,但移动设备终端的计算能力和存储容量都是有限的,因此,低复杂度高质量的音频编码算法研究和系统实现成为数字音频处理领域的研究热点之一。
     为实现一个低复杂度高质量的音频编解码器,本论文的工作主要围绕下述两个方面展开:在算法级上,选择音频感知编码模型中具有突出优点的AAC编码系统为研究对象,分别在频域变换、心理声学分析和量化编码这三大关键模块中进行算法优化,在保证编码质量的前提下,降低运算复杂度,减小编码耗时;在实现技术方面,采用SOPC设计策略,使用“微处理器软核+专用IP核”的模式进行软硬件协同工作,在FPGA开发平台上实现一个低复杂度高音质的AAC编解码系统。
     本文的主要工作和创新如下:
     (1)滤波器组是音频感知编码模型中的计算密集型模块,占用较大的运算量。本文针对滤波器组的快速实现算法进行了研究,分别提出了两种改进方案——基于递归结构和基于N/8点FFT核的MDCT/IMDCT快速实现方案,适于IP核设计并可以实现MDCT/IMDCT电路共用。第一种方案具有电路规整、占用硬件资源少、运算速度快和吞吐能力强等优点,与现有递归算法相比,只需要N~2/16个周期就可以完成N点MDCT/IMDCT变换。第二种方案,相对于目前流行的基于N/4点FFT核的实现方法,增加了一些加法器,但降低了对乘法器数目的需求,减小了计算误差,同时将运算速度提升了近一倍。
     (2)为消除预回声的影响,音频感知编码模型在心理声学模块中通过暂态分析,判断信号的瞬变性,以指导变换编码中自适应长短块的切换。本文结合入耳听觉特性和音频编码特点,草拟了一种听觉感知阈值的拟合模型框架,并且,分析了基于感知熵的块类型选择算法存在的缺点,提出了一种简单的暂态分析方法——时域峰值检测法,能在时域上快速判断出音频信号的瞬变性,从而,对平稳信号和瞬变信号使用不同的变换窗长度,以获取较好的时域分辨率和频域分辨率。在对音质影响不大的前提下,提高了心理声学模型的计算速度。
     (3)音频感知编码模型中使用Brandenburg的双循环量化处理结构,可以获取较好的编码质量,但存在收敛速度慢、迭代次数多的缺点,不具备实时处理能力。本文在原量化模块设计思想的指导下,提出了基于噪声预测的量化-编码结构。通过确定公共缩放因子和尺度因子的制约关系,缩小量化阶的迭代范围,加快了收敛速度,简化了量化模块的运算复杂度。与原有双循环迭代结构相比,在对音质影响不大的前提下,运算速度提高了一倍。相应地,在反量化模块中,提出了一种改进型的查表方法,与现有算法相比,减少了50%的存储空间,并将计算误差控制在10~(-6)级别内。
     (4)依据嵌入式系统实时操作和可编程化的要求,本文提出了一种基于SOPC架构的数字音频编解码系统的可编程实验模型。选择MPEG AAC为实验对象,通过对编解码系统中关键模块的算法改进和部分电路的硬件优化,软硬件协同设计,降低编解码的运算复杂度。在保证编码质量的前提下,系统的编码速度提高了一倍,并且实现了实时解码。经过主/客观评测系统评估,取到了较好的编码质量评测分数。
On the audio perceptual coding technology, research mainly concentrates in two aspects: first, optimization of audio compression algorithms; second, hardware design and implementation of the algorithms. At present, with the popularity of mobile network, the spread of audio products gets more frequent and convenient. Due to the limitation of computing capability and storage capacity which comes along with the mobile terminal, realization of an audio coding system with the performance of low complexity and high quality has become one of the most popular researches in digital audio processing.
     To achieve a high-quality audio codec with low complexity, this paper focused on two improvements: first, the key technologies of AAC, such as frequency transform, psychoacoustic analysis and quantization, on the algorithm level were optimized, in order to reduce computational complexity; second, based on the SOPC design strategy, a real-time MPEG AAC codec system was implemented using the combination of soft-core microprocessor and IP cores.
     The main work and innovation are as follows:
     (1) The filterbank module is a computation-intense part of audio perceptual coding model, occupying large amount of computing. In this paper, two methods accelerating the computation speed of the filterbank are proposed. One method was based on a recursive structure and the other was with the N/8-point FFT kernel, which were suitable for IP core design of both MDCT and IMDCT. Compared with the other recursive algorithms, the first approach reduced its computation cycles to N~2/16 and provided a superior performance in terms of computation speed, data throughput and hardware utilization. Although the existing algorithms based on N/4-point FFT kernel cost fewer adders, the second method not only cut down the requirement of multipliers, but also doubled the computation rate.
     (2) To eliminate the impact of pre-echo, the psychoacoustic module in audio perceptual coding model adopts transient analysis method to switch adaptively the transformation length. Based on the characteristics of human auditory and audio compression technique, a hypothesis of perceptual threshold model is presented. Besides, a block switching method in time domain is exploited instead of PE-based algorithm, which could quickly determine the transient signal. As a result, it raised the computing speed of the psychoacoustic model with little effect on the audio quality.
     (3) The quantization module of audio coding system employs Brandenburg architecture to obtain good quality, but it results in great complexity, which is not suitable for real-time applications. A simplification of the dual-loop structure is proposed on the basis of the noise approximate formula. According to the relation between the common scalefactor and the scalefactors in each scalefactor bands, the iterating scope of quantizing step got narrower to expedite its convergence. Results of experiments showed that the quality of reconstructed sound with the proposed approach was almost the same as the one reconstructed by original quantization module. In the decoding system, a modified version of a look-up table method is exhibited to perform the inverse non uniform quantization. In comparison with the existing ones, it reduced 50% storage and decreased the calculation errors.
     (4) A programmable model of digital audio codec is developed with the concept of SOPC architecture. Taking MPEG AAC as an example, the software/hardware co-operation was processed to reduce the computational complexity of the codec system. The reports of FPGA implementation showed that this audio codec system achieved higher coding rate and realized real-time decoding procedure. The results of both objective and subjective evaluation tests indicated this codec got good audio quality.
引文
恩比主页.2008.恩比公司产品[OL].http://www.enbia.com/?product&CategoryID=6
    付中华,王大炜,赵荣椿,谢磊.2003.一种基于MP3框架的低采样率音频压缩方案[J].计算机工程,29(19):146-148.
    高科,刘心松,詹骥.2006.互联网中基于MEPG-4流媒体播放的研究[J].电子科技大学学报,35(3):381-384.
    韩纪庆,冯淘,郑贵滨,马翼平.2007.音频信息处理技术[M].北京:清华大学出版社.
    胡贯荣,谢争捷,涂刚.2006.嵌入式音频系统的设计与实现[J].计算机工程与设计,27(23):4566-4568.
    胡广书.1997.数字信号处理--理论、算法与实现[M].北京:清华大学出版社.
    胡学龙,江新炼,周琳,吴镇扬.2003.一种改进的无损压缩数字音频编码器[J].微电子学与计算机,(7):23-25.
    候兆荣,窦维蓓,董在望.2001.改进MPEG音频编码的窗型切换准则[J].电声技术,(6):7-9.
    姜丹.2001.信息论与编码[M].合肥:中国科学技术大学出版社.
    姜晔,吴镇扬.2000.感知音频编码中预回声的产生机理与抑制方法[J].电声技术,(11):15-19.
    李力利,方向忠,徐盛,等.2005.MPEG-4无损音频压缩算法改进[J].语音技术,(6):54-56.
    林福宗.2000.多媒体技术基础[M].北京:清华大学出版社.
    刘烃海,刘同怀,郭立,等.2004.一种高速HUFFMAN解码电路[J].微电子学与计算机,21(10):182-185..
    轮志新,刘建伟,王蕾.2006.MPEG算法在列车运行监控系统中的应用[J].电子技术应用,(9):91-93.
    罗伟,张太镒,杨斌.2004.AAC编码算法的快速实现[J].信号处理,20(6):561,563-565.
    罗文(Rowen C).2006.复杂SOC设计[M].吴武臣,侯立刚 译.北京:机械工业出版社.
    帕里(Parhi K K).2004.VLSI数字信号处理系统[M]:设计与实现.陈弘毅等译.北京:机械工业出版社.
    潘松,黄继业,曾毓.2005.SOPC技术实用教程[M].北京:清华大学出版社.
    潘亚涛,周宏,陈健.2001.基于DSP的同样实时音频处理系统[J].数据采集与处理.16: 330-333.
    彭澄廉 主编.2004.挑战SOC--基于NIOS的SOPC设计与实践[M].北京:清华大学出版社.
    数字音视步贞编解码技术标准工作组.2007.简介[OL].http://www.avs.org.cn/aboutus.asp.
    孙剑,郭立,林海涛,等.2006.先进音频编码中MDCT的新型递归结构及其FPGA设计[J].微电子学与计算机,23(4):61-63.
    王昱洁,刘同怀,郭立,等.2005.一种应用于MPEG-2 AAC的快速Huffman解码算法[J].微型机与应用,(2):53-55.
    韦晓东,周琼芳,汪斌等.2003.嵌入式RISC核MEPG2/AC3解码器[J].半导体技术,28(7):47-49.
    小M.2005.各有特色PHILIPS SAA7750芯MP3导购[OL].http://www.itime.cn/Article/ShowArticle.asp?ArticleID=5459
    徐光辉,程东旭,黄如,等.2006.基于FPGA的嵌入式开发与应用[M].北京:电子工业出版社.
    徐盛.2000.基于感知理论的低码率高质量音频编码[D]:博士.上海:上海交通大学.
    徐盛,胡剑凌,陈健.2001.基于感知熵的感知音频编码器量化模块的改进[J].上海交通大学学报.35(6):902-904.
    徐欣,于红旗,易凡,等.2005.基于FPGA的嵌入式系统设计[M].北京:机械工业出版社.
    阎建新,窦维蓓,董在望.2006.音频编码中瞬态信号的时域检测方法[J].电子与信息学报,28(2):307-311.
    杨品,钟玉琢,蔡莲红(泽).1995.MPEG运动图象压缩编码标准(ISO/IEC 11172)[M].北京:机械工业出版社.
    佚名.2005.日本爱普生开发新一代音频解码芯片[OL].http://www.itime.cn/Article/mp3/news/6258.shtm
    佚名.2007.博通公司视频/音频解码芯片BCM7440支持蓝光DVD和HDDVD标准[OL].http://www.ic37.com/document/38537.htm
    佚名.2008.德州仪器数字音频处理器又添新成员[OL].http://18show.cn/news/d231133.html
    周建,刘鹏,梅优良,陈科明.2005.基于微处理器核的媒体系统芯片结构设计[J].电视技术:电路与应用,(12):25-27.
    Ahmed N,Natarajan T,Rao K R.1974.Discrete cosine transform[J].IEEE Transactions on.Computing,C-23:90-93.
    Bang K H,Kim J S,Jeong N H,et al.2001.Design optimization of MPEG-2 AAC decoder[J]. IEEE Transactions on Consumer Electronics, 47 (4): 895 - 903.
    
    Brandenburg K, Herre J, Johnston J D, et al. 1991. ASPEC: Adaptive spectral entropy coding of high quality music signal[C]. In Proc. 90~(th) Conv. Aud, Eng, Soc,. Feb, 1991, preprint 3011.
    
    Brandenburg K. 1999. MP3 and AAC explained[C]. AES 17~(th) International Conference on High Quality Audio Coding, 1-12.
    
    Britanak V, Rao K. R. 2001. An efficient implementation of the forward and inverse MDCT in MPEG audio coding[J]. IEEE Signal Processing Letters, 8:48 - 51.
    
    Chen C H, Liu B D, Yang J F. 2003. Recursive Architectures for Realizing Modified Discrete Cosine Transform and Its Inverse[J]. IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, 50(1): 38 - 45.
    
    Chiang H C, Liu J C. 1996. Regressive implementations for the forward and inverse MDCT in MPEG audio coding[J]. Signal Processing Letters, 3 (4): 116-118.
    
    Christopher R C. 2002. Perceptual modeling for low-rate audio[D]. Master thesis of McGill University, Canada.
    
    Davidson G, Bosi M. 1992. AC-2: High quality audio coding for broadcasting and storage[C]. In Proc. 46~(th) Annu. Broadcast Eng. Conf., 98 - 105.
    
    Davidson G. 1998. Digital audio coding: Dolby AC-3[M]. In The Digital Signal Processing Handbook, V. Madisetti and D.Williams, Eds. Boca Raton, FL: CRC Press, 41.1 -41.21.
    
    Dimkowiae I, Milovanowiae D, Bojkoviae Z. 2002. Fast software implementation of MPEG advance audio encoder[C]. In Proceedings of 14~(th) International Conference on Digital Signal Precessing. [S.L.]: IEEE Press, 2 : 839- 843.
    
    Duhamel P, Mahieux Y, Petit J. 1991. A fast algorithm for the implementation of filter banks based on time domain aliasing cancellation[C]. In Proc. ICASSP-91, 2209-2212.
    
    Ferreira A. 1995. Tonality detection in perceptual coding of audio[C]. In Proc. AES, (2): 39 - 47.
    
    Fuchs H. 1993. Improving joint stereo audio coding by adaptive inter-channel prediction[C]. IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, 39 -42.
    
    Gluth R. 1991. Regular FFT-related transform kernels for DCT/DST-based polyphase filter banks[J]. Acoustics, Speech, and Signal Processing, 3(4): 2205 - 2208.
    
    Hall J L. 1997. Asymmetry of masking revisited: Generalization of masker and probe bandwidth[J].J.Acoust. Soc. Amen, 101 : 1023-1033.
    
    Hashemian R. 2002. Condensed Huffman coding, a new efficient decoding technique[J]. Circuits and Systems, 1: 1-228 - 231.
    
    Hashemian R. 2003. Direct Huffman coding and decoding using the table of code-lengths. In Proc. Information Technology: Coding and Computing [Computers and Communications], 237 - 241.
    
    Herley C, Kovacevic J, Ramchandran K.. 1993. Tilings of the time-frequency plane: construction of arbitrary orthogonal bases and fast tiling algorithms[J]. IEEE Trans. Signal Processing, 41 : 3341-3359.
    
    Heriey C. 1995. Boundary filters for finite-length signals and time-varying filter banks[J]. IEEE Trans. Circuits Systems II, 42 : 102 - 114.
    
    Herre J, Johnston J D. 1996. Enhancing the performance of perceptual audio coders by using temporal noise shaping (TNS)[C]. 101~(st) AES Convention, Los Angeles, preprint 4384.
    
    Hotho G, Villemoes L F, Breebaart J. 2008. A Backward-Compatible Multichannel Audio Codec[J], IEEE Transactions on Audio, Speech and Language Processing, 16(1): 83 - 93.
    
    Howard P G, Vitter J S. 1994. Arithmetic coding for data compression[J]. Proc. Of the IEEE, 82 (6): 857-865.
    
    ISO/IEC 11172-3. 1992. Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5Mbit/s, part3: Audio[S].
    
    ISO/IEC 13818-7. 1997. Information Technology - Generic coding of moving pictures and associated audio, part 7: advanced audio coding(AAC)[S].
    
    ITU-R Recommendation BS 1116-1. 1994. Methods for the subjective assessment of small impairments in audio system including multichannel sound systems[S].
    
    ITU-R Recommendation BS 1387-1. 2001. Method for objective measurements of perceived audio quality[S].
    
    Jain A K. 1976. A fast Karhunen-Loeve transform for a class of random processes[J]. IEEE Trans. Comm., COM-24(10): 1023-1029.
    
    Johnston J D. 1979. Estimation of perceptual entropy using noise masking criteria[C]. In Proc. ICASSP-88, 2524-2527.
    
    Johnston J D. 1988. Transform coding of audio signals using perceptual noise criteria[J]. IEEE Journal on Selected Area of Communications. 6(2): 314-323.
    
    Johnston J D, Quackenbush S R, Herre J, et al. 2000. Review of MPEG-4 General Audio Coding[M]. In Puri A, Chen T (ed), Multimedia Systems, Standards, and Networks, Marcel Dekker, Inc. New York, USA, 131 - 155.
    
    Kang S S, Lee M H. 1993. An expanded 2-D DCT algorithm based on convolution[J]. IEEE Transactions on Consumer Electonics, 39 (3): 159 - 164.
    
    Kidambi S S. 1998. Recursive implementation of the DCT-IV and DST-IV. IEEE Symposium on Advances in Digital Filtering and Signal Processing, 106 - 110.
    Kim H K, Cho Y K, and Lee W P. 2004. A new optimized algorithm for computation of MDCT and its inverse transform[J]. Intelligent Signal Processing and Communication Systems, 528-530.
    Krishnan T, Oraintara S. 2002. Fast and lossless implementation of the forward and inverse MDCT computation in MPEG audio coding[J]. Circuits and Systems, 2 :11-181 - 11-184.
    Kubichek R. 1993. Mel-cepstral distance measure for objective speech quality assessment[C]. In Proc. IEEE Pacific Rim Conf. Communications, Computers and Signal Processing, 125 - 128.
    Kurniawati E, Lau C T, Premkumar B, et al. 2004. New implementation techniques of an efficient MPEG advanced audio coder[J]. IEEE Transaction on Consumer Electronics, 50 (2): 655 -665.
    Kwong M, Lefebvre R. 2003. Transient detection of audio signals based on an adaptive comb fileter in frequency domain[J]. Signal, Systems and Computers, 1 : 542 - 545.
    Lee C Y, Fang Y C, Chuang H C, et al. 2002. A Fast Audio Bit Allocation Technique Based on a Linear R-D Model[J]. IEEE Transactions on Consumer Electronics, 48(3): 662 - 670.
    Lee S W. 2001. Improved algorithm for efficient computation of the forward and backward MDCT in MPEG audio coder[J]. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 48 : 990 - 994.
    Li J. 2004. Reversible FFT and MDCT via matrix lifting[C]. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 4: 173-176.
    Li J. 2005. Low noise reversible MDCT(RMDCT) and its application in progressive-to-lossless embedded audio coding[J]. IEEE Transactions on Signal Processing, 53 (5): 1870-1880.
    Li L, Guo L, Bai X F, et al. 2008. An efficient recursive structure of MDCT/IMDCT for MPEG2 AAC[J]. Journal of University of Science and Technology of China, 38(3): 277 - 281, 287.
    Liu C N, Tsai T H. 2005. SoC Platform Based Design of MPEG-2/4 AAC Audio Decoder[C]. IEEE International symposium on Circuits and Systems, 3: 2851 - 2854.
    Liu P, Liu L Z, Deng N, et al. 2007. VLSI Implementation for Portable Application Oriented MPEG-4 Audio Codec[C]. IEEE International symposium on Circuits and Systems, 777 - 780.
    Magadum A. Prakash V. Optional fixed point implementation of MPEG-4 AAC encoder[OL]. http://www.ittiam.com/pages/competency/AAC_ispc_2003.PDF
    Mizosoe H, Yoshida D, Nakamura T. 2007. A Singal Chip H.264/AVC HDTV Encoder/Decoder/Transcoder System LSI[C]. International Conference on Consumer Electronics, 1-2.
    Moore B C J. 1996. Masking in the human auditory system[G]. In Collected Papers on Digital Audio Bit-Rate Reduction, N. Gilchrist and Grewin, Eds., 9-19.
    Murthy N R, Swamy M N S. 2003. A parallel/pipelined algorithm for the computation of MDCT and IMDCT[J]. Circuits and Systems. 2003, 4 (5): 25 - 28.
    Nikolajevic V, Fettweis G. 2003. Computation of Forward and Inverse MDCT Using Clenshaw's Recurrence Formula[J]. Signal Processing, IEEE Transactions on, 2003, 51(5): 1439 -1444.
    Ozer H, Avcibas I, Sankur B, et al. 2003. Steganalysis of audio based on audio quality metrics[C]. Proceedings of SPIE, 5020 : 55-66.
    Painter T, Spanias A. 2000. Perceptual coding of digital audio. IEEE proceedings, 88(4): 451 - 515.
    Princen J P, Bradley A B. Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation[C]. IEEE International on Acoustics, Speech, and Signal Proecessing, 2161-2164.
    Quackenbush S. 1997. Noiseless coding of quantized spectral components in MPEG-2 advanced audio coding. In IEEE ASSP Workshop Applications of Signal Processing to Audio and Acoustics, 1 -4.
    Rajsuman R. 2000. System On Chip: Design and Test. Norwood, MA: Artect House.
    Schoroeder M, Atal B S, Hall J L. 1979. Optimizing digital speech coders by exploiting masking properties of the human ears[J]. J. Acoust. Soc. Amer, 1647 - 1652.
    Sevic D, Popovic M. 1994. A new efficient implementation of the oddly-stacked princenbradley filter bank[J]. IEEE Signal Processing Letter, 1: 166-168.
    Tan H H, Sun Y H. 2007. Design of a configurable system-on-chip for audio application[C]. ASIC, 2007. ASICON '07, 740 - 743.
    Terhardt E. 1979. Calculating virtual pitch[J]. Hearing Research, 1: 155 - 182. TI. 2008. Audio Solutions Guide[OL]. http://www.ti.com/audio
    Todd C, Davidson G, Davis M, et al. 1994. AC-3: Flexible perceptual coding for audio transmission and storage[C]. In Proc. 96~(th) Conv. Aud. Eng. Soc, preprint 3910.
    Tsai T H, Yen C C. 2002. A high quality re-quantization-quantization method for MP3 and MPEG-4 AAC audio coding[C]. IEEE International Symposium on Circuits and Systems, 3 : 851 -854.
    Tsai T H, Liu C N, Wang Y W. 2003. A pure-ASIC design approach for MPEG-2 AAC audio decoder[C]. The 4~(th) Pacific Rim Conference on Multimedia, 3 : 1633 - 1636.
    Vernon S. 1995. Design and implementation of AC-3 coders[J]. IEEE Transactions on Consumer Electronics, 41(3):754 - 759.
    Watson M A, Buettner P. 2000. Design and implementation of AAC decoders[J]. IEEE Transactions on Consumer Electronics, 46 (3): 819 - 824.
    Wei X, Shaw M J. Varley M R. 1997. Optimum bit allocation and decomposition for high quality audio coding[C]. Proc. Int. Conf., Acoustic, Speech, Signal Processing, 1 :315 - 318.
    Wiese D, Stoll G. 1990. Bitrate reduction of high quality audio signals by modeling the ear's masking thresholds[C]. AES 89~(th) Conv., Audio Engineering Society, preprint #2970.
    Yu R S, Lin X, Rahardja S, et al. 2004. A statistics study of the MDCT coefficient distribution for audio[C]. IEEE International Conference on Multimedia and Expo: 1483-1486.
    Zhang C, Hu R. 2007. A Novel Codec for Mobile Multimedia Applications. International Conference on Wireless Communications, Networking and Mobile Computing, 2873 - 2876.
    Zwicker E, Fasti H. 1990. Psychoacoustics Facts and Models[M]. Berlin, Germany: Springer-Verlag.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700