AAC感知音频编码算法的优化与设计

英文题名：Optimized Design of Audio Perceptual Coding Algorithm on AAC
作者：方贞
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：感知音频 ; 心理声学 ; 动态分块 ; 预回声
英文关键词：Perceptual Audio ; Psychological Acoustics ; Dynamic Block ; Pre echo
学位年度：2012
导师：余绍军
学科代码：081203
学位授予单位：中南林业科技大学
论文提交日期：2012-05-01

摘要

随着网络信息与多媒体技术的快速发展,人们对于多媒体业务的需求越来越迫切,多媒体与计算机网络技术逐渐融合为一体,已经渗透到社会经济与生活的各个方面。为了方便存储和传输,基于心理声学模型的感知音频编码技术得到了广泛的应用。
     高级音频编码技术(Advanced Audio Coding,简称AAC)是目前最先进的感知音频编码技术,它的优势主要体现在以下三方面：较高的信号压缩比、模块化的量化编解码过程、完美透明的重建音质。作为一种音频编码标准广泛的应用于各个领域,AAC具有得天独厚的优点以及其潜在的市场价值。但是AAC标准算法同样存在不足之处,由于其算法复杂度很高,消耗大量的运算时间及系统资源,存在一定的编码延时,十分不利于当今感知音频编码技术的实时性要求。根据实验研究分析表明,AAC编码过程中的系统资源消耗及运算消耗主要集中在量化与编码、心理声学模型及滤波器组等编码器的各主要模块上。
     为实现一个低复杂度、高实现效率的音频编解码器,在具有高效性的同时实现完美透明的重建音质,本文充分利用感知音频核心理论——心理声学模型,用客观的参数指标反映主观的听觉效果,针对量化编码过程中的预回声问题提出一种基于动态分块的自适应窗口切换算法。该算法采用动态分块的理念,针对不同采样率的音频信号采用不同的分块效果,结合心理声学模型中的时域掩蔽效应,更准确的判别出瞬变信号并合理的进行窗口切换,将预回声控制并消除在时域阶段,以达到产生的量化噪声不可感知化,实现高效率、高保真的音频编码效果。该算法在保证音频重建质量的前提下,降低了算法的运算复杂度并减小了编码的耗时。
With the rapidly development of network information and multimedia technology, the multimedia business need more and more pressing by people, multimedia and computer network technology already seeped to society economy and all aspects of life. In order to facilitate the storage and transmission, the perceptual audio coding technology based on psychoacoustic model is applied widely.
     Advanced audio coding (Advanced Audio Coding, referred to as AAC) is the most advanced perceptual audio coding technology currently, its advantages are mainly embodied in the following three aspects:the signal with a higher compression ratio, a modular quantization coding and decoding process, and perfect transparent reconstruction quality. AAC is richly endowed by nature as well as its potential market value, as a kind of audio coding standard is widely used in various fields But AAC standard algorithms also exist deficiencies, such as its complexity is very high, also, consuming a large amount of computing time and system resources exists certain coding delay, so is not conducive to the perceptual audio coding technology of real-time requirements. According to the experimental research, AAC encoding process in system resource consumption and operation cost are mainly concentrated in the quantization with coding, psychoacoustic model, filter and encoder each main module.
     In order to achieve a low complexity and high efficiency audio codec, also, in high efficiency while achieving perfect transparent reconstruction quality, this paper makes full use of perceptual audio core theory and reflection psychoacoustic model, by using objective parameters reflect the subjective auditory effects, and the quantization process pre-echo problems, the author proposed an adaptive window switching algorithm based on the dynamic block. The new algorithm uses a dynamic block concept, according to different sampling rate of audio signals using different blocking effect, combined with psychoacoustic model in time domain masking effect, so we could get more accurately in the transient signal and get reasonable window switches, also, the echo control and elimination in the time domain, in order to achieve the quantization noise generated can not be perceived, to achieve high efficiency, high fidelity audio coding effect. The algorithm reduced the algorithm complexity and the coding time, under the audio reconstruction in the premise of quality,

引文

[1]Painter T. Perceptual Coding of Digital Audio[J]. Proceeding of the IEEE, 2000, 88(4):542-462.
    [2]高成伟.移动多媒体技术标准、理论与实践[M].北京：清华大学出版社,2006.
    [3]陈涛,黄东平.MPEG-2/4 AAC音频编码器实时性能优化研究进展[J].电声技术,2008,32(5)：45-50.
    [4]ISO/IEC 13818-7.1997. Information Technology-Generic Coding of Moving Pictures and Associated Audio Information, Part 7:Advanced Audio Coding.
    [5]ISO/IEC 14496-3.2001. Information Technology-Coding of Audio Visual Objects, Part 3:Audio.
    [6]ISO/IEC 11172-3.1993. Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5Mbit/s, Part 3:Audio.
    [7]Tao Zhang, Wei Wang, Jialin He. On the Pre-echo Control Method in Transient Signal Coding of AVS audio [C] Audio. Language and Image Processing (ALIP 2008), Shanghai:2008,242-246.
    [8]陈涛MPEG-4 AAC实时音频编码算法的优化研究[D].湖南：中南大学,2008.
    [9]侯兆荣,窦维蓓,董在望.改进MPEG音频编码的窗型切换准则[J].电声技术,2001,6：7-9.
    [10]付轩MPEG-2/4 AAC低复杂度层次编码器设计和算法改进[D].上海：上海交通大学,2005.
    [11]Shin-Chi Lai, Sheau-Fang Lei, Ching-Hsing Luo. Common Architecture Design of Novel Recursive MDCT and IMDCT Algorithms for Application to AAC[J]. IEEE Transactions on Circuits and Systems II,2009, 56(10):793-797.
    [12]Britanak.V, Rao, K.R. An Efficient Implementation of the Forward and Inverse MDCT in MPEG Audio Coding[J]. Signal Processing Letters, 2001,8(2):48-51.
    [13]Krishnan.T, Oraintara.S. Fast and Lossless Implementation of the Forward and Inverse MDCT Computation in MPEG Audio Coding[J]. Circuits and Systems. ISCAS 2002. IEEE International Symposium on Vol.2:26-29,2002.
    [14]GluthR. Regular FFT-related Transform Kernels for DCT/DST based Poly Phase Filter Banks[J]. Aeousties, Speech and Signal Processing.1991,3(4):2205-2208.
    [15]黄征.定点DSP实现MPEG-2 AAC解码[J].计算机应用,2004,24.
    [16]王华明,徐盛.用定点DSP实现MPEG-2 AAC实时解码器[J].电声技术,2001,3.
    [17]彭澄廉.挑战SOC—基于NIOS的SOPC设计与实践[M].北京：清华大学出版社,2004.
    [18]周建,刘鹏,梅优良,陈科明.基于微处理器核的媒体系统芯片结构设计[J].电视技术,2005,12：25-27.
    [19]Ken.C, Pohlman,苏菲译.数字音频原理与应用[M].北京：电子工业出版社,2002.
    [20]韩纪庆,冯涛.音频信号处理技术[M].北京：清华大学出版社,2007,145-183.
    [21]郑方,徐明星.信号处理原理[M].北京：清华大学出版社,2003,239-258.
    [22]朱丽,郭从良.心理声学模型在数字音频中的应用[J].电声技术,2002,8：11-14.
    [23]朱丽,黄思远,湛金童.心理声学模型中音调探测算法的改进[J].声学技术,2003,22(4)：273-275.
    [24]马登峰.基于实时音频流的信息隐藏方法研究[D],江苏：江苏大学,2011.
    [25]陈小平,胡泽.听觉临界频带及其在声频信号处理中的应用[J].北京广播学院学报,2004,11(2)：28-35.
    [26]党辰,戴葵,王苏峰,刘芸,王志英.高频重建技术SBR的研究与实现[J].电子学报,2004,12：189-191.
    [27]Dimkovic.I. Improved ISO AAC Coder [EB/OL]. http://www.psytel-research.co.yu/papers/2003.3.
    [28]Dimkovic.I. Fast Implementation of AAC LC Encoder [EB/OL]. http://www.psytel-research.co.yu/papers/2003.3.
    [29]List P, Joch A. Adaptive Deblocking Filter[J]. IEEE Trans on Circuits and Systems for Video Technology, 2003,13:614-619.
    [30]Ka-Ho Ng, Lai-Man Po, Ka-Man Wong. Multiple Block-size Search Algorithm for Fast Block Motion Estimation[A].7th International Conference on Information, Communications and Signal Processing, Macau,2009.
    [31]Tao Z. Fast Global Motion Estimation and Moving Object Extraction Algorithm in Image Sequences[J]. Journal of Southeast University (English Edition),2008, 24(2):192-196.
    [32]P. Moulin, R. Koetter. Data-hiding Codes[J]. Proc of the IEEE, 2005, 93(12):2083-2126.
    [33]H. L. Liu, Y. M. Cheung. A Learning Framework for Blind Source Separation using Generalized Eigenvalues[C]. Proceedings of International Symposium on Neural Network, Chongqing, China:Springer Berlin,2005.
    [34]贺前华,韦岗,帅林.多声道音频编码AC-3算法原理[J].计算机工程,1998,24(12)：44-46.
    [35]杜伟韬,杨占听.AAC编码器的滤波器组原理与实现[J].北京广播学院学,2005,12.
    [36]Y. M, Cheung, H. L. A New Approach to Blind Source Separation with Global Optimal Property[C]. Proceedings of the IASTED International Conference of Neural Networks and Computational Intelligence. Grindelwald, Switzerland: IASTED,2004:137-141.
    [37]A. Hyvarinen. Independent Component Analysis[M]. John Wiley and Sons, 2001.
    [38]LEMMERLING P, WAMBACQ P, VANHUFFEL S. Perceptual Audio Modeling with Exponentially Damped Sinusoids[J]. Signal Processing, 2005,85:163-176.
    [39]Kim Jong-Ho, Kim Byung-Gyu. Fast block mode decision algorithm in AVC video coding[J]. Journal of Visual Communication and Image Representation, 2008,3(19):175-183.
    [40]何冬梅,基于小波包分解复杂度可分级的音频编码算法[J].哈尔滨：哈尔滨工业大学,2009.
    [41]杨福生.独立分量分析的原理和应用[M].北京：清华大学出版社,2006.
    [42]Johnston J D, Transform Coding of Audio Signals using Perceptual Noise Criteria[J]. IEEE J.on Sel, Areas in Com,1988,6:314-323.
    [43]Lemmer Ling P, Wambacq P, Vanhuffel S. Perceptual Audio Modeling with Exponentially Damped Sinusoids[J]. Signal Processing, 2005,85:163-176.
    [44]刘伟,王朔中,张新鹏.一种基于部分mp3编码原理的音频水印[J].中山大学学报(自然科学版),2004年,43(2)：26-33.
    [45]Hane Mallat,杨力华译.信号处理的小波导引[M].机械工业出版社,2006.
    [46]RFC3016, "RTP Payload Format for MPEG-4 Audio/Visual Streams".
    [47]RFC4566, "SDP:Session Description Protocol".
    [48]吴凤燕,刘守训,王翔.基于DRA和AAC的MDCT/IMDCT预回声抑制比较[J].电声技术,2010,7.
    [49]沈佐峰,陈曦.一种多路音频编解码系统的设计与实现[J].通信技术,2012,24-28.
    [50]聂铭玮.数字多媒体音频新技术的发展研究——MPEG标准音频编码系统[J].2011,4：78-84..
    [51]彭鹏.AVS音频编码算法研究[D].天津：天津大学,2010.
    [52]夏宇闻.数字系统级设计教程(第二版)[M].北京：北京航空航天大学出版社,2008,14-15.
    [53]汪国有,张成兴,廖容MPEG-4 AAC实时音频编码器设计与实现研究[J].计算机与数字工程,2005,33(8)：124-128.
    [54]姜哗,吴镇扬.感知音频编码中预回声的产生机理与抑制方法[J].电声技术,2000,10：15-18.
    [55]唐骏.变换音频编码预先回声产生机理与抑制方法[J].厦门理工学院学报,2009,17(2)：35-39.
    [56]付轩,MPEG-2/4 AAC (?)(?)复杂度层次编码器设计和算法改进[D].上海：上海交通大学,2005.
    [57]余绍军,方贞.基于动态分块的自适应切窗算法[J].电子世界,2012,5,153-156.
    [58]景雨,安居白,刘朝霞.基于动态分块阈值去噪和改进的GDNI边缘连接的溢油遥感图像的边缘检测算法[J].计算机科学,2011,11.
    [59]刘冬冰,杜伟韬,杨占听.音频编码中的频带复制技术浅析[J].辽宁大学学报,2011,4.
    [60]汪源源,现代信号处理理论和方法[M].上海：复旦大学出版社,2003,121-125.
    [61]R. Kemerait, D. Childers. Signal Detection and Extraction by CePstrum Techniques[J], IEEE Transactions on Information Theory, 1972,18(6):745-759.
    [62]符晓娟,杨完全.利用离散余弦变换的语音信号压缩方案[J].信息技术,2006,11：74-76.
    [63]张璐.一种面向交互式通信的音频编码算法[J].通信技术,2011,7：56-62.
    [64]周延献,张涛,王赞.基于方差平坦测度的音频瞬态段检测算法[J].电声技术,2012,2：121-128.
    [65]苟大举,苟平,周群彪,语音DCT变换的一种小波编码方法,四川大学学报：自然科学版,2004,12：115-117.
    [66]侯兆荣,窦维蓓,董在望.改进MPEG音频编码的窗型切换准则[J].电声技术,2001(6)：7-9.
    [67]何兵,徐盛,陈健.一种低码率音频压缩编码的窗切换方法[J].电声技术,2001,11：3-6.
    [68]Kurniawati E, Lau C T, Premkumar B. New Implementation Techniques of An Efficient MPEG Advanced Audio Coder[J]. IEEE Trans on Consumer Electronics, 2004,50(2):655-665.
    [69]Zhang C, Hu R. A Novel Codec for Mobile Multimedia APP Internet Conference on Wireless Corn [J], Networking and Mobile Computing, 2007, 2873-2876.
    [70]孔祥凤MPEG-2 AAC解码算法研究及其在DSP平台上的实现[D].天津：天津大学,2005.
    [71]张成兴MPEG-4 AAC音频编码器的DSP程序优化设计与实现[D].武汉：华中科技大学,2005.
    [72]翟元杰MPEG-2AAC音频解码器原型芯片设计与实现[D].合肥：合肥工业大学,2009.
    [73]孙鹏,刘平香,徐百灵.瞬态信号的检测方法的研究[J].舰船科学技术,2005.
    [74]阎建新,窦维蓓,董在望,音频编码中瞬态信号的时域检测方法[J],电子与信息学报,2006,28(2)：307-311.
    [75]胡多传MPEG-2 AAC音频编解码的研究及实现[D].安徽：安徽农业大学,2005.
    [76]刘丽,郭立.音频压缩编码中参数比特分配算法的研究和实现[J].微电子学与计算机,2008,25(2)：46-50.
    [77]陈宁.数字音频水平中的关键技术研究[D].上海：上海交通大学,2008.
    [78]张军,张德运,傅鹏.一种改进的心理声学语音质量客观评价算法[J].微电子与计算机,2007,24(3)：15-19.
    [79]M. Hans, R. Scharfer. Lossless Compression of Digital Audio[J]. Proceedings of the IEEE,2001,18(5):53-75.
    [80]丁贵广,郭宝龙.精细可伸缩视频编码中的增强层编码方法研究[J].计算机工程与应用,2003,21：25-26.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700