可伸缩性视频编码的转码及其应用

英文题名：Transcoding and Its Applications in Scalable Video Coding
作者：柳辉
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：可伸缩性视频编码 ; SVC ; H.264/AVC ; 视频转码 ; 性能比较 ; 快速模式选择 ; 快速运动估计 ; 码率转码
英文关键词：Scalable Video Coding ; SVC ; H.264/AVC ; video transcoding ; performance comparison ; fast motion estimation ; fast mode decision ; bitrate reduction
学位年度：2009
导师：李世鹏 ; 李厚强
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2009-05-02

摘要

随着网络技术和多媒体技术的迅猛发展,基于网络的多媒体获得了十分广泛的应用。当前的媒体应用环境具有网络形式的异构性、终端设备的多样性以及多媒体应用的复杂性等特点,从而造成了目前多媒体应用面临的困难和挑战。其中可伸缩性视频编码技术提供内嵌多层子码流来提供不同时间、空间、质量等各种尺度的可伸缩功能,从而摆脱传统的单层视频编码而具有良好的适应能力;另外,一直以来具有良好性能并被广泛研究的视频转码(Video Transcoding)技术则能对已编码码流按用户需求进行可变、多样的格式转换,同样具有良好的适应能力。这两者都是有望解决这些问题的关键技术,是目前视频处理与通信领域研究的热点,具有重要的理论意义和广泛的应用价值。
     本文在深入分析最新的可伸缩性视频编码国际标准(Scalable VideoCoding,SVC)技术特点及应用场景的基础上,研究相应的转码技术,旨在解决在不同的网络应用环境、用户体验和编码格式下,转码存在的各种技术问题。本文的主要工作以及创新之处在于:
     1.研究SVC技术与转码技术的性能对比。
     本文对SVC进行理论分析和实际测试,综合各种特性的视频内容,从编码端复杂度、编码性能、解码端复杂度以及可伸缩应用角度对其和视频转码进行详细的对比实验和数据分析,指出SVC的优势以及缺点,为SVC的应用前景提供了理论分析和数据支持。
     2.提出了通用的基于SVC标准到H.264/AVC标准的空间分辨率转码框架及算法。
     本文研究了从SVC标准到H.264/AVC标准的空间分辨率转码过程,分析了SVC标准与H.264/AVC标准的异同点,提出了通用的空间分辨率转码框架,着重在像素域上研究快速运动估计、快速模式选择以及运动重用技术的算法,在保证良好转码性能的同时,实现了低复杂度的、从SVC到H.264/AVC的空间分辨率转码。
     3.提出了通用的从SVC标准到H.264/AVC标准的综合转码框架及算法。
     在本文所提的空间分辨率转码技术基础上,本文研究了从SVC标准到H.264/AVC标准的质量转码过程,并提出了综合空间分辨率和质量的通用的转码框架,分别在像素域的闭环以及开环算法中研究综合转码的各种问题,并提出闭环/开环联合的综合转码方法以及由底向上的快速模式选择算法,在保证性能损失不大的情况下,进行迅速的转码,实现复杂度以及率失真性能的良好折中。
     综上所述,本文对SVC进行了探讨和研究,并结合可伸缩性视频编码与转码,进行空间与质量的转码技术研究,取得了一些有价值的研究成果。
With the fast development of network and multimedia techniques,multimedia applications based on networks has been widely applied into everyday life.There are mainly three characteristics in the current multimedia application environment: heterogeneous networks,various receiving terminals and diversity of multimedia application categories.These features lead to difficulties as well as challenges within multimedia applications.Scalable video coding,which contains multiple sub-bitstreams for different temporal,spatial and quality scalabilities,differs greatly from traditional single-layer coding,and is adaptive to above environments.On the other hand,video transcoding,which has excellent coding performance and was also studied for years,can convert one signal to another according to various user requirements,thus is an ideal alternation.These two are the key techniques to solve above problems,and are the hot spots in research area of video processing and communication.They are considered as two promising techniques in theory and practice.
     Under the latest framework of Scalable Video Coding(SVC) extension of H.264/AVC,the dissertation analyzes its technical features and application scenarios, and studies related video transcoding techniques,for the purpose of solving various problems in video transcoding under different network environments,user experiences and coding formats.The main contents and novelties of the dissertation are as follows:
     1.Makes comprehensive technical comparisons between SVC and video transcoding.
     This dissertation studies SVC from theoretical analysis to practical simulations within broad categories of video contents,and makes comparisons between SVC and video transcoding in terms of encoder complexity,coding performance,decoder complexity,scalability scale,etc.With the above detailed comparisons and data analysis,the pros and cons of SVC are pointed out,which can be considered as theoretical analysis and data support for the potential application scenarios of SVC.
     2.Proposes a general spatial resolution transcoding framework to H.264/AVC video format,as well as algorithms.
     The dissertation studies the spatial resolution transcoding process from SVC to H.264/AVC,analyzes the commonness and differences between them,and proposes a general fast spatial resolution transcoding architecture from SVC to H.264/AVC format.It focuses on pixel-domain fast motion estimation,fast mode decision,and motion re-using techniques.Under this brand-new SVC-to-H.264/AVC spatial transcoding architecture,excellent transcoding performance is obtained while maintaining low complexity.
     3.Proposes a general method for comprehensive video transcoding from SVC to H.264/AVC video format,together with other techniques.
     With proposed spatial transcoding architecture mentioned above,the dissertation studies the transcoding process from SVC to H.264/AVC in terms of bitrate transcoding,and proposes a spatial-quality combined comprehensive transcoding framework and related algorithms.The new framework studies transcoding techniques on both open-loop and closed-loop transcoding fields.In addition, considering bitrate reduction,the dissertation proposes a bottom-up fast mode decision algorithm.These techniques can realize real-time spatial and quality video transcoding while maintaining acceptable performance loss,thus making a satisfactory tradeoff between transcoding performance and transcoding complexity.
     In conclusion,this dissertation investigates the latest SVC standard,and builds up new architectures combining SVC and transcoding for the purposes of spatial and quality transcoding.Some useful and encouraging results have been obtained.

引文

刘甘娜,朱文胜,付先平。2004。多媒体应用基础[M]。高等教育出版社:第2版。
    王小鹏,刘玉红,邵君花。2006。多媒体通信技术[M]。兰州大学出版社。
    中国国家质监总局中国家标准化管理委员会。2006。GB/T 20090.2-2006。信息技术先进音视频编码第2部分:视频[S]。中国标准出版社。
    钟玉琢。1999。多媒体技术[M]。清华大学出版社。
    祖晟。2006。常见的视频编码技术标准(Ⅰ)(Ⅱ)[M]。记录媒体技术:9-10。
    Ahmad L,Wei X,Sun Y,et al.2005.Video transcoding:an overview of various techniques and research issues[J].IEEE Trans.Multimedia.7:793-804.
    Amonou I,et al.2007.Optimized rate-distortion extraction with quality layers[J].IEEE Trans.Circuits Syst.Video Technol.17(9):1186-1193.
    Andreopoulos Y,Verdicchio F,Barbarien J,et al.2004.Response to call for proposals on scalable video coding technology[M].MPEG Doc:ISO/IEC JTC 1/SC29/WG11:m10589.
    Assuncao P,Ghanbari M.1996.Post-processing of MPEG-2 coded video for transmission at lower bit-rates[C].IEEE Proc.ICASSP.1998-2001.
    Assuncao P,Ghanbari M.1998.A frequency-domain video transcoder for dynamic bit-rate reduction of MPEG-2 bitstreams[J].IEEE Trans.Circuits Syst.Video Technol.8:953-967.
    Bjork N,Christopoulos.1998.Transcoder architectures for video coding[J].IEEE Trans.Consumer Electron.44:88-98.
    Bjork N,Christopoulos.1998.Transcoding architectures for video coding[C].IEEE Proc.ICASSP.2813-2816.
    Chang SF,Messerschmidt DG.1995.Manipulation and compositing of MC-DCT compressed video[J].IEEE J.Select.Areas Commun.13:1-11.
    Chang SF,Vetro A.2005.Video adaptation:Concepts,technologies,and open issues[J].Proceedings of IEEE.93:148-158.
    Cock JD,Notebaert S,Van WR.2007.Transcoding from H.264/AVC to SVC with CGS layers [C].IEEE Proc.ICIP.4:Ⅳ73-Ⅳ76.
    Fung KT,Chan YL,Siu WC.2002.New architecture for dynamic frame-skipping transcoder[J].IEEE Trans.Image Processing.11:886-900.
    Golwelkar A,Bajic I,Woods JW.2003.Response to call for evidence on scalable video coding [M].MPEG Doc:ISO/IEC JTC 1/SC29/WG11:M9723.
    Guo Y,Li HQ,Pei SB,et al.2006.A novel fast Inter-prediction mode decision for H.264/AVC [C].SPIE 17th International Symposium on Electronic Imaging.
    Hang HM,Tsai SS,Chiang T.2004.Motion information scalability for MC-EZBC:response to the call for evidence on Scalable Video Coding[M].MPEG Doc:ISO/IEC JTC 1/SC29/WG11:M9756.
    Hannuksela MM,Wang YK.2007.Support for SVC Header Rewriting to AVC[M].JVT Doc:JVT-W046.
    Hwang JN,Wu TD,Lin CW.1998.Dynamic frame-skipping in video transcoding[C].IEEE Proc.MMSP.616-621.
    ISO/IEC JTC 1.1993.ISO/IEC 11172-2(MPEG-1 Video).Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s—Part 2:Video[S].
    ISO/IEC JTC 1.1999.ISO/IEC 9899.Information technology-Programming language C[S].
    ISO/IEC JTC 1.2004.ISO/IEC 14492-2(MPEG-4 Visual).Coding of audio-visual objects—Part 2:Visual[S].3rd ed.
    ITU-T.1993.ITU-T Rec.H.261.Video coding for audiovisual services at p×64kbit/s[S].2nd ed.
    ITU-T.1993.ITU-T Rec.H.320.Narrow-band visual telephone systems and terminal equipment[S].
    ITU-T,ISO/IEC JTC 1.1994.ITU-T Rec.H.220.0 and ISO/IEC 13818-1(MPEG-2 Systems).Generic coding of moving pictures and associated audio information—Part 1:Systems[S].
    ITU-T,ISO/IEC JTC 1.1994.ITU-T Rec.H.262 and ISO/IEC 13818-2(MPEG-2 Video).Generic coding of moving pictures and associated audio information—Part 2:Video[S].
    ITU-T.2000.ITU-T Rec.H.263.Video coding for low bit rate communication[S].3rd ed.
    ITU-T,ISO/IEC JTC 1.2007.ITU-T Rec.H.264 and ISO/IEC 14496-10(MPEG-4 AVC).Advanced video coding for generic audiovisual services[S].8th ed.
    Lan A,Hwang JN.1997.Context dependent reference frame placement for MPEG video coding[C].IEEE Proc.ICASSP.2997-3000.
    Li CH,Wang CN,Chiang T.2004.A fast downsizing video transcoder based on H.264/AVC standard[C].Springer Proc.PCM.215-223.
    Li HQ,Liu H.2009.Overview of AVS video coding standard[M].Visual Communication and Networking:Chap.6.In press.
    Lin CW,Lee YR.2001.Fast algorithms for DCT-domain video transcoding[C].IEEE.Proc.ICIP.1:421-424.
    Liu H,Li HQ,Wang YK.2005.Showcase of scalability information SEI message[M].JVT Doc:JVT-Q067.
    Liu H,Wang YK,Li HQ.2008.A comparison between SVC and transcoding[J].IEEE Trans. Consumer Electron.54(3):1439-1446.
    Liu H,Wang YK,Chen Y,et al.2009.Spatial transcoding from Scalable Video Coding to H.264/AVC[C].Accepted by IEEE Proc.ICME.
    Liu H,Wang YK,Chen Y,et al.2009.Transcoding from Scalable Video Coding to H.264/AVC[J].Submitted to IEEE Trans.Broadcasting.
    Merhav N.1999.Multiplication-free approximate algorithms for compressed-domain linear operations on images[J].IEEE Trans.Image Processing.8:247-254.
    Ohm JR.1994.Three-dimensional subband coding with motion compensation[J].IEEE Trans.Image Processing.3(5):559-571.
    Ohm JR.2005.Advances in scalable video coding[J].Proceedings of IEEE.93(1):42-56.
    Ohm JR,Koenen R,Chiariglione L.2005.SVC requirements specified by MPEG(ISO/IEC/JTC 1/SC29/WG11)[M].JVT Doc:JVT-N026.
    Qian T,Sun J,Li D,et al.2006.Transform domain transcoding from MPEG-2 to H.264 with interpolation drift-error compensation[J].IEEE Trans.Circuits Syst.Video Technol.16(4):522-534.
    Reichel J,Schwarz H,Wien M,2007.Joint Scalable Video Model 11(JSVM 11)[M].JVT Doc:JVT-X202.
    Schwarz H,Hinz T,Marpe D,et al.2005.Constrained inter-layer prediction for single-loop decoding in spatial scalability[C].IEEE Proc.ICIP.2:870-873.
    Schwarz H,Marpe D,Wiegand T.2006.Analysis of hierarchical B pictures and MCTF.IEEE Proc.ICME.2006:1929-1932.
    Schwarz H,Marpe D,Wiegand T.2007.Overview of the scalable video coding extension of the H.264/AVC standard[J].IEEE Trans.Circuits Syst.Video Technol.17(9):1103-1120.
    Segall CA.2007.SVC-to-AVC bit-stream rewriting for coarse grain scalability[M].JVT Doc:JVT-V035.
    Segall CA,Sullivan GJ.2007.Spatial scalability within the H.264/AVC scalable video coding extension[J].IEEE Trans.Circuits Syst.Video Technol.17(9):1121-1135.
    Shanableh T,Ghanbari M.2000.Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats[J].IEEE.Trans.Multimedia.2:101-110.
    Shen B,Sethi IK,Vasudev B.1997.Adaptive motion vector re-sampling for compressed video downscaling[C].IEEE Proc.ICIP.1:771-774.
    Shen HF,Sun XY,Wu F,et al.2006.A fast downsizing video transcoder for H.264/AVC with rate-distortion optimal mode decision[C].IEEE Proc.ICME.2017-2020.
    Song BC,Kim TH,Chun KW.2002.Efficient video transcoding with scan format conversion
    Youn J,Sun MT,Lin CW.1999.Motion vector refinement for high performance transcoding[J].IEEE Trans.Multimedia.1:30-40.
    Youn J,Xin J,Sun MT.2000.Fast video transcoding architectures for networked multimedia[C].IEEE Proc.ISCAS.4:25-28.
    Zhang P,Lu Y,Huang Q,et al.2004.Mode mapping method for H.264/AVC spatial downscaling transcoding[C].IEEE Proc.ICIP.4:2781-2784.
    Zhu W,Yang KH,Beacken MJ.1998.CIF-to-QCIF video bitstream down-conversion in the DCT domain[J].Bell Labs Tech.J.3(3):21-29.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700