面向三维视频应用的多视角视频编码压缩算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

面向三维视频应用的多视角视频编码压缩算法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Researches on the Compression Algorithm of Multiview Video Coding for 3D Video
作者：朱玲
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：三维视频 ; 多视角视频编码 ; 视角切换 ; 视角合成预测
英文关键词：3D video coding ; Multiview Video Coding ; Viewpoint Switching ; View Synthesis Prediction
学位年度：2010
导师：李厚强
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2010-05-01

摘要

随着数字多媒体技术的快速发展和用户需求的不断提高,三维视频应用逐渐成为一个消费领域的潮流,获得越来越多的关注。三维视频,相比较与传统的二维视频,增加了真实场景的深度信息,能提供用户更加立体的视觉感受。三维视频通常由多视角的纹理信息及其相应的深度信息组成,数据量巨大,对存储和传输都带来了挑战,必须对其进行有效的压缩,尤其是信息量占绝对比重的多视角纹理视频信息部分。如何对多视角视频进行有效的压缩编码传输是推动三维视频应用的关键。因此,研究面向三维视频应用的多视角视频编码压缩算法具有广泛的应用价值以及理论意义。
     本文基于面向三维视频应用这一前提,研究多视角视频编码的一些关键技术,主要工作和创新之处在于:
     1.提出了一种在多视角视频流系统中利用视角间预测模式编码的冗余帧来实现视角切换的解决方案。
     在一个典型的任意视角视频流系统中,用户的需求是多样的,如果不分差异的将所有多个视角视频信息全部传输到不同的用户端,显然是低效的,这里一般采用视角切换技术。本文提出了利用视角间预测模式编码的冗余帧来实现视角切换的解决方案。即对位于潜在切换点的原始帧,根据其所在视角的特性,编码若干种冗余帧,当切换发生时,根据具体的切换场景选择最佳的冗余帧代替原始帧进行传输以实现有效的视角切换。实验结果表明本文提出的解决方案与传统的利用关键帧技术实现视角切换方法相比,不仅可以提升编码压缩性能,而且可以有效节约传输带宽。
     2.研究多视角视频编码中采用视角合成预测模式对三维视频编码整体性能的影响。
     三维视频应用中,无论是立体视频显示,或自由视角浏览,解码端都需要深度信息。深度信息旨在合成虚拟视角而不直接用于显示。如何结合这一特性在编码端利用它来提高多视角的纹理视频压缩效率,进而提高整体的三维视频编码压缩水平是非常有意义的。本文通过大量实验分析了采用视角合成预测模式对多视角视频编码性能的影响,结果表明这种预测模式提高了编码性能,同时表明虚拟视角能够提供比现实存在的参考视角更准确的预测信息。大量的实验数据对从事三维视频编码研究的专家学者而言具有参考指导意义。
With the rapid development of multimedia technologies and higher user demands, three dimensional (3D) video becomes a trend in the consumption area, attracting a great attention.
     3D video, compared to traditional two dimensional (2D) video, can provide more realistic stereoscopic experience. It commonly consists of multiview texture video sequences and corresponding depth information. The amount of the data is large, which is a challenge to storage and transmission especially under limited bandwidth. Thus it is necessary to efficiently compress and transmit the 3D video data, particularly the multiview texture information which constitutes the largest portion. And the key to promote the use of 3D video lies in the efficient coding and transmission of multiview video. In conclusion, it is of great theoretical and practical significance to do research on the compression of multiview video.
     This thesis focuses on the key technologies of multiview video coding for 3D video applications. And the main work and innovation are as follows.
     Firstly, this thesis proposes a novel scheme that enables viewpoint switching in multiview video streaming. In a typical free viewpoint video streaming system, users’demands are different. It is obviously inefficient to transmit the entire multiview stream to every different client, and viewpoint switching methods are usually adopted. We present a new scheme that enables viewpoint switching through inter-view-predicted redundant pictures. For each desired switching point, sets of redundant pictures are encoded using only inter-view prediction. When switching happens at a switch point, an appropriate redundant picture is adaptively chosen based on which views were transmitted before and will be transmitted after the switch point, and this redundant picture is transmitted instead of the original one to enable efficient viewpoint switching. Experimental results show that the proposed viewpoint switching method not only efficiently saves the transmission bandwidth but also improves the compression performance.
     Secondly, the thesis investigates the impact of introducing view synthesis prediction into multiview video coding on the overall performance of 3D video coding. In 3D video applications, whether 3D video displaying or free-view browsing, user terminals make use of the depth information just for synthesizing intermediate virtual views not for displaying. Hence it is very meaningful to try to reuse the encoded depth maps available both at the encoder and the decoder to improve the coding efficiency of the 3D video. In this thesis, a large number of experiments have been done to investigate the impact of view synthesis prediction on 3D video coding. The results demonstrated that VSP will improve the performance of multiview coding. Moreover, it shows that the synthesized virtual view will provide more accurate prediction than those existing reference views. Additionally, our research provides large amounts of accurate experimental data for the researchers in the fields concerned.

引文

毕厚杰,王建。2009。新一代视频压缩编码标准-H.264/AVC[M].人民邮电出版社。
    虞露,胡倩,易峰。2005。AVS视频的技术特定[J]。电视技术。
    Bjontegaard G.2001.Calculation of average PSNR differences between RD-curves[R].ITU-T Video Coding Experts Group.Doc.VCEG-M33.
    Chen S, Williams L.1993.View interpolation for image synthesis[C] .Proc. ACM Annu. Computer Graphics Conf. 279–288.
    Chen Y, Hannuksela MM, Zhu L,et al.2009. Coding techniques in multiview video coding and joint multiview video model[C]. 27th Picture Coding Symposium: PCS2009.
    Chen Y,Pandit P,Yea S.2009. JMVC_4_0[R]. ISO/IEC JTC1/SC29/WG11. Doc.N10340.
    Cheung G, Ortega A, and Cheung NM.2009. Generation of redundant frame structure for interactive multiview streaming[C]. Proc. IEEE Int. Conf. International Packet Video Workshop.
    Cigla C, Zabulis X.2007. Region-Based Dense Depth Extraction from Multi-View Video[C].Proc.IEEE. Int. Conf. on Image Processing
    Ding G.2008.A Multiview video coding method based on Distributed Source Coding for free viewpoint switching[C]. Proc. IEEE Int. Conf. on Intelligent Information Hiding and Multimedia Signal Processing. 438-441.
    Fan L, Ma SW, Wu F. 2004. Overview of AVS video standard [C]. Proc. IEEE Int. Conf. Multimedia and Expo. 423-426.
    Ince S, Martinian E.2007.Depth Estimation for View Synthesis in Multiview Video Coding[C].Proc. 3DTV Conference.
    Information technology—Generic coding of moving pictures and associated audio information—Part 2: Video, ISO/IEC DIS 13818-2, 1994.
    ISO/IEC JTC 1. 1993. ISO/IEC 11172-2(MPEG-1 Video). Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s—Part 2: Video [S].
    ISO/IEC JTC 1. 2004. ISO/IEC 14492-2 (MPEG-4 Visual). Coding of audio-visual objects—Part 2: Visual [S]. 3rd ed.
    ISO/IEC JTC1/SC29/WG11 .2009.Description of Exploration Experiments in 3D Video Coding[ R]. ISO/IEC JTC1/SC29/WG11. Doc.N10360.
    ISO/IEC JTC1/SC29/WG11. ISO/IEC 14496-16/PDAM1. Doc. N6544, Redmont, WA, USA, July 2004.
    ITU-T, ISO/IEC JTC 1. 1994. ITU-T Rec. H.220.0 and ISO/IEC 13818-1 (MPEG-2 Systems). Generic coding of moving pictures and associated audio information—Part 1: Systems [S].
    ITU-T, ISO/IEC JTC 1. 2007. ITU-T Rec. H.264 and ISO/IEC 14496-10 (MPEG-4 AVC). Advanced video coding for generic audiovisual services [S]. 8th ed. [17] ITU-T. 1993. ITU-T Rec. H.261.Video coding for audiovisual services at p×64kbit/s [S]. 2nd ed.
    ITU-T. 1993. ITU-T Rec. H.320. Narrow-band visual telephone systems and terminal equipment [S].
    ITU-T. 2000. ITU-T Rec. H.263. Video coding for low bit rate communication [S]. 3rd ed.
    Kurutepe E, Civanlar ML, and Tekalp AM.2007 .Client-driven selective streaming of multiview video for interactive 3DTV[J] . IEEE Trans. on Circuits and Systems for Video Technology. vol. 17, no. 11. 1558-1565.
    Lai KK, Chan YL et al.2008.Viewpoint switching in multiview videos using SP-frames[C]. Proc. IEEE Int. Conf.on Image Processing.1776-1779.
    Lee EK, Kim SY.2008.High-Resolution Depth Map Generation by Applying Stereo Matching Based on Initial Depth Informaton[C].Proc.3DTV Conference:The Ture vision-Capture, Transmission and Display of 3D Video.
    Merkle P, Smolic A, Muller K et al. 2007. Multi-view video plus depth representation and coding [C]. Proc. IEEE Int. Conf. on Image Processing.
    Merkle P, Smolic A,et al.2007.Efficient prediction structures for multiview video coding[J]. IEEE Trans. on Circuits and Systems for Video Technology. vol. 17, no. 1. 1461-1473.
    Mori YJ, Fukushima N, Fujii T,et al.2008. View Generation with 3D Warping using depth information for FTV[C]. Proc.3DTV-CON2008. 229-232.
    Pandit P, Vetro A, Chen Y.2008.JMVM 8 software[R].JVT-AA208.
    Richardson I, 2002. Overview of H.264, H.264/Mpeg-4 Part10 White Paper [EB/OL]. http://www.vcodex.com.
    Shade J, Gortler S, He LW, et al.1998.Layered depth images[C]. Proc. ACMAnnu.Computer Graphics Conf.231–242.
    Smolic A, Kauff P. 2005. Interactive 3-D video representation and coding technologies [J]. Proceedings of the IEEE. 93(1): 98-110.
    Smolic A, McCutche D. 2004. 3DAV exploration of video-based rendering technology in MPEG [J]. IEEE Trans. Circuits and Syst. Video Technol. 14(3): 348-356.
    Smolic A, Mueller K, Merkle P, et al. 2006. 3D video and free viewpoint video - technologies, applications and MPEG standards [C]. Proc. IEEE Int. Conf. Multimedia and Expo. 2161-2164.
    Smolic A, Mueller K, Stefanoski N ,et al.2007.Coding Algorithms for 3DTV—A Survey[J]. IEEE Trans.οn Circuits and Systems for Video Technology, Vol 7, Issue 11. 1606-1621.
    Smolic A, Mueller K, Stefanoski N, et al. 2007. Coding algorithm for 3D TV-a survey [J]. IEEE Trans. Circuits Syst. Video Technol. 17(11): 1606-1621.
    Stockhammer T, Hannuksela M M, and Viegand T, H.264/AVC in wireless environments. IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 657-673, July 2003.
    Su YP, Vetro A,Smolic A.2006. Common test conditions for multiview video coding[R]. ISO/IEC JTC1/SC29/WG11and ITU-T Q6/SG16, Doc. JVT-T207.
    Sun J, Shum HY, and Zheng NN.2003.Stereo matching using belief propagation[J].IEEE Trans. Pattern Analysis and Machine Intelligence,vol.25, no. 7. 787–800.
    Tao SP, Chen Y, Hannuksela MM, et al. 2009. Depth Map Coding Quality Analysis for View Synthesis [M]. MPEG Doc: m16050.
    Vetro A, Matusik W, Pfister H, et al. 2004. Coding approaches for end-to-end 3D TV systems [C]. Proceedings of the 23rd Picture Coding Symposium. 319-324.
    Vetro A, Pandit P, Kimata H, et al. 2008. Joint draft 9.0 on multi-view video coding [M]. JVT Doc: JVT-AB204.
    Wiegand T, Sullivan GJ, Bjontedaara G, et al. 2003. Overview of the H.264/AVC video coding standard [J]. IEEE Trans. Circuits Syst. Video Technol. 13(7): 560-576.
    Yang H, Chang Y, Huo J,et al.2008.CE1, Fine motion matching for motion skip mode in MVC[R]. JVT-Z021.
    Zhu L, Hannuksela MM, Li H. 2010. Inter-view-predicted redundant pictures forviewpoint switching in multiview video streaming [C]. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700