2D视频转3D视频的空洞处理方法研究

英文题名：Hole Filling in2D Video to3D Conversion
作者：刘晓军
论文级别：硕士
学科专业名称：计算机技术
中文关键词：3D ; DIBR ; 基于深度图的绘制 ; 遮挡区域 ; 空洞填补 ; 时空一致性 ; 虚拟视点
英文关键词：3D ; DIBR ; depth image based rendering ; Occlusion ; Holes filling ; Spatial-
英文关键词：temporal consistency ; Virtual view
学位年度：2013
导师：张家树
学科代码：0812
学位授予单位：西南交通大学
论文提交日期：2013-05-01

摘要

随着显示设备的发展,3D电视在市场上普及开来,裸眼3D电视也正在进入人们的生活。3D显示给观众带来了更真实更震撼的体验,供给裸眼显示设备的多视点内容却远远不能满足广大消费者的需求。3D内容的欠缺是目前制约3D推广发展的因素之一,因此2D转3D或者双目转多目技术就成为3D视频生成的关键技术。基于深度图的绘制(Depth Image Based Rendering,DIBR)因为兼容性好、质量较高、灵活性强等特点成为3D内容制作的重要方法。利用通常的图像数据和对应的深度图可以合成虚拟视点图像,虚拟视点图像和原图像有可调节的视差,多幅虚拟视点图像就和原图一起组成了多视点内容,在立体显示设备播放。利用已有的视频内容生成供裸眼显示设备播放的多视点内容成为市场的迫切需求。基于深度图绘制的虚拟视点生成主要分为三步：深度图获取及处理、3D变换、空洞处理。由于场景中物体具有不同的景深及遮挡关系,在当前视点中被遮挡的物体或背景,在虚拟视点中可能暴露出来。暴露的区域因缺乏有效信息,常称为空洞。空洞区域处理不当,会严重影响观众的视觉体验。视频中通常有各种各样的场景,情况复杂,合成空洞区域真实的内容非常具有挑战性。因此空洞区域的处理就成为DIBR过程中重要的环节。空间一致性和时域稳定性是虚拟视点的两个重要因素,满足这两点就可以给观众良好的视觉体验。采用简单的算法填补空洞区域,常会造成纹理错位、模糊,边缘瑕疵,帧间闪烁等问题。本文针对基于深度图绘制方法生成的虚拟视点存在空洞这一问题,分析总结已有算法,将图像邻域及视频帧间信息用于填补空洞,以改善虚拟视点图像质量。本文详细讨论实现了深度图质量对虚拟视点合成的影响、基于图像修复的快速填补和基于样例的空洞填补方法,改善了虚拟视点的质量。
     本文主要从以下几个方面展开了工作：
     1)深度图预处理方面,重点实现比较了双边滤波,联合双边滤波、时空域结合的滤波对深度图质量的改进,实验通过深度图预处理改进虚拟视点质量的途径。
     2)针对边缘插值法在填补空洞时瑕疵明显的缺点,利用图像快速修复技术,对空洞区域进行填补,修复过程中利用方向化的快速行进法,从背景向前景进行修复,保证了距离初始边界较近的点优先修复。该方法为邻域方法,计算相对简单,纹理简单区域能较好修复；修复过程不受深度图质量影响。
     3)对视频中空洞,采用基于图像样例块的修复方法,对空洞区域进行帧间或邻域采样,寻找与空洞点邻域最相似图像块来填补当前空洞。利用视频的帧间信息,对当前视点遮挡的区域,通过在其他帧中搜索对应信息的出现,填补此类空洞。本文主要分析改进了基于样例方法中优先度的计算、匹配代价的计算等关键问题,提高了匹配的准确度。利用视频的帧间相关性,对空洞区域所需信息在帧间进行搜索匹配,将最佳候选信息用做空洞区域的填补,保证虚拟视点的时空一致性,实现较高质量的虚拟视点的合成。
     4)结合上述主要算法,开发了一款基于MFC和OpenCV的虚拟视点生成演示软件。
With the development of display devices,3DTV is popular on the market and auto-stereoscopic TV is coming into our life. The3D display brings more realistic and shocking experience to us, but the contents for auto-stereoscopic are far from enough to customer satisfac-tion. The short of3D content is one limitation of3D technology generalization, so2D to3D or stereoscopic to multi-view conversion becomes the key technology for3D video generation. Due to the advantages of backward compatible, high quality and flexibility, the depth image based view generation becomes the important method for3D creation. Using the texture image and corresponding depth map we could rendering virtual views, which has adjustable parallax with the original. Original views plus their virtual views compose multi-view contents for the auto-stereo-scopic display device. The method of using existing2D video generate multi-view video for auto-stereoscopic display devices plays the urgent needs of the market. Generally, the depth image based rendering is divided into three steps: depth map acquisition and processing,3D warping, disocclusion handing. Due to objects of scene usually have different depth and occlusion, the object or the background which is occluded in the current view, may be exposed in the virtual view. Disocclusion areas are often defined as holes due to lack of effective information. The quality of these filled hole areas will seriously affect the audience's visual experience. Because of the wide variety of complex scene of video, synthesis disocclusion area's real content is very challenging. Handling of the disocclusion area is one of the most important parts of the DIBR. The spatial coherence and the temporal stability are two important factors of the virtual view, accomplish these factors could bring good visual experience to the audience. Simple algorithms to fill the hole areas, fast but often cause the texture dislocation, blur, edge defects and interflicker.
     This thesis, aims at the problem of filling disocclusion in the virtual view which is generated by DIBR, analyzes existing algorithms, tries to use image neighborhood and video interframe information to fill the hole, to improve the quality of the virtual viewpoint. This paper discusses the impact of the depth map quality with virtual view synthesis and employs fast image inpainting based method and the exemplar based disocclusion filling method, finally improves the quality of the virtual view. This paper's work is mainly composed with the following aspects:
     1) preprocessing of depth map, focuses on the affect of bilateral filtering, joint bilateral filtering, and spatial-temporal filtering to generate a virtual viewpoint, explores ways to improve the quality of the virtual view by preprocessing depth map.
     2) the edge interpolation method to fill the hole area always attains defects, utilize fast image inpainting technology to fill the area. As a neighborhood approximate method, the calculation is relatively simple, simple texture region can get satisfactory result and the repair procedure is not affected by the quality of depth map.
     3) For the hole areas of video, use the method of exemplar block of image, tries to use informa-tion from inter frame. The occlusion area of the current view may appear in other frames, so searches corresponding information to fill these holes and samples the inter-frames or neighborhoods to find the most similar image blocks with hole area. This thesis mainly analyzes the key issues of priority calculation and the calculation of the matching cost of exemplar based method. Utilizes frame coherence of video, searches the required information and matches the best candidate to fill the area, make sure spatial and temporal consistency of the virtual view to achieve synthesis of the virtual viewpoint with high quality.
     4) Based on these main algorithms, develops a presentation software of virtual view genera-tion based on MFC and OpenCV.

引文

[1]Dodgson N A. Analysis of the viewing zone of multiview autostereoscopic displays[C] //Electronic Imaging 2002. International Society for Optics and Photonics,2002:254-265.
    [2]de Boer C N, Verleur R, Heuvelman A, et al. Added value of an autostereoscopic mul-tiview 3-D display for advertising in a public environment[J]. Displays,2010,31(1):1-8.
    [3]3D-TV System with Depth-Image-Based Rendering:Architecture, Techniques and Chall-enges[M]. Springer,2012.
    [4]Scharstein D, Szeliski R. High-accuracy stereo depth maps using structured light[C] //Computer Vision and Pattern Recognition,2003. Proceedings.2003 IEEE Computer Soci-ety Conference on. IEEE,2003,1:I-195-I-202 vol.1.
    [5]Lange R, Seitz P. Solid-state time-of-flight range camera[J]. Quantum Electronics, IEEE Journal of,2001,37(3):390-397.
    [6]Sun J, Zheng N N, Shum H Y. Stereo matching using belief propagation[J]. Pattern Anal-ysis and Machine Intelligence, IEEE Transactions on,2003,25(7):787-800.
    [7]Wang Z F, Zheng Z G. A region based stereo matching algorithm using cooperative opti-mization[C]//Computer Vision and Pattern Recognition,2008. CVPR 2008. IEEE Confere-nce on. IEEE,2008:1-8.
    [8]Yoon K J, Kweon I S. Adaptive support-weight approach for correspondence search[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2006,28(4):650-656.
    [9]Li P, Farin D, Gunnewiek R K, et al. On creating depth maps from monoscopic video using structure from motion[C]//Proc. IEEE Workshop on Content Generation and Coding for 3D-televsion.2006:508-515.
    [10]Zhang G, Jia J, Wong T T, et al. Consistent depth maps recovery from a video sequence [J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2009,31(6):974-988.
    [11]Zhang G, Jia J, Wong T T, et al. Recovering consistent video depth maps via bundle optimization[C]//Computer Vision and Pattern Recognition,2008. CVPR 2008. IEEE Conf erence on. IEEE,2008:1-8.
    [12]De Silva D, Fernando W A C, Worrall S T.3D video communication scheme for error prone environments based on motion vector sharing[C]//3DTV-Conference:The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON),2010. IEEE,2010:1-4.
    [13]Yan B. A novel H.264 based motion vector recovery method for 3D video transmission [J]. Consumer Electronics, IEEE Transactions on,2007,53(4):1546-1552.
    [14]Liu Y, Wang J, Zhang H. Depth image-based temporal error concealment for 3-D video transmission[J]. Circuits and Systems for Video Technology, IEEE Transactions on,2010, 20(4):600-604.
    [15]Merkle P, Morvan Y, Smolic A, et al. The effects of multiview depth video compression on multiview rendering[J]. Signal Processing:Image Communication,2009,24(1):73-88.
    [16]Kim W S, Ortega A, Lai P L, et al. Depth map coding with distortion estimation of rendered view[J]. Visual Communication and Information Processing,2010,7543.
    [17]Oh K J, Vetro A, Ho Y S. Depth coding using a boundary reconstruction filter for 3-D video systems[J]. Circuits and Systems for Video Technology, IEEE Transactions on,2011, 21(3):350-359.
    [18]Fehn C. Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV[C]//Electronic Imaging 2004. International Society for Optics and Photonics,2004:93-104.
    [19]Konrad J, Halle M.3-D displays and signal processing[J]. Signal Processing Magazine, IEEE,2007,24(6):97-111.
    [20]Urey H, Chellappan K V, Erden E, et al. State of the art in stereoscopic and autostereo-scopic displays[J]. Proceedings of the IEEE,2011,99(4):540-555.
    [21]Meesters L M J, IJsselsteijn W A, Seuntiens P J H. A survey of perceptual evaluations and requirements of three-dimensional TV[J]. Circuits and Systems for Video Technology, IEEE Transactions on,2004,14(3):381-391.
    [22]Redert A, de Beeck M O, Fehn C, et al. Advanced three-dimensional television system technologies[C]//3D Data Processing Visualization and Transmission,2002. Proceedings. First International Symposium on. IEEE,2002:313-319.
    [23]Chen W Y, Chang Y L, Lin S F, et al. Efficient depth image based rendering with edge dependent depth filter and interpolation[C]//Multimedia and Expo,2005. ICME 2005. IEEE International Conference on. IEEE,2005:1314-1317.
    [24]Daribo I, Tillier C, Pesquet-Popescu B. Distance dependent depth filtering in 3D warp-ing for 3DTV[C]//Multimedia Signal Processing,2007. MMSP 2007. IEEE 9th Workshop on. IEEE,2007:312-315.
    [25]Bertalmio M, Bertozzi A L, Sapiro G. Navier-stokes, fluid dynamics, and image and video inpainting[C]//Computer Vision and Pattern Recognition,2001. CVPR 2001. Proceed-ings of the 2001 IEEE Computer Society Conference on. IEEE,2001,1:1-355-I-362 vol.1.
    [26]Oh K J, Sehoon Y, Ho Y S. Hole-filling method using depth based in-painting for view synthesis in free viewpoint television (ftv) and 3D video[C]//Picture Coding Symposium. 2009.
    [27]Daribo I, Pesquet-Popescu B. Depth-aided image inpainting for novel view synthesis [C] //Multimedia Signal Processing (MMSP),2010 IEEE International Workshop on. IEEE, 2010:167-170.
    [28]Ndjiki-Nya P, Koppel M, Doshkov D, et al. Depth image-based rendering with advance-ed texture synthesis for 3-D video[J]. Multimedia, IEEE Transactions on,2011,13(3):453-465.
    [29]Schmeing M, Jiang X. Depth image based rendering: a faithful approach for the disocc-lusion problem[C]//3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON),2010. IEEE,2010: 1-4.
    [30]Yang Q, Yang R, Davis J, et al. Spatial-depth super resolution for range images [C]//Computer Vision and Pattern Recognition,2007. CVPR'07. IEEE Conference on. IEEE, 2007:1-8.
    [31]Richardt C, Stoll C, Dodgson N A, et al. Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos[C]//Computer Graphics Forum. Blackwell Publishing Ltd, 2012,31(2ptl):247-256.
    [32]Zitnick C L, Kang S B, Uyttendaele M, et al. High-quality video view interpolation us-ing a layered representation[C]//ACM Transactions on Graphics (TOG). ACM,2004,23(3): 600-608.
    [33]D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms.International Journal of Computer Vision,47(1/2/3):7-42, April-June 2002.
    [34]黄晓军.二维转三维视频技术研究[D].浙江大学,2011.
    [35]Zhou C, Lin S, Nayar S. Coded aperture pairs for depth from defocus[C]//Computer Vision,2009 IEEE 12th International Conference on. IEEE,2009:325-332.
    [36]Battiato S, Curti S, La Cascia M, et al. Depth map generation by image classification[C] //Electronic Imaging 2004. International Society for Optics and Photonics,2004:95-104.
    [37]Loh A M, Hartley R. Shape from non-homogeneous, non-stationary, anisotropic, perspe-ctive texture[C]//Proc. of the BMVC.2005:69-78.
    [38]Kowdle A, Snavely N, Chen T. Recovering depth of a dynamic scene using real world motion prior[C]//Image Processing (ICIP),2012 19th IEEE International Conference on. IEEE,2012:1209-1212.
    [39]Zhong L, Wang S, Park M, et al. Towards Automatic Stereoscopic Video Synthesis from a Casual Monocular Video[C]//Multimedia (ISM),2012 IEEE International Symposium on. IEEE,2012:306-313.
    [40]Eisemann E, Durand F. Flash photography enhancement via intrinsic relighting [C] //ACM transactions on graphics (TOG). ACM,2004,23(3):673-678.
    [41]Petschnigg G, Szeliski R, Agrawala M, et al. Digital photography with flash and no-flash image pairs[C]//ACM transactions on graphics (TOG). ACM,2004,23(3):664-672.
    [42]Gangwal O P, Berretty R P. Depth map post-processing for 3D-TV[C]//Consumer Elec-tronics,2009. ICCE'09. Digest of Technical Papers International Conference on. IEEE,2009: 1-2.
    [43]Lai Y K, Lai Y F, Lin J W. High-quality view synthesis algorithm and architecture for 2D to 3D conversion[C]//Circuits and Systems (ISCAS),2012 IEEE International Sympo-sium on. IEEE,2012:373-376.
    [44]Fehn C. A 3D-TV approach using depth-image-based rendering (DIBR)[C]//Proc. of VIIP.2003,3.
    [45]Zhang L, Tam W J. Stereoscopic image generation based on depth images for 3D TV[J]. Broadcasting, IEEE Transactions on,2005,51(2):191-199.
    [46]Ndjiki-Nya P, Koppel M, Doshkov D, et al. Depth image-based rendering with advan-ced texture synthesis for 3-D video[J]. Multimedia, IEEE Transactions on,2011,13(3):453-465.
    [47]Grossauer H. A combined PDE and texture synthesis approach to inpainting[M]// Computer Vision-ECCV 2004. Springer Berlin Heidelberg,2004:214-224.
    [48]N Greene, M Kass, G Miller, Hierarchical Z-buffer visibility, inProceedings of the 20th annual conference on Computer graphics and interactive techniques, Anaheim, CA, USA, (ACM Press,1993), pp.231-238
    [49]Azzari L, Battisti F, Gotchev A. Comparative analysis of occlusion-filling techniques in depth image-based rendering for 3D videos[C]//Proceedings of the 3rd workshop on Mobile video delivery. ACM,2010:57-62.
    [50]Wang Z, Zhou F, Qi F. Inpainting thick image regions using isophote propagation[C] //Image Processing,2006 IEEE International Conference on. IEEE,2006:689-692.
    [51]Benzarti F, Amiri H. Repairing and Inpainting Damaged Images using Diffusion Tensor [J]. International Journal of Computer Science,2012,9.
    [52]Telea A. An image inpainting technique based on the fast marching method[J]. Journal of graphics tools,2004,9(1):23-34.
    [53]Sethian J A. A fast marching level set method for monotonically advancing fronts[J]. Proceedings of the National Academy of Sciences,1996,93(4):1591-1595.
    [54]Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment:From error visibility to structural similarity [J]. Image Processing, IEEE Transactions on,2004,13(4):600-612.
    [55]Criminisi A, Perez P, Toyama K. Region filling and object removal by exemplar-based image inpainting[J]. Image Processing, IEEE Transactions on,2004,13(9):1200-1212.
    [56]Cheng C M, Lin S J, Lai S H. Spatio-temporally consistent novel view synthesis algori-thm from video-plus-depth sequences for autostereoscopic displays[J]. Broadcasting, IEEE Transactions on,2011,57(2):523-532.
    [57]Xi M, Wang L H, Yang Q Q, et al. Depth-image-based rendering with spatial and temp-oral texture synthesis for 3DTV[J]. EURASIP Journal on Image and Video Processing,2013, 2013(1):9.
    [58]Koppel M, Ndjiki-Nya P, Doshkov D, et al. Temporally consistent handling of disocclu-sions with texture synthesis for depth-image-based rendering[C]//Image Processing (ICIP), 2010 17th IEEE International Conference on. IEEE,2010:1809-1812.
    [59]张晴.基于样本的数字图像修复技术研究[D].华东理工大学,2012.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700