多视点视频编码中关键技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着立体显示技术的发展,有关三维(3D)视觉的研究逐渐升温,3D视频信号也成为了未来多媒体通信的主要内容。多视点视频则是现阶段表征3D视频信号的重要方式,它蕴涵了景物的深度信息,在自然场景的表征上更具真实感,在3D电视、自由视点电视、具有临场感的可视会议及虚拟现实等领域也展现了广阔的应用前景。随着视点的增加,数据压缩成了该应用领域的重点研究课题。在多视点视频中,除了各个视频流内具有很强的空间和时间相关性,各视点之间也具有一定的交叉相关性,因此,如何有效地利用视点间的视差信息以去除冗余是提高多视点视频编码效率的关键。为提高多视点视频的压缩效率,本文对多视点视频编码的预测框架、运动与视差矢量的预测、基于颜色差异补偿的视差预测编码以及基于对象的立体视频编码等方面进行了分析与研究。
     本文首先分析了多视点视频中视差预测特性和各种相关性的相对大小,在此基础上,提出了基于H.264的多视点视频编码方案。在视差预测中引入全局视差预测编码模式,并将其集成到H.264的多模式预测编码中,提高了压缩效率;为减少编码视差矢量和运动矢量所需的比特数,提出了改进的视差矢量和运动矢量预测方法,该方法除了利用视差矢量和运动矢量的空间相关性,还利用了它们在相邻视点或相邻时刻的对应关系。
     在多视点视频中,由于各摄像机所处方位不同,接收到的光线强度存在差异,同时各摄像机的增益、电平等也不能保证完全一致,导致实际获得的多视点视频图像之间存在着颜色(包括亮度和色差信号)差别,从而严重影响多视点视频的压缩性能。为进一步提高多视点视频的压缩效率,本文深入研究了基于颜色差异补偿的视差预测编码。在分析了不同视点图像之间的颜色差异基础上,对其进行建模,提出并实现了两种基于颜色差异补偿的视差预测编码方法:全局线性颜色差异补偿法及全局非线性颜色差异补偿法。实验结果验证了本文的颜色差异补偿方法明显地改善了视差预测,提高了多视点视频的压缩效率。
     基于对象的立体视频编码压缩技术在立体视频会议系统、视频检索等应用背景下具有较高的实用性,从立体视频信号中正确分割出立体视频对象是基于对象的立体视频编码的一个前提条件。基于此,本文分析了立体视频对象分割的特点,提出了基于对象的视差估值算法,接着通过视差分割及变化检测提取视频对象,最后设计了基于对象的视频编码方案。结果显示该方案较好地提高了压缩效率。
Studies on three-dimensional (3D) vision are recently becoming increasingly popular due to advances in the 3D display technologies. In soon future, 3D video signal will be important contents of multimedia communication. As the most important 3D display technology, multi-view video has been widely used in the fields of 3D TV, free-viewpoint TV, immersive videoconferencing, virtual reality etc. However, its huge volumn of data is one major obstacle for the application of multi-view video. A multi-fold increase in bandwidth over the existing single-view makes it extremely tough to transmit and store multi-view video data. This thesis mainly concerns the problems of highly efficient multi-view video coding (MVC). To achieve high compression efficiency, correlation among different views must be exploited in multi-view video coding scheme. To this end, several efficient encoding schemes have been proposed in this paper.
     Firstly, an H.264-based multi-view video coding scheme is introduced. It uses the advanced predictive method of H.264 to eliminate the spatial, temporal and inter-view correlation in multi-view video. According to the characteristic of multi-view video, global disparity coding method is employed. Based on an eight-parameter global disparity model, two global disparity coding schemes are proposed. To decrease the bit rate of motion vector (MV) and disparity vector (DV), an optimized MV and DV predicted method is proposed. It utilizes not only the correlation of neighboring blocks, but also that of corresponding blocks in the adjacent images.
     Due to the parameter scatterance of cameras, there exists serious color-fluctuation (including brightness variation) among different views. This fluctuation will degrade the compression performance. To improve the coding efficiency of MVC, the method of color-fluctuation compensation is investigated. Firstly , color-fluctuation is analyzed based on the image formation theory. Then, several color-fluctuation compensation methods are proposed according to the simplified models. They are global linear color-variation compensation, global non-linear color-variation compensation, local color-variation compensation and global-local-adaptive color-variation compensation. Experiment results have shown these methods can greatly improve the performance of disparity prediction and the coding efficiency of MVC.
     Finally, object-based stereo and multi-view video technologies are discussed. An efficient algorithm for disparity matching is presented. Then, disparity field and high-order statistics based stereo video object segmentation method is proposed. Finally, an object based stereo video coder is implemented. Performance of object-based encoder verifies the validity of our algorithm.
引文
[1] V.S.Nalwa. “A Guided Tour of Computer Vision [M]”: Addson-Wesley, 1993
    [2] S. Aljoscha, M.C. Chen, “3DAV exploration of video-based rendering technology in MPEG[J]” , IEEE Transaction on Circuit and Systems for Video Technology, vol.14 (3), pp:348-356, 2004.
    [3] T. Kanade, P. Rander, P. Narayanan,“Virtualized reality: Constructing virtual worlds from real scenes[J]”, IEEE Multimedia, Immersive Telepresence 4, pp:34–47, 1997.
    [4] T. Okishi, “Three-Dimensional imaging techniques [M]”, Academic Press. 1976.
    [5] www.stereo3d.com
    [6] G.P. Li, Y. He, “A novel multi-view video coding scheme based on H.264 [C]”, ICICS-PCM, pp:493-497, 2003.
    [7] W. Woontack and O. Antonio, “Overlapped block disparity compensation with adaptive windows for stereo image coding [J]”, IEEE Trans. on Circuits and Systems for Video Technology, vol.10 (2), pp:194-200, 2000.
    [8] G.P. Michael, “Data compression of stereopairs [J]”, IEEE Trans. on Communications, vol.40(4), pp:684-696, 1992.
    [9] ISO/IEC JTC1/SC29/WG11, “Call for evidence on multi-view video coding [S]”, MPEG document N6720, Palma de Mallorca, 2004.
    [10] ISO/IEC JTC1/SC29/WG11, “Survey of algorithms for multi-view video coding [S]”, MPEG Document N6909, Hong Kong, 2005.
    [11] M.E. Lukacs, “Predictive coding of multi-viewpoint image sets [J]”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 1, pp:521-524, 1986.
    [12] ISO/IEC 13 818-2, AMD 3, “MPEG-2 multiview profile [S]”, ISO/IEC JTC1/SC29/WG11, document no. N1366, 1996.
    [13] S. Sethuraman, “Stereoscopic image sequence compression using multi-resolution and quadtree decomposition based disparity and motion-adaptive segmentation [D]”, Carnegie Mellon University, 1996.
    [14] W. Woontack, “Rate-distortion based dependent coding for stereo images and video: disparity estimation and dependent bit allocation [D]”, Dept. of Electrical Engineering, Faculty of the graduate school , University of Southern California, 1999.
    [15] W.X. Yang, N.K. Ngan, “MPEG-4 based stereoscopic video sequences encoder [C]”, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.3, pp: 741-744, 2004.
    [16] T. Chiang and Y.Q. Zhang, “A new rate control scheme using quadratic rate distortion model [J]” IEEE Trans. Circuits Syst. Video Technol., vol.7, pp:246-250, 1997.
    [17] Y. Luo, Z Y. Zhang, P. An, “Stereo video coding based on frame estimation and interpolation [J]”, IEEE Transaction on Broadcasting, vol.49(1), pp:14-21, 2003.
    [18] G.P. Li, Y. He, “A novel multi-view video coding scheme based on H.264 [C]”, ICICS-PCM, pp:493-497, 2003.
    [19] F. Ulrich, K.Andre, “H.264/AVC compatible coding of dynamic light fields using transposed picture ordering [J]”, 2005 European Signal Processing Conference, Antalya, Turkey, Sep. 4–8, 2005.
    [20] X. Guo, Q.M. Huang, “Multiview video coding based on global motion model [C]”, Proceeding of pacific-rim conference on multimedia, pp:665–672, 2004.
    [21] M. Waldowski, “A new segmentation algorithm for videophone application based on stereo image pairs [J]”, IEEE Transactions on Communicaitons, vol.39(12),pp:1856-1868, 1991.
    [22] M. Ziegler and S. Panis, “An object-based stereoscopic coder [C]”, Intl. workshop on Stereoscopic and Three Dimensional Imaging, pp:40-45, 1995.
    [23] J. Lopez, G. Chen, J.H. Kim and A. Ortega, “Illumination compensation for multi-view video compression [S]”, ISO/IEC JTC1/SC29/WG11 Doc M11132, 2004.
    [24] 朱仲杰,郁梅,蒋刚毅,“用于立体视频会议系统的立体对象分割与跟踪算法 [J]”,计算机辅助设计与图形学学报,vol. 16(3), 2004.
    [25] 董太和,“立体视觉的机制 [J]”,照相机,vol.7,pp:8-13,1994.
    [26] 候紫峰,“计算机立体视觉 [J]”,程序员技术,vol.3,pp:87-90, 1994.
    [27] 梁栋,韦穗,周敏彤,“双眼立体感知几何模型的研究 [J]”,中国图象图形学报,vol. 3(8),pp:679 一 683, 1998.
    [28] B. Julesz, “Foundations of Cyclopean Perception [M]”, Chicago: The University of Chicago Press, 1971.
    [29] B. Kost and S. Pastoor, “Visibility thresholds for disparity quantization errors in stereoscopic displays [J]”, Proc. SID, vol. 32(2), pp: 165-170, 1991.
    [30] A. Smolic, H. Kimata, “Report on 3DAV exploration [S]”, ISO/IEC JTC1/SC29/WG11, Doc. N5878, 2003.
    [31] M. Hirose, “Image-based virtual world generation [J]”, IEEE Multimedia, vol.4(1), pp:27-33, 1997.
    [32] S. Pollard, M. Pilo, S. Hayes, A. Lorusso, “View synthesis by trinocular edge matching and transfer [J]”, Image and Vision Computing, vol.18, pp:749-757, 2000.A.
    [33] E. Izquierdo, X. H. Feng, “Modeling arbitrary objects based on geometric surface conformity [J]”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 9(2), pp:336-352, 1999.
    [34] C. Buehler, M. Bosse, L. Mcmillan, S. Gortler, and M. Cohen, “Unstructured lumigraph rendering [J]”, Proceedings of SIGGRAPH, pp:425–432, 2001.
    [35] S. Pastoor and M. Wopking, “A review of current technologies [J]”, Displays 17, pp:100–10, 1997.
    [36] G. Hamagishi, M. Sakata, A. Yamashita, et al., “New stereoscopic LC displays without special g lasses [J]”, Asia Display, pp:791–794, 1995.
    [37] R. Borner, B. Duckstein, O. Machui, and T. Sikora, “A family of single-user autostereoscopic displays with head-tracking capabilities [J]”, IEEE Transactions Circuits and Systems for Video Technology, vol.10(2), pp:234–243, 2000.
    [38] H. Kimata, M. Kitahara, K. Kamikura, and Y. Yashima, “Multi-view video coding using reference picture selection for free-viewpoint video communication [C]”, Picture Coding Symposium 2004, San Francisco, California, USA, pp:15–17, 2004.
    [39] Smolicand and D. McCutchen, “3DAV exploration of video-based rendering technology in MPEG [J]”, IEEE Transaction on Circuits and Systems for Video Technology, vol.14(3), pp:348–356, 2004.
    [40] V. Anthony, M. Wojciech, “Coding approaches for end-to-end 3D TV systems [Z]”, Mitsubish Lab technique report, 2004.
    [41] J.D. Gibson, T. Berger, T. Looabaugh, et al., “Digital compression for multimedia principles & standards [J]”, Morgan Kaufmann Publishers Inc., 1997.
    [42] A.M.Tekalp, “Digital Video Processing [Z]”, Englewood Cliffs: Printice Hall, 1995.
    [43] R.J.Clarke, “Transform coding of image [J]”, New York: Academic Press, 1984.
    [44] I. Daubechies, “Orthogonal bases of compactly supported wavelets [J]”, Communications on Pure and Applied Mathematics, vol.41, pp:909-996, 1988.
    [45] 沈兰荪,图像编码与异步传输,北京:人民邮电出版社,1998.
    [46] A.B. Watson, “DCT quantization matrices visually optimized for individual images [C]”, Proceedings of the SPIE conference on human vision, visual processing and digital display IV, vol. 1913, pp: 202-216, 1993
    [47] 周荫清,信息理论基础,北京:北京航空航天大学出版社,1993.
    [48] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, G.J. Sullivan, “Rate-constrained coder control and comparison of video coding standards [J]”, IEEE Transaction on Circuit and Systems for Video Technology, vol.13(7), pp:688-703, 2003.
    [49] P. Lambert, D.N. Wesley, D.N. Philippe, M. Ingrid, D. Piet, “Rate-distortion performance of H.264/AVC compared to state-of-the-art video codecs [J]”, IEEE Trans. on Circuits and Systems for Video Technology, vol.16(1), 2006.
    [50] F. Ulrich, K.Andre, “H.264/AVC compatible coding of dynamic light fields using transposed picture ordering [C]”, 2005 European Signal Processing Conference, Antalya, Turkey, Sep. 4–8, 2005.
    [51] L. Rhee, G.R. Martin, S. Muthukrishnan, R.A. Packwood, “Quadtree-structured variable-size block-matching motion estimation with minimal error [J]”, IEEE Transactions on Circuits and Systems for VideoTechnology, vol.10 (1), pp: 42-50, 2000.
    [52] T.Wiegand, G.J. Sullivan, G. Bjontegarrd and A.Luthra. “Overview of the H.264/AVC Video Coding Standard [J]”, IEEE Transaction on circuits and systems, vol.13(7), pp: 560-576, 2003.
    [53] Anthony Vetro, Wojciech Matusik,etc, “Coding Approaches for end-to-end 3D TVSystems [Z]”,Mitsubish Lab,2005.
    [54] L.Zitnick, S.B.Kang, M.Uyttendaele, S.Winder, and R.Szeliski, “High-quality video view interpolation using a layered representation [J]” ACM Transaction on Graphics, vol. 23(3), pp:598–606, Aug. 2004.
    [55] Shiping LI, Mei YU, Gangyi JIANG, Tae-Young CHOI, Yong-Deak KIM, “Approaches To H.264-Based Stereoscopic Video Coding [C]”, Proceedings of the third International Conference on Image and Graphics ( ICIG’04) ,2004[9].
    [56] F. Dufaux and J. K. Efficient, “Robust and fast global motion estimation for video coding [J]”, IEEE Trans. on Image Processing, vol.9(3), pp:497-501, 2000.
    [57] H.Z. Jia, W. Gao, Y. Lu, “Stereo video coding based on global displacement compensated prediction [C]”, Proceeding of pacific-rim conference on multimedia, pp: 61-65, 2003.
    [58] X. Guo, Q.M. Huang, “Multiview video coding based on global motion model [C]”, Proceeding of pacific-rim conference on multimedia, pp:665–672, 2004.
    [59] J. Liu and R. Skerjanc, “Stereo and motion correspondence in a sequence of stereo images [J]”, Signal Processing: Image Commun., vol.5, pp:305–318, 1993.
    [60] M.E. Izquierdo, “Stereo image analysis for multi-viewpoint telepresence applications [J]”, Signal Processing: Image Communication, vol.11, pp: 231–254, 1998.
    [61] 张丽萍,H.264 在流媒体应用中的优势[J],流媒体世界,vol.3,2006.
    [62] J. Lopez, G. Chen, J.H. Kim and A. Ortega, “Illumination compensation for multi-view video compression [S]”, ISO/IEC JTC1/SC29/WG11 Doc M11132, 2004.
    [63] Y.C. Chang, J. F Reid, “RGB calibration for color image analysis in machine vision [J]”, IEEE Transaction on Image Processing, vol.5(10), pp:1414-1422, 1996.
    [64] J.H. Tappan, M.E. Wright, and F.E. Sistler, “Error sources in a digital image analysis system [J]”, Comput. Electron. Agriculure, vol.2, pp:109-118, 1987.
    [65] I.J. Cox, “Dynamic histogram warping of images pairs for constant image brightness [J]”, Proc.of IEEE Int. Conf. on Image Processing, pp:2366-2369, 1995.
    [66] X.D Yang, Q.Xiao, and H. Raafat, “Direct mapping between histogram: An improved interactive image enhancement method [C]”, IEEE In. Conf. on Systems, Man and Cybernetics, pp:243-247, 1991.
    [67] F. Ulrich, ISO/IEC JTC1/SC29/WG11, “Luminance and chrominance compensation for multi-view sequences using histogram matching [C]”, Nice, France, 2005.
    [68] K. Kamikura, H. Watanabe, H. Jozawa, H. Kotera, and S. Ichinose, “Global brightness-variation compensation for video coding [J]”, IEEE Trans. Circuits and Systems for Video Technology, vol.8(8), 1998.
    [69] K. Kamikura, H. Watanabe, H. Jozawa, H. Kotera, and S. Ichinose, “Global brightness-variation compensation for video coding [J]”, IEEE Trans. Circuits and Systems for Video Technology, vol.8(8), 1998.
    [70] D. W. Murray, B. T. Buxton, “Scene Segmentation from Visual Motion Using Global Optimization [J] ” , IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.9(2),pp:220-228,1987.
    [71] G. Russol, S. Colonnesel,Development of Core Experiment N2 on Automatic Segmentation Techniques: FUB Results. ISO/IEC JTC1/SC29/WGIl, MPEG96/M1181, Chicago,Sept., 1996.
    [72] P. Correia, F. Pereira, The Role of Analysis in Contect-Based Video Coding and Indexing, Signal Processing, Vol.66,pp:125-142,1998.
    [73] C. Kim, J. Hwang, “Fast and Automatic Video Object Segmentation and Tracking for Content-Based Applications [J]”, IEEE Trans. on Circuits System And Video Technology, Vol.12 (12),pp:122-129,2002.
    [74] Y. Liu, Y. F. Zheng, “Video Object Segmentation and Tracking Using Learning Classification [J] ” , IEEE Trans. on Circuits System and Video Technology, Vol.15(7),pp:885-899,2005
    [75] P. C. Chen, J. J. Su, Y. P. Tsai, “Coarse-To-Fine Video Object Segmentation By MAP Labeling of Watershed Regions [J]”, Bulletin of College of Engineering, National Taiwan University, vol.9,pp:25-34,2004.
    [76] N. Paragios, V. Ramesh, “A MRF-based Approach for Real-time Subway Monitoring [C]”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Vol.1,pp:1034-1040,2001.
    [77] Y. Tsaig, A. Averbuch, “Automatic Segmentation of Moving Objects in Video Sequences:A Region Labeling Approach [J]”, IEEE Trans. on Circuits System and Video Technology, Vol. 12(7), pp:597-612,2002.
    [78] V. Mezaris, I. Kompatsiaris, M. G. Strintzis, “Video Object Segmentation Using Bayes-Based Temporal Tracking and Trajectory-Based Region Merging [J]”, IEEE Trans.on Circuits System and Video Technology, Vol.14(6),pp:782-795,2004.
    [79] Y. Tsai, C. Lai, Y. Hung, et al, “A Bayesian Approach to Video Object Segmentation via Merging 3-D Watershed Volumes [J]”, IEEE Trans. on Circuits System and Video Technology, Vol.15(1), pp:175-180,2005.
    [80] 韩军,熊璋,侯亚荣等,“交互式分割视频运动对象的研究与实现[J]”,中国图像图形学报, Vol.8(2),pp:169-175,2003.
    [81] S. Lee, C. Ouyang, A. Du, “A Neuro-Fuzzy Approach for Segmentation of Human Objects in Image Sequences [J] ” , IEEE Trans. on Systems, Man, and Cybernetics, Vol. 33(3),pp:420-437,2003.
    [82] D. Wang, “Unsupervised Video Segmentation Based on Watersheds and Temporal Tracking [J]”, IEEE Trans. on Circuits System and Video Technology, Vol.8(5), pp:536-549,1998.
    [83] A. Neri, S. Colonnese, G. Russo, et al., “Automatic Moving Objects and Background Separation [J]“, Signal Processing, Vol.66(2),pp:219-238,1998.
    [84] M.L.Jamrozik, “Spatio-temporal Segmentation in the Compressed Domain [D]”,Ph.D. Dissertation of Georgia Institute of Technology, 2002.
    [85] E. Tuncel and L. Onural, “Utilization of the Recursive Shortest Spanning Tree Algorithm for Video-object Segmentation by 2-D Affine Motion Modeling [J]”, IEEE Trans. On Circuits System and Video Technology, Vol.10(5),pp:776-781,2000.
    [86] I. Kompatsiaris, M. G. Strintzis, “Spatialtemporal Segmentation and Tracking of Objects for Visualization of Videoconference Image Sequences [J]”, IEEE Trans. on Circuits System and Video Technology, Vol.10(8),pp:1388-1403,2000.
    [87] Y. Deng, B. S. Manjunath, “Unsupervised Segmentation of Color-Texture Regions in Images and Video [J]”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.23(8),pp:800-801,2001.
    [88] S. Y. Chien, S. Y. Ma, L. G. Chen, “Efficient Moving Object Segmentation Algorithm Using Background Registration Technique [J]”, IEEE Trans. on Circuits System and Video Technology, Vol. 12(7), pp:577-586,2002.
    [89] 杨莉,张弘,李玉山,“视频运动对象的自动分割[J]”,计算机辅助设计与图形学学报, Vol.16(3),pp:301-306,2004.
    [90] L. Li, M. Leung, “Integrating Intensity and Texture Differences for Change Detection [J]”, IEEE Trans. on Image Processing, Vol.11(2),pp:105-112,2002.
    [91] 崔之枯,江春,陈丽鑫,“数字视频处理[M]”,北京:电子工业出版社, pp:73-74,1998.
    [92] Lee J W, Vetro A, Wang Y, Ho Y S. “MPEG-4 video object-based rate allocation with variable temporal rates [C]”. Proceedings of 2002 International Conference on Image Processing, pp:701-704,2002.
    [93] Mech R, Wollborn M. “A noise robust method for 2 D shape estimation of moving objects in video sequences considering a moving camera [J]”. Signal Processing,66(2),1998.
    [94] Martin G R, Packwood R.A, Steliaros M K. “Scalable description of shape and motion for object-based coding [C]”. Seventh International Conference on Image Processing and Its Applications, pp:157 -161,1999.
    [95] Gu C, Lee M C. “Semiautomatic segmentation and tracking of semantic video objects [J]”,IEEE Trans. on Circuits and Systems for Video Technology, vol.8(5):572-584,1998.
    [96] Kim M, Choi J, Kim D, Lee H.“ A VOP Generation Tool: Automatic Segmentation of Moving Objects in Image Sequences Based on Spatial-Temporal Information [J]”. IEEE Trans. On Circuits and Systems for Video Technology,vol.9(8): 1216-1226,1999.
    [97] Desmet, S, Deknuydt, B, Van Eycken L, Oosterlinck A. “Initial segmentation of a scene using the results of a classification based motion estimator [C]”, Proceedings. IEEE International Conference on Image Processing, pp:559 -562,1994.
    [98] TANCEVSKI L, RUSCH L A, BONONI A. “Gain control in EDFA’s by pump compensation [J]”, IEEE Photo Technol Lett, vol.10(9),pp:1313-1315,1998.
    [99] Neri A, Colonnese S, Russo G, “Automatic moving objects and background segmentation by means of higher order statistics [C]”, Proceedings of SPIE, Visual Communications and Image Processing’97, San Jose, pp:246-256,1997.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700