多视点视频系统中虚拟视点合成算法的研究与实现

英文题名：Research and Implementation of Virtual Viewpoint Synthesis Algorithm in Multi-view Video System
作者：李放
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：多视点视频 ; 虚拟视点合成 ; 视频对象分割 ; 场景表示
英文关键词：multi-view video ; virtual viewpoint synthesis ; video object segmentation ; scene representation
学位年度：2005
导师：杨士强
学科代码：081203
学位授予单位：清华大学
论文提交日期：2005-05-01

摘要

多视点视频是近几年视频处理领域研究的热点方向。多视点视频是针对即将出现的交互式多媒体应用提出的,其所涵盖的双目立体视频与多视点视频播放将在未来几年中实用化,它将解决 3D 交互视频的表现、交互、存储和传输等问题。它的主要特点在于能够为用户提供交互选择观看视点的功能。虚拟视点的合成技术是提供交互功能的关键技术。虚拟视点的合成技术用来生成在视点切换过程中平滑过渡的虚拟图像。本文主要对视频内容的分层描述,视频对象的分割技术和虚拟视点图像的合成方法等内容进行了研究,并设计和实现了在大角度会聚拍摄条件的虚拟视点合成算法。
    论文的主要内容包括:
    (1)基于多视点视频的虚拟会议场景表示和前景提取与跟踪算法
    主要工作是建立半沉浸的投影显示设备构造虚拟环境,提出基于水线的视频对象分割算法,实现稳定的虚拟会议静态背景环境下与会者对象实时提取;采用线性视点合成算法,实现了基于位置的多视点视频重构合成。
    (2)多视点视频系统合成算法的设计与实现
    主要工作是在多视点视频系统之中合成算法的设计和实现。多视点视频系统中合成算法主要处理两部分内容:前景物体和背景。在合成的过程中采用不同的手段处理前景和背景。对于前景的合成算法主要分为三个步骤:图像特征的提取和跟踪,建立对应视频对象的不完全三维结构,生成中间图像的插值运算。对于背景的处理采用 sprite 的方法生成一幅摄像机系统覆盖场景的全景图。通过前后场景的融合形成虚拟视点图像。
    (3)多视点视频虚拟视点图像合成方法的改进
    改进的目的主要包括两个方面:合成图像的质量和合成算法的整体处理速度。合成虚拟图像的质量是最终评判算法的重要因素。我们主要考虑从图像特征跟踪的质量和数量以及图像中视频对象的边缘处理两个方面进行改进。关于算法的处理速度主要从系统级的模块设计,算法级的优化两个方面给出了设计方案。以上改进通过测试序列进行了验证。
Multi-view video is a hot-point on the video processing research fieldrecently. It is proposed for interactive multimedia applications, includingbinocular three-dimensional video and multi-view video broadcasting whichwill come into being in the future. Multi-view video needs to solverepresentation, interaction, storage and transmission in 3D interactive video.Its feature focuses on the interaction function which is provided for the usersto select view point interactively. The synthesis of virtual view points is thekey technique in the multi-view video. It produces smooth virtual video in theprocess of view point transition. In this paper, we mainly focus on the layerrepresentation, video object segmentation, the virtual view point synthesis,and implementing the algorithm of the virtual view point synthesis in thecondition of large angel capture system.
    The main contributions of this thesis are as follows:
    (1) Research and implementation of the virtual teleconferencingrepresentation and foreground segmentation based on multi-view video.
    We construct virtual teleconferencing environment by usinghalf-immersion projection display device. We propose a watershed basedalgorithm which extracts stably real time video objects from the staticbackground. And then we implement position based multi-view videoreconstruction and synthesis.
    (2) Research and implementation of synthesis algorithm in multi-viewvideo system.
    The main task of this part is to design and implement the synthesisalgorithm in our multi-view video system. The algorithm deals with twokinds of data—the foreground data and the background data. For theforeground data, the algorithm consists of three parts: selecting and trackingfeature points;building incompletely 3D construction and interpolatingvirtual view point images. For the background data, we adopt sprite method
    to produce a panoramic image. Finally we combine the two results to get thevirtual view point image.(3)Improvement of the synthesis algorithm in multi-view video systemThe aim of the improvement includes: increasing the quality of thesynthesis image and reducing the executing time of the whole algorithm. Inorder to increase the quality of the synthesis, we mainly improve the methodof tracking the feature points. In addition, to reduce the executing time, we domore improvements in both the system structure and the algorithm. And wetest it by using standard testing sequences.

引文

[1] ISO/IEC JTC1/SC29/WG11, “Applications and Requirements for 3DAV”, Doc. N5539, Pattaya, Thailand, March 2003.
    [2] Bing-Bing Chai, Sriram Sethuraman, “A Depth Map Representation for Real-Time Transmission and View-Based Rendering of a Dynamic 3D Scene” 3DPVT.02, 2002
    [3] H. Tao and H. S. Sawhney, “Global matching criterion and color segmentation based stereo”, Proc. Workshop on the Application of Computer Vision (WACV2000), pp. 246-253, December 2000.
    [4] C. Lawrence Zitnick, Takeo Kanade,“A Cooperative Algorithm for Stereo Matching and Occlusion Detection”, CMU-RI-TR-99-35,1999
    [5] Hideo Saito, Shigeyuki Baba, “Appearance-Based Virtual View Generation From Multicamera Videos Captured in the 3-D Room” IEEE Multimedia,2003
    [6] Okutomi and T. Kanade, “A multiple-baseline stereo,” IEEE Trans.Pattern Anal. Machine Intell., vol. 15, pp. 353–363, Apr. 1993.
    [7] S. Vedula, P. W. Rander, H. Saito, and T. Kanade, “Modeling, combining, and rendering dynamic real-world events from image sequences,” in Proc. 4th Conf. Virtual Systems and MultiMedia, vol. 1,1998, pp. 326–332.
    [8] dorin comaniciu ,“robust analysis of feature spaces: color image segmentation”, IEEE 1997
    [9] A Stereo Matching Algorithm with an Adaptive Window: Theory and experiment, Takeo Kanade 1998
    [10] Introduction to the Special Issue on Image-Based Modeling, Rendering, and Animation,Microsoft Research Asia 2003
    [11] A. Smolic, and D. Mc Cutchen, “Requirement for very high resolution video in 3DAV”, ISO/IEC JTC1/SC29/WG11, MPEG02/M9256, Awaji, Japan, December 2002.
    [12] C. Grünheit, A. Smolic, and T. Wiegand, “Efficient Representation and Interactive Streaming of High-Resolution Panoramic Views”, Proc. ICIP2002, IEEE International Conference on Image Processing, Rochester, NY, USA, September 22.-25. 2002.
    [13] ISO/IEC JTC1/SC29/WG11, “Applications and Requirements for Scalable Video Coding”, Doc. N5540, Pattaya, Thailand, March 2003.
    [14] ISO/IEC JTC1/SC29/WG11, “3D Video Fragments as a Data Representation and Streaming Method for EE2 in MPEG 3DAV”, Doc. M9721, Trondheim, July 2003.
    [15] ISO/IEC JTC1/SC29/WG11, “Efficient representation and interactive streaming of high-resolution panoramic views using MPEG-4 BIFS”, Doc. M8305, Fairfax, VA, USA - May 2002.
    [16] ISO/IEC JTC1/SC29/WG11, “Requirements for Standardization of 3D Video”, Doc. M8355, May 2002.
    [17] ISO/IEC JTC1/SC29/WG11, “Standard Support for Progressive Encoding, Compression and Interactive Visualization of Surface Light Fields”, Doc. M7603, Pattaya - December 2001.
    [18] ISO/IEC JTC1/SC29/WG11, “Study of some MPEG Tools Related to 3D-Video”, Doc. M8423, Fairfax, May 2002.
    [19] ISO/IEC JTC1/SC29/WG11, “About the Impact of Disparity Coding on Novel View Synthesis”, Doc. M8676, Klagenfurt, July 2002.
    [20] ISO/IEC JTC1/SC29/WG11, “Bit rate control algorithm for mpeg-4 MAC for stereoscopic video coding”, Doc. M9129, Awaji, JP, December 2002.
    [21] P. J. Narayanan, P. W. Rander, and T. Kanade, “Constructing virtual worlds using dense stereo,” in ICCV'98, 1998, pp. 3–10.
    [22] M. Pollefeys, R. Koch, and L. V. Gool, “Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters,”in ICCV'98, 1998, pp. 90–95.
    [23] R. Raskar, G. Welch, M. Cutts, A. Lake, L. Stesin, and H. Fuchs, “The office of the future: A unified approach to image-based modeling and spatially immersive displays,” in SIGGRAPH 98, 1998, pp. 179–188.
    [24] S. M. Seitz and C. R. Dyer, “Photorealistic scene reconstruction by voxel coloring,” in Proc. CVPR'97, 1997, pp. 1067–1073.
    [25] C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: A factorization method,” Int. J. Computer, vol. 9, no. 2, pp. 137–154, 1992.
    [26] R. Tsai, “A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf cameras and lenses,” IEEE J. Robot. Automat, vol. RA-3, no. 4, pp. 323–344,1987
    [27] S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Threedimensional scene flow,” in Proc. ICCV'99, vol. 2, Sept. 1999, pp.722–729.
    [28] T. Werner, R. D. Hersch, and V. Hlavac, “Rendering real-world objects using view interpolation,” in Proc. ICCV'95, 1995, pp.957–962.
    [29] M. D. Wheeler, Y. Sato, and K. Ikeuchi, “Consensus surfaces for modeling 3D objects from multiple range images,” in DARPA Image Understanding Workshop, 1997, pp. 1229–1236.
    [30] R. Haralick, S. Sternberg and X. Zhuang;Image analysis using mathematical morphology. IEEE Trans. On Pattern Analysis and Machine Intelligence PAMI-9, No.4, 1987, pp.532-550.
    [31] J. Serra;Image Analysis and Mathematical Morphology. Academic Press Inc., 1982
    [32] R.M. Haralick, L.G. Shapiro;Computer and Robot Vision. Addison-Wesley, Massachusetts, 1992
    [33] F. Safa, G. Flouzat;Speckle Removal on Radar Imagery based on Mathematical Morphology. Signal Processing 16, North-Holland, 1989, pp.319-333
    [34] L. G. Shapiro, R.S. MacDonald, S.R. Sternberg;Shape Recognition with Mathematical Morphology. Academic Press Inc., 1982
    [35] A. Schmitt;Mathemtical Morphology and Artificial Intelligence: An automatic Programming System. Signal Processing 16, North-Holland, 1989, pp. 389-401
    [36] P. Salembier;Morphological multiscale segmentation for image coding. Signal Processing, 1994, pp. 359-386
    [37] P. Salembier, M. Pardas;Hierarchical morphological segmentation for image sequence coding. IEEE Trans on Image processing, Vol.3, No.5, 1994, pp 639-651.
    [38] D. Wang;Unsupervised video segmentation based on watersheds and temporal tracking. IEEE Trans on Circuits and Systems for video technology, Vol 8, No. 5, 1998, pp. 539-546
    [39] F. Meyer and S. Beucher;Morhpological segmentation, J. Visual Commun. Image Representation, Vol. 1, 1990, pp. 21-46
    [40] Christian J. Breitender,etc, TELEPORT – An Augmented Reality Teleconferencing Environment[A],Proc.3rd Eurographics Workshop on Virtual Environments Coexistence & Collaboration[C], Monte Carlo, Monaco, February 1996, pp41-49, Springer-Verlag, London, UK.
    [41] Vali Lalioti, ect, Virtual Meeting in CyberStage [A],ACM VRST'98[C], TAIPEI, TAIWAN Nov, 1998, pp2-5, ACM Press, New York, NY, USA.
    [42] Hideyuki Nakanishi, ect, FreeWalk: A 3D Virtual Space for Casual Meetings[J], IEEE Multimedia, Vol.6, Issue 2, April–June, 1999, pp20-28.
    [43] Donald Hearn, etc, Computer Graphics [M], USA: Prentice Hall Inc., 1997
    [44] Maia Garau, ect, The Impact of Eye Gaze on Communication using Humanoid Avatars[A], ACM SIGCHI'01[C], Seattle, WA, USA, March 31-April 4, 2001, pp309-316, ACM Press, New York, NY, USA.
    [45] Emilee Patrick, ect, Using a Large Projection Screen as an Alternative to Head-Mounted Displays for Virtual Environments [A], ACM SIGCHI'00 [C], Hague, Netherlands, April 1-6, 2000, pp478-485, ACM Press, New York, NY, USA.
    [46] J. Serra;Image Analysis and Mathematical Morphology [M]. USA: Academic Press Inc., 1982
    [47] Chuang Gu, Ming-Chieh Lee. Semiautomatic segmentation and tracking of semantic video objects [J]. IEEE Trans on Circuits and systems for video technology. Vol 8, Issue 5, 1998, pp572~584
    [48] P. Salembier, M. Pardas;Hierarchical morphological segmentation for image sequence coding [J]. IEEE Trans on Image processing, Vol.3, No.5, 1994, pp 639-651.
    [49] Tsai R Y. An Efficient and Accurate Camera Calibration Technique for 3D Machine Vision[A], Proc of IEEE Conference of Computer Vision and Pattern Recognition CVPR'86[C], Miami, FL ,USA, 1986, pp364-374
    [50] Thomas Meier, King N. Ngan. Automatic segmentation of moving objects for video object plane generation IEEE Transaction on Circuits and Systems for Video Technology, 1998,8(5):525～538
    [51] ZHU Zhongjie, J IANG Gangyi .New Algorithm for Extracting and Tracking Moving Object in Object-Based Video Coding. ACTA ELECTRONICA SINICA, Vol .31 No.9 Sep.2003
    [52] A Method for Implementation of Automatic Segmenting and Tracking of Video Moving Objects Journal of Image and Graphics, Vol.6(A),No.8 Aug.2001
    [53] Carlo Tomasi and Takeo Kanade Detection and Tracking of Point Features Carnegie Mellon University Technical Report CMU-CS-91-132, April 1991
    [54] Jianbo Shi and Carlo Tomasi, Good Features to Track. IEEE Conference on Computer Vision and Pattern Recognition, pages 593-600, 1994
    [55] Yaakov T, AmirA Automatic Segmentation of Moving Objects in Video Sequences: a Region Labeling Approach. IEEE Trans. Circuit s and System s for Video Technology, 2002, 12 (7) ,19-25
    [56] Carranza, J., Theobalt, C., Magnor, M. A., and Seidel, H.-P Free-view point video of human actors. ACM Transactions on Graphics 22, 3, 569–577
    [57] Matusik,W., et al. Image-based visual hulls. Proceedings of SIGGRAPH 2000, 369–374.
    [58] Schirmacher, H., Ming, L., and Seidel, H.-P. On-the-fly processing of generalized Lumigraphs. In Proceedings of Eurographics, Computer Graphics Forum 20, 3, 165–173.
    [59] Goldl¨ucke, B., Magnor, M., and Wilburn, B. Hardware accelerated dynamic light field rendering. In Proceedings Vision, Modeling and Visualization VMV 2002, 455–462.
    [60] Steve Seitz, Chuck Dyer,View Morphing , Proc. SIGGRAPH 96,1996

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700