立体图像和立体视频中对象分割算法的研究

英文题名：Study on Object Segmentation in Stereo Images and Stereo Videos
作者：周茜
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：立体图像 ; 立体视频 ; 对象分割 ; 立体匹配 ; 视差图 ; 二次帧差
英文关键词：stereo image ; stereo video ; object segmentation ; stereo matching ; disparity map ; twice frame difference
学位年度：2007
导师：王世刚
学科代码：081001
学位授予单位：吉林大学
论文提交日期：2007-04-23

摘要

立体视频图像是通过模拟人的立体感觉而设计的,在许多领域有着广泛应用。然而,立体视频图像的数据量很大,需要通过编码来进行压缩。对象分割的准确性直接影响了基于对象的立体视频图像编码的有效性。本文主要研究立体图像和立体视频中的对象分割问题。针对立体图像,提出了一种基于视差图的对象分割方法。对于立体视频,本文将视差图分割技术与视频对象的运动信息相结合,提出一种基于视差图和二次帧差运动检测的立体视频运动对象分割方法。
     本文提出的立体图像对象分割方法首先对欠采样的立体图像对进行立体匹配,根据视差图对处于不同视差平面上的对象进行初次分割。再对初次分割的结果进行过采样,得到初级对象分割模板。在该模板基础上,对立体图像对进行二次分割,并结合边缘检测的结果,最终得到精确的目标对象。通过仿真实验,证明该方法有效地解决了基于区域匹配计算量过大的问题,提高了匹配的计算效率和准确度,可以将立体图像对中处于不同视差层面的目标分别提取出来,对于小遮挡立体图像对的对象分割十分有效。
     针对立体视频对象分割的问题,本文提出了基于视差和二次帧差运动检测的立体视频对象分割算法。该算法结合了视差图分割和二次帧差运动分割的优点,首先对视差图进行分割得到处于不同视差层的目标初级分割模板。然后在模板区域内进行二次帧差运动检测,并且利用边缘检测修正对象边缘,最终得到精确运动目标。该方法计算复杂度较低,易于实现。通过仿真结果的分析可以看出,在不同视差层的运动对象之间存在交叠或者背景存在运动的情况下,该方法可以得到较好的分割效果。
Stereo systems are designed to emulate human stereo perception, which have many applications. One major obstacle for the application of stereo videos is the extremely large amount of data associated with a stereo sequence. To enable the storage or transmission of stereo sequence at a reasonable cost, substantial compression of the data must be accomplished. There are mainly two methods to compress stereo video, block-based coding and object-based coding. Block-based coding has the merits of simplicity and robustness, but may cause blocking artifacts. Object-based coding is very desirable as it can avoid blocking artifacts, and has higher coding efficiency. Furthermore, it can describe a scene in a structural way. Therefore, it has been an active area of research as the developing trend of stereo video compression scheme. As the name implies, object-based coding is a concept of video compression that relies on the detection of foreground objects in a video scene. Therefore, video object segmentation is a crucial step in object-based stereo video coding.
     Object segmentation is also an important step in many computer vision and multimedia tasks. In the study of images, images are split into several regions. The certain regions which draw more attention of researchers are called objects or foreground, and the rest regions are called background. Image segmentation is one of the most difficult tasks in image processing, for images are influenced by illumination change, background confusions and occlusions, etc. There isn't a universal segmentation algorithm for all kinds of images. Therefore, object segmentation in stereo is a problem full of challenge.
     Object segmentation algorithms are derived from image segmentation and one-channel video segmentation algorithms. Some researchers segment objects in one channel using traditional image segmentation algorithms, then get the objects of the other channel based on stereo matching. Some other researchers do the segmentation procedure on the depth map. This method tends to be more accurate, for the depth information is quite close to the true object boundary.
     This paper mainly works on the object segmentation problem in stereo images and videos. For stereo images, an object segmentation algorithm based on disparity map is proposed. A stereo video segmentation algorithm combining disparity map segmentation and twice frame difference segmentation is also proposed in this paper. Both of them have two steps, described as follows:
     (1) The first segmentation procedure is based on the disparity map, for disparity map contains the depth information associated with the 3-D scene. One can get disparity map after stereo matching. For the parallel camera configuration, the epipolar lines are parallel to horizontal scan lines, so one can constrain the search within the horizontal scan lines. For the converging camera configuration, the epipolar lines are parallel to horizontal scan lines after epipolar line rectification. Therefore, a one dimension window is employed to do the match procedure. In order to reduce the computation, the stereo matching is based on the undersampled stereo images. Based on the disparity smooth constraint, objects in different disparity planes are located. After morphological processing and upsampling, we can get the first segmentation result.
     (2) The second segmentation procedure of stereo images is based on the first segmentation result. We take advantage of the characteristics of boundary pixels: first, their matching error is much bigger; second, the disparity on the left hand side is different from disparity on the right hand side. Then a one dimensional window is employed to search for matches in corresponding scan lines. The searching bound is constrained by the disparity of background and object. If the matching error is smaller than a giving threshold, the pixels are matching points, or the pixels belong to the object boundary. Combing the second segmentation with edge detection, we get the final object. This algorithm is efficient for stereo images with small occlusions.
     (3) Moving object segmentation in stereo video performs in the regions derived from the first segmentation procedure, based on the twice frame difference algorithm. By subtracting present frame with former frame and latter frame, we get two frame difference images. Higher-order statistic is used to judge whether each pixel in frame difference image belongs to moving object. Then moving object in the two frame difference image is extracted. We take the intersection of the moving object in these two images, and this intersection is the moving object of the present frame. Combing moving object segmentation with edge detection, we get the final moving object. Using this algorithm, we can segment moving objects in different disparity planes. Over segmentation can be avoided in the case that there is motion interaction between moving objects or motion information are involved in the background.
     The results show that algorithms proposed in this paper perform well in object segmentation in stereo images and stereo videos. Meanwhile the algorithms are fast, and easy to implement.
     The second segmentation of stereo images is efficient for stereo images with small occlusions. When there are large occlusions, complicated occlusions detection and disparity compensation must be taken. Our work is based on gray information of images and videos. In future work we anticipate that by involving color information, it should be possible to get more accurate segmentation results.

引文

[1] 韩军功,卢朝阳.立体图像序列的压缩方法.通信学报,2003,24(6):113-123.
    [2] Strintzis M G, Malassiotis S. Object-Based Coding of Stereoscopic and 3D Image Sequences[J].IEEE Signal Processing Magazine,1999,16 (3):14-28.
    [3] 张毓晋.图象理解与计算机视觉[M].北京:清华大学出版社,2000 年 8 月第 1 版:25-46.
    [4] Marr. D 著,姚国正等译.视觉计算理论[M].上海:上海交通大学出版社,1999 年 3 月第 2版:100-164.
    [5] Roberts L. G. Machine perception of the three-dimensional solids. Optical and Electro-optical Information Processing, 1965:159-197.
    [6] Marr D. Vision: A computational investigation into the human representation and processing of visual information. W.H. Freeman and Comnanv San Francisco,1982:30-120.
    [7] Marr D, Poggio T.A. A Computational Theory of Human Stereo Vision. Proceedings of the Royal Society,1979,B(207):301-328.
    [8] 马颂德,张正友.计算机视觉——计算理论与算法基础.北京:科学出版社,1999:5-7.
    [9] Ian Sexton, Phil Surman. Stereoscopic and Autostereoscopic Display Systems[J].IEEE Signal Processing Magazine, May 1999:85-99.
    [10] 朱仲杰,郁梅,蒋刚毅.用于立体视频会议系统的立体对象分割和跟踪算法[J].计算机辅助设计与图形学报,2004,16(3):16-20.
    [11] Ntalianis K.S, Doulamis N.D, Doulamis A.D, Kollias S.D. An active contour-based video object segmentation scheme for stereoscopic video sequence[C]. in Proceeding of the IEEE International Electromechanical confefend(MELECON”00”).[S.I.]:IEEE Press,2000,10(2): 554-557.
    [12] Ebroul Izquierdo M. Disparity/Segmentation Analysis: Matching with an Adaptive Window and Depth-Driven Segmentation. IEEE Transactions on Circuits and System for Video Technlolgy,1999,9(4):589-607.
    [13] Yao Wang, Jorn Ostermann, Ya-Qin Zhang. 视频信号处理与通信.北京:清华大学出版社,2003 年月第 1 版:374-392.
    [14] 黎洪松.数字视频处理.北京:北京邮电大学出版社,2006 年 8 月第 1 版:200-202.
    [15] Yang W, Ngan K, Lim J, Sohn K. Joint Motion and Disparity Fields Estimation for Stereoscopic Video Sequences.Signal Processing:Image Communication,2005(20):265-276.
    [16] Kim M,Choi J,Kim D,et al.A VOP generation tool:Automatic segmentation of moving objects in image sequence based on spatial-temporal information[J].IEEE Trans.Circuits Syst.for Video Technology,1999,9(8):1216-1226.
    [17] Wang,R, Wang Y. Multiview video sequence analysis,compression,and virtual viewpointsynthesis.IEEE Trans.Circuits Syst.for Video Technology(April 2000),10(3):397-410.
    [18] Birchfield S, Tomasi C. Depth Discontinuities by Pixel-to-Pixel Stereo. International Journal of Computer Vision,1999,35(3):269-293.
    [19] Edouard Francois, Bertrand Chupeau. Depth-Based Segmentation. IEEE Transactions on Circuits and Systems for Video Technology,1997,7(1): 237-240.
    [20] Salembier P, Marques F. Region-based Representations of Image and Video: Segmentation Tools for Multimedia Services. IEEE Transactions on Circuits and Systems for Video Technology,1999,9(8): 1147-1169.
    [21] Isgro F, Trucco E. Projective rectification without epipolar geometry. CVPR99, 1999:94-99.
    [22] Loop C, Zhang Z. Computing rectifying homographies for stereo vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,1999:125-131
    [23] Andrea Fusiello. Epipolar Rectfication. Available: http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/FUSIELLO2/rectif_cvol.html. In CVonline: On-Line Compendium of Computer Vision [Online]. R. Fisher (ed). Available: "http://homepages.inf.ed.ac.uk/rbf/CVonline/". [2007,03]
    [24] 韩杰.基于视差分析的立体视频对象分割(硕士论文).上海:上海大学,2005.
    [25] http://cat.middlebury.edu/stereo/
    [26] http://research.microsoft.com/vision/cambridge
    [27] Rosenfeld A, Vanderbrug G, Coarse-fine template matching. IEEE Trans. on Sys. Man Cyber, 1977:104-107.
    [28] Milan Sonka, Vaclav Hlavac. Roger Boyle. Image Processing, Analysis, and Machine Vision[M].人民邮电出版社,2003
    [29] 吴立德.计算机视觉.上海:复旦大学出版社,1993 年 12 月第 1 版:131-132.
    [30] 李介谷.计算机视觉的理论和实践.上海交通大学出版社,1991 年 11 月第 1 版:205-222.
    [31] 徐奕,周军,周源华.立体视觉匹配技术.计算机工程与应用,2003,39(15):1-5.
    [32] 赵杰,彭京,蔡鹤臬.一种机器人立体匹配方法的研究.机械与电子,2004,6:34-37.
    [33] Zhang Z. G, Yan H. Region matching and optimal matching pair theorem[C]. Computer Graphics International,2001:232-239.
    [34] Rosenholm D. Multi-point matching using the least-squares technique for an evaluation of three-dimensional models[J]. Photogrammatic Engineering and Remote Sensing,1987, 53(6):1214-1218.
    [35] Stefano L. D, Marchioni M et al. A Fast Area-Based Stereo Matching Algorithm. Image and Vision Computing,2004,22:983-1005.
    [36] Kanade T, Okutomi M. A stereo matching algorithm with an adaptive window: theory and experiment.IEEE Trans on PAMI,1994,16(3):920-931.
    [37] 骆艳,张兆扬.基于可变尺寸块分割的立体视频帧估计和内插算法.上海大学学报(自然科学版), 2001,12,7(6):471-476.
    [38] Schaffalitzky F, Zisserman A. Viewpoint invariant texture matching and wide baselinestereo[C]. Proceedings of In Pro ICCV 2001. Vancouver, Canada,2001:636-643.
    [39] Raymond van Ee, Clifton M Schor. Unconstrained stereoscopic matching of lines[J]. Vision Research,2000;40:151-162.
    [40] Prince S. J. D, Eagle R. A. Weighted directional energy model for human stereo correspondence[J]. Vision Research,2000,40(9):1143-1155.
    [41] Fleet D. J, Jepson A. D, Jenkin M. R. M. Phase-based Disparity Measurement[J]. CVGIP: Image understanding,1991;53(2):198-210.
    [42] Frohlinghaus T, Buhrnann J. M. Refularizing Phase-Based Stereo. Proceeding of International Conference of Pattern Recognition,1996,A:451-455.
    [43] 章毓晋等.图像图形科学丛书——图像分割.科学出版社,2001 年 2 月第 1 版:2-6.
    [44] 李玉山.数字视觉视频技术.西安:西安电子科技大学出版社,2006 年 1 月第 1 版:79-93.
    [45] Haritaoglu I, Harwood D, Davis L. S. W4:Real-Time Surveillance of People and their Activities [J]. IEEE Trans. On Pattern analysis and machine intelligence,2000, 22(8):809-830.
    [46] Staufer C, Grimson W. Adaptive background mixture models for real-time tracking. Proc. IEEE Conference on Computer Vision and Pattern Recongition. Fort Collins, Colorado, 1999:246-252.
    [47] Neri A, Colonnese S, Russo G. Automatic moving objects and background segmentation by means of higher order statistics [A]. Proceedings of SPIE,Visual Communications and Image Processing ’97,San Jose,1997:246-256.
    [48] Neri A, Russo G, Colonnese S. Time-space segmentation and motion estimation based on higher order statistics[A]. Proceedings of SPIE, Advanced Signal Processing: Algorithms, Architectures, and Implementations V, San Diego,1994: 60-71.
    [49] Barron J, Fleet D, Beauchemin S. Performance of Optical Flow Techniques. International Journal of Computer Vision.1994,12(1):42-47.
    [50] 王晓宁,宁固.一种改进的基于光流的运动目标的检测算法.武汉大学学报(信息科学版),2003(3):101-103.
    [51] 李超 , 熊璋 , 赫阳 . 基于帧间差的区域光流分析及其应用 . 计算机工程与应用,2005(31):199-201
    [52] 罗玲.面向 MPEG-4 的视频对象分割算法研究(硕士论文).成都:电子科技大学,2003.
    [53] 景晓军,周贤伟,付娅丽.图像处理技术及其应用.北京:国防工业出版社,2005 年 8 月第 1版:214-215.
    [54] 陈韩锋,戚飞虎.一种基于灰度连续区域分割的视频对象分割方法.红外与毫米波学报,2002,21(3):195-199.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700