在线视频分割关键问题研究

英文题名：Online Video Segmentation
作者：钟凡
论文级别：博士
学科专业名称：应用数学
中文关键词：视频分割 ; 实时 ; 颜色模型 ; 背景模型 ; 二义性 ; 置信度 ; 图切割
英文关键词：video segmentation ; real-time ; color model ; background model ; ambiguity ; confidence ; graph-cut
学位年度：2010
导师：彭群生 ; 秦学英
学科代码：070104
学位授予单位：浙江大学
论文提交日期：2010-09-13
答辩委员会主席：石教英

摘要

最近几年,随着计算能力的提升,与视频相关的应用不断扩展,其中很多为实时系统,如增强现实系统,视频会议系统等。这些系统往往需要对视频中感兴趣的物体进行精确分割,以便对相关的区域进行特殊处理。与传统的离线视频分割不同,在线视频分割不仅要求速度快,而且在分割过程中不允许进行用户交互,因此对分割算法的性能和可靠性都有很高的要求,到目前为止还有很多问题有待解决。
     本文对在线视频分割所涉及的一系列关键问题进行了研究,其内容和创新工作主要有：
     (1)对在线视频分割已取得的成果进行回顾和总结,在此基础上提出了本文将要研究的四个问题,即静止背景场景的鲁棒分割、动态背景场景的实时分割、分割的自动初始化以及对分割结果的实时后处理。
     (2)提出了一种基于置信度的颜色分布模拟方法,以改善对静止背景场景视频分割的鲁棒性。该方法通过估计全局颜色模型和背景模型的置信度,使得二者在每个像素上都能达到最优组合,从而大大减少了前、背景包含的相似颜色所引起的错误。
     (3)提出了一种实时的递推式视频分割算法,将在线视频分割推广到视点移动的情况。其核心是一个基于时间连续性的的局部颜色模型,该模型不仅比全局颜色模型更精确,而且可以被实时构造。利用该模型可在没有背景信息的情况下,将上一帧的分割结果精确地传递到当前帧,从而使得对动态背景场景的实时分割成为可能。
     (4)提出了一种在线视频分割的自动初始化方法。传统的在线视频分割初始化要求提供背景图或第一帧的分割结果,或者需要针对特定的场景进行离线学习,在实际应用中显得很不方便。本文方法基于一种新的运动分割算法,该方法无需进行学习,即可在前景运动时从两相邻帧中提取出前景的完整分割。
     (5)提出了一种实时的分割后处理算法,以消除二值分割在边界附近的微小误差。二值分割的结果在边界附近容易出错,从而造成闪烁。传统的方法都通过对边界进行模糊来改善合成效果,但结果往往不够理想,而较复杂的方法又不能达到实时。本文所提出的后处理算法能够同时满足对精度和速度的要求,从而较好地解决了这个问题。
     (6)论文最后对全文的工作进行总结,提出了需要进一步深入研究的一些问题。
In recent years, along with the development of hardware, the applications of video tech-nique have extended to many new areas, most of which are real-time systems, including augmented reality, teleconferencing, etc. Some of these systems need to know the accu-rate segmentation of the interested object(s) in order to be able to process correspond-ing regions in a special way. Compared with offline video segmentation, online video segmentation not only need to reach real-time speed, but also cannot involve user inter-action at online phase. Therefore, the segmentation algorithm should be very fast, and at the same time, very robust, which is very hard to be achieved in real environments.
     This dissertation studies a series of key problems and techniques related with on-line video segmentation, and includes the following contents:
     (1) A brief survey and discussion of previous works, based on which the four prob-lems are proposed, i.e. robustness in the scene of stationary background, real-time seg-mentation in dynamic scenes, automatic initialization and real-time post processing.
     (2) Presents a confidence-based color modeling method, which can greatly improve the robustness of segmentation methods in the scene of stationary background. By eval-uating the confidence of the global color models and the background model, the optimal combination can be achieved for individual pixels, in this way the errors introduced by ambiguous colors can be greatly reduced.
     (3) Presents a real-time transductive video segmentation method, which can be used for the cases of non-stationary background. The key is a novel local color mod-eling method combined with the temporal continuity, which is not only more accurate than the global color models but also can be constructed in real-time. By using this tech-nique the segmentation result can be propagated accurately without using background information.
     (4) Presents an automatic initialization method for online video segmentation. Tra-ditional initialization methods require either the background image or the segmentation result of the first frame, or need to be pre-trained in specific scene, which are very in-convenient in practice. The proposed initialization approach is based on a novel motion segmentation method, which does not need to be pre-trained, and can extract the fore-ground object from two adjacent frames when it is moving.
     (5) Presents a real-time post-processing algorithm, which can efficiently remove mi-nor errors around the boundary of binary segmentation. In order to suppress flicking, traditional methods usually simply smooth the boundary by feathering, which can not result in good effect in most cases. On the other hand, significant methods are hard to reach real-time speed. The proposed post-processing algorithm can meet the require-ment of accuracy and efficiency at the same time, and thus solves this problem very well.
     (6) Concludes the works of this dissertation, and presents some problems that can be studied further.

引文

[1]Jue Wang, Bo Thiesson, Yingqing Xu, and Michael Cohen. Image and video seg-mentation by anisotropic kernel mean shift. In Proceedings of ECCV, pages 238-249, 2004.
    [2]Dorin Comaniciu and Peter Meer. Mean shift:A robust approach toward feature space analysiss. IEEE Transactions on Pattern Analysis and Machine Intelligence,24(5), 2002.
    [3]Luc Vincent and Pierre Soille. Watersheds in digital spaces:An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence,13(6):583-599,1991.
    [4]J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost:Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of bECCV,2006.
    [5]Tie Liu, Jian Sun, Nan-Ning Zheng, Xiaoou Tang, and Heung-Yeung Shum. Learn-ing to detect a salient object. In Proceedings of CVPR,2007.
    [6]Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,22 (8):888-906,2000.
    [7]Alvy Ray Smith and James F. Blinn. Blue screen matting. In Proceedings of SIG-GRAPH, pages 259-268,1996.
    [8]Mark A. Ruzon and Carlo Tomasi. Alpha estimation in natural images. In'Proceed-ings of CVPR, pages 18-25,2000.
    [9]Yung-Yu Chuang, Brian Curless, David H. Salesin, and Richard Szeliski. A bayesian approach to digital matting. In Proceedings of IEEE CVPR, pages 264-271, 2001.
    [10]Anat Levin, Dani Lischinski, and Yair Weiss. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence,30(2), 2008.
    [11]J. Wang and M. Cohen. An iterative optimization approach for unified image seg-mentation and matting. In Proceedings of ICCV, pages 936-943,2005.
    [12]Kaiming He, Jian Sun, and Xiaoou Tang. Fast matting using large kernel matting laplacian matrices. In Proceedings of CVPR,2010.
    [13]Eduardo S. L. Gastal and Manuel M. Oliveiral. Shared sampling for real-time alpha matting. In Proceedings of EuroGraphics,2010.
    [14]Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. Grabcut-interactive foreground extraction using iterated graph cuts. ACM Transactions on Grpahics,23 (3):309-314,2004.
    [15]Patrick Perez. Markov random fields and images,1998.
    [16]Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. In International Work-shop on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), pages 359-374,2001.
    [17]Yuri Boykov, Olga Veksler, and Ramin Zabih. Efficient approximate energy mini-mization via graph cuts. IEEE Transactions on PAMI,20(12):1222-1239,2001.
    [18]Yuri Boykov and Marie-Pierre Jolly. Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In International Conference on Com-puter Vision, pages 105-112,2001.
    [19]Sara Vicente, Vladimir Kolmogorov, and Carsten Rother. Graph cut based image segmentation with connectivity priors. In Proceedings of CVPR,2008.
    [20]V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother. Bi-layer segmenta-tion of binocular stereo video. In Proceedings of IEEE CVPR, pages 407-414,2005.
    [21]A. Criminisi, G. Cross, A. Blake, and V. Kolmogorov. Bilayer segmentation of live video. In Proceedings of IEEE CVPR, pages 53-60,2006.
    [22]L. Grady and G. Funka-Lea. Multi-label image segmentation for medical applica-tions based on graph-theoretic electrical potentials. In Computer Vision and Math-ematical Methods in Medical and Biomedical Image Analysis, ECCV, pages 230-245, 2004.
    [23]L. Grady. Random walks for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence,28(11):1768-1783,2006.
    [24]P. Doyle and L. Snell. Random walks and electric networks, ser. Carus mathematical monographs. Washington, D.C.:Mathematical Association of America, (22),1984.
    [25]L. Grady. Multilabel random walker image segmentation using prior models. In Proceedings of CVPR,2005.
    [26]Ali Kemal Sinop and Leo Grady. A seeded image segmentation framework unify-ing graph cuts and random walker which yields a new algorithm. In Proceedings of ICCV,2007.
    [27]Eric N. Mortensen and William A. Barrett. Intelligent scissors for image composi-tion. In Proceedings of ACM Siggraph,1995.
    [28]Xue Bai and Guillermo Sapiro. A geodesic framework for fast interactive image and video segmentation and matting. In Proceedings of ICCV,2007.
    [29]Pekka J. Toivanen. New geodesic distance transforms for gray-scale images. Pattern Recognition Letters, pages 437-450,1996.
    [30]Antonio Criminisi, Toby Sharp, and Andrew Blake. Geos:Geodesic image seg-mentation. In Proceedings of ECCV,2008.
    [31]Brian L. Price, Bryan Morse, and Scott Cohen. Geodesic graph cut for interactive image segmentation. In Proceedings of IEEE CVPR,2010.
    [32]Varun Gulshan, Carsten Rother, Antonio Criminisi, Andrew Blake, and Andrew Zisserman. Geodesic star convexity for interactive image segmentation. In Pro-ceedings of IEEE CVPR,2010.
    [33]http://www.uib.no/med/avd/miapr/arvid/mod3_2002/konturdeteksjon/.
    [34]http://en.wikipedia.org/wiki/level_set_method.
    [35]M. Kass, A. Witkin, and D. Terzopoulos. Snakes:active contour model. International Journal of Computer Vision, pages 321-331,1988.
    [36]T. F. Chan and L. A. Vese. Active contours without edges. IEEE Transactions on Image Processing,10(2),2001.
    [37]H. Li and A. Yezzi. Local or global minima:Flexible dualfront active contours. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,29 (1):1-14,2007.
    [38]X. Bresson, S. Esedoglu, P. Vandergheynst, J.-P. Thiran, and S. Osher. Fast global minimization of the active contour/snake model. Journal of Mathematical Imaging and Vision,28(2):151-167,2007.
    [39]S. Osher and J. Sethian. Fronts propagating with curvature dependent speed:Al-gorithms based on hamilton-jacobi formulations. Journal of Computationl Physics, pages 12-49,1988.
    [40]Jue Wang, Pravin Bhat, R. Alex Colburn, Maneesh Agrawala, and Michael F.Cohen. Interactive video cutout. In Proceedings of ACM SIGGRAPH, pages 585-594,2005.
    [41]Li Y., Sun J., and Shum H.Y. Video object cut and paste. In Proceedings of ACM SIGGRAPH, pages 595-600,2005.
    [42]Xue Bai, Jue Wang, David Simons, and Guillermo Sapiro. Video snapcut:Robust video object cutout using localized classifiers. In Proceedings of ACM Siggraph,2009.
    [43]Changick Kim and Jenq-Neng Hwang. Fast and automatic video object segmenta-tion and tracking for content-based applications. IEEE TRANSACTIONS ON CIR-CUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,12(2):122-130,2002.
    [44]Josh Wills, Sameer Agarwal, and Serge Belongie. What went where. In Proceedings of CVPR,2003.
    [45]Yuchi Huang, Qingshan Liu, and Dimitris Metaxas. Video object segmentation by hypergraph cut. In Proceedings of CVPR,2009.
    [46]Jian Sun, Weiwei Zhang, Xiaoou Tang, and Heung-Yeung Shum. Background cut. In Proceedings of ECCV, pages 628-641,2006.
    [47]Pei Yin, Antonio Criminisi, John Winn, and Irfan Essa. Tree-based classifiers for bilayer video segmentation. In Proceedings of IEEE CVPR, pages 1-8,2007.
    [48]Vladimir Kolmogorov and Ramin Zabih. What energy functions can be minimized via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence,26: 65-81,2004.
    [49]Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence,26:359-374,2004.
    [50]K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallfower:Principles and prac-tice of background maintenance. In Proceedings of ICCV, pages 255-261,1999.
    [51]Vijay Mahadevan and Nuno Vasconcelos. Background subtraction in highly dy-namic scenes. In Proceeding of IEEE CVPR,2008.
    [52]C. Stauffer and W. Grimson. Adaptive background mixture models for real-time tracking. In Proceedings of IEEE CVPR,1999.
    [53]褚一平,叶修梓,张引,张三元.基于分层mrf模型的抗抖动视频分割算法.浙江大学学报(工学版),41(11),2007.
    [54]杨文明,刘济林.基于区域边界运动信息的视频分割.浙江大学学报(工学版),42(2),2008.
    [55]褚一平,陈勤,黄叶珏,郑河荣.基于随机蕨丛的双层视频分割算法.模式识别与人工智能20(3),2009.
    [56]Andres Bruhn and Joachim Weickert. Lucas/kanade meets horn/schunck:Com-bining local and global optic flow methods. International Journal of Computer Vision, 61(3)1211-231,2005.
    [57]B. Lucas and T. Kanade. An iterative image registration technique with an ap-plication to stereo vision. In Proceedings of Seventh International Joint Conference on Artificial Intelligence, pages 674-679,1981.
    [58]B. Horn and B. Schunck. Determining optical flow. Artificial Intelligence,17:185-203, 1981.
    [59]Matjaz Kukar and Igor Kononenko. Reliable classifications with machine learning. In International Conference on Machine Learning (ICML),2002.
    [60]Ilia Nouretdinov, Tom Melluish, and Volodya Vovk. Ridge regression confidence machine. In International Conference on Machine Learning (ICML),2000.
    [61]Mingkun Li and Ishwar K. Sethi. Svm-based classifier design with controlled con-fidence. In International Conference on Pattern Recognition (ICPR),2004.
    [62]A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr. Interactive image segmentation using an adaptive gmmrf model. In Proceedings of ECCV, pages 428-441,2004.
    [63]Zilong Dong, Lei Jiang, Guofeng Zhang, Qing Wang, and Hujun Bao. Live video montage with a rotating camera. In Pacific Graphics,2009.
    [64]Jingyu Cui, Qiong Yang, Fang Wen, Qiying Wu, Changshui Zhang, Luc Van Gool, and Xiaoou Tang. Transductive object cutout. In Proceedings of CVPR,2008.
    [65]Brian Price, Scott Cohen, and Bryan Morse. Livecut:Learning-based interactive video segmentation by evaluation of multiple propagated cues. In Proceedings of ICCV,2009.
    [66]M. Nicolescu and G. Medioni. Motion segmentation with accurate boundaries-a tensor voting approach. In Proceedings of CVPR, pages 382-389,2003.
    [67]R. Li, S. Yu, and X. Yang. Eficient spatio-temporal segmentation for extracting mov-ing objects in video sequences. IEEE Transactions on Consumer Electronics,53(3): 1161-1167,2007.
    [68]Yining Deng and B.S. Manjunath. Unsupervised segmentation of color-texture re-gions in images and video. IEEE Transactions on Pattern Analysis and Machine Intel-ligence,23(8):800-810,2001.
    [69]Thomas B. Moeslund, Adrian Hilton, and Volker Kru. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Un-derstanding,104:90-126,2006.
    [70]Jianbo Shi and Carlo Tomasi. Good feature to track. In Proceedings of CVPR, pages 593-600,1994.
    [71]A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr. Interactive image segmentation using an adaptive gmmrf model. In Proceedings of ECCV,2004.
    [72]Shum H.Y., Sun J., Yamazaki S., Li Y, and Tang C.K. Pop-up light field:An interac-tive image-based modeling and rendering system. ACM Transactions on Graphics, 23(2):143-162,2004.
    [73]Chuang Y.Y., Curless B., Salesin D.H., and Szeliski R. A bayesian approach to digital matting. In Proceedings of CVPR, pages 264-271,2001.
    [74]A.Criminisi and A. Blake. The sps algorithm:Patching figural continuity and trans-parency by split-patch search. In Proceedings of ECCV, pages 342-349,2004.
    [75]Viola P. and Jones M. Robust real-time object detection. International Journal of Computer Vision,57(2):137-154,2004.
    [76]Eric N. Mortensen and William A. Barrett. Toboggan-based intelligent scissors with a four parameter edge model. In Proceedings of ACM SIGGRAPH, pages 451-459, 1999.
    [77]J. Wang and M. Cohen. An iterative optimization approach for unified image seg-mentation and matting. In Proceedings of ICCV, pages 936-943,2005.
    [78]V. Vineet and P. Narayanan. Cuda cuts:Fast graph cuts on the gpu. In CVPR Workshops on Visual Computer Vision on GPUs,2008.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700