面向三维可视通讯的立体匹配方法

英文题名：Stereo Matching for Three Dimensional Visual Communication
作者：柴登峰
论文级别：博士
学科专业名称：应用数学
中文关键词：立体匹配 ; 三维可视通讯 ; 视图合成 ; 前景背景分离 ; 马尔科夫随机场 ; 图割算法 ; 像素标号问题
英文关键词：stereo matching ; 3D visual communication ; view synthesis ; foreground/background segmentation ; Markov Random Fields ; graph cut ; pixel labelling
学位年度：2006
导师：彭群生
学科代码：070104
学位授予单位：浙江大学
论文提交日期：2006-10-30

摘要

立体视觉是计算机视觉的核心研究领域。经过几十年努力，视图几何关系的研究取得突破，理论逐步完善，方法逐渐成熟，立体匹配的研究也取得很大进展，人们将视差场描述为马尔科夫随机场，将立体匹配表述为像素标号问题，采用图割算法和置信传播算法估计视差场，取得很好的实验结果。近年来，出现了三维可视通讯、基于图像的绘制等新兴应用领域，这些领域对立体匹配提出新的要求。本文针对这些新兴应用领域，围绕质量和效率两个要素，以马尔科夫随机场为描述工具，以图割算法为求解工具，对立体匹配问题开展研究。主要贡献包括：
     1．提出像素标号的二分法。首先将整个标号集赋给每个像素，然后将标号集一分为二成为两个子标号集并舍弃其中一个子集，如此循环直至标号集仅含一个标号为止。通过这种方式将多值标号问题转换为一系列二值标号问题，从而提供了NP难问题的一种近似解法。进一步解释上述标号过程，并据此构造优化目标函数，证明所构造目标函数可以利用图割算法进行优化。在此基础上，设计像素标号的置位算法，算法复杂度为log_2n(n是标号数目)，而目前同类算法中效率最高的扩张算法(α-expansion algorithm)复杂度为n*k(k＞1)。应用置位算法求解立体匹配问题并与扩张算法进行比较，结果表明：在匹配质量相当的基础上，二分法具有很强的效率优势。像素标号的二分法对立体图像没有特殊要求，方法具有很强通用性，而且还可以应用于图像恢复、运动估计等领域。
     2．提出双层立体匹配方法。回顾和分析现有分层立体匹配方法，针对前景和背景彼此分离并各自连续的场景，提出首先确定前景层视差场和背景层视差场，然后组合成为整体视差场的匹配方法，从而将整个匹配分解为一系列二值标号问题，避免模型拟合与迭代改善。在此框架下，进一步给出融合颜色、对照度和形状等信息划分前景区域和背景区域的目标函数。实验结果表明：双层立体匹配方法大大改善了匹配质量。与分层动态规划方法比较的结果表明：双层立体匹配方法在质量和效率方面都具有一定优势。
     3．基于上述两个方法，给出三维可视通讯系统中凝视校正和前景背景分离两个关键技术问题的解决方案和实现技术。特别地，提出基于双层表达的视图合成算法，提出基于像素标号二分法的前景背景分离算法。进一步给出实验结果，表明方法的有效性。
Stereo vision is a fundamental topic in computer vision. There have received a breakthrough on multiple view geometry in the past decades. At the same time, people described disparity field as Markov Random Field, formulated stereo matching as pixel labelling problem, applied graph cut algorithms or belief propagation algorithm to estimate the disparity field, and got very good experimental results. In recent years, three dimensional visual communication and image based rendering etc. are becoming new applications of stereo vision and require both high quality and high efficiency of the matching. In this thesis, we develop some novel approaches for stereo matching to meet these requirements. The main contributions consist of:
     1. We propose bisection approach for pixel labelling. It assigns the whole label set to each pixel at first, splits the label set into two subsets and discards the one with higher cost of assigning it to the pixel iteratively, until each subset contains only one label. We present a probabilistic interpretation of the process, construct an energy function to optimize it, and prove that the constructed energy can be mini-mized via graph cut exactly. Based on bisection approach, we propose bit setting algorithm, it sets one bit of each pixel's label at each step. Bit setting algorithm has complexity of (log2n), is most efficient among state of the art techniques. We apply bit setting algorithm to solve stereo correspondence problem. Exper-imental results demonstrate that both good performance and high efficiency are achieved.
     2. We propose bilayer stereo matching for scenes consist of foreground and back-ground. It first determines disparity fields for foreground layer and background layer independently, then combines them together to get the final disparity field. Unlike previous layered approach for stereo matching, it does not need model fitting and iterative adjustment. We also make use of color information and con-trast information in one image to determine a better segmentation of foreground and background. Experimental results demonstrate that bilayer stereo matching improves precision greatly, has advantages on both quality and efficiency over Layered Dynamic Programming.
     3. Based on above approaches, we present some solutions for gaze correction and foreground/background segmentation, which is necessary for three dimensional visual communication. We proposed a view synthesis algorithm based on bilay-ered description of scene, propose a foreground/background segmentation algo-rithm based on bisection approach for pixel labelling. More experimental results demonstrate that the techniques proposed in this thesis are effective.

引文

[1] 王之卓．摄影测量原理．测绘出版社，1979．
    [2] 吴立德．计算机视觉．复旦大学出版社，1993．
    [3] 黄欣，胡聪，金贵昌．立体视觉机制的研究进展．眼视光学杂志，4(1)：249-252，1999．
    [4] 彭群生，鲍虎军，金小刚．计算机真实感图形的算法基础．科学出版社，1999．
    [5] 张祖勋，张剑清．数字摄影测量学．武汉大学出版社，2001．
    [6] 马颂德，张正友．计算机视觉-计算理论与算法基础．科学出版社，2003．
    [7] G. Adiv. Determining three-dimensional motion and structure from optical flow generated by several moving objects. IEEE Transaction on Pattern Analysis and Machine Intelligence, 7(4):384-401, 1985.
    [8] S. Baker, R. Szeliski, and P. Anandan. A layered approach to stereo reconstruction. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 434-441, 1998.
    [9] S.T. Barnard and M.A. Fischler. Computational stereo. ACM Comp. Surveys, 14(4):553-572, 1982.
    [10] J.L. Barron, D.J. Fleet, and S.S. Beauchemin. Performance of optical flow techniques, International Journal of Computer Vision, 12(1):43-77, 1994.
    [11] S. S. Beauchemin and J. L. Barron. The computation of optical flow. ACM Computing Surveys, 27(3):433-467, 1995.
    [12] P. Belhumeur. A bayesian approach to binocular stereopsis, International Journal of Computer Vision, 19(3):237-260, 1996.
    [13] P. Belhumeur and D. Mumford. A bayseian treatment of the stereo correspondence problem using half-occluded regions. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 1992.
    [14] J. Besag. On the statistical analysis of dirty pictures(with discussion). J. Royal Statistical Soc., Series B, 48(3):259-302, 1986.
    [15] S. Birchfield and C. Tomasi. A pixel dissimilarity measure that is insensitive to image sampling. IEEE Transaction on Pattern Analysis and Machine Intelligence, 20(4):401-406, 1998.
    [16] A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr. Interactive image segmentation using an adaptive gmmrf model. In Proc. of European Conference on Computer Vision, 2004.
    [17] A. Blake and A. Zisserman. Visual Reconstruction. MIT Press, 1987.
    [18] A.F. Bobick and S.S. Intille. Large occlusion stereo. International Journal of Computer Vision, 33(3):181-200, 1999.
    [19] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transaction on Pattern Analysis and Machine Intelligence, 26(9): 1124-1137, 2004.
    [20] Y. Boykov, O. Veksler, and R. Zabih. Markov random fields with efficient approximations. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 648-655, 1998.
    [21] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Transaction on Pattern Analysis and Machine Intelligence, 23(11):1222-1239, 2001.
    [22] Y.Y. Boykov and M-P. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In Proc. of IEEE International Conference on Computer Vision, pages 105-112, 2001.
    [23] M.Z. Brown, D. Burschka, and G.D. Hager. Advances in computational stereo. IEEE Transaction on Pattern Analysis and Machine Intelligence, 25(8):993-1008, 2003.
    [24] J. Canny. A computational approach to edge detection. IEEE Transaction on Pattern Analysis and Machine Intelligence, 8(6):679-698, 1986.
    [25] S. E. Chen and L. Williams. View interpolation for image synthesis. In Proc. of SIG-GRAPH'93, pages 279-288, 1993.
    [26] A. Criminisi, A. Blake, C. Rother, J. Shotton, and P. H. Torr. Efficient dense stereo with occlusions for new view-synthesis by four-state dynamic programming, International Journal of Computer Vision, 71(1):89-110, 2007.
    [27] A. Criminisi, G. Cross, A. Blake, and V. Kolmogorov. Bilayer segmentation of live video. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 53-60, 2006.
    [28] A. Criminisi, J. Shotton, A. Blake, and P.H.S. Torr. Gaze manipulation for one-to-one teleconferencing. In Proc. of International Conference on Computer Vision, pages 939-946, 2003.
    [29] T. Darrell and A,E Pentland. Cooperative robust estimation using layers of support. IEEE Transaction on Pattern Analysis and Machine Intelligence, 17(5):474-487, 1995.
    [30] Y. Deng, Q. Yang, X. Lin, and X. Tang. A symmetric patch-based correspondence model for occlusion handing. In Proc. of International Conference on Computer Vision, pages 1316-1322, 2005.
    [31] R. Deriche and O. Faugeras. Tracking line segments. In Proc. of European Conference on Computer Vision, pages 259-267, 1990.
    [32] U.R. Dhond and J.K. Aggarwal. Structure from stereo - a review. IEEE Transaction. on Systems, Man, and Cybern, 19(6):1489-1510,January 1989.
    [33] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern classification. New York: Wiley, 2001.
    [34] O. Faugeras and Q. Luong. The Geometry of Multiple Images. MIT Press, 2001.
    [35] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381-395, 1981.
    [361 L. Ford and D. Fulkerson. Flows in Networks. Princeton University Press, 1962.
    [37] W.T. Freeman, E.C. Pasztor, and O.T. Carmichael. Learning low-level vision, International Journal of Computer Vision, 40:25-47, 2000.
    [38] P. Fua. A parallel stereo algorithm that produces dense depth maps and preserves image features. Machine Vision and Applications, 40(6):35-49, 1993.
    [39] D. Geiger, B. Ladendorf, and A. Yuille. Occlusions and binocular stereo, International Journal of Computer Vision, 14:211-226, 1995.
    [40] S. Geman and D. Geman. Stochastic relaxation, gibbs distribution, and the bayesian restoration of images. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(6):721-741, 1984.
    [41] A. Goldberg and R. Tarjan. A new approach to the maximum flow problem. Journal of the Association for Computing Machinery, 35(4):921-940, 1988.
    [42] S.J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen. The lumigraph. In Proc. of SIGGRAPH'96, pages 43-54, 1996.
    [43] D. Greig, B. Porteous, and A. Seheult. Exact maximum a posteriori estimation for binary images. J. Royal Statistical Soc., Series B, 51(2):271-279, 1989.
    [44] W. E. L. Grimson. A computer implementation of a theory of human stereo vision. Phil. Trans. Royal Soc. London, B292:217-253, 1981.
    [45] W.E.L. Grimson. Computational experiments with a feature based stereo algorithm. IEEE Transaction on Pattern Analysis and Machine Intelligence, 7(1): 17-34, 1985.
    [46] J. M. Hammersley and P. Clifford. Markov field on finite graphs and lattices, unpublished, 1971.
    [47] C. J. Harris and M. Stephens. A combined comer and edge detector. In Proc. of 4th Alvey Vision Conference, pages 147-151, 1988.
    [48] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. University Press, Cambridge, UK, 2000.
    [49] R.I. Hartley. Theory and practice of projective rectification. International Journal of Computer Vision, 35(2):1-16, 1999.
    [50] P. S. Heckbert. Survey of texture mapping. IEEE Computer Graphics and Applications, 6(11):56-67.
    [51] L. Hong and G. Chen. Segment-based stereo matching using graph cuts. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 74-81,2004.
    [52] S. Hsu, P. Anandan, and S. Peleg. Accurate computation of optical flow by using layered motion representations. In Proc. of lEEE International Conference on Pattern Recognition, pages 743-746, 1994.
    [53] H. Ishikawa and D. Geiger. Occlusions, discontinuities, and epipolar lines in stereo. In Proc. of European Conference on Computer Vision, pages 232-248, 1998.
    [54] T. Kanade and M. Okutomi. A stereo matching algorithm with an adaptive window: Theory and experiment. IEEE Transaction on Pattern Analysis and Machine Intelligence, 16(9):920-932, 1994.
    [55] S.B. Kang, R. Szeliski, and J. Chai. Handling occlusions in dense multi-view stereo. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 103-110, 2001.
    [56] M. Kass, A. Witkin, and D. Terzolpoulos. Snakes: Active contour models, International Journal of Computer Vision, 2:321-331, 1988.
    [57] V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother. Bi-layer segmentation of binocular stereo video. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 731-737, 2005.
    [58] V. Kolmogorov and R. Zabih. What energy functions can be minimized via graph cuts? IEEE Transaction on Pattern Analysis and Machine Intelligence, 26(2): 147-159, 2004.
    [59] M. Levoy and P. Hanrahan. Light field rendering. In Proc. of SIGGRAPH'96, pages 31-42, 1996.
    [60] Michael H. Lin and Carlo Tomasi. Surfaces with occlusions from layered stereo. IEEE Transaction on Pattern Analysis and Machine Intelligence, 26(8): 1073-1078, 2004.
    [61] C. Loop and Z. Zhang. Computing rectifying homographies for stereo vision. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 125-131, 1999.
    [62] D. G. Lowe. Object recognition from local scale-invariant features. In Proc. oflnternational Conference on Computer Vision, pages 1150-1157, 1999.
    [63] B.D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proc. of Int. Joint Conf. Artificial Intelligence, 1981.
    [64] D. Marr and T Poggio. Cooperative computation of stereo disparity. Science, 194:283-287, 1976.
    [65] D. Marr and T. Poggio. A computational theory of human stereo vision. In Proc. of the Royal Society of London, B 204, pages 301-328, 1979.
    [66] W. Matusik and H. Pfister. 3d tv: A scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes. ACM Transaction on Graphics, 23(3):811-821, 2004.
    [67] L. Mhlbach, B. Kellner, A. Prussog, and G. Romahn. The importance of eye contact in a videotelephone service. In Proc. of 11th Interational Symposium on Human Factors in Telecommunications, 1985.
    [68] K. Mikolajczyk and C. Schmid. An affine invariant interest point detector. In Proc. of European Conference on Computer Vision, pages 128-142, 2002.
    [69] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 27(10): 138-149, 2005.
    [70] K Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affine region detectors, International Journal of Computer Vision, 65(1/2): 138-149, 2005.
    [71] Y. Ohta and T. Kanade. Stereo by intra- and interscanline search using dynamic programming. IEEE Transaction on Pattern Analysis and Machine Intelligence, 7(2): 139-154, 1985.
    [72] S. Osher and 3. Sethian. Fronts propagating with curvaturedependent speed: algorithms based on the hamilton-jacobi formulation. Journal of Computational Physics, 79:12-49, 1988.
    [73] P, Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Transaction on Pattern Analysis and Machine Intelligence, 12(7):629-639, 1990.
    [74] C. Rother, V. Kolmogorov, and A. Blake. Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transaction on Graphics, 23(3):309-314, 2004.
    [75] P, J. Rousseeuw. Robust Regression and Outlier Detection. Wiley, New York, 1987.
    [76] S. Roy and I.J. Cox. A maximum-flow formulation of the n-camera stereo correspondence problem. In Proc. of International Conference on Computer Vision, pages 492-499, 1998.
    [77] D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155-174, 1998.
    [78] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, lnternational Journal of Computer Vision, 47(3):7-42, 2002.
    [79] S.M. Seitz and C. R. Dyer. View morphing. In Proc. of SIGGRAPH'96, pages 21-30, 1996.
    [80] J.G. Semple and G. T. Kneebone. Algebraic Projective Geometry. Oxford University Press, 1979.
    [81] J. Shade, S. Gortler, L. Hey, and R. Szeliski. Layered depth images. In Proc. of SIG-GRAPH'98, pages 231-242, 1998.
    [82] J. Shi and J. Malik. Normalized cuts and image segmentation. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 731-737, 1997.
    [83] J. Shi and C. Tomasi. Good features to track. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 1994.
    [84] H. Y. Shum and S. B. Kang. A review of image-based rendering techniques. In Proc. of IEEE/SPIE Visual Communications and Image Processing ( VCIP2000), pages 2-13, 2000.
    [85] C. E. Springer. Geometry and Analysis of Projective Spaces. Freeman, 1964.
    [86] J. Sun, Y. Li, S.B. Kang, and H.-Y. Shum. Symmetric stereo matching for occlusion handling. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 399-406, 2O05.
    [87] J. Sun, H.Y. Shum, and N.N. Zheng. Stereo matching using belief propagation. In Proc. of European Conference on Computer Vision, pages 510-524, 2002.
    [88] R. Szeliski. Prediction error as a quality metric for motion and stereo. In Proc. of International Conference on Computer Vision, pages 781-788, 1999.
    [89] R. Szeliski, S. Avidan, and P. Anandan. Layer extraction from multiple images containing reflections and transparency. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 246-253, 2000.
    [90] R. Szeliski and P. Golland. Stereo matching with transparency and matting. International Journal of Computer Vision, Special Issue for Marr Prize papers, 32(1):45-61, 1999.
    [91] R. Szeliski and R. Zabih. An experimental comparison of stereo algorithms. In Proc. of International Workshop on Vision Algorithms, pages 1-19, 1999.
    [92] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A comparative study of energy minimization methods for markov random fields. In Proc. of European Conference on Computer Vision, pages 16-29, 2006.
    [93] H. Tao, H. Sawhney, and R. Kumar. Aglobal matching framework for stereo computation. In Proc. of International Conference on Computer Vision, pages 532-539, 2001.
    [94] M.F. Tappen and W.T. Freeman. Comparison of graph cuts with belief propagation for stereo, using identical mrf parameters. In Proc. of International Conference on Computer Vision, pages 900-907, 2003.
    [95] P. H. S. Tort and A. Zisserman. Robust computation and parameterization of multiple view relations. In Proc. of International Conference on Computer Vision, pages 727-732, 1998.
    [96] Philip H.S. Torr, Richard Szeliski, and P. Anandan. An integrated bayesian approach to layer extraction from image sequences. IEEE Transaction on Pattern Analysis and Machine Intelligence, 23(3):297-303, 2001.
    [97] Y. Tsin, S.B. Kang, and R. Szeliski. Stereo matching with reflections and translucency. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 702-709, 2003.
    [98] T. Tuytelaars and L. Van Gool. Wide baseline stereo matching based on local, affinely invariant regions. In Proc. of BMVC, pages 412-425, 2000.
    [99] J.Y.A. Wang and E.H. Adelson. Layered representation for motion analysis. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 361-366, 1993.
    [100] S. Kang Y. Tsin and R. Szeliski. Stereo matching with linear superposition of layers. IEEE Transaction on Pattern Analysis and Machine Intelligence, 28(2):290-301, 2006.
    [101] R. Yang, G. Welch, and G. Bishop. Real-time consensus-based scene reconstruction using commodity graphics hardware. In Proc. of Pacific Graphics, 2002.
    [102] R. Yang and Z. Zhang. Eye gaze correction with stereovision for video tele-conferencing. In Proc. of European Conference on Computer Vision, pages 479-494, 2002.
    [103] R. Yang and Z. Zhang. Model-based head pose tracking with stereovision. In Proc. of Automatic Face and Gesture Recognition, 2002.
    [104] J. Yedidia, W. T. Freeman, and Y. Weiss. Understanding belief propagation and its general-izations. In Proc. of International Joint Conference on Artificial Intelligence Distinguished Papers Track, 2001.
    [105] Z. Zhang. Token tracking in a cluttered scene. Image and Vision Computing, 12(2): 110-120, 1994.
    [106] Z. Zhang. Determining the epipolar geometry and its uncertaint:a review. International Journal of Computer Vision, 27(2): 161-195, 1998.
    [107] Z. Zhang, R. Deriche, O. Faugeras, and Q. Luong. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artificial Intelligence, 78(1-2):87-119, 1995.
    [108] S. C. Zhu and A. Yuille. Region competition: Unifying snakes, region growing, and bayes/mdl for multiband image segmentation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 18(9):884-900, 1996.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700