贝叶斯边缘结构模型及其在物体分割中的应用

英文题名：Bayesian Contour Structure Model and Its Application to Object Segmentation
作者：陶阳宇
论文级别：博士
学科专业名称：电路与系统
中文关键词：图像边缘分析 ; 物体分割 ; 边缘感知组织 ; 边缘结构模型 ; 统计学习 ; 先验知识
英文关键词：image edge analysis ; object segmentation ; perceptual edge grouping ; edge structure model ; statistical learning ; prior knowledge
学位年度：2009
导师：沈向洋
学科代码：080902
学位授予单位：中国科学技术大学
论文提交日期：2009-10-01

摘要

分析图像边缘并提取形状特征是计算机视觉研究领域的一个基本课题,被广泛应用在形状匹配、图像分割、物体识别等视觉任务中。针对图像边缘分析,研究者们提出了大量行之有效的方法。其中,基于格式塔原理(Gestalt principles)的边缘感知组织(perceptual edge grouping)方法近年来成为研究的重点。该类方法基于认知心理学中关于人脑在感知过程中对接近性、连续性、平行性和对称性等显著性特征较为敏感的研究结论,对散乱的图像边缘进行聚合、连接得到具有一定语义的边缘轮廓。然而,传统的边缘感知组织方法只刻画了一般意义的显著性特征,并没有包含特定类别物体形状的先验知识。本文进一步研究:如何将物体形状相关的先验知识与边缘感知组织方法结合起来,将其扩展到提取特定类别物体的边界轮廓即物体分割问题中。
     根据同类别物体形状共有的结构特征,我们提出一种鲁棒的边缘结构模型,以概率形式描述图像边缘构成物体边界轮廓的置信度。模型利用统计学习方法从样本数据中训练得到。针对物体分割问题,我们还提出一种新的能量函数将边缘结构模型与传统的感知组织方法结合,得到一个全局最优的物体分割方法。具体地,本文的主要研究成果如下:
     1.提出了一种贝叶斯边缘结构模型(Bayesian Contour Structure Model,BCSM)。BCSM模型描述了观察到图像特征后一组边缘构成的子结构形成物体边界轮廓的后验概率。概率的方式比传统判别性的边缘结构模型更加灵活和鲁棒。BCSM模型是一种局部形状模型,与整体形状模型相比,能更好地处理物体自身形变、物体姿态或视角变化、以及遮挡等情况。为进一步增强模型的鲁棒性,BCSM模型在训练时没有采用如高斯等参数化模型,而是通过Boosting方法迭代地逼近物体边缘结构的概率分布。利用Boosting方法另一个优点是可以综合考虑几何、颜色和纹理等多种图像特征并做特征选取。通过样本的统计学习,BCSM模型可以描述物体类别相关的结构特征,而非一般性的感知显著性特征。实验证明,BCSM模型对物体形状特征有良好的描述性,同时对背景干扰边缘也有很好的区分性。
     2.设计了一种增量型的模型训练方法,以解决模型的小样本学习问题。BCSM模型在训练时需要一定量的已标注的样本数据。当物体可用的图像数据较少时可能会出现小样本问题。直接利用少量的已标注样本训练BCSM模型有可能出现过拟合(over-fitting)的情况,使模型的泛化能力降低。如何利用较少的已标注数据来扩充样本集合也是本文研究的内容。针对这种情况,我们提出一种增量型的模型训练方法。首先基于最小增量编码长度方法设计了一种有监督分类器算法。该分类器即使在小样本情形下仍具有良好的泛化能力,适宜于样本较少的数据分类。我们将该分类器应用于增量型BCSM模型训练方法中,在训练初期对未标注的样本进行分类,然后并入训练样本集合中,实现训练集的在线自动标注。
     3.在BCSM模型基础上,定义了一种新的感知组织能量函数,用于物体轮廓的提取。感知组织能量函数描述了物体轮廓的联合后验概率。我们假设各边缘子结构之间是统计独立的,将物体的整体轮廓的后验概率因子分解为该轮廓所包含的子结构的后验概率的乘积。因此在最大似然估计准则下,在图像中搜索最优的物体轮廓等价于最大化该联合概率。通过对数操作,我们将最大概率问题转化为最小化能量问题。研究表明直接最小化感知组织能量函数存在“小环”问题,即提取结果往往是图像噪声形成的没有价值的小轮廓。为减轻直接优化带来的偏差,我们利用轮廓的长度对能量函数进行归一化(normalization),因而最终得到的能量函数为一种比例形式,其中分子为描述物体轮廓的似然度,分母为轮廓的长度。为优化归一化的能量函数,我们以图像边缘构建了图模型,并利用最小比例环优化算法求解。由于该优化算法是全局最优,因而提取的物体轮廓为最大似然准则下的最优解。我们在自然图像中验证提出的基于BCSM模型感知组织方法用于物体轮廓的可行性。
Analyzing image edges and extracting shape features are both fundamental prob-lems in computer vision area, and have been widely used in various vision tasks suchas shape matching, image segmentation and object recognition and so on. For the edgeanalysis problem, researchers have proposed a lot of effective methods. Among themethods, the Gestalt principles based perceptual edge grouping has recently been thefocus of research. The methods build upon the evidences from the cognition psychol-ogy that the human brain is more sensitive to such salient features as proximity, con-tinuity, parallelism and symmetry, and can group and link the individual image edgesto from boundary. However, the conventional edge grouping method can only capturethe generic salient features without any prior information about the shapes a specificclass of objects. We in this paper go further to study how to incorporate the class spe-cific shape information into the conventional perceptual edge grouping and extend it tothe object segmentation problem that extracts the boundary contours of class specificobjects.
     Considering the common structure features among the objects of the same class,we propose a robust contour structure model, and characterize in a probabilistic man-ner the confidence that the contour structures form the object boundaries. The model islearned through statistical learning techniques from training samples. To the object seg-mentation problem, we furthermore design a novel energy function which combines thecontour structure model and classic perceptual grouping resulting in a global optimalobject segmentation method. More specifically, our main contributions are:
     1. We propose a Bayesian Contour Structure Model (BCSM). The BCSM model de-scribes the posterior probability that the sub-structure of edges forms a fragmentof an object boundary given the observed image features. The probabilistic for-mulation is more ?exible and robust than the conventional deterministic structuremode. Also the BCSM model as a local shape model can better handle the objectvariations in shapes, pose and occlusion than the global shape model. To furtherimprove the robustness of the model, we apply the Boosting iterative trainingmethod instead of using the parametric models like Gaussian model to approx-imate the true probability distribution of contour structure. Another advantageof using Boosting method is it can do feature selection of a variety of features such as geometry structure, color and texture. From the training samples, theBCSM model can capture the object related structures rather than the generalsalient features. The experimental results indicate that the proposed model hasgood description ability and robustness to the background clutters and noises.
     2. We develop an incremental learning scheme to the small size training set. TheBCSM model requires a relatively large set of labeled samples. When the avail-able samples are few there will appear small size training set problem. Directtraining with them may cause the model over-fitting which has low generaliza-tion ability. So that how to utilize the few labeled samples to argument the train-ing set is also what we study here. We first develop a supervised classificationalgorithm based on the minimum incremental coding length theory. The clas-sifier has excellent generalization ability especially in the small sample setting.We apply the classifier the training process of BCSM model where the unlabeledsamples are first classified and then are merged into the training set, thus they areautomatically labeled online.
     3. Based on the BCSM model, we define a new kind of grouping energy function.The energy function describes the joint posterior probability of object boundary.Assuming the statistical independence among the edge structures, we factorizethe joint probability into a product of probabilities of individual sub-structures.Under the maximum likelihood criterion, searching for the optimal object bound-ary is equal to maximum the joint boundary probability. By taking logarithmoperation, we transfer the maximum likelihood problem to an energy minimumone. Some researches indicate that straightforward optimizing the energy willresult in trivial small cycles known as’small cycle bias’. To reduce the bias, wenormalize the energy with the total length of boundary, and thus we obtain energyin a ratio form of which the numerator describes the likelihood of boundary andthe dominator is the total length. To optimize the normalized energy, we first con-struct a graph with the image edges and perform the ratio cycle algorithm on thegraph. The extracted object boundary is globally optimal since the optimizationis globally optimal. We demonstrate the effectiveness of the proposed methodwith experiments on real data.

引文

[1] Song Wang, T Kubota, J Siskind, and J Wang. Salient closed boundary extraction with ratiocontour. IEEE Trans. on PAMI, 27:546–561, 2005.
    [2] Derek Hoiem and Alexei A Efros. Closing the loop in scene interpretation. In ComputerVision and Pattern Recognition (CVPR), 2008.
    [3] Li-jia Li, Richard Socher, and Fei-fei Li. Towards total scene understanding: Classification,annotation and segmentation in an automatic framework. In Computer Vision and PatternRecognition (CVPR), 2009.
    [4] Jacob Feldman and Manish Singh. Information along contours and object boundaries. Psy-chological Review, 112:243–252, 2005.
    [5] J Canny and A. computational approach to edge detection. IEEE Transactions on PatternAnalysis and Machine Intelligence, 8:679–698, 1986.
    [6] D Martin, C Fowlkes, and J Malik. Learning to detect natural image boundaries using localbrightness, color and texture cues. PAMI, 2004.
    [7] Michael Maire and Pablo Arbel. Using contours to detect and localize junctions in naturalimages. In Computer Vision and Pattern Recognition (CVPR), 2008.
    [8] K Koffka. principles of gestalt psychology. Harcourt, 1935.
    [9] Q Zhu, G Song, and J Shi. Untangling cycles for contour grouping. In International Confer-ence on Computer Vision (ICCV), 2007.
    [10] Amir Tamrakar and Benjamin B Kimia. No grouping left behind: From edges to curve frag-ments. In International Conference on Computer Vision (ICCV), pages 1–8, 2007.
    [11] Xiaofeng Ren. Local grouping for optical ?ow. Computer Vision and Pattern Recognition(CVPR), 2008.
    [12] Ce Liu, William T Freeman, and Edward H Adelson. Analysis of contour motions. In NIPS,2006.
    [13] J Shotton, A Blake, and R Cipolla. Contour-based learning for object detection. In Interna-tional Conference on Computer Vision (ICCV), volume 2, 2005.
    [14] J Shotton, A Blake, and R Cipolla. Multi-scale categorical object recognition using contourfragments. IEEE Trans. on PAMI, 30, 2007.
    [15] A Opelt, A Pinz, and A Zisserman. boundary-fragment-model for object detection. In Euro-pean Conference on Computer Vision (ECCV), volume vol, 2006.
    [16] V Ferrari, T Tuytelaars, and L J V Gool. Object detection by contour segment networks. InEuropean Conference on Computer Vision (ECCV), pages 14–28, 2006.
    [17] V Ferrari, L Fevrier, F Jurie, and C Schmid. Groups of adjacent contour segments for objectdetection. IEEE Trans. on PAMI, 2008.
    [18] Saiprasad Ravishankar, Arpit Jain, and Anurag Mittal. Multi-stage contour based detection ofdeformable objects. In European Conference on Computer Vision (ECCV), pages 483–496,2008.
    [19] Xiaofeng Ren, Charless C Fowlkes, and Jitendra Malik. Cue integration for figure/groundlabeling. In NIPS, 2005.
    [20] Xiaofeng Ren, Charless C Fowlkes, and Jitendra Malik. Learning probabilistic models forcontour completion in natural images. International Journal of Computer Vision, 77:47– 63,2008.
    [21] D Hoiem, A Efros, and M Hebert. Recovering occlusion boundaries from a single image. InInternational Conference on Computer Vision (ICCV), volume 6, 2007.
    [22] Mukta Prasad, A Zisserman., A Fitzgibbon, M Pawan Kumar, and P H S Torr. Learning class-specific edges for object detection and segmentation. In Indian Conference on ComputerVision, Graphics and Image Processing (ICVGIP), 2006.
    [23] Bo Wu and Ram Nevatia. Simultaneous object detection and segmentation by boosting localshape feature based classier. In Computer Vision and Pattern Recognition (CVPR), 2007.
    [24] Vassilis Athitsos, Jingbin Wang, Stan Sclaroff, and Margrit Betke. Detecting instances ofshape classes that exhibit variable structure. In European Conference on Computer Vision(ECCV), pages 121–134, 2006.
    [25] Cheng-en Guo, Zhu Song-Chun, and Wu Ying Nian. Primal sketch: Integrating structure andtexture. Computer Vision and Image Understanding, pages 5–19, 2007.
    [26] D. Marr. 1982.
    [27] J Winn. Locus: Learning object classes with unsupervised segmentation. In InternationalConference on Computer Vision (ICCV), 2005.
    [28] V Ferrari, F Jurie, and C Schmid. Accurate object detection with deformable shape modelslearnt from images. In Computer Vision and Pattern Recognition (CVPR), 2007.
    [29] Frank Y Shih and Kai Zhang. Locating object contours in complex background usingimproved snakes. Computer Vision and Image Understanding, 105:93–98, 2007. doi:10.1016/j.cviu.2006.08.007.
    [30] Yong Jae Lee and Kristen Grauman. Shape discovery from unlabeled image collections. InComputer Vision and Pattern Recognition (CVPR), 2009.
    [31] Tingting Jiang, Frederic Jurie, and Cordelia Schmid. Learning shape prior models for objectmatching. In Computer Vision and Pattern Recognition (CVPR), 2009.
    [32] Pablo Arbel. Boundary extraction in natural images using ultrametric contour maps. In 5thIEEE Computer Society Workshopon Perceptual Organization in Computer Vision (POCV), 2006.
    [33] L R Williams and David W Jacobs. stochastic completion fields: a neural model of illusorycontour shape and salience. In International Conference on Computer Vision (ICCV), 1995.
    [34] David W Jacobs. robust and efficient detection of salient convex groups. IEEE Trans. onPAMI, 18, 1996.
    [35] F Ackermann, A Mabmann, S Posch, G Sagerer, and D Schliiter. perceputal grouping ofcontour segments using markov random fields. Int. Journal Pattern Recognition and ImageAnalysis, 7:11–17, 1997.
    [36] Antonio R Kelly and Edwin R Hancock. Grouping line-segments using eigenclustering. InThe British Machine Vision Conference (BMVC), 2000.
    [37] L Williams and K Thornber. A. comparison of measures for detecting natural shapes in clut-tered backgrounds. International Journal of Computer Vision, 34:81–96, 2000.
    [38] S Mahamud, L R Williams, K K Thornber, and K Xu. Segmentation of multiple salient closedcontours from real images. IEEE Trans. on PAMI, 25:433–444, 2003.
    [39] Song Wang, J S Stahl, A Bailey, and M Dropps. Global detection of salient convex boundaries.International Journal of Computer Vision (IJCV), 71:337–359, 2007.
    [40] Nagesh Adlurn, Longin Jan Latecki, Rolf Lakaemper, Thomas Young, Xiang Bai, and AriGross. Contour grouping based on local symmetry. In International Conference on ComputerVision (ICCV), 2007.
    [41] Ian H Jermyn and Hiroshi Ishikawa. Globally optimal regions and boundaries as minimumratio weight cycles. IEEE Trans. on PAMI, 23:1075–1088, 2001.
    [42] Joachim S Stahl and Song Wang. Convex grouping combining boundary and region informa-tion. In International Conference on Computer Vision (ICCV), 2005.
    [43] Joachim S Stahl, Student Member, and Song Wang. Edge grouping combining boundary andregion information. IEEE Trans. on Image Processing, 16:2590–2606, 2007.
    [44] Thomas Schoenemann and Daniel Cremers. Introducing curvature into globally optimal im-age segmentation: Minimum ratio cycles on product graphs. In International Conference onComputer Vision (ICCV), pages 0–5, 2007.
    [45] Joachim S Stahl, Student Member, and Song Wang. Globally optimal grouping for symmetricclosed boundaries by combining boundary and region information. IEEE Trans. on PAMI, 30:395–411, 2008.
    [46] Joachim S Stahl, Kenton Oliver, and Song Wang. Open boundary capable edge groupingwith feature maps. In Sixth IEEE Computer Society Workshop on Perceptual Organization inComputer Vision (POCV), volume 1, 2008.
    [47] Francisco J Estrada and Allan D Jepson. Robust boundary detection with adaptive grouping.In 5th IEEE Computer Society Workshopon Perceptual Organization in Computer Vision (POCV), 2006.
    [48] James H Elder and Steven W Zucker. computing contour closure. In European Conference onComputer Vision (ECCV), 1996.
    [49] S Sarkar and P Soundararajan. Supervised learning of large perceptual organization: Graphspectral partitioning and learning automata. ieee transactions on. IEEE Trans. on PAMI, 22:504–525, 2000.
    [50] James H Elder, Amnon Krupnik, and Leigh A Johnston. Contour grouping with prior models.IEEE Trans. on PAMI, 25:661–674, 2003.
    [51] Pablo Arbel and Laurent Cohen. Constrained image segmentation from hierarchical bound-aries. In Computer Vision and Pattern Recognition (CVPR), 2008.
    [52] Li Y, Sun J, Tang C.K., and Shum H.Y. Lazy snapping. In SIGGRAPH, 2004.
    [53] D.P Huttenlocher, A Klanderman, and Rucklidge W.J. comparing images using the hausdorffdistance. IEEE Trans. on PAMI, 15, 1993.
    [54] Feng Ge, Tiecheng Liu, Song Wang, and Joachim Stahl. Template-based object detectionthrough partial shape matching and boundary verification. International Journal of SignalProcessing, 4:148–157, 2007.
    [55] Chengen Lu, Longin Jan Latecki, Nagesh Adlurn, Haibin Ling, and Yang Xingwei. Shapeguided contour fragment grouping with particle filters. In International Conference on Com-puter Vision (ICCV), 2009.
    [56] Xiang Bai, Quannan Li, Longin Jan Latecki, Wenyu Liu, and Zhuowen Tu. Shape band: Adeformable object detection approach. In Computer Vision and Pattern Recognition (CVPR),2009.
    [57] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale in-variant learning. In Computer Vision and Pattern Recognition (CVPR), 2003.
    [58] Qihui Zhu, Liming Wang, Yang Wu, and Jianbo Shi. Contour context selection for objectdetection: A set-to-set contour matching approach. In European Conference on ComputerVision (ECCV), pages 774–787, 2008.
    [59] James Hays and Alexei A Efros. Scene completion using millions of photographs. In SIG-GRAPH, 2007.
    [60] Litian Tao, Lu Yuan, and Jian Sun. skyfinder: attribute based sky image search. In SIGGRAPH,2009.
    [61] M. Collins, R. Schapire, and Y. Singer. Logistic regression, adaboost and bregman distance.Machine Learning, 48:1–3, 2002.
    [62] P Viola and M Jones. Robust real-time face detection. International Journal of ComputerVision, 57:137– 154, 2004.
    [63] Yangyu Tao, Lin Liang, Yingqing Xu, and Shum H.Y. Complex shape building extractionnwith bayesian structure model.计算机辅助设计与图形学报, 2009.
    [64] John Wright, Yangyu Tao, Zhouchen Lin, Yi Ma, and Shum H.Y. Classification via minimumincremental coding length. In Conference on Neural Information Processing Systems (NIPS),2007.
    [65] J Friedman, T Hastie, and R Tibshirani. Additive logistic regression: a statistical view ofboosting. Annals of Statistics, 28, 2000.
    [66] W Li, X Gao, Y Zhu, V. Ramesh, and T Boult. On the small sample performance of boostedclassifier. In Computer Vision and Pattern Recognition (CVPR), 2005.
    [67] J.R Quinlan. Induction of decision trees. Machine Learning, pages 81–106, 1986.
    [68] J Friedman, T Hastie, and R Tibshirani. Additive logistic regression: a statistical view ofboosting. Annals of Statistics, 1998.
    [69] Ian H Jermyn and Hiroshi Ishikawa. globally optimal regions and boundaries as minimumratio cycles. IEEE Trans. on PAMI, 23:1075–1088, 2001.
    [70] R K Ahuja, T L Magnati, and J B Orlin. network ?ows: theory, algorithms, and application.Prentice Hall, New Jersey, 1993.
    [71] Rakesh Mohan and R Nevatia. Using perceptual organization to extract 3d structures. IEEETrans. PAMI, 11:1121–1139, 1989.
    [72] Jing Peng, Dong Zhang, and Yuncai Liu. An improved snake model for building detectionfrom urban aerial images. Pattern Recognition Letters, 26:587–595, 2005. doi: 10.1016/j.patrec.2004.09.033.
    [73] Chungan Lin, Ramakant Nevatia, and Intelligent Systems. Building detection and descriptionfrom a single intensity image. Computer Vision and Image Understanding, pages 1–43, 1998.
    [74] J A Shufelt. Performance evaluation and analysis of monocular building extraction from aerialimagery. IEEE Trans. on PAMI, 21:311–326, 1999.
    [75] A Handzed and T Flash. affine invariant edge completion with affine geodesics. In IEEEWorkshop on Variational and Level Set Methods, pages 97–103, 2001.