机器视觉中物体识别方法的研究与探讨

作者：王利明
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：机器视觉 ; 物体识别 ; 模式分类 ; 形状特征 ; 轮廓线选择 ; 姿态分析 ; 轮廓线聚类 ; 图像扩散
英文关键词：computer vision ; object recognition ; pattern classification ; shape feature ; contour selection ; pose estimation ; contour grouping ; image diffusion
学位年度：2009
导师：沈一帆
学科代码：081203
学位授予单位：复旦大学
论文提交日期：2009-10-05

摘要

机器视觉(Computer vision)是研究如何使计算机对图像数据产生智能化感知的一门科学。物体识别在机器视觉领域属于一项基础研究,对图像理解目标的实现起着至关重要的作用。有效的物体识别算法是包括图像检索、视频监控、医学图像处理、工业机器人在内等众多应用领域的前提与基础。物体识别技术推动着工业、医学、交通、国防等领域朝着自动化、智能化方向快速前进,并有可能从根本上改变它们的发展模式;随着相关科学技术在应用层面的普及,它甚至走进了人们的衣食住行当中。然而,物体识别技术仍然处在一个快速发展的初级阶段,或许对于某些特定的应用可能有一些专门的解决方案,但是一个通用的鲁棒的理论和算法框架尚未出现。本文讨论了笔者在这个课题上的一些研究工作。
     首先,第3章提出一个简明的基于形状点特征的物体识别算法。在该算法框架中,对传统的形状上下文描述子做了两方面的改进,并且基于改进的形状特征在图像中寻找局部匹配。通过一种通用的霍夫变换投票过程,将匹配结果组织起来产生物体检测的假设。特征改进的主要目的是为了避免背景对物体形状特征所产生的干扰,以及使特征对物体形变有更好的容忍度。这种利用先验模型在图像中寻找可能物体假设的过程是一种自顶向下的识别过程,它一般会有较高的识别率,但是精度不够理想。为了有效提高识别的精度,在第一步识别的基础上使用分类器方法对识别假设进一步辨别真伪,并结合自底向上的图像分割信息获取物体在图像中的前景区域。
     背景杂物一直是影响物体识别性能的重要因素。基于中心点和邻域的图像特征由于背景杂物的存在,往往导致有用信息被严重破坏。大部分的识别方法会采用学习的策略,通过海量的训练数据来教会计算机记住特征中的哪些维度是重要的,哪些维度又应该是忽略的。第4章中提出了一种新的方法,可以有效克服背景信号的干扰,并且对某一类物体只使用单一的训练样本。使用自底向上的轮廓线作为基本的图像元素,并采用很大的领域范围来提取形状特征是本方法的一个特色。选择很大的领域范围事实上加重了背景杂物对有用信号的干扰,对一些形状狭长的物体来说尤为严重。为了解决背景干扰物问题,本文中的方法模仿人类视觉中的选择机制对轮廓线进行组合选择,利用选择出来的轮廓线生成形状特征并与模型进行匹配。从实验结果可以看到,通过该方法选择出来的特征可以很好地从特征维度中去掉背景信号,达到最终的识别目的。在选择匹配的过程中保证了轮廓线底层语义的完整性。
     物体识别的层次不仅仅是判断图像中是否存在某类物体,或者物体在图像中所处的位置等等,还包括对物体更高层次的理解,姿态分析便是其中之一。第5章提出了一种同样是基于选择的姿态分析算法。不仅对图像中的轮廓线进行选择,对模型姿态参数也进行选择。通过匹配从图像中选择出来的轮廓线与从模型中选择出来的姿态来判断其合理性。实验中,该算法被应用在一组棒球运动员的图像数据库上,并取得了很好的效果,可以得到较为精准的姿态判断。
     第6章讨论的内容是物体识别的一个子课题:轮廓线聚类。不同于利用轮廓线选择进行自顶向下的物体识别(本质上是利用模型把前景轮廓线聚为一类),这里的轮廓线聚类仍然是一个自底向上的过程。文中的方法期望把轮廓线进行某种聚类以得到更高一层的图像信号表示。方法选取了另外一张相关的图片(深度相关、运动相关、相似图片)来帮助完成这个目标。
     在交通领域中,街道场景一直是视觉算法应用的重要场合。第7章利用一种混合的识别方法对街道场景中的物体进行了识别。这些物体包括交通灯、路标、路灯、消防栓、树木以及汽车。由于这几类物体内在的属性并不一样(有的物体是刚性物体,有的是由纹理组成,有的则是半刚性或者容易变形),本文因地制宜地采用了不同的方法来检测不同的物体类别。文中列出了对这几类物体的检测结果。
     形状特征的提取主要基于边界信息。为了能够得到清楚的边界信息,从而增强形状特征的描述能力,第8章提出了一种能够自适应调节参数的图像扩散算法。该算法的目标是在保持图像结构的前提下去除图像的噪声,特别是一些随机噪声与纹理噪声。本文设计了一种新的核函数来增强图像结构的保持性能,并且在实现中利用自底向上的图像分割信息自适应地调节核函数的参数。从实验结果可以看到,这种扩散算法可以增强边界检测的结果,从而改进形状特征的描述性能。
Computer Vision aims at enabling and advancing intelligent perception of input image data.Image content is understood by recognizing objects in images;thus object detection/recognition is a very fundamental subject.An efficient object detection algorithm is a basis and prerequisite in many applications,including but not limited to:content-based image retrieval,video surveillance,medical image processing,industrial robots.Object recognition techniques will influence industry, medical science,traffic control,national defense,and possibly change the traditional way in which they are currently developing.The wide-spreading of this technique is expected to become an integral part of our daily lives.However,despite its fast-developing pace,object recognition is still at its first stage.There are many detailed algorithms designed for specific applications,although no general yet robust algorithm framework is available.This paper presents what I have done on this research topic.
     In Chapter 3,a simple but efficient object recognition algorithm is introduced based on basic point-shape-feature.In this algorithm,improved Shape Context features are used to find matches in images,followed by a generalized Hough Transform to organize qualified matches.Object hypotheses are selected from this voting procedure.The Shape Context is improved mainly to avoid background clutter and gain tolerance to shape deformation.This kind of finding matches using pre-defined object model is called top-down recognition.It usually has a high recall but suffers from low precision.To overcome this,hypotheses are further verified using a discriminative classifier.Object regions in image are finally obtained by combining the cues from bottom-up image segmentations.
     Background clutter is always "annoying" to the performance of object recognition.It often corrupts image features by adding useless information to feature dimensions.Most pervious methods learn which dimensions are more important using many training exemplars.Chapter 4 proposes an object detection method in heavy cluttered background using only a single exemplar.It starts with bottom-up contour grouping as its basic perception elements,and uses a much larger context for shape feature extraction.Normally,enlarging the context area will worsen the problem of background clutter,especially for those objects with elongated structures. We tackle this problem by introducing a selection process on image contours,and model contours as well,during which contour integrity is maintained.Selected image contours are compared against selected model contours.The experiment results show this selected shape feature can remove background clutter effectively,and achieve good detection results.
     Object recognition is not only about telling the existence or positions of objects inside an image,but also about conducting a level up analysis,including pose estimation,etc.Chapter 5 introduces a pose estimation algorithm utilizing the same SELECTION strategy.Selection is not merely on image contours,but on model pose parameters as well.Valid poses are proposed by matching features generated by selected image contours against different model poses.This algorithm demonstrates its performance on a baseball image database.
     Chapter 6 starts an important sub-topic of object recognition:contour clustering. Unlike top-down recognition using contour selection,which can also be seen as a contour clustering process using pre-defined model knowledge,herein contour clustering is still a bottom-up process via another related image(stereo,motion, similarity).
     Street scene is an important play stage for vision algorithms.Besides pedestrian, there are lots of other interesting objects on the streets.We are particularly interested on objects like {traffic lights,road signs,lamps,fire hydrant,trees and car} and we want detect them on the streets.Due to the inter-object variations(some are rigid objects,some are composed of texture region,some are semi-rigid or deformable objects),hybrid detectors are exploited to detect them efficiently, detailed in Chapter 7.
     Shape feature is mostly extracted from an edge map.To acquire a clear edge map thus improve shape feature descriptor,in Chapter 8,a structure-preserving image diffusion technique is proposed using adaptive grouping-based bandwidth selection.The goal is to preserve image structure while remove noises,especially random and texture noises.The contribution includes a novel diffusion kernel which has better structure preserving performance,and an adaptive selection on kernel parameters.By experiments,we find it can improve shape feature descriptors by enhancing the edge detection results.

引文

[1] Koen E. A., Theo Gevers, and Cees G. M., "Evaluation of color descriptors for object and scene recognition," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008)[C], 2008.

    [2] Arnon Amir and Michael Lindenbaum, "Grouping-Based Nonadditive Verification," IEEE Trans. Pattern Anal. Mach. Intell.[l], vol. 20, pp. 186-192, 1998.

    [3] Yali Amit and Kenneth Wilder, "Joint Induction of Shape Features and Tree Classifiers," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 19, pp. 1300-1305, 1997.

    [4] Caroline Baillard and Henri Maitre, "3-D Reconstruction of Urban Scenes from Aerial Stereo Imagery: A Focusing Strategy," Computer Vision and Image Understanding[J], vol. 76, pp. 244-258,1999.

    [5] Dana H. Ballard, "Generalizing the Hough transform to detect arbitrary shapes," Pattern Recognition[J], vol. 13, pp. 111-122,1981.

    [6] Danny Barash, "A Fundamental Relationship between Bilateral Filtering, Adaptive Smoothing, and the Nonlinear Diffusion Equation," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 24, pp. 844-847, 2002.

    [7] Danny Barash and Dorin Comaniciu, "A common framework for nonlinear diffusion, adaptive smoothing, bilateral filtering and mean shift," Image Vision Comput.[J], vol. 22, pp. 73-81, 2004.

    [8] Olga Barinova et al., "Fast Automatic Single-View 3-d Reconstruction of Urban Scenes," in ECCV 2008, 10th European Conference on Computer Vision (2)[C], 2008, pp. 100-113.
    [9] J.L. Barron, D.J. Fleet, S.S. Beauchemin, and T.A. Burkitt, "Performance of optical flow techniques," in Computer Vision and Pattern Recognition, 1992. Proceedings CVPR '92., 1992 IEEE Computer Society Conference on[C], 1992, pp. 236-242.

    [10] Serge Belongie, Jitendra Malik, and Jan Puzicha, "Shape Matching and Object Recognition Using Shape Contexts," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 24, pp. 509-522, 2002.
    [11] Alexander C. Berg, Tamara L. Berg, and Jitendra Malik, "Shape Matching and Object Recognition Using Low Distortion Correspondences," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (1)[C], 2005, pp. 26-33.
    [12] Alexander C. Berg and Jitendra Malik, "Geometric Blur for Template Matching," in 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001) (1)[C], 2001, pp. 607-614.
    [15] Stan Birchfield and Carlo Tomasi, "Depth Discontinuities by Pixel-to-Pixel Stereo," in IEEE International Conference on Computer Vision (ICCV 1998)[C], 1998, pp. 1073-1080.

    [16] Michael J. Black, Guillermo Sapiro, David H. Marimont, and David Heeger, "Robust anisotropic diffusion," IEEE Transactions on Image Processing[J], vol. 7, pp. 421-432,1998.

    [17] David M. Blei, Andrew Y. Ng, and Michael I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research[i], vol. 3, pp. 993-1022, 2003.

    [18] Eran Borenstein and Shimon Ullman, "Class-Specific, Top-Down Segmentation," in Computer Vision - ECCV 2002, 7th European Conference on Computer Vision (2)[C], 2002, pp. 109-124.

    [19] Anna Bosch, Andrew Zisserman, and Xavier Mu, "Scene Classification Using a Hybrid Generative/Discriminative Approach," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 30, pp. 712-727, 2008.

    [20] Anna Bosch, Andrew Zisserman, and Xavier Mu, "Scene Classification Via pLSA," in ECCV 2006, 9th European Conference on Computer Vision (4)[C], 2006, pp. 517-530.

    [21] Yuri Boykov, Olga Veksler, and Ramin Zabih, "Fast Approximate Energy Minimization via Graph Cuts," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 23, pp. 1222-1239, 2001.

    [22] Antoni Buades, Bartomeu Coll, and Jean-Michel Morel, "A Non-Local Algorithm for Image Denoising," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (2)[C], 2005, pp. 60-65.

    [24] Ondrej Chum and Andrew Zisserman, "An Exemplar Model for Learning Object Classes," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007)[C], 2007.

    [25] Vasek Chvatal, Linear Programming[M].: W. H. Freeman, 1983.

    [26] Dorin Comaniciu and Peter Meer, "Mean Shift: A Robust Approach Toward Feature Space Analysis," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 24, pp. 603-619, 2002.

    [27] Timothy F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham, "Active Shape Models-Their Training and Application," Computer Vision and Image Understanding[J], vol. 61, pp. 38-59,1995.

    [28] Nico Cornelis, Bastian Leibe, Kurt Cornells, and Luc J. Van, "3D Urban Scene Modeling Integrating Recognition and Reconstruction," International Journal of Computer Vision[J], vol. 78, pp. 121-141, 2008.

    [29] Timoth, Florence Bnzit, and Jianbo Shi, "Spectral Segmentation with Multiscale Graph Decomposition," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (2)[C], 2005, pp. 1124-1131.

    [30] Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, and Cedric Bray, "Visual categorization with bags of keypoints," in In Workshop on Statistical Learning in Computer Vision, ECCV[C], 2004, pp. 1-22.

    [31] Oana G. Cula and Kristin J. Dana, "Compact Representation of Bidirectional Texture Functions," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001) (1)[C], 2001, pp. 1041-1047.

    [32] Navneet Dalai and Bill Triggs, "Histograms of Oriented Gradients for Human Detection," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (1)[C], 2005, pp. 886-893.

    [33] J.G. Daugman, "Two-dimensional spectral analysis of cortical receptive field profiles," Vision Research[J], vol. 20, pp. 847-856,1980.

    [34] Douglas DeCarlo and Anthony Santella, "Stylization and abstraction of photographs," in Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2002[C], 2002, pp. 769-776.

    [35] Joost van de, Theo Gevers, and Andrew D. Bagdanov, "Boosting Color Saliency in Image Feature Detection," IEEE Trans. Pattern Anal. Mach. Intell. [J], vol. 28, pp. 150-156, 2006.

    [36] EDISON, Mean Shift Code, http://www.caip.rutgers.edu/riul/research/code/EDISON/index.html.

    [37] Michael Elad, Boaz Mataton, and Michael Zibulevsky, "Image Denoising with Shrinkage and Redundant Representations," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006) (2)[C], 2006, pp. 1924-1931.

    [38] Pedro F. Felzenszwalb and Daniel P. Huttenlocher, "Efficient Matching of Pictorial Structures," in 2000 Conference on Computer Vision and Pattern Recognition (CVPR 2000)[C], 2000, pp. 2066-.
    [39] Pedro F. Felzenszwalb and Daniel P. Huttenlocher, "Pictorial Structures for Object Recognition," International Journal of Computer Vision[J], vol. 61, pp. 55-79, 2005.
    [40] Pedro F. Felzenszwalb and Joshua D. Schwartz, "Hierarchical Matching of Deformable Shapes," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007)[C], 2007.

    [41] Vittorio Ferrari, L. Fevrier, Frdric Jurie, and Cordelia Schmid, "Groups of Adjacent Contour Segments for Object Detection," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 30, pp. 36-51, 2008.

    [42] Vittorio Ferrari, Frdric Jurie, and Cordelia Schmid, "Accurate Object Detection with Deformable Shape Models Learnt from Images," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007)[C], 2007.

    [43] Vittorio Ferrari, Tinne Tuytelaars, and Luc J. Van, "Object Detection by Contour Segment Networks," in Computer Vision - ECCV 2006, 9th European Conference on Computer Vision (3)[C], 2006, pp. 14-28.

    [44] Bruce Fischl and Eric L. Schwartz, "Adaptive Nonlocal Filtering: A Fast Alternative to Anisotropic Diffusion for Image Enhancement," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 21, pp. 42-48,1999.

    [45] Alessandro Foi, Vladimir Katkovnik, and Karen O. Egiazarian, "Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images," IEEE Transactions on Image Processing[J], vol. 16, pp. 1395-1411, 2007.

    [46] William T. Freeman and Edward H. Adelson, "The Design and Use of Steerable Filters," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 13, pp. 891-906,1991.

    [47] Yoram Gdalyahu and Daphna Weinshall, "Flexible Syntactic Matching of Curves and Its Application to Automatic Hierarchical Classification of Silhouettes," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 21, pp. 1312-1328,1999.

    [49] Jan-Mark Geusebroek, Rein van den, Arnold W. M., and Hugo Geerts, "Color Invariance," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 23, pp. 1338-1350, 2001.

    [50] GoogleCorp., Google Street View, http://maps.google.com/help/maps/streetview/.

    [51] Kristen Grauman and Trevor Darrell, "The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features," in IEEE International Conference on Computer Vision (ICCV 2005)[C], 2005, pp. 1458-1465.

    [52] Chris Harris and Mike Stephens, "A Combined Corner and Edge Detector," in Proceedings of The Fourth Alvey Vision Conference[C], 1988, pp. 147-151.

    [54] Derek Hoiem, Alexei A. Efros, and Martial Hebert, "Geometric Context from a Single Image," in 10th IEEE International Conference on Computer Vision (ICCV 2005)[C], 2005, pp. 654-661.

    [55] Thorsten Joachims, SVM Light, http://svmlight.joachims.org.

    [56] Andrew Edie Johnson and Martial Hebert, "Recognizing Objects by Matching Oriented Points,", vol. 0,1997, p. 684.

    [58] lasonas Kokkinos, Petros Maragos, and Alan L. Yuille, "Bottom-Up \& Top-down Object Detection using Primal Sketch Features and Graphical Models," in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006) (2)[C], 2006, pp. 1893-1900.

    [60] Y. Lamdan, J. T. Schwartz, and H. J. Wolfson, "Affine invariant model-based object recognition," Robotics and Automation, IEEE Transactions on[J], vol. 6, pp. 578-589, 1990.

    [61] Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann, "Beyond sliding windows: Object localization by efficient subwindow search," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008) [C], 2008.

    [62] Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce, "A Sparse Texture Representation Using Affine-Invariant Regions," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003) (2)[C], 2003, pp. 319-326.

    [63] Mun Wai Lee and Isaac Cohen, "Proposal Maps Driven MCMC for Estimating Human Body Pose in Static Images," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004) (2)[C], 2004, pp. 334-341.

    [64] Bastian Leibe, Edgar Seemann, and Bernt Schiele, "Pedestrian Detection in Crowded Scenes," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (1)[C], 2005, pp. 878-885.

    [65] Thomas K. Leung and Jitendra Malik, "Contour Continuity in Region Based Image Segmentation," in ECCV'98, 5th European Conference on Computer Vision (1)[C], 1998, pp. 544-559.

    [66] Thomas K. Leung and Jitendra Malik, "Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons,", vol. 43, 2001, pp. 29-44.

    [68] Anat Levin and Yair Weiss, "Learning to Combine Bottom-Up and Top-Down Segmentation," in Computer Vision - ECCV 2006, 9th European Conference on Computer Vision (4)[C], 2006, pp. 581-594.

    [69] Fei-Fei Li, Robert Fergus, and Pietro Perona, "One-Shot Learning of Object Categories," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 28, pp. 594-611, 2006.

    [70] Haibin Ling and David W. Jacobs, "Using the Inner-Distance for Classification of Articulated Shapes," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (2)[C], 2005, pp. 719-726.

    [71] Fei-Fei Li, Pietro Perona, and California Institute of, "A Bayesian Hierarchical Model for Learning Natural Scene Categories," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (2)[C], 2005, pp. 524-531.

    [72] Fei-Fei Li, Fergus Rob, and Torralba Antonio, Recognizing and Learning Object Categories, 2007, short course, http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html.

    [74] David G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International Journal of Computer Vision[J], vol. 60, pp. 91-110, 2004.

    [75] David G. Lowe, "Object Recognition from Local Scale-Invariant Features," in IEEE International Conference on Computer Vision (ICCV 1999)[C], 1999, pp. 1150-1157.

    [76] David G. Lowe and Thomas O. Binford, "The Recovery of Three-Dimensional Structure from Image Curves," IEEE Transactions on Pattern Analysis and Machine Intelligence[J], vol. PAMI-7, pp. 320-326,1985.

    [77] Bruce David Lucas, "Generalized image matching by the method of differences," 1985.
    [78] Subhransu Maji and Jitendra Malik, "Object detection using a max-margin
    Hough transform," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009)[C], 2009, pp. 1038-1045.

    [79] Jitendra Malik, Serge Belongie, Thomas K. Leung, and Jianbo Shi, "Contour and Texture Analysis for Image Segmentation," International Journal of Computer Vision[J], vol. 43, pp. 7-27, 2001.

    [80] Jitendra Malik, Serge Belongie, Jianbo Shi, and Thomas K. Leung, "Textons, Contours and Regions: Cue Integration in Image Segmentation," in ICCV[C], 1999, pp. 918-925.

    [81] David R. Martin, Charless Fowlkes, and Jitendra Malik, "Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 26, pp. 530-549, 2004.

    [82] Krystian Mikolajczyk and Cordelia Schmid, "A Performance Evaluation of Local Descriptors," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 27, pp. 1615-1630, 2005.

    [83] Krystian Mikolajczyk and Cordelia Schmid, "Scale \& Affine Invariant Interest Point Detectors," International Journal of Computer Vision[J], vol. 60, pp. 63-86, 2004.

    [84] Greg Mori, "Guiding Model Search Using Segmentation," in 10th IEEE International Conference on Computer Vision (ICCV 2005)[C], 2005, pp. 1417-1423.

    [85] Greg Mori, Serge J. Belongie, and Jitendra Malik, "Efficient Shape Matching Using Shape Contexts," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 27, pp. 1832-1837, 2005.

    [86] Greg Mori, Xiaofeng Ren, Alexei A. Efros, and Jitendra Malik, "Recovering Human Body Configurations: Combining Segmentation and Recognition," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004) (2)[C], 2004, pp. 326-333.

    [87] Juan Carlos Niebles and Fei-Fei Li 0002, "A Hierarchical Model of Shape and Appearance for Human Action Classification," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007)[C], 2007.

    [88] Pietro Perona and Jitendra Malik, "Scale-Space and Edge Detection Using Anisotropic Diffusion," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 12, pp. 629-639, 1990.

    [89] Deva Ramanan, "Learning to parse images of articulated bodies," in Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems[C], 2006, pp. 1129-1136.

    [90] Deva Ramanan, "Using Segmentation to Verify Object Hypotheses," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007)[C], 2007.

    [91] Deva Ramanan, David A. Forsyth, and Andrew Zisserman, "Strike a Pose: Tracking People by Finding Stylized Poses," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (1)[C], 2005, pp. 271-278.

    [92] Xiaofeng Ren, Alexander C. Berg, and Jitendra Malik, "Recovering Human Body Configurations Using Pairwise Constraints between Parts," in 10th IEEE International Conference on Computer Vision (ICCV 2005)[C], 2005, pp. 824-831.

    [93] Xiaofeng Ren and Jitendra Malik, "Learning a Classification Model for Segmentation," in 9th IEEE International Conference on Computer Vision (ICCV 2003)[C], 2003, pp. 10-17.

    [94] Remi Ronfard, Cordelia Schmid, and Bill Triggs, "Learning to Parse Pictures of People," in ECCV 2002, 7th European Conference on Computer Vision (4)[C], 2002, pp. 700-714.

    [95] A. Rosenfeld and J. Pfaltz, "Distance Functions in Digital Pictures," Pattern Recognition[J], vol. 1, pp. 33-61,1968.

    [96] Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas, "A Metric for Distributions with Applications to Image Databases," in IEEE International Conference on Computer Vision (ICCV 1998)[C], 1998, pp. 59-66.

    [97] Leonid I. Rudin, Stanley Osher, and Emad Fatemi, "Nonlinear total variation based noise removal algorithms," Physica D Nonlinear Phenomenal[J], vol. 60, pp. 259-268, 1992.

    [98] Payam Sabzmeydani and Greg Mori, "Detecting Pedestrians by Learning Shapelet Features," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007)[C], 2007.

    [99] Philippe Saint-Marc, Jer-Sen Chen, and G, "Adaptive Smoothing: A General Tool for Early Vision," IEEE Trans. Pattern Anal. Mach. Intell.[J], vol. 13, pp. 514-529, 1991.

    [100] Anthony Santella and Douglas DeCarlo, "Visual interest and NPR: an evaluation and manifesto," in Proceedings of the 3rd International Symposium on Non-Photorealistic Animation and Rendering 2004[C], 2004, pp. 71-150.

    [101] Daniel Scharstein and Richard Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," International Journal of Computer Vision[J], vol. 47, pp. 7-42, 2002.

    [102] D. Sherman and S. Peleg, "Stereo by incremental matching of contours," IEEE Transactions on Pattern Analysis and Machine Intelligence[J], vol. 12, pp. 1102-1106, 1990.

    [103] Jianbo Shi and Jitendra Malik, "Normalized Cuts and Image Segmentation," in 1997 Conference on Computer Vision and Pattern Recognition (CVPR 1997)[C], 1997, pp. 731-737.

    [104] Sameer Shirdhonkar and David W. Jacobs, "Approximate earth mover's distance in linear time," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008)[C], 2008.
    [105] Jamie Shotton, Andrew Blake, and Roberto Cipolla, "Contour-Based Learning for Object Detection," in 10th IEEE International Conference on Computer Vision (ICCV 2005)[C], 2005, pp. 503-510.

    [106] Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, and William T. Freeman, "Discovering Objects and their Localization in Images," in IEEE International Conference on Computer Vision (ICCV 2005)[C], 2005, pp. 370-377.

    [107] Mohan Sridharan and Peter Stone, "Color learning and illumination invariance on mobile robots: A survey," Robotics and Autonomous Systems[J], vol. 57, pp. 629-644, 2009.

    [108] Praveen Srinivasan and Jianbo Shi, "Bottom-up Recognition and Parsing of the Human Body," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007)[C], 2007.

    [110] loannis Stamos and Marius Leordeanu, "Automated Feature-Based Range Registration of Urban Scenes of Large Scale," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003) (2)[C], 2003, pp. 555-561.

    [111] Erik B. Sudderth, Antonio B. Torralba, William T. Freeman, and Alan S. Willsky, "Learning Hierarchical Models of Scenes, Objects, and Parts," in 10th IEEE International Conference on Computer Vision (ICCV 2005)[C], 2005, pp. 1331-1338.

    [112] Hao Tang, Zhigang Zhu, and George Wolberg, "Dynamic 3D Urban Scene Modeling Using Multiple Pushbroom Mosaics," in 3rd International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT 2006)[C], 2006, pp. 456-463.

    [113] Jean-Philippe Tarel and Frederic Guichard, "Combined Dynamic Tracking and Recognition of Curves with Application to Road Detection," in International Conference on Image Processing[C], 2000.

    [114] Arasanathan Thayananthan, Bjoern Stenger, Philip H. S., and Roberto Cipolla, "Shape Context and Chamfer Matching in Cluttered Scenes," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003) (1)[C], 2003, pp. 127-133.

    [115] The Grand Challenge, http://www.darpa.mil/grandchallenge/.

    [116] Carlo Tomasi and Roberto Manduchi, "Bilateral Filtering for Gray and Color Images," in IEEE International Conference on Computer Vision (ICCV 1998)[C], 1998, pp. 839-846.

    [117] Manik Varma and Andrew Zisserman, "Texture Classification: Are Filter Banks Necessary?," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003) (2)[C], 2003, pp. 691-698.

    [118] Martin Vetterli and Jelena Kovacevic, Wavelets and subband coding[M].: Prentice-Hall, Inc., 1995.

    [119] Paul A. Viola and Michael J. Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001) (1)[C], 2001, pp. 511-518.

    [120] Paul A. Viola, Michael J. Jones, and Daniel Snow, "Detecting Pedestrians Using Patterns of Motion and Appearance," in 9th IEEE International Conference on Computer Vision (ICCV 2003)[C], 2003, pp. 734-741.

    [121] J.Y.A. Wang and E.H. Adelson, "Layered representation for motion analysis," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 1993)[C], 1993, pp. 361-366.

    [122] Liming Wang, Jianbo Shi, Gang Song, and I fan Shen, "Object Detection Combining Recognition and Segmentation," in ACCV 2007, 8th Asian Conference on Computer Vision (1)[C], 2007, pp. 189-199.

    [123] Liming Wang and Gang Song, A pedestrian image database, 2008, http://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip.

    [124] Josh Wills, Sameer Agarwal, and Serge Belongie, "A Feature-based Approach for Dense Segmentation and Estimation of Large Disparity Motion," International Journal of Computer Vision[J], vol. 68, pp. 125-143, 2006.

    [125] M. B. Wilson and S. Dickson, "Poppet: A Robust Road Boundary Detection and Tracking Algorithm," in Proceedings of the British Machine Vision Conference 1999, BMVC 1999[C], 1999.

    [126] John M. Winn, Antonio Criminisi, and Thomas P. Minka, "Object Categorization by Learned Universal Visual Dictionary," in 10th IEEE International Conference on Computer Vision (ICCV 2005)[C], 2005, pp. 1800-1807.

    [127] Holger Winnem, Sven C. Olsen, and Bruce Gooch, "Real-time video abstraction," ACM Trans. Graph.[J], vol. 25, pp. 1221-1226, 2006.

    [128] Christian Wojek, Gyuri Dork, Andr, and Bernt Schiele, "Sliding-Windows for Rapid Object Class Localization: A Parallel Technique," in Pattern Recognition, 30th DAGM Symposium[C], 2008, pp. 71-81.

    [129] Stella X. Yu and Jianbo Shi, "Object-Specific Figure-Ground Segregation," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003) (2)[C], 2003, pp. 39-45.

    [130] C. T. Zahn and R. Z. Roskies, "Fourier Descriptors for Plane Closed Curves," Transactions on Computers, IEEE[J], vol. c-21, pp. 269-281, 1972.

    [131] Jiayong Zhang, Jiebo Luo, Robert T. Collins, and Yanxi Liu, "Body Localization in Still Images Using Hierarchical Models and Hybrid Search," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006) (2)[C], 2006, pp. 1536-1543.

    [132] Liang Zhao and Larry S. Davis, "Closely Coupled Object Detection and Segmentation," in 10th IEEE International Conference on Computer Vision (ICCV 2005)[C], 2005, pp. 454-461.

    [133] Song C. Zhu and David Mumford, "A stochastic grammar of images," Found. Trends. Comput. Graph. Vis.[J], vol. 2, pp. 259-362, 2006.
    [134] Qihui Zhu, Gang Song, and Jianbo Shi, "Untangling Cycles for Contour Grouping," in IEEE 11th International Conference on Computer Vision, ICCV 2007[C], 2007, pp. 1-8.