基于概率图模型的场景理解方法研究

英文题名：Research on Scene Understanding Methods Based on Probabilistic Graphical Models
作者：毛凌
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：场景理解 ; 语义分割 ; 目标检测 ; 联合目标检测和语义分割 ; 条件随机场模型
英文关键词：Scene Understanding ; Semantic Segmentation ; Object Detection ; Joint
英文关键词：Object Detection and Semantic Segmentation ; Conditional Random Field
英文关键词：Model
学位年度：2013
导师：解梅
学科代码：081002
学位授予单位：电子科技大学
论文提交日期：2013-03-01

摘要

场景理解作为计算机视觉研究领域中极其重要的基础问题和终极目标，其研究成果已广泛应用于机器人导航、安防、医疗、网络搜索等众多民生领域，彰显出重要的学术研究价值和现实意义。围绕“分而治之”的指导思想，场景理解的各分支任务，如目标检测、图像分割、场景分类等都已取得了突破性进展。但是整体场景理解的目标远未实现。近些年围绕“合而为一”的指导思想，学者们提出了“语义分割”的研究思路，研究如何将这些分支任务融为一体，以实现场景理解的最终目标，并据此提出“联合目标检测和语义分割”。语义分割不仅在一定程度上实现了对视觉场景的理解，更是推理出其他高层语义的基础；联合目标检测和语义分割则是在完成语义分割的同时，定位到每个物体并获得目标的数量信息。但是目前已有研究成果并不令人满意。因此，本文着眼于目标检测，语义分割，联合目标检测和语义分割等研究热点和难点，采用概率图模型，针对已有研究中的不足开展研究并提出了相应的解决方法。本文主要内容和贡献如下：
     1．研究了如何构建先进的条件随机场模型，使其准确反映现实视觉场景中的约束条件，从而提升语义分割性能。提出了三种模型：
     （1）基于扩充纹元图的点对条件随机场模型（下称模型I）。该模型由一元项和成对项组成，其中一元项由联合自举分类器构成，成对项反映了相邻像素间的平滑约束。该模型表达形式简单，简化了模型参数的学习过程。为更好地描述纹理特征，利用LBP、SIFT和Color SIFT等局部特征描述子扩充了原始纹元图；为获得更具区分力的特征表达，在扩充纹元图的基础上定义了纹理空间滤波器，引入了形状、位置和上下文信息，并将其作为联合自举分类器的弱分类器。实验结果表明，该模型得到了较好的语义分割效果。
     （2）基于全局同主题约束的高阶条件随机场模型（下称模型II）。为了克服模型I自身的局限性，引入了反映全局同主题约束的高阶项，构建出高阶条件随机场模型。首先采用规范化分割对输入图像进行多次分割，其次利用主题模型发现同主题分割块，然后在同主题分割块上定义高阶项，最后与模型I加权混合得到高阶条件随机场模型。该模型不仅考虑了局部纹理特征对于像素类别的约束，而且反映了同主题分割块类别一致性的全局约束，在实验中取得了良好的语义分割效果。
     （3）融合了像素和分割块两种基本处理单元的分层条件随机场模型。该模型由观察数据层、像素层、分割层三层组成。观察数据层即原始图像；以像素作为基本处理单元的模型I构成像素层，反映了局部纹理特征对于像素类别的约束以及像素间平滑约束；以分割块作为基本处理单元的模型I构成分割层，反映了分割区域的描述特征对于分割块类别的约束、区域一致性约束、以及分割块间平滑性约束。该模型在分割块和块内像素上定义了关联能量项，对两者进行了融合，克服了单独使用一种处理单元的缺陷。本文分别采用了基于多分割图模式和基于约束参数最小割两种方式来获得分割层。此外，本文还提出了一种新的一二阶合并方法来获得更为稳定可靠的分割区域的描述特征。
     2．提出了一种基于偏最小二乘分析的目标检测方法。首先对输入图像进行多尺度滑窗搜索，通过密集采样获得滑窗的高维特征描述。其次利用偏最小二乘方法从原始高维特征中抽取出少量潜在成分组成低维特征向量空间，从而得到新的目标特征表达。接着提出了一种利用模型质量比值确定最佳潜在成分数量的方法。最后利用基于高斯核的均值漂移算法进行最大值抑制，去除重叠检测边界框，得到最终的目标检测结果。实验结果表明：降维性能优于PCA，能够获得更具区分力的低维特征表达；目标检测性能优于Dalal提出的经典算法。
     3．提出了一种新的高阶条件随机场模型，以解决联合目标检测与语义分割问题。基本思想是：在模型II的基础上，引入目标检测高阶能量项，将基于目标检测器对搜索窗内像素类别的判断作为一种约束条件反映到能量方程中，与局部纹理特征、像素间平滑先验、分割块内像素类别一致性等约束条件一起“竞争”，共同决定像素的类别归属。此外，提出了两种目标检测能量项生成方法：一是直接利用目标检测器的检测结果生成能量项；二是同时提取边界框中的全局形状特征和局部纹理特征，并通过特征的一二阶合并方法获得更具鲁棒性的特征表达，再利用逻辑斯蒂回归分类器获得更准确的检测信任度，进而获取目标检测能量项。实验结果表明，该模型能够同时完成目标检测和语义分割任务，并且提升了语义分割性能，优于目前许多语义分割算法。
Scene understanding, as an important basic problem and ultimate goal in computervision, has been widely applied in many fields, such as, robot navigation, security,medical treatment and web search. According to the idea of “Divide and Conquer”, eachbranch of scene understanding, including object detection, image segmentation andscene classification, has made a breakthrough. However, the overall sceneunderstanding is far from achieving. In recent years, according to the idea of “Mergethese subtasks”, scholars have put forward the concepts of semantic segmentation, andlater joint object detection and semantic segmentation so as to realize the ultimate goalof scene understanding. In some sense, scene understanding can be formulated assemantic segmentation, and besides, other high-level semantic information be obtainedfrom it. Joint object detection and semantic segmentation can localize each object andprovide the number of objects, and besides, achieve semantic segmentation. However,current research results are not satisfactory. This dissertation focuses on the researchhotspots and difficulties, including object detection, semantic segmentation, joint objectdetection and semantic segmentation. In order to overcome the shortcomings in theexisting methods, this dissertation proposes some solutions based on probabilisticgraphical models. In this dissertation, the main contributions are described below:
     1. This dissertation focuses on the way to build advanced conditional random fieldmodels, which can accurately reflect real constraints in the visual scene and thusimprove the semantic segmentation performance. This dissertation puts forward threemodels:
     (1) Pairwise conditional random field model based on the enhanced texton map.This model is composed of unary item and pairwise item (model I). The unary item isconstructed by jointboost classifier, and the pairwise item reflects the smoothnessconstraint between adjacent pixels. The model is simple and thus simplifies the learningprocess of the model parameters. To describe the texture characteristics better, LBP,SIFT and Color SIFT are used to enhance the original texton map; on the other hand, toobtain more discriminative features, the texton-layout filter is defined on the enhanced texton map, and is used as the weak classifier of jointboost, which introduces the shape,location and context information. The experimental results show that the modelachieves better semantic segmentation performance.
     (2) Higher-order conditional random field model based on the global same topicconstraint (model II). In order to overcome the limitations of model I, higher-order itemis introduced to build up higher-order conditional random field model, which reflectsthe global same topic constraint. Firstly, normalized cuts segmentation is performedseveral times; secondly, the same topic segments are found by using topic model; andthen the higher-order item is defined on the same topic segments; finally, thehigher-order item and the model I are combined to achieve higher-order conditionalrandom field model. This model not only considers the local texture feature constraintfor pixel categories, but also reflects the consistency of the same topic segments’category. Good semantic segmentation results are obtained in the experiments.
     (3) Hierarchical conditional random field model fusing both of the basic processingunits, i.e. pixel and segment. This model is composed of observation data layer, pixellayer and segmentation layer. Observation data layer is the original image; the model Ibased on pixels constitutes the pixel layer, which reflects the local texture constraint forpixel categories and smoothness constraint between neighbouring pixels; the model Ibased on segments constitutes the segmentation layer, which reflects the featureconstraint extracted from segments for segment categories, region consistencyconstraint and smoothness constraint between neighbouring segments. The associatedenergy term is defined on segments and pixels within them, and thus fuses both of thebasic units, that overcomes the defect of using only a processing unit. This articleseparately adopts two methods to generate the segmentation layer, i.e. multiplesegmentation mode and constrained parametric min-cuts. In addition, this dissertationpresents a new first-second-order pool method to describe the segmentation area morestably and reliably.
     2. This dissertation proposes an object detection method based on partial leastsquares analysis. Firstly, multi-scale sliding window searching is performed, and thehigh-dimensional feature description is obtained through intensive sampling. Secondly,the partial least squares method is used to extract out a few of latent components fromthe original high-dimensional features, which constitute low-dimensional feature space. In this dissertation, quality ratio is used to determine the best number of latentcomponents. Finally, the mean shift with Gaussian kernel is used to perform nonmaximum suppression, which removes overlapping bounding boxes, and gets the finaldetection result. The experiment results show that, the method is better than PCA inreducing dimentions, and gets more discriminative low-dimensional feature expression,and obtains better results than Dalal's algorithm.
     3. This dissertation proposes a new higher-order conditional random field model tosolve the problem of joint object detection and semantic segmentation. Its basic idea is:on the basis of the model II, we define the object detection higher-order energy item,which introduces the results obtained by the object detector into the energy equation, asa kind of constraint. This constraint competes with other constraints, e.g. local texturefeature, smoothing prior between pixels, region consistency constraint, to jointlydetermine the category of pixels. Additionally, this dissertation puts forward two kindsof methods to generate detection energy term: one is to directly use results generated byobject detector, the other is to extract the global shape characteristics and the localtexture features from the bounding box at the same time, and obtain more robustexpression of these characteristics through first-second-order pooling, and then computethe detection energy item based on output of logistic regression classifier. Theexperimental results show that the model can complete both the object detection andsemantic segmentation tasks simultaneously. Moreover, it shows superior to manycurrent semantic segmentation algorithms.

引文

[1] L. Fei-Fei, A. Iyer, C. Koch, et al. What do we perceive in a glance of a real-world scene?[J].Journal of Vision,2007,7(1):1-29
    [2] S. J. Russell, P. Norvig. Artificial intelligence: a modern approach,2nd Ed.[M]. Prentice HallEnglewood Cliffs, NJ,2002
    [3] E. Klingbeil, B. Carpenter, O. Russakovsky, et al. Autonomous operation of novel elevatorsfor robot navigation[C]. Proceedings of the2010IEEE International Conference on Roboticsand Automation, Anchorage, AK, United states,2010,751-758
    [4] M. Quigley, S. Batra, S. Gould, et al. High-accuracy3D sensing for mobile manipulation:Improving object detection and door opening[C]. Proceedings of the2009IEEE InternationalConference on Robotics and Automation, Kobe, Japan,2009,2816-2822
    [5] T. Gill, J. M. Keller, D. T. Anderson, et al. A system for change detection and humanrecognition in voxel space using the Microsoft Kinect sensor[C]. Proceedings of the2011IEEE Applied Imagery Pattern Recognition Workshop: Imaging for Decision Making,Washington, DC, United states,2011,1-8
    [6] H. F. Rueda, L. F. Polania, K. E. Barner. Robust tracking and anomaly detection in videosurveillance sequences[C]. Proceedings of the Airborne Intelligence, Surveillance,Reconnaissance (ISR) Systems and Applications IX, USA,2012,1-12
    [7] D. J. Mirota, M. Ishii, G. D. Hager. Vision-based navigation in image-guided interventions[J].Annual Review of Biomedical Engineering,2011,13(August15):297-319
    [8] D. J. Mirota, H. Wang, R. H. Taylor, et al. A system for video-based navigation for endoscopicendonasal skull base surgery[J]. IEEE Transactions on Medical Imaging,2012,31(4):963-976
    [9] M. S. Lew, N. Sebe, C. Djeraba, et al. Content-based multimedia information retrieval: stateof the art and challenges[J]. ACM Transactions on Multimedia Computing, Communicationsand Applications,2006,2(1):1-19
    [10] R. Krishnamoorthy, S. Sathiya Devi. Image retrieval using edge based shape similarity withmultiresolution enhanced orthogonal polynomials model[J]. Digital Signal Processing: AReview Journal,2013,23(2):555-568
    [11] A. W. M. Smeulders, M. Worring, S. Santini, et al. Content-based image retrieval at the end ofthe early years[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(12):1349-1380
    [12] C. Papageorgiou, T. Poggio. A trainable system for object detection[J]. International Journal ofComputer Vision,2000,38(1):15-33
    [13] P. Viola, M. J. Jones. Robust real-time face detection[J]. International Journal of ComputerVision,2004,57(2):137-154
    [14] N. Dalal, B. Triggs. Histograms of oriented gradients for human detection[C]. Proceedings ofthe2005IEEE Computer Society Conference on Computer Vision and Pattern Recognition,San Diego, CA, United states,2005,886-893
    [15] P. Felzenszwalb, D. McAllester, D. Ramanan. A discriminatively trained, multiscale,deformable part model[C]. Proceedings of the2008IEEE Conference on Computer VisionAnd Pattern Recognition, Anchorage, AK,2008,1984-1991
    [16] J. Shi, J. Malik. Normalized cuts and image segmentation[J]. IEEE Transactions on PatternAnalysis and Machine Intelligence,2000,22(8):888-905
    [17] D. Comaniciu, P. Meer. Mean shift: A robust approach toward feature space analysis[J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2002,24(5):603-619
    [18] M. P. Kumar, P. H. S. Torr, A. Zisserman. OBJ CUT[C]. Proceedings of the2005IEEEComputer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA,United states,2005,18-25
    [19] J. Carreira, C. Sminchisescu. CPMC: Automatic object segmentation using constrainedparametric min-cuts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(7):1312-1328
    [20] J. Li, J. Z. Wang. Automatic linguistic indexing of pictures by a statistical modelingapproach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(9):1075-1088
    [21] A. Bosch, A. Zisserman, X. Munoz. Image classification using random forests and ferns[C].Proceedings of the11th International Conference on Computer Vision, Rio de Janeiro, Brazil,2007,1779-1786
    [22] S. Gould, R. Fulton, D. Koller. Decomposing a scene into geometric and semanticallyconsistent regions[C]. Proceedings of the12th International Conference on Computer Vision,Kyoto, Japan,2009,1-8
    [23] L. Ladicky, P. Sturgess, K. Alahari, et al. What, where and how many? Combining objectdetectors and CRFs[C]. Proceedings of the11th European Conference on Computer Vision,Heraklion, Crete, Greece,2010,424-437
    [24] S. Gould, T. Gao, D. Koller. Region-based segmentation and object detection[C]. Proceedingsof the23rd Annual Conference on Neural Information Processing Systems, Vancouver, BC,Canada,2009,655-663
    [25] L.-J. Li, R. Socher, L. Fei-Fei. Towards total scene understanding: Classification, annotationand segmentation in an automatic framework[C]. Proceedings of the2009IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition Workshops, Miami, FL,United states,2009,2036-2043
    [26] J. Shotton, J. Winn, C. Rother, et al. Textonboost for image understanding: multi-class objectrecognition and segmentation by jointly modeling texture, layout, and context[J]. InternationalJournal of Computer Vision,2009,81(1):2-23
    [27] L. Ladicky. Global structured models towards scene understanding[D]. Oxford BrookesUniversity,2011
    [28] L. G. Roberts. Machine perception of three-dimensional solids[D]. Massachusetts Institute ofTechnology,1963
    [29] A. Guzman-Arenas. Computer recognition of three-dimensional objects in a visual scene[R].MIT,1968
    [30] J. A. Feldman, Y. Yakimovsky. Decision theory and artificial intelligence-1. Asemantics-based region analyzer[J]. Artificial Intelligence,1974,5(4):349-371
    [31] J. M. Tenenbaum, H. G. Barrow. Experiments in interpretation-guided segmentation[J].Artificial Intelligence,1977,8(3):241-274
    [32] Y. Ohta. Knowledge-based interpretation of outdoor natural color scenes[M]. MorganKaufmann,1985
    [33] Y. Ohta, T. Kanade, T. Sakai. An analysis system for scenes containing objects withsubstructures[C]. Proceedings of the4th International Joint Conference on Pattern Recognition,Kyoto, Japan,1978,752-754
    [34] R. A. Brooks. Model-based three-dimensional interpretations of two-dimensional images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1983,5(2):140-150
    [35] A. R. Hanson, E. M. Riseman. VISIONS: A computer system for interpreting scenes[J].Computer vision systems,1978,78
    [36] D. Hoiem. Seeing the world behind the image[D]. CMU,2007
    [37] G. Heitz. Graphical models for high-level computer vision[D]. Stanford Univercity,2009
    [38] S. Gould. Probabilistic models for region-based scene understanding[D]. Stanford University,2010
    [39] C. A. Bouman, M. Shapiro. A multiscale random field model for bayesian imagesegmentation[J]. IEEE Transactions on Image Processing,1994,3(2):162-177
    [40] X. Feng, C. Williams, S. Felderhof. Combining belief networks and neural networks for scenesegmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(4):467-483
    [41] S. Kumar, M. Hebert. Man-made structure detection in natural images using a causalmultiscale random field[C]. Proceedings of the2003Computer Vision and PatternRecognition Conference, Los Alamitos, CA, USA,2003,119-126
    [42] S. Kumar, M. Hebert. Discriminative random fields: A discriminative framework forcontextual interaction in classification[C]. Proceedings of the9th IEEE InternationalConference on Computer Vision, Nice, France,2003,1150-1157
    [43] C. Bishop. Pattern recognition and machine learning[M]. Springer New York:,2006
    [44] J. Lafferty, A. McCallum, F. C. N. Pereira. Conditional random fields: Probabilistic models forsegmenting and labeling sequence data[C]. Proceedings of the18th International Conf. onMachine Learning, Williams College, Williamstown, MA, USA,2001,282-289
    [45] S. Kumar, M. Hebert. A hierarchical field framework for unified context-basedclassification[C]. Proceedings of the10th IEEE International Conference on Computer Vision,Beijing, China,2005,1284-1291
    [46] S. Kumar, M. Hebert. Discriminative fields for modeling spatial dependencies in naturalimages[C]. Proceedings of the7th Annual Conference on Neural Information ProcessingSystems, Vancouver and Whistler, British Columbia, Canada,2003
    [47] C. Galleguillos, S. Belongie. Context based object categorization: A critical survey[J].Computer Vision and Image Understanding,2010,114(6):712-722
    [48] X. F. Ren, C. C. Fowlkes, J. Malik. Figure/ground assignment in natural images[C].Proceedings of the9th European Conference on Computer Vision, Graz, AUSTRIA,2006,614-627
    [49] B. Leibe, B. Schiele. Interleaving object categorization and segmentation[C]. Proceedings ofthe Cognitive Vision Systems: Sampling the Spectrum of Approaches, Tiergartenstrasse17,Heidelberg, D-69121, Germany,2006,145-161
    [50] X. He, R. S. Zemel, M. A. Carreira-Perpinan. Multiscale conditional random fields for imagelabeling[C]. Proceedings of the2004IEEE Computer Society Conference on Computer Visionand Pattern Recognition, Washington, DC, United states,2004, II695-II702
    [51] X. He, R. S. Zemel, D. Ray. Learning and incorporating top-down cues in imagesegmentation[C]. Proceedings of the9th European Conference on Computer Vision, Graz,Austria,2006,338-351
    [52] J. Shotton, J. Winn, C. Rother, et al. TextonBoost: Joint appearance, shape and contextmodeling for multi-class object recognition and segmentation[C]. Proceedings of the9thEuropean Conference on Computer Vision, Graz, Austria,2006,1-15
    [53] A. Torralba, K. P. Murphy, W. T. Freeman. Sharing features: Efficient boosting procedures formulticlass object detection[C]. Proceedings of the2004IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition, Washington, DC, United states,2004,II762-II769
    [54] A. Torralba, K. P. Murphy, W. T. Freeman. Sharing visual features for multiclass andmultiview object detection[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence,2007,29(5):854-869
    [55] J. Shotton, M. Johnson, R. Cipolla. Semantic texton forests for image categorization andsegmentation[C]. Proceedings of the26th IEEE Conference on Computer Vision and PatternRecognition, Anchorage, AK, United states,2008,1-8
    [56] F. Schroff, A. Criminisi, A. Zisserman. Single-histogram class models for imagesegmentation[C]. Proceedings of the5th Indian Conference on Computer Vision, Graphics andImage Processing, Berlin, Germany,2006,82-93
    [57] F. Schroff, A. Criminisi, A. Zisserman. Object class segmentation using random forests[C].Proceedings of the2008British Machine Vision Conference, University of Leeds, England,2008,1-10
    [58] X. Ren, J. Malik. Learning a classification model for segmentation[C]. Proceedings of the9thIEEE International Conference on Computer Vision, Nice, France,2003,10-17
    [59] D. Hoiem, A. A. Efros, M. Hebert. Automatic photo pop-up[C]. Proceedings of the ACMSIGGRAPH, Los Angeles, CA, United states,2005,577-584
    [60] A. Rabinovich, T. Lange, J. M. Buhmann, et al. Model order selection and cue combinationfor image segmentation[C]. Proceedings of the2006IEEE Computer Society Conference onComputer Vision and Pattern Recognition, New York, NY, United states,2006,1130-1137
    [61] Y. Lin, P. Meer, D. J. Foran. Multiple class segmentation using a unified framework overmean-shift patches[C]. Proceedings of the2007IEEE Computer Society Conference onComputer Vision and Pattern Recognition, Minneapolis, MN, United states,2007,2144-2151
    [62] D. Batra, R. Sukthankar, T. Chen. Learning class-specific affinities for image labelling[C].Proceedings of the26th IEEE Conference on Computer Vision and Pattern Recognition,Anchorage, AK, United states,2008,1-8
    [63] C. Galleguillos, A. Rabinovich, S. Belongie. Object categorization using co-occurrence,location and appearance[C]. Proceedings of the26th IEEE Conference on Computer Visionand Pattern Recognition, Anchorage, AK, United states,2008,1-8
    [64] F. Li, J. Carreira, C. Sminchisescu. Object recognition as ranking holistic figure-groundhypotheses[C]. Proceedings of the2010IEEE Computer Society Conference on ComputerVision and Pattern Recognition, San Francisco, CA, United states,2010,1712-1719
    [65] A. Ion, J. Carreira, C. Sminchisescu. Probabilistic joint image segmentation and labeling[C].Proceedings of the25th Annual Conference on Neural Information Processing Systems2011,Granada, Spain,2011,1827-1835
    [66] J. Carreira, R. Caseiro, J. Batista, et al. Semantic segmentation with second-order pooling[C].Proceedings of the12th European Conference on Computer Vision, Florence, Italy,2012,430-443
    [67] D. Hoiem, A. A. Efros, M. Hebert. Geometric context from a single image[C]. Proceedings ofthe10th IEEE International Conference on Computer Vision, Beijing, China,2005,654-661
    [68] B. C. Russell, A. A. Efros, J. Sivic, et al. Using multiple segmentations to discover objects andtheir extent in image collections[C]. Proceedings of the2006IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, New York, NY, United states,2006,1605-1612
    [69] C. Pantofaru, C. Schmid, M. Hebert. Object recognition by integrating multiple imagesegmentations[C]. Proceedings of the10th European Conference on Computer Vision,Marseille, France,2008,481-494
    [70] M. Everingham, L. Van Gool, C. K. I. Williams, et al. The PASCAL visual object classes(VOC) challenge[J]. International Journal of Computer Vision,2010,88(2):303-338
    [71] M. Everingham, L. Van Gool, C. K. I. Williams, et al. The PASCAL visual object classeschallenge2012(VOC2012)[/OL]. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/index.html,2012
    [72] L. u. Ladicky, C. Russell, P. Kohli, et al. Associative hierarchical CRFs for object class imagesegmentation[C]. Proceedings of the12th IEEE International Conference on Computer Vision,Kyoto, Japan,2009,739-746
    [73] P. Viola, M. J. Jones, D. Snow. Detecting pedestrians using patterns of motion andappearance[J]. International Journal of Computer Vision,2005,63(2):153-161
    [74] Q. Zhu, S. Avidan, M.-C. Yeh, et al. Fast human detection using a cascade of histograms oforiented gradients[C]. Proceedings of the2006IEEE Computer Society Conference onComputer Vision and Pattern Recognition, New York, NY, United states,2006,1491-1498
    [75] B. Wu, R. Nevatia. Optimizing discrimination-efficiency tradeoff in integrating heterogeneouslocal features for object detection[C]. Proceedings of the26th IEEE Conference on ComputerVision and Pattern Recognition, Anchorage, AK, United states,2008,1-8
    [76] Y.-T. Chen, C.-S. Chen. Fast human detection using a novel boosted cascading structure withmeta stages[J]. IEEE Transactions on Image Processing,2008,17(8):1452-1464
    [77] X. Wang, T. X. Han, S. Yan. An HOG-LBP human detector with partial occlusion handling[C].Proceedings of the12th International Conference on Computer Vision, Kyoto, Japan,2009,32-39
    [78] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, et al. Object detection with discriminativelytrained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1627-1645
    [79] O. Barinova, V. Lempitsky, P. Kholi. On detection of multiple object instances using houghtransforms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(9):1773-1784
    [80] I. Jolliffe. Principal component analysis[M]. Wiley Online Library,2005
    [81] Y. Ke, R. Sukthankar. PCA-SIFT: A more distinctive representation for local imagedescriptors[C]. Proceedings of the2004IEEE Computer Society Conference on ComputerVision and Pattern Recognition, Washington, DC, United states,2004, II506-II513
    [82] P. N. Belhumeur, J. P. Hespanha, D. J. Kriegman. Eigenfaces vs. fisherfaces: recognition usingclass specific linear projection[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence,1997,19(7):711-720
    [83] X. Lan, S. Roth, D. Huttenlocher, et al. Efficient belief propagation with learned higher-orderMarkov random fields[C]. Proceedings of the9th European Conference on Computer Vision,Graz, Austria,2006,269-282
    [84] R. Paget, I. D. Longstaff. Texture synthesis via a noncausal nonparametric multiscale Markovrandom field[J]. IEEE Transactions on Image Processing,1998,7(6):925-931
    [85] S. Roth, M. J. Black. Fields of experts: A framework for learning image priors[C].Proceedings of the2005IEEE Computer Society Conference on Computer Vision and PatternRecognition, San Diego, CA, United states,2005,860-867
    [86] P. Kohli. Minimizing dynamic and higher order energy functions using graph cuts[D]. Ph. D.dissertation, Oxford Brookes University,2007
    [87] P. Kohli, L. U. Ladicky, P. H. S. Torr. Robust higher order potentials for enforcing labelconsistency[C]. Proceedings of the26th IEEE Conference on Computer Vision and PatternRecognition, Anchorage, AK, United states,2008,1-8
    [88] P. Kohli, L. Ladicky, P. H. S. Torr. Robust higher order potentials for enforcing labelconsistency[J]. International Journal of Computer Vision,2009,82(3):302-324
    [89] S. Gould. Multiclass pixel labeling with non-local matching constraints[C]. Proceedings of the2012IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, Unitedstates,2012,2783-2790
    [90] A. Vezhnevets, V. Ferrari, J. M. Buhmann. Weakly supervised semantic segmentation with amulti-image model[C]. Proceedings of the2011IEEE International Conference on ComputerVision, Piscataway, NJ, USA,2011,643-650
    [91] N. Plath, M. Toussaint, S. Nakajima. Multi-class image segmentation using conditionalrandom fields and global classification[C]. Proceedings of the26th International ConferenceOn Machine Learning, Montreal, QC, Canada,2009,817-824
    [92] D. A. Forsyth, J. Malik, M. M. Fleck, et al. Finding pictures of objects in large collections ofimages[C]. Proceedings of the1996European Conference on Computer Vision Workshop onObject Representation in Computer Vision II, Cambridge, United kingdom,1996,335-335
    [93] D. Larlus, F. Jurie. Combining appearance models and markov random fields for categorylevel object segmentation[C]. Proceedings of the26th IEEE Conference on Computer Visionand Pattern Recognition, Anchorage, AK, United states,2008
    [94] Z. Tu, X. Chen, A. Yuille, et al. Image parsing: Unifying segmentation, detection, andrecognition[J]. International Journal of Computer Vision,2005,63(2):113-140
    [95] C. Gu, J. J. Lim, P. Arbelaez, et al. Recognition using regions[C]. Proceedings of the2009IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops,Miami, FL, United states,2009,1030-1037
    [96] J. Winn, J. Shotton. The layout consistent random field for recognizing and segmentingpartially occluded objects[C]. Proceedings of the2006IEEE Computer Society Conference onComputer Vision and Pattern Recognition, New York, NY, United states,2006,37-44
    [97] C. Wojek, B. Schiele. A dynamic conditional random field model for joint labeling of objectand scene classes[C]. Proceedings of the10th European Conference on Computer Vision,Marseille, France,2008,733-747
    [98] A. Vedaldi, V. Gulshan, M. Varma, et al. Multiple kernels for object detection[C]. Proceedingsof the12th International Conference on Computer Vision, Kyoto, Japan,2009,606-613
    [99] S. Konishi, A. L. Yuille. Statistical cues for domain specific image segmentation withperformance analysis[C]. Proceedings of the2000IEEE Conference on Computer Vision andPattern Recognition, Hilton Head Island, SC, USA,2000,125-132
    [100] S. Nowozin, C. H. Lampert. Global connectivity potentials for random field models[C].Proceedings of the2009IEEE Computer Society Conference on Computer Vision and PatternRecognition Workshops, Miami, FL, United states,2009,818-825
    [101] S. Vicente, V. Kolmogorov, C. Rother. Graph cut based image segmentation with connectivitypriors[C]. Proceedings of the26th IEEE Conference on Computer Vision and PatternRecognition, Anchorage, AK, United states,2008
    [102] V. Lempitsky, P. Kohli, C. Rother, et al. Image segmentation with a bounding box prior[C].Proceedings of the12th International Conference on Computer Vision, Kyoto, Japan,2009,277-284
    [103] S. Z. Li. Markov random field modeling in image analysis[M]. Springer-Verlag New York Inc,2009
    [104] Y. Y. Boykov, M. P. Jolly. Interactive graph cuts for optimal boundary region segmentation ofobjects in N-D images[C]. Proceedings of the8th International Conference on ComputerVision, Vancouver, BC, United states,2001,105-112
    [105] C. Rother, V. Kolmogorov, A. Blake."GrabCut"-Interactive foreground extraction usingiterated graph cuts[C]. Proceedings of the ACM SIGGRAPH2004, Los Angeles, United States,2004,309-314
    [106] B. Julesz. Texton, the elements of texture perception, and their interactions[J]. Nature,1981,290(5802):91-97
    [107] S.-C. Zhu, C.-E. Guo, Y. Wang, et al. What are textons?[J]. International Journal of ComputerVision,2005,62(1-2):121-143
    [108] T. Leung, J. Malik. Representing and recognizing the visual appearance of materials usingthree-dimensional textons[J]. International Journal of Computer Vision,2001,43(1):29-44
    [109] M. Varma, A. Zisserman. A statistical approach to texture classification from single images[J].International Journal of Computer Vision,2005,62(1-2):61-81
    [110] J. Winn, A. Criminisi, T. Minka. Object categorization by learned universal visualdictionary[C]. Proceedings of the10th IEEE International Conference on Computer Vision,Los Alamitos, CA, USA,2005,1800-1807
    [111] C. Elkan. Using the triangle inequality to accelerate k-means[C]. Proceedings of the20thInternational Conference on Machine Learning, Washington, DC, United states,2003,147-153
    [112] J. S. Beis, D. G. Lowe. Shape indexing using approximate nearest-neighbour search inhigh-dimensional spaces[C]. Proceedings of the1997IEEE Computer Society Conference onComputer Vision and Pattern Recognition, San Juan, PR, USA,1997,1000-1006
    [113] F.-F. Li, P. Perona. A bayesian hierarchical model for learning natural scene categories[C].Proceedings of the2005IEEE Computer Society Conference on Computer Vision and PatternRecognition, San Diego, CA, United states,2005,524-531
    [114] A. Bosch, A. Zisserman, X. Munoz. Scene classification via pLSA[C]. Proceedings of the9thEuropean Conference on Computer Vision, Graz, Austria,2006,517-530
    [115] P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple features[C].Proceedings of the2001IEEE Computer Society Conference on Computer Vision and PatternRecognition, Kauai, HI, United states,2001, I511-I518
    [116] F. Porikli. Integral histogram: A fast way to extract histograms in Cartesian spaces[C].Proceedings of the2005IEEE Computer Society Conference on Computer Vision and PatternRecognition, San Diego, CA, United states,2005,829-837
    [117] Y. Freund, R. E. Schapire. A decision-theoretic generalization of on-line learning and anapplication to boosting[C]. Proceedings of the2nd European Conference on ComputationalLearning Theory, Berlin, Germany,1995,23-37
    [118] Y. Freund, R. E. Schapire. A decision-theoretic generalization of on-line learning and anapplication to boosting[J]. journal of computer and system sciences,1997,55(1):119-139
    [119] J. Friedman, T. Hastie, R. Tibshirani. Additive logistic regression: a statistical view ofboosting (With discussion and a rejoinder by the authors)[J]. The annals of statistics,2000,28(2):337-407
    [120] R. Caruana. Multitask learning[J]. Machine Learning,1997,28(1):41-75
    [121] M. Szmnmer, P. Kohli, D. Hoiem. Learning CRFs using graph cuts[C]. Proceedings of the10th European Conference on Computer Vision, Berlin, Germany,2008,582-595
    [122] V. Kolmogorov, R. Zabin. What energy functions can be minimized via graph cuts?[J]. PatternAnalysis and Machine Intelligence, IEEE Transactions on,2004,26(2):147-159
    [123] Y. Boykov, O. Veksler, R. Zabih. Fast approximate energy minimization via graph cuts[J].Pattern Analysis and Machine Intelligence, IEEE Transactions on,2001,23(11):1222-1239
    [124] Z. Wu, R. Leahy. An optimal graph theoretic approach to data clustering: theory and itsapplication to image segmentation[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence,1993,15(11):1101-1113
    [125] J. Shi, J. Malik. Normalized cuts and image segmentation[C]. Proceedings of the1997IEEEComputer Society Conference on Computer Vision and Pattern Recognition, San Juan, PR,USA,1997,731-737
    [126] J. Sivic, B. C. Russell, A. A. Efros, et al. Discovering objects and their location in images[C].Proceedings of the10th IEEE International Conference on Computer Vision, Beijing, China,2005,370-377
    [127] D. G. Lowe. Object recognition from local scale-invariant features[C]. Proceedings of the7thIEEE International Conference on Computer Vision, Kerkyra, Greece,1999,1150-1157
    [128] P. Quelhas, F. Monay, J. M. Odobez, et al. Modeling scenes with local descriptors and latentaspects[C]. Proceedings of the10th IEEE International Conference on Computer Vision,Beijing, China,2005,883-890
    [129] D. M. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet allocation[J]. Journal of Machine LearningResearch,2003,3(4-5):993-1022
    [130] T. L. Griffiths, M. Steyvers. Finding scientific topics[J]. Proceedings of the National Academyof Sciences of the United States of America,2004,101(5228-5235)
    [131] L. Ladicky, C. Russell, P. Kohli, et al. Graph cut based inference with co-occurrencestatistics[C]. Proceedings of the11th European Conference on Computer Vision, Heraklion,Crete, Greece,2010,239-253
    [132] A. Rabinovich, A. Vedaldi, S. Belongie. Does image segmentation improve objectcategorization?[R].2007
    [133] A. Rabinovich, A. Vedaldi, C. Galleguillos, et al. Objects in context[C]. Proceedings of the11th IEEE International Conference on Computer Vision, Piscataway, NJ, USA,2007,1153-1160
    [134] I. Endres, D. Hoiem. Category independent object proposals[C]. Proceedings of the11thEuropean Conference on Computer Vision, Heraklion, Crete, Greece,2010,575-588
    [135] J. Carreira, C. Sminchisescu. Constrained parametric min-cuts for automatic objectsegmentation[C]. Proceedings of the2010IEEE Computer Society Conference on ComputerVision and Pattern Recognition, San Francisco, CA, United states,2010,3241-3248
    [136] C. Russell, P. Kohli, L. u. Ladicky, et al. Exact and approximate inference in associativehierarchical networks using graph cuts[C]. Proceedings of the26th Conference on Uncertaintyin Artificial Intelligence, Catalina Island, CA, United states,2010,501-508
    [137] R. Unnikrishnan, C. Pantofaru, M. Hebert. Toward objective evaluation of imagesegmentation algorithms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(6):929-944
    [138] T. Malisiewicz, A. A. Efros. Improving spatial support for objects via multiplesegmentations[C]. Proceedings of the2007British Machine Vision Conference, University ofWarwick, UK,2007,1-8
    [139] C. Fowlkes, D. Martin, J. Malik. Learning affinity functions for image segmentation:Combining patch-based and gradient-based approaches[C]. Proceedings of the2003IEEEComputer Society Conference on Computer Vision and Pattern Recognition, Madison, WI,United states,2003, II/54-II/61
    [140] D. R. Martin, C. C. Fowlkes, J. Malik. Learning to detect natural image boundaries using localbrightness, color, and texture cues[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence,2004,26(5):530-549
    [141] P. F. Felzenszwalb, D. P. Huttenlocher. Efficient graph-based image segmentation[J].International Journal of Computer Vision,2004,59(2):167-181
    [142] M. Maire, P. Arbelaez, C. Fowlkes, et al. Using contours to detect and localize junctions innatural images[C]. Proceedings of the26th IEEE Conference on Computer Vision and PatternRecognition, Anchorage, AK, United states,2008,1-8
    [143] V. Kolmogorov, Y. Boykov, C. Rother. Applications of parametric maxflow in computervision[C]. Proceedings of the11th International Conference on Computer Vision, Rio deJaneiro, Brazil,2007,560-567
    [144] S. Wang, J. M. Siskind. Image segmentation with ratio cut[J]. IEEE Transactions on PatternAnalysis and Machine Intelligence,2003,25(6):675-690
    [145] L. Breiman. Random forests[J]. Machine learning,2001,45(1):5-32
    [146] A. Jaiantilal. Classification and regression by randomforest-matlab[/OL]. http://code.google.com/p/randomforest-matlab,2009
    [147] J. Carbonell, J. Goldstein. The use of MMR, diversity-based reranking for reorderingdocuments and producing summaries[C]. Proceedings of the21st International ACM SIGIRConference on Research and Development in Information Retrieval, New York, NY, USA,1998,335-336
    [148] Y. L. Boureau, J. Ponce, Y. Lecun. A theoretical analysis of feature pooling in visualrecognition[C]. Proceedings of the27th International Conference on Machine Learning, Haifa,Israel,2010,111-118
    [149] Y. L. Boureau, N. Le Roux, F. Bach, et al. Ask the locals: Multi-way local pooling for imagerecognition[C]. Proceedings of the2011IEEE International Conference on Computer Vision,Barcelona, Spain,2011,2651-2658
    [150] K. Chatfield, V. Lempitsky, A. Vedaldi, et al. Pipeline details[/OL]. http://www.robots.ox. ac.uk/~vgg/research/encoding_eval/,2011
    [151] C. Schmid, R. Mohr. Local grayvalue invariants for image retrieval[J]. IEEE Transactions onPattern Analysis and Machine Intelligence,1997,19(5):530-535
    [152] S. Lazebnik, C. Schmid, J. Ponce. Beyond bags of features: Spatial pyramid matching forrecognizing natural scene categories[C]. Proceedings of the2006IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, New York, NY, United states,2006,2169-2178
    [153] F. Jurie, B. Triggs. Creating efficient codebooks for visual recognition[C]. Proceedings of the10th IEEE International Conference on Computer Vision, Beijing, China,2005,604-610
    [154] O. Boiman, E. Shechtman, M. Irani. In defense of nearest-neighbor based imageclassification[C]. Proceedings of the26th IEEE Conference on Computer Vision and PatternRecognition, Anchorage, AK, United states,2008,1-8
    [155] T. Hastie, R. Tibshirani, J. Friedman. The elements of statistical learning: data mining,inference, and prediction[M].2009
    [156] R. Bhatia. Positive definite matrices[M]. Princeton University Press,2007
    [157] V. Arsigny, P. Fillard, X. Pennec, et al. Geometric means in a novel vector space structure onsymmetric positive-definite matrices[J]. SIAM Journal on Matrix Analysis and Applications,2007,29(1):328-347
    [158] P. I. Davies, N. J. Higham. A Schur-Parlett algorithm for computing matrix functions[J].SIAM Journal on Matrix Analysis and Applications,2003,25(2):464-485
    [159] T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolution gray-scale and rotation invariant textureclassification with local binary patterns[J]. IEEE Transactions on Pattern Analysis andMachine Intelligence,2002,24(7):971-987
    [160] M. Everingham, L. VanGool, C. K. I. Williams, et al. PASCAL visual object classes challenge2007[/OL]. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/in dex.html,2007
    [161] O. Barinova, V. Lempitsky, P. Kohli. On detection of multiple object instances using houghtransforms[C]. Proceedings of the2010IEEE Computer Society Conference on ComputerVision and Pattern Recognition, San Francisco, CA, United states,2010,2233-2240
    [162] H. Wold. Soft modelling by latent variables: the non-linear iterative partial least squares(NIPALS) approach[J]. Perspectives in Probability and Statistics, In Honor of MS Bartlett,1975,117-144
    [163] S. Wold, A. Ruhe, H. Wold, et al. The collinearity problem in linear regression. The partialleast squares (PLS) approach to generalized inverses[J]. SIAM Journal on Scientific andStatistical Computing,1984,5(3):735-743
    [164] R. Rosipal, N. Kramer. Overview and recent advances in partial least squares[C]. Proceedingsof the Subspace, Latent Structure and Feature Selection-Statistical and OptimizationPerspectives Workshop, SLSFS2005, Feb23-252005, Bohinj, Slovenia,2005,34-51
    [165] A. L. Boulesteix, K. Strimmer. Partial least squares: a versatile tool for the analysis ofhigh-dimensional genomic data[J]. Briefings in bioinformatics,2007,8(1):32-44
    [166] J. J. Dai, L. Lieu, D. Rocke. Dimension reduction for classification with gene expressionmicroarray data[J]. Statistical Applications in Genetics and Molecular Biology,2006,5(1):Article6
    [167] D. V. Nguyen, D. M. Rocke. Multi-class cancer classification via partial least squares withgene expression profiles[J]. Bioinformatics,2002,18(9):1216-1226
    [168] D. V. Nguyen, D. M. Rocke. Tumor classification by partial least squares using microarraygene expression data[J]. Bioinformatics,2002,18(1):39-50
    [169] H. Abdi. Partial least squares regression and projection on latent structure regression (PLSRegression)[J]. Wiley Interdisciplinary Reviews: Computational Statistics,2010,2(1):97-106
    [170] D. Comaniciu, V. Ramesh, P. Meer. The variable bandwidth mean shift and data-driven scaleselection[C]. Proceedings of the8th International Conference on Computer Vision, Vancouver,BC, United states,2001,438-445
    [171] D. Comaniciu. An algorithm for data-driven bandwidth selection[J]. IEEE Transactions onPattern Analysis and Machine Intelligence,2003,25(2):281-288
    [172] M. Enzweiler, D. M. Gavrila. Monocular pedestrian detection: Survey and experiments[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(12):2179-2195
    [173] N. DALAL. Finding People in Images and Videos[D].2006
    [174] R. B. Girshick, P. F. Felzenszwalb, D. McAllester. Object detection with grammar models[C].Proceedings of the25th Annual Conference on Neural Information Processing Systems,Granada, Spain,2011,
    [175] J. Carreira, F. Li, C. Sminchisescu. Object recognition by sequential figure-ground ranking[J].International Journal of Computer Vision,2012,98(3):243-262
    [176] D. Lowe. Distinctive image features from scale-invariant keypoints[J]. International Journal ofComputer Vision,2004,60(2):91-110
    [177] K. Van De Sande, T. Gevers, C. Snoek. Evaluating color descriptors for object and scenerecognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1582-1596
    [178] A. Vedaldi, B. Fulkerson. VLFeat: An open and portable library of computer visionalgorithms[/OL]. http://www.vlfeat.org/,2012
    [179] F. Schroff. Semantic image segmentation and web-supervised visual learning[D]. Oxford,2009

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700