用户名: 密码: 验证码:
图像准稠密匹配及协分割
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
如今,随着软件和硬件的迅猛发展,图像已经成为人们记录信息的主流载体。相比于数字和文本而言,图像数据包含了更为丰富的信息,这些信息对于人类,更客观、语义层次更高,也更贴近现实。换言之,图像反映了真实世界中的场景、物体、物体与物体之间的关系。而对于机器而言,图像不过是以一定格式组织在一起的数据而已。那么,如何使机器能够理解图像的内容是使机器理解真实世界的关键问题之一。也正是因为这个原因,图像理解成为计算机视觉和模式识别领域中最基础最重要的主题之一。
     计算机对图像的理解,通常要从低层信息入手,挖掘图像中具有代表性的信息,进而组织信息。再通过寻找图像之间信息的对应性来探索更为高层的信息。这与人类认知的本质是极为一致的。图像的特征主要分为两类:全局图像特征和局部图像特征。全局图像特征关注于图像的整体信息,其优点是简单高效,但是它对图像的变换、噪音和遮挡十分敏感。局部图像特征则倾向于图像的细节,相比于全局图像特征,其优点在于它对图像的变换、噪音和遮挡等因素具有良好的鲁棒性,但是处理的时间会增加。随着硬件的快速更新以及现实任务的需求,局部图像特征已成为图像特征的主要研究对象。本论文从图像的视觉信息出发,对局部图像特征的描述、场景级别的准稠密匹配、物体级别的(准稠密)匹配及物体的协同分割几个方面开展研究。
     1)赋赋予予特特征征描描述述符符以以镜镜面面翻翻转转不不变变性性的的方方法法。。尽管国内外学者已经设计了很多图像特征描述的方法,这些方法能够有效的处理图像的缩放、旋转和视角等变换。但是,对镜面翻转的情况却鲜有研究。文中提出一种具有镜面翻转不变性的特征描述框架,赋予传统的特征描述符以镜面翻转的鲁棒性并保持了原有的特性及优势,包括平移、旋转和尺度不变性。我们并不是要设计一种全新的特征描述符,而是提出了一个框架能提供大多数特征描述符以翻转不变性。镜面翻转不变性的提出拓宽了特征描述的应用范围。
     2)加加入入三三角角形形几几何何约约束束的的特特征征匹匹配配方方法法。。获取了图像特征的描述,图像特征描述符的匹配是图像理解的另一关键之处。衡量特征匹配性能主要包括两个方面:1)正确匹配的数量和2)正确匹配的比率。正确匹配数量对三维重建等任务十分关键,而正确匹配率则是结果正确性的保证。根据不同的应用需求,传统的匹配方法通常以牺牲一方而获取另一方。这种此消彼长的模式限制了图像理解本身以及其应用的性能改善。本文提出了一种加入几何约束的特征匹配方法,即三角约束特征匹配,来同时提高这两个性能指标得到准稠密的、高精度的匹配结果,从而突破了长期以来存在的瓶颈。
     3)物物体体级级别别的的特特征征匹匹配配算算法法和和物物体体协协同同分分割割方方法法。。相比于像素和局部特征,物体是人类认知中更为有效的单元。为此,本文还开展了基于图像的特征描述符的匹配进而挖掘图像中的物体之间的关系。物体级别的匹配充分利用了匹配特征点对之间的尺度、旋转、空间关系及描述符相似度,在没有任何先验信息的情况下,有效的寻找物体之间的对应关系。由于图像的特征匹配是以点对的形式存在,很难覆盖物体的全部信息。为了克服这一困难,本文最后设计了一种协同分割的方法,提出对应物体的更多信息。
     本文以图像特征描述、图像特征匹配、物体特征匹配和协同分割为纵向主线依次进行描述。为了验证所提出的方法的有效性和鲁棒性,文中结合大量模拟数据和真实数据对相应方法的性能进行定性及定量分析。从实验结果中我们发现,所提出的方法相比于对比方法性能上有显著的提高。
Recently, as techniques rapidly develop, images become the dominant informationcarriage of people. Compared with digits and texts, the content of images is much richer,which is more objective with high semantic level. In other words, images reflect scenes,objects and relationships of the objects. As for computers, image data is just in a specificway of organization. As a result, the problem of how to understand the image data isone of the key issues for machines to be intelligent. That is why image understandingbecomes one of the most fundamental and important topics in the fields of computervision and pattern recognition.
     To understand what images represent, representative information usually is extract-ed from pixels, and then the description is employed to organize such information. Next,the high level information is explored via finding correspondence between images. Thisprocedure is very similar with the way that human beings perceive from real world. Im-age features can be grouped into two categories, i.e. Global Image Feature and LocalImage Feature. The former one focuses on the whole image, the advantage of which isits efciency. But, it is very sensitive to image transformation, noise and occlusion. Incontrary, local image features take care of local characteristic of images. It is relative-ly robust to the factors including image transformation, noise and occlusion with longercomputational time. Fortunately, due to the development of hardware and the demand oftasks in real world, local image features have attracted more attention from researcher-s. This dissertation starts from image visual information, and focuses on image featuredescription, scene-level quasi dense matching, object-level (quasi dense) matching andobject co-segmentation.
     1) Mirror Reflection Invariant Description Method. Although many image fea-ture descriptors have been developed by researchers which can efectively handle scale,rotation and view-point changes, the mirror reflection remains difcult and limited work iscarried out for addressing the difculty. In this work, we propose a framework for descrip-tors to be mirror reflection invariant, which enriches most of the existing descriptors withmirror reflection invariance meanwhile preserves the original advantages. The descriptorswith more invariances broaden the applicable range of image feature descriptors.
     2) Geometric Constraint Based Image Feature Matching Method. In addition,the matching of image feature descriptors is another key issue of image understanding.The performance of image feature matching is measured by two metrics, including thenumber of correct matches and the matching accuracy. According to diferent require-ments of applications, traditional matching methods usually improve one aspect by sacri-ficing the other, which limits the improvement of performance for both image understand-ing itself and its applications. This dissertation proposes a matching method enforcing ageometric constraint, i.e. Triangle Constraint, to simultaneously improve both the num-ber of correct matches and the matching accuracy and thus obtain precise and quasi densematching results.
     3) Object-level Matching and Co-segmentation. Based on the matching result, wefurther explore the object-level relationship within images. The exploration utilizes thescale, rotation, relative position and descriptor similarity information of matched featurepairs, without any prior knowledge, to distinguish diferent objects. Due to the charac-teristic of point-based image features, it is very unlikely to recover the whole objects byonly using the matching pairs. To recover and extract the object information as much aspossible, we finally design a co-segmentation scheme.
     Extensive experiments on both simulated data and real data demonstrate the efec-tiveness and robustness of our proposed methods quantitatively and qualitatively. Fromthe results, we can find that the methods proposed in this work have better performancecompared with the state-of-the-arts.
引文
[1] Datta R, Jpshi D, Li J, et al. Image Retrieval: Ideas, Influences and Trends of the New Age[J].ACM Computing Surveys,2008,40(2):1–59.
    [2] Wang X, Yang M, Cour T, et al. Contextual Weighting for Vocabulary Tree Based ImageRetrieval[C]. Proc IEEE International Conf. on Computer Vision,2011:209–216.
    [3] Liu Z, Li H, Zhou W, et al. Embedding Spatial Context Information into Inverted File forLarge-scale Image Retrieval[C]. Proc ACM Multimedia,2012:199–208.
    [4] Chum O, Perdoch M, Matas J. Geometric Min-hashing: Finding a (Thick) Needle in aHaystack[C]. Proc IEEE Conf. on Computer Vision and Pattern Recognition,2009:17–24.
    [5] Agarwal S, Snavely N, Simon I, et al. Building Rome in a Day[C]. Proc IEEE InternationalConf. on Computer Vision,2009:72–79.
    [6] Lafarge F, Mallet C. Building Large Urban Enviroments from Unstructured Point Data[C]. ProcIEEE Conf. on Computer Vision and Pattern Recognition,2011:1068–1075.
    [7] Liao H, Lin Y, Medioni G. Aerial3D Reconstruction with Line-constrained Dynamic Program-ming[C]. Proc IEEE International Conf. on Computer Vision,2011:1855–1862.
    [8] Saxena A, Sun M, Ng A.3-D Reconstruction from Sparse Views using Monocular Vision[C].Proc IEEE International Conf. on Computer Vision,2007:1–8.
    [9] Xiao J, Shah M. Two-frame wide baseline matching[C]. Proc IEEE International Conf. onComputer Vision,2003:603–609.
    [10] Bay H, Ferrari V, van Gool L. Wide-baseline stereo matching with line segments[C]. Proc IEEEConf. on Computer Vision and Pattern Recognition,2005:329–336.
    [11] Ishii J, Sakai S, Ito K, et al. Wide-baseline stereo matching using ASIFT and POC[C]. ProcIEEE International Conf. on Image Processing,2012:2977–2980.
    [12] Lu X, Manduchi R. A New Structural Constraint and its Application in Wide Baseline Match-ing[C]. Proc IEEE International Conf. on Pattern Recogntion,2006:84–89.
    [13] Zhao X, Zou X, Zheng J. An efcient mosaic panorama technique for medical video[C]. ProcIEEE International Conf. on Complex Medical Engineering,2010:40–44.
    [14] Seo D, Kim S, Yoo J, et al. Immersive panorama TV service system[C]. Proc IEEE InternationalConf. on Consumer Electronics,2012:201–202.
    [15] Deng X, Wu F, Wu Y, et al. Automatic spherical panorama generation with two fisheyeimages[C]. World Congress on Intelligent Control and Automation,2008:5955–5959.
    [16] Kim B, Lee S, Cho N. Real-time panorama canvas of natural images[J]. IEEE Transactions onConsumer Electronics,2011,57(4):1961–1968.
    [17] Doretto G, Chiuso A, Soatto S, et al. Dynamic Textures[J]. International Journal of ComputerVision,2003,51(2):91–109.
    [18] Lu Z, Xie W, Pei J, et al. Dynamic Texture Recognition by Spatiotemporal MultiresolutionHistogram[C]. Proc. IEEE Workshop Motion and Video Computing,2005:241–246.
    [19] Smith J, Lin C, Naphade M. Video Texture Indexing Using Spatiotemporal Wavelets[C]. Proc.IEEE Conf. on Image Processing,2002:437–440.
    [20] Shan C, Gong S, McOwan P. Robust Facial Expression Recognition Using Local BinaryPatterns[C]. Proc. IEEE Conf. on Image Processing,2005:370–373.
    [21] Han X, Chen Y, Ruan X. Canonical correlation analysis of local feature set for view-basedobject recognition[C]. Proc. IEEE Conf. on Image Processing,2011:3601–3604.
    [22] Alexe B, Deselaers T, Ferrari V. What is an object?[C]. Proc. IEEE Conf. on Computer Visionand Pattern Recognition,2010:73–80.
    [23] Diplaros A, Gevers T, Patras I. Combining color and shape information for illumination-viewpoint invariant object recognition[J]. IEEE Trans. on Image Processing,2006,15(1):1–11.
    [24] Serre T, Wolf L, Bileschi S, et al. Robust Object Recognition with Cortex-Like Mechanisms[J].IEEE Trans. on Pattern Analysis and Machine Intelligence,2007,29(3):411–426.
    [25] Zhang Z, Matsushita Y, Ma Y. Camera calibration with lens distortion from low-rank tex-tures[C]. Proc IEEE Conf. on Computer Vision and Pattern Recognition,2011:2321–2328.
    [26] Zhang Z. Camera calibration with one-dimensional objects[J]. IEEE Trans. on Pattern Analysisand Machine Intelligence,2004,26(7):892–899.
    [27] Haq A, Gondal I, Murshed M. On Temporal Order Invariance for View-Invariant Action Recog-nition[J]. IEEE Trans. on Circuits and Systems for Video Technology,2013,23(2):203–211.
    [28] Derpanis K, Sizintsev M, Cannons K, et al. Action Spotting and Recognition Based on a Spa-tiotemporal Orientation Analysis[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence,2013,35(3):527–540.
    [29] Wu Q, adn F. Deng Z W, Chi Z, et al. Realistic Human Action Recognition With MultimodalFeature Selection and Fusion[J]. IEEE Trans. on Systems, Man and Cybernetics, Part A: Sys-tems and Humans.
    [30] Gilbert A, Illingworth J, Bowden R. Action Recognition Using Mined Hierarchical CompoundFeatures[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence,2011,33(5):883–897.
    [31] Junejo I, Dexter E, Laptev I, et al. View-Independent Action Recognition from Temporal Self-Similarities[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence,2011,33(1):172–185.
    [32] Harris C, Stephens M J. A combined corner and edge detector[C]. Alvey Vision Conference,1988,20:147–152.
    [33] Lindeberg T. Feature detection with automatic scale selection[J]. IJCV,1998,30(2):79–116.
    [34] Mikolajczyk K, Schmid C. Indexing Based on Scale Invariant Interest Points[C]. ICCV,2001:525–531.
    [35] Lowe D. Distinctive Image Features from Scale-Invariant Keypoints[J]. IJCV,2004,60(2):91–110.
    [36] Mikolajczyk K, Schmid C. A Performance Evalution of Local Descriptors[J]. PAMI,2005,27(10):1651–1630.
    [37] Bay H, Tuytelaars T, Van Gool. L. SURF:Speeded up robust features[C]. ECCV,2006:404–417.
    [38] Tola E, Lepetit V, Fua P. A Fast Local Descriptor for Dense Matching[C]. CVPR,2008:1–8.
    [39] Mikolajczyk K, Tuytelaars T, Schmid C, et al. A comparison of afne region detectors[J]. IJCV,2005,65(1/2):43–72.
    [40] Winder S, Brown M. Learning local image descriptors[C]. CVPR,2007:1–8.
    [41] Winder S, Hua G, Brown M. Picking the best DAISY[C]. CVPR,2009:1–8.
    [42] Moreels P, Perona P. Evaluation of Features Detectors and Descriptors based on3D Objects[J].IJCV,2006,73(3):263–284.
    [43] Lindeberg T. Scale-space theory: A basic tool for analysing structures at diferent scales[J]. J.Appl. Statist.,1994,21(2):224–270.
    [44] Lindeberg T, Garding J. Shape-adapted smoothing in estimation of3-D shape cues from afnedeformations of local2-D brightness structure[J]. Image Vision Comout.,1997,15(6):415–434.
    [45] Mikolajczyk K, Schmid C. Scale and afne invariant interest point detectors[J]. IJCV,2004,1(60):63–86.
    [46] Zabih R, Woodfill J. Non-parametric local transforms for computing visual correspondance[C].ECCV,1994:151–158.
    [47] Belongie S, Malik J, Puzicha J. Shape matching and object recognition using shape contexts[J].IEEE Tran. Pattern Analysis and Machine Intelligence,2002,2(4):509–522.
    [48] Carneiro G, Jepson A. Phase-based local features[C]. ECCV,2002:282–296.
    [49] Lazebnik S, Schmid C, Ponce J. Sparse texture representation using afne-invariant neighbor-hoods[C]. CVPR,2003:319–324.
    [50] Johnson A, Hebert M. Object recognition by matching oriented points[C]. CVPR,1997:684–689.
    [51] Scovanner P, Ali S, Shah M. A3-Dimensional SIFT Descriptor and its Application to ActionRecognition[C]. ACM International conference on Multimedia,2007:357–360.
    [52] Ke Y, Suktnankar R. PCA-SIFT: A More Distictive Representation for Local Image Descrip-tors[C]. CVPR,2004,2:506–513.
    [53] Li C, Ma L. A new framework for feature descriptor based on SIFT[J]. Pattern RecognitionLetters,2009,30(5):544–557.
    [54] Zhang D, Lu G. A Comparative Study on Shape Retrieval Using Fourier Descriptors withDiferent Shape Signatures[J]. Journal of Visual Communication and Image Representation,2003,14(1):41–60.
    [55] Gabor D. Theory of communication[J]. J. IEE,1946,3(93):429–457.
    [56] Tieng Q, Boles W. Recognition of2D Object Contours Using the Wavelet Transform Zero-Crossing Representation[J]. PAMI,1997,12(8):910–916.
    [57] Florack L, Romeny B, Koenderink J, et al. General intensity transformations and second orderinvariants[C]. Proc. Scandinavian Conf. Image Analysis,1991:338–345.
    [58] Schmid C, Mohr R. Local grayvalue invariants for image retrieval[J]. PAMI,1997,19(5):530–534.
    [59] Freeman W, Adelson E. The design and use of steerable filters[J]. PAMI,1991,13(9):891–906.
    [60] Schafalitzky F, Zisserman A. Multi-View Matching for Unordered Image Sets[C]. ECCV,2002:414–431.
    [61] Baumberg A. Reliable feature matching across widely separated views[C]. CVPR,2000:774–781.
    [62] Mokhtarian F, Mackworth S. A theory of multiscale, curvature-based shape representation forplanar curves[J]. PAMI,1992,14(8):789–805.
    [63] Lin I, Kung S. Coding and comparison of dags as a novel neural structure with application toon-line handwriting recognition[J]. IEEE Tran. Signal Processing,1996.
    [64] Zhang W, Kosecka J. Image Based Localization in Urban Environments[C].3DPVT,2006:33–40.
    [65] Jegou H, Douze M, Schmid. C. Hamming embedding and weak geometric consistency for largescale image search[C]. ECCV,2008:304–317.
    [66] Bosch A, Zisserman A, Muoz X. Scene classification using a hybrid generative/discriminativeapproach[J]. IEEE Trans. Pattern Anal. Mach. Intell.,2008,30(4):712–727.
    [67] Burghouts G, Geuserbroek J. Performance evaluation of local colour invariants[J]. ComputerVision and Image Understanding,2009,113(1):48–62.
    [68] van de Sande K, Gevers T, Snoek C. Evaluating color descriptors for object and scene recogni-tion[J]. IEEE Trans. Pattern Anal. Mach. Intell.,2010,32(9):1582–1596.
    [69] Joly A. New Local descriptors based on dissociated dipoles[C]. Proc. of ACM Int. Conf. onImage and Video Retrieval,2007:573–580.
    [70] Ma R, Chen J, Su Z. MI-SIFT: mirror and inversion invariant generalization for SIFT descrip-tor[C]. Proc. of ACM Int. Conf. on Image and Video Retrieval,2010:228–235.
    [71] Hayfron-Acquah J B, Nixon M S, Carter J N. Automatic gait recognition by symmetry analy-sis[J]. Pattern Recognition Letters,2003,24:2175–218.
    [72] Choi I, Chien S I. A generlized Symmetry transfor with selective attention capability for specificcorner angels[J]. IEEE Signal Processing Letters,2004,11:255–257.
    [73] Hartley R I, Zisserman A. Multiple View Geometry in Computer Vision[M].[S.l.]: CambridgeUniversity Press,2004.
    [74] Fischler M A, Bolles R C. Random Sample Consensus: A Paradigm for Model FItting with Ap-plications to Image Analysis and Automated Cartography[C]. Commun. Assoc. Comp. Mach.,1981,24:381–395.
    [75] Lee S, Liu Y. Curved glide-reflection symmetry detection[C]. CVPR,2009:1046–1053.
    [76] Tuytelaars T, Van Gool L. Matching Widely Separated Views Based on Afne Invariant Region-s[J]. IJCV,2004,1(59):61–85.
    [77] Brown M, Lowe D. Recognising Panoramas[C]. ICCV,2003:1218–1227.
    [78] Leordeanu M, Hebert M. A spectral technique for correspondence problems using pairwiseconstraints[C]. ICCV,2005,2:1482–1489.
    [79] Rabin J, Delon J, Gousseau Y. Circular Earth Mover’s Distance for the comparison of localfeatures[C]. ICPR,2008:1–4.
    [80] Jiang H, Yu S. Linear Solution to Scale and Rotation Invariant Object Matching[C]. CVPR,2009:2474–2481.
    [81] Ta D, Chen W, Gelfand N, et al. SURFTrac: Efcient tracking and continuous object recognitionusing local feature descriptors[C]. IEEE conf. on Comp. Vis. and Patt. Recog.,2009:2937–2944.
    [82] Fan J, Shen X, Wu Y. What Are We Tracking: A Unified Approach of Tracking and Recogni-tion[J]. IEEE Transactions on Image Processing,2013,2(22):549–560.
    [83] Ling X M H. Robust Visual Tracking and Vehicle Classification via Sparse Representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,11(33):2259–2272.
    [84] Harrison A, Joseph D. Maximum Likelihood Estimation of Depth Maps Using PhotometricStereo[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,7(34):1368–1380.
    [85] Luong Q, Faugeras O. The fundamental matrix: Theory, algorithms and stability analysis[J].Int. Journal of Comp. Vis.,1996,17(1):43–76.
    [86] Chesi G, Garulli A, Vicino A, et al. Estimating the fundamental matrix via constrainted least-squares: a convex approach[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence,2002,24(3):397–401.
    [87] Fathy M, Hussein A, Tolba M. Fundamental matrix estimation: A study of error criteria[J].Pattern Recognition Letters,2011,32(2):383–391.
    [88] Bishop C M. Pattern Recognition and Machine Learning[M].[S.l.]: Springer,2006.
    [89] Cai J, Cande`s E, Shen Z. A Singular Value Thresholding Algorithm for Matrix Completion[J].SIAM Journal on Optimization,2010,20(4):1956–1982.
    [90] Toh K, Yun S. An Accelerated Proximal Gradient Algorithm for Nuclear Norm RegularizedLeast Squares Problems[J]. Pacific Journal of Optimization,2010,6(1):615–640.
    [91] Liu Z, Vandenberghe L. Interior-Point Method for Nuclear Norm Approximation with Appli-cation to System Identification[J]. SIAM Journal on Matrix Analysis and Applications,2009,31(3):1235–1256.
    [92] Lin Z, Chen M, Wu L, et al. The augmented lagrange multiplier method for exact recovery ofcorrupted low-rank matrices[R].[S.l.]: UIUC Technical Report,2009.
    [93] Bertsekas D. Nonlinear Programming[M].[S.l.]: Athena Scientic,2004.
    [94] Yuan X, Yang J. Sparse and low-rank matrix decomposition via alternating direction methods[J].Pacific Journal of Optimization. Preprint.
    [95] Ferrari V, Tuytelaars T, Van Gool L. Simultaneous object recognition and segmentation fromsingle or multiple model views[J]. Int. Journal on Comp. Vis.,2006,67(2):159–188.
    [96] Kannala J, Brandt S. Quasi-Dense Wide Baseline Matching Using Match Propagation[C]. IEEEconf. on Comp. Vis. and Patt. Recog.,2007.
    [97] Cho M, Lee J, Lee K. Feature Correspondence and Deformable Object Matching via Agglom-erative Correspondence Clustering[C]. ICCV,2009:1280–1287.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700