基于局部不变特征的图像分类研究

英文题名：Research on Local Invariant Feature Based Image Classification
作者：黄飞
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：局部不变特征 ; 图像分类 ; 词包模型 ; 概率潜在语义分析
英文关键词：local invariant feature ; image classification ; bow ; plsa
学位年度：2013
导师：景晓军
学科代码：081001
学位授予单位：北京邮电大学
论文提交日期：2013-01-10

摘要

随着数字技术的发展,如何有效组织、检索、分类大量图像信息成为热点研究课题,图像分类方法是其中的一项关键技术。本文研究了基于局部不变特征的图像分类方法,包括基于多码书词包模型的图像分类技术和基于概率潜在语义分析的改进型图像分类算法两个主要方面。论文主要贡献如下：
     一、提出了基于局部特征显著性准则的类属码书生成方法,并设计了基于多码书的多类分类器。图像的词包模型(BoW)可以有效利用局部不变特征,近年来在图像分类等领域应用十分广泛。码书生成是BoW模型中的重要技术,本文提出的类属码书生成算法在一定程度上避免了直接采用K-均值等方法聚类时造成的码书辨别性损失的问题。基于多码书的多类分类器在有效利用各个码书的辨别性的同时,降低了BoW向量的维度,也方便了新增类别的添加。
     二、提出了结合空间信息的概率潜在语义分析(pLSA),并应用于场景分类。在BoW模型中,视觉词汇通过特征聚类而来,难免会产生一些同义词和多义词。pLSA可以有效的解决BoW的多义词和同义词现象。但是,一般基于pLSA的图像分类方法,并没用利用图像的空间信息。空间信息对于图像分类任务是非常重要的,本课题在pLSA中加入空间信息,提高了分类准确率。这里的空间信息包括两个部分,一是将各个视觉词汇的邻域词汇作为上下文信息,二是各个潜在主题的空间位置信息。
     最后在上述理论基础上设计仿真实验,结果符合预期,较好地验证了理论研究的结论和实际应用的可行性。
As digital technology has been developing fast in recent years, how to effectively organize, search and classify vast quantities of image has become a valuable research subject with image classification as one of the most important parts. In this dissertation, we discuss image classification techniques based on local invariant feature. More specifically, they are based on bag-of-words (BoW) model and probabilistic Latent Semantic Analysis (pLSA). The main work includes the followings:
     The proposal of a new method to build class-specific codebook based on features' significance and a multiclass classifier based on class-specific codebooks. BoW model, which has been widely used in image classification, was proposed for the efficient use of local invariant features. Codebook is an important part of BoW, but the k-means like clustering method used to build codebook may lower codebook's discriminative ability. The technique we introduced here can alleviate the loss of discriminative ability. The multiclass classifier based on class-specific codebooks can take advantage of the discriminative ability of class-specific codebook and lower the dimension of BoW vector.
     The proposal of the pLSA which incorporated spatial information and was applied to scene classification tasks. In BoW, the code words come from feature clustering, which may generate some "polysemy" and "synonymy". pLSA has the ability to solve the problem of "polysemy" and "synonymy" and has been successfully used in scene classification as an intermediate representation of images. However, it didn't utilize the spatial information of an image which is important for scene classification tasks. To improve the accuracy of classification, we proposed a new method which incorporates spatial information coming from neighbor words and topics' position into pLSA. Finally, an image can be represented by the position distribution of each latent topic, and subsequently, we train a classifier on the topics' position distribution vector for each image.

引文

[1]任桢.图像分类任务的关键技术研究[学位论文].哈尔滨工程大学2009
    [2]曾璞.面向语义提取的图像分类关键技术研究[学位论文].国防科学技术大学2009.
    [3]孙浩,王程,王润生.局部不变特征综述[J].中国图象图形学报,2011,16(2)：141-151
    [4]G.Csurka, C. Dance, L. Fan, J.Williamowski, and C. Bray. Visual Categorization with Bags of Keypoints. In ECCV'04 workshop on Statistical Learning in Computer Vision, pages 59-74,2004.
    [5]T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. In ICCV,2007.
    [6]J. S. Sivic and A. Zisserman. Video google:A text retrieval approach to object matching in videos. In Proc. of International Conference on Computer Vision, volume 2, pages 1470-1477,2003.
    [7]A. Oliva and A. Torralba. Modeling the shape of the scene:a holistic representation of the spatial envelope. Int. Journal of Computer Vision.42.2001.
    [8]D. Gabor, "Theory of Communication," J. IEE, vol.3, no.93,pp.429-457,1946
    [9]J.K.M. Vetterli, Wavelets and Subband Coding. Prentice Hall,1995.
    [10]M.Stricker, M. Orengo. Similarity of color images. In proceedings of the International Society for Optical Engineering Storage and Retrieval for Image and Video Databases,1995,2(4):381-392.
    [11]F. Perronnin, Universal and adapted vocabularies for generic visual categorization, IEEE Trans. on PAMI, vol.30, no.7,2008.
    [12]Lindeberg T. Feature Detection with Automatic Scale Selection. International Journal of Computer Vision,1998,30(2):79-116.
    [13]David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision,60,2 (2004), pp.91-110.
    [14]Lowe, D.G.1999. Object recognition from local scale-invariant fea-tures. In International Conference on Computer Vision,Corfu,Greece, pp.1150-1157.
    [15]H. Bay, T. Tuytelaars, and L. Van Gool. Surf:Speeded up robust features. In European Conference on Computer Vision, May 2006.1,2
    [16]M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief:Binary robust independent elementary features. In European Conference on Computer Vision,2010.
    [17]E. Rosten, R. Porter, and T. Drummond. Faster and better:A machine learning approach to corner detection. IEEE Trans. Pattern Analysis and Machine Intelligence,32:105-119,2010.1
    [18]J. Koenderink and A. van Doom, Representation of Local Geometry in the Visual System, Biological Cybernetics, vol.55,pp.367-375,1987
    [19]L. Florack, B. terHaarRomeny, J. Koenderink, and M. Viergever, General Intensity Transformations and Second Order Invar-iants, Proc. Seventh Scandinavian Conf. Image Analysis, pp.338-345,1991
    [20]Y. Ke and R. Sukthankar, PCA-SIFT:A More Distinctive Representation for Local Image Descriptors, Proc. Conf. Computer Vision and Pattern Recognition, pp.511-517,2004.
    [21]E. Rublee, V. Rabaud, K. Konolige and G. Bradski. ORB:an efficient alternative to SIFT or SURF, In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 2011.
    [22]K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE PAMI,27(10):1615-1630,2005.
    [23]C. Harris and M.J. Stephens. A combined corner and edge detector. In Alvey Vision Conference.1988,147-152P
    [24]Mikolajczyk K., SchmidC.. Indexing based on scale invariantinterest points. Proc. Int. Conf. Computer Vision. Vancouver, Canada.2001,525-531P
    [25]K. Mikolajczyk and C. Schmid, Scale and Affine InvariantInterest Point Detectors, Int'l J. Computer Vision, vol.1, no.60,pp.63-86,2004.
    [26]K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas,F. Schaffalitzky, T. Kadir, L. Van Gool. A comparison of affine region detectors. International Journal of Computer Vision.2005,65(1-2):43-72P
    [27]ChengXiangZhai, AtulyaVelivelli, Bei Yu, A cross-collection mixturemodel for comparative text mining, In Proc. of International conference on Knowledge discovery and datamining, August 22-25,2004, Seattle, WA, USA.
    [28]Bosch A, Zisserman A, Munoz X. Scene classification via pLSA. In Proceedings of European Conference of Computer Vision.4:517-30,2006.
    [29]A. Bosch, A. Zisserman, and X. Munoz. Scene classification using a hybrid generative/dicriminative approach, IEEE Trans. On PAMI,2008
    [30]万华林,Morshed U. Chowdhury.基于支持向量机的图像语义分类[J].软件学报,2003,14(11)：1891～1899
    [31]K. Grauman and T. Darrell, "Pyramid match kernels:Discriminative classification with sets of image features," in Proceedings of the IEEE International Conference on Computer Vision (ICCV),2005.
    [32]A. Vailaya, M.A.T. Figueiredo, A.K. Jain, H.J. Zhang, Image classification for content-based indexing, IEEE Trans. Image Process.10 (1) (2001) 117-130.
    [33]S. Lazebnik, C.Schmid, J. Ponce, Beyond Bags of Features:Spatial Pyramid Matching for Recognizing Natural Scene Categories, CVPR 2006
    [34]C.Wallraven, B. Caputo, and A. Graf, Recognition with local features:the kernel recipe, in Proceedings of IEEE conf. ICCV,2003
    [35]Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In Proc. of the CVPR. Minneapolis,2007.
    [36]Yang J, Yu K, Gong Y, Huang T. Linear spatial pyramid matching using sparse coding for image classification. In:Proc. of the IEEE Conf. on CVPR.2009
    [37]Yu K, Zhang T. Gong YH. Nolinear learning using local coordinate coding. In: Proc. of the NIPS 2009.2009.
    [38]Wang J, Yang J, Yu K, Lii F, Huang T, Gong Y. Locality-Constrained linear coding for image classification. In:Proc. of th e IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).2010.3360-3367.
    [39]O. Boiman, E. Shechtman and M. Irani In defense of nearest-neighbor based image classification, Proc. IEEE CVPR, pp.1-82008
    [40]A. Berg, T. Berg, and J. Malik, Shape matching and object recognition using low distortion correspondences, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2005.
    [41]J. Yang, Y. G. Jiang, A. G. Hauptmann, and C. W. Ngo, Evaluating bag-of-visual-words representations in scene classification, in ACM MIR,2007
    [42]Y. Yang and X. Liu, A comparative study on feature selection in text categorization, in ICML,1997.
    [43]黄鉴欣,曲延云,李翠华,胡妙君.基于词袋的图像分类中的分类器比较研究。In Pattern Recognition,2009. CCPR 2009. Chinese Conference on,2009.
    [44]C. Cortes and V. Vapnik. Support-vector Network. Machine Learning, 20:273-297,1995.
    [45]T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors:a survey. Found. Trends. Comput.Graph. Vis.,3(3):177-280,2008.
    [46]C. wang, D.M. Blei, and L. Feifei. Simultaneous image classification and annotation. In CVPR,1903-1910, June.2009
    [47]R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, Learning Object Categories from Google's Image Search, Proc. Intl Conf. Computer Vision, vol.Ⅱ, pp. 1816-1823, Oct.2005.
    [48]P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. PAMI,2010.
    [49]Megha Pandey and Svetlana Lazebnik, Scene Recognition and Weakly Supervised Object Localization with Deformable Part-Based Models, ICCV 2011.
    [50]E. Ergul and N. Arica. Scene Classification Using Spatial Pyramid of Latent Topics, ICPR 2010 1991,1992
    [51]L. Fei-Fei and P. Perona, A Bayesian Hierarchical Model for Learning Natural Scene Categories, Proc. IEEE Conf. CVPR, pp.524-531,2005.
    [52]A. Opelt, A. Pinz, M. Fussenegger, and P. Auer. Generic object recognition with boosting. PAMI, March 2006.
    [53]Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, vol.41, no.2,2001, pp.177-196.
    [54]GemertJ, VeenmanC, Smeulders A. Visual word ambiguity. IEEE Trans on Pattern Analysis and Machine Intelligence,2010,32(7):1271-1283
    [55]J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W.Ngo. Evaluating bag-of-visual-words representations inscene classification. In Proceedings of the internationalworkshop on Multimedia Information Retrieval, pages 197-206, New York, NY, USA,2007. ACM.
    [56]Teng Li, Tao Mei, In-So Kweon,. Contextual Bag-of-Words for Visual Categorization, IEEE Transactions on CSVT, April 2011, pp.381-92.
    [57]http://www.csdn.net/article/2012-10-18/2810904-facebook-has-220-billion-of-yo ur-photos
    [58]C.-C. Chang and C.-J. Lin, LIBSVM:A Library for Support Vector Machines, 2001 Available:http://www.csie.ntu.edu.tw/-cjlin/libsvm

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700