上下文特征结合空间金字塔模型的场景分类算法研究

英文题名：Research on Scene Classification Technologies with the Local Context Feature and Spatial Pyramid Model
作者：涂潇蕾
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：场景图像分类 ; 上下文特征 ; 互补特征 ; 具体类视觉词汇 ; 稀疏编码 ; 空间金字塔匹配
英文关键词：scene image classification ; context feature ; complementary feature ; category-specific visual words ; sparse coding ; spatial pyramid matching
学位年度：2012
导师：胡正平
学科代码：081001
学位授予单位：燕山大学
论文提交日期：2011-11-01

摘要

场景图像分类是依据人类视觉感知原理，对包含不同语义信息的图像进行自动分类的过程，为指导目标识别等视觉任务提供了重要的环境线索，成为当前计算机视觉领域的研究热点。与文本中的单词相似，对图像进行视觉词汇的建模可以形成一种中层表示方法，建立有效的场景语义描述。本文在场景图像的词包模型基础上，围绕特征提取、视觉词汇构建及视觉词包描述方面开展以下研究：
     首先，针对传统视觉词汇仅由独立的局部视觉特征形成，忽略图像特征间相邻关系的缺陷，构建一类包含多方向上下文信息的视觉特征，利用具体类分离的词汇生成方式形成视觉词汇表，进而结合空间金字塔模型来完成场景分类。该方法将图像在特征域的相似性同空间域的上下文关系有机的结合起来并加以类别区分，在实验中取得了较好的分类效果。
     其次，鉴于上下文关系在图像特征表示方面的重要作用，对图像邻域信息的有效性做进一步研究，利用图像区域的平坦度构建一种无监督地自适应上下文特征提取方案，依据图像具体类别生成视觉词汇，并结合稀疏编码空间金字塔模型将视觉特征编码为视觉词汇的联合分布来完成场景分类。该方法选择图像中更加有效的上下文特征，使得分类效果有明显提高。
     最后，为了进一步挖掘图像中不同方面的视觉属性，在自适应上下文特征基础上引入局部自相似描述子，构造一种联合互补特征的描述方法，采用具体类词汇生成方式与稀疏编码形成词包描述，进而利用偏最小二乘原理构建一种区分空间金字塔表示来完成场景分类。该方法能够增强词包描述的适应性与区分力，在实验中具有较好的分类性能，对复杂的室内场景效果更好。
The process of scene image classification is that how to make computer systems toclassify the image sets automatically which contain semantic information, according to thevisual perception mechanism of human. Scene classification has become an activeresearch topic in the computer vision area, which provides important environmental cluesfor object recognition and other computer vision tasks. Be similar to the words in text data,modeling scene images with visual vocabulary could form a middle representation, whichdescribes the semantic information of scene images effectively. Based on the bag-of-wordsmodel of scene images, we focus on the feature extraction, visual words formation andvisual words representation to do the following research:
     First of all, the traditional visual words are formed by local features independently,and consider nothing about the relations among of features. To overcome this defect, wepropose a kind of local visual features that include multi-direction context information,and use the category-specific strategy to form the visual vocabulary, after that the spatialpyramid model is combined to accomplish the scene classification. According to differentscene categories, this method combines the feature similarity and contextual relationtogether. The experiments show that this method performs better than the existed methods.
     Secondly, since the context relations play an important role in the featurerepresentation of images, we do some further research about the effectiveness of contextinformation. Utilizing the flatness information of image regions, a new feature extractionmethod is proposed to form adaptive context features in an unsupervised manner, and thevisual words are formed by the specific categories of images. After that the integrateddistribution vectors of visual words are computed by applying the spatial pyramid modelwhich based on sparse coding, and the scene classification is accomplished by thosedistribution vectors. This method chooses the effective context features and theexperiment results show that it could achieve a higher accuracy obviously.
     Finally, in order to extract the further visual property of scene images in differentaspects, we introduce a local self-similarity descriptor to describe scene images on the basis of adaptive context features. Then an image representation method is proposed bycombining the two complementary features, and the visual words model is obtained byusing the category-specific strategy and sparse coding, after that a discriminative spatialpyramid representation is exploited by applying partial least squares theory to accomplishthe scene classification. This method could make the bag-of words model more flexibleand discriminative. The experiment results show that this method could achieve a higheraccuracy, especially perform well in complicated indoor images.

引文

[1]高隽,谢昭.图像理解理论与方法[M].北京:科学出版社,2009:399-402.
    [2]戎怡.局部描述特征结合概率潜在语义模型的场景分类技术研究[D].秦皇岛:燕山大学信号与信息处理学科硕士学位论文,2010:1-15.
    [3] Vailaya A, Figueiredo A, Jain A, et al. Image classification for content-based indexing[J]. IEEETransactions on Image Processing,2001,10(1):117-130.
    [4] Chang E, Goh K, Sychay G, et al. CBSA: Content-based soft annotation for multimodal imageretrieval using bayes point machines[J]. IEEE Transactions on Circuits and Systems for VideoTechnology,2003,13(1):26-38.
    [5] Serrano N, Savakis A, Luo J. Improved scene classification using efficient low-level features andsemantic cues[J]. Pattern Recognition,2004,37(9):1773-1784.
    [6] Benmokhtar R, Huet B, Berrani S. Low-level feature fusion models for soccer sceneclassification[C]. Proceedings of IEEE International Conference on Multimedia and Expo,Hannover, Germany,2008:1329-1332.
    [7] Baharudin B, Qahwaji R, Jiang J, et al. Combining image features for image classification[C].International Conference on Intelligent and Advanced Systems, Kuala Lumpur, Malaysia,2007:268-272.
    [8] Oliva A, Torralba A. Modeling the shape of the scenel a holistic representation of the spatialenvelope[J]. International Journal of Computer Vision,2001,42(3):145-175.
    [9] Siagian C, Itti L. Rapid Biologically-inspired scene classification using features shared withvisual attention[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(2):300-312.
    [10] Song Dong-jin, Tao Da-cheng. Biologically inspired feature manifold for scene classification[J].IEEE Transactions on Image Processing,2010,19(1):174-184.
    [11] Fan J, Gao Y, Luo H, et al. Statistical modeling and conceptualization of natural images[J]. PatternRecognition,2005,38(6):865-885.
    [12] Fredembach C, Schrfder M, Sfisstrunk S. Eigenregions for image classification[J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2004,26(12):1645-1649.
    [13] Aksoy S, Koperski K, Tusk C, et al. Learning Bayesian classifiers for scene classification with avisual grammar[J]. IEEE Transactions on Geoscience and Remote Sensing,2004,43(3):581-589.
    [14] Boutell M R, Luo J, Brown C M. Scene parsing using region-based generative models[J]. IEEETransactions on Multimedia,2007,9(1):136-146.
    [15]程环环,王润生.融合空间上下文的自然场景语义建模[J].电路与系统学报,2010,15(6):39-46.
    [16] Quelhas P, Monay F, Odobez J, et al. Modeling scenes with local descriptors and latent aspects[C].Proceedings of the IEEE International Conference on Computer Vision, Beijing, China,2005:883-890.
    [17] Bosch A, Mu oz X, Martí R. Which is the best way to organize/classify images by content?[J].Image and Vision Computing,2007,25(6):778-791.
    [18]胡正平,戎怡.基于EILBP视觉描述子结合PLSA的场景分类算法[J].光电工程,2010,37(11):128-134.
    [19]胡正平,戎怡.基于EICS-LBP与统计边缘主色对的场景分类算法[J].系统工程与电子技术,2011,33(4):919-924.
    [20] Perronin F, Dance C, Csurka G, et al. Adapted vocabularies for generic visual categorization[C].Proceedings of the9th European Conference on Computer Vision, Graz, Austria,2006,(4):464-475.
    [21] Liu Jin-gen, Shah M. Scene modeling using co-clustering[C]. Proceedings of the IEEEInternational Conference on Computer Vision, Rio de Janeiro, Brazil,2007:1-7.
    [22] Wu Jian-xin, Rehg J M. Beyond the Euclidean distance: Creating effective visual codebooks usingthe histogram intersection kernel[C]. Proceedings of the IEEE International Conference onComputer Vision, Kyoto, Japan,2009:630-637.
    [23] Li Fei-fei, Perona P. A bayesian hierarchical model for learning natural scene categories[C].Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,San Diego, CA, United states,2005:524-531.
    [24] Bosch A, Zisserman A, Mu oz X. Scene classification using a hybrid generative/discriminativeapproach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,30(4):712-727.
    [25] Zheng Ying-bin, Lu Hong, Jin Cheng, et al. Incorporating spatial correlogram into bag-of-featuresmodel for scene categorization[C].9th Asian Conference on Computer Vision,2010,5994LNCS(part1):333-34.
    [26] Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching forrecognizing natural scene categories[C]. Proceedings of IEEE Computer Society Conference onComputer Vision and Pattern Recognition, New York, NY, United states,2006:2169-2178.
    [27] Lades M, Vorbrueggen J C, Buhmann J, et al. Distortion invariant object recognition in thedynamic link architecture[J]. IEEE Transactions on Computers,1993,42(3):300-311.
    [28] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal ofComputer Vision,2004,60(2):91-110.
    [29]陈方.基于局部不变特征的实时精确景象匹配算法研究[D].南京:南京航空航天大学精密仪器及机械学科硕士学位论文,2010:38-40.
    [30] Leibe B, Leonardis A, Schiele B. Robust object detection with interleaved categorization andsegmentation[J]. International Journal of Computer Vision,2008,77(1-3):259-289.
    [31] Jurie F, Triggs B. Creating efficient codebooks for visual recognition[C]. Proceedings of the IEEEInternational Conference on Computer Vision, Beijing, China,2005:604-610.
    [32] MacKay D. Information Theory, Inferenceand Learning Algorithms[M]. Cambridge UniversityPress,2005:318-326.
    [33]贾世杰,孔祥维,付海燕,等.基于互补特征和类描述的商品图像自动分类[J].电子与信息学报,2010,32(10):2294-2300.
    [34] Torralba A. Contextual Priming for object detection[J]. International Journal of Computer Vision,2003,53(2):169-191.
    [35] Jiang Jia-yan, Tu Zhuo-wen. Efficient scale space auto-context for image segmentation andlabeling[C]. IEEE Computer Society Conference on Computer Vision and Pattern RecognitionWorkshops, Miami, FL, United states,2009:1810-1817.
    [36] Kumar S, Hebert M. Discriminative random fields: a discriminative framework for contextualinteraction in classification[C]. Proceedings of the IEEE International Conference on ComputerVision, Nice, France, United states,2003,2:1150-1157.
    [37] He Xu-ming, Zemel R S, Carreira-Perpinan M A. Multiscale conditional random fields for imagelabeling[C]. Proceedings of the IEEE Computer Society Conference on Computer Vision andPattern Recognition, Washington, DC,2004,2:695-702.
    [38] Heitz G, Koller D. Learning spatial context: Using stuff to find things[C].10th EuropeanConference on Computer Vision, Marseille, France,2008:30-43.
    [39] Li Li-jia, Soeher R, Li Fei-fei. Towards total scene understanding:classification，annotation andsegmentation in an automatic framework[C]. IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops,2009:2036-2043.
    [40] Vogel J, Sehiele B. A semantic typicality measure for natural scene categorization[J]. PatternRecognition,2004,3175:195-203.
    [41] Zhou Wen-gang, Qi Tian, Lu Yi-juan, et al. Latent visual context learning for web imageapplications[J]. Pattern Recognition,2011,44(10-11):2263-2273.
    [42] Qin Jian-zhao, Yung N. Scene categorization with multi-scale category-specific visual words[C].Proceedings of SPIE-The International Society for Optical Engineering, San Jose, CA, Unitedstates,2009,7252:1-10.
    [43] Qin Jian-zhao, Yung N. Scene categorization via contextual visual words [J]. Pattern Recognition,2010,43(5):1874-1888.
    [44] Lazebnik S, Raginsky M. Supervised learning of quantizer codebook by information lossminimization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(7):1294-1309.
    [45] VanGemert J C, Veenman C J, Smeulders A W M, et al. Visual word ambiguity[J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2010,32(7):1271-1283.
    [46] Yang Jian-chao, Yu Kai, Gong Yi-hong, et al. Linear spatial pyramid matching using sparse codingfor image classification[C]. Proceedings of IEEE Computer Society Conference on ComputerVision and Pattern Recognition, Miami, FL, United states,2009:1794-1801.
    [47] Hotta K. Local autocorrelation of similarities with subspaces for shift invariant sceneclassification[J]. Pattern Recognition,2011,44(4):794-799
    [48]江悦,王润生,王程.采用上下文金字塔特征的场景分类[J].计算机辅助设计与图形学学报,2010,22(8):1366-1373.
    [49]刘硕研,须德,冯松鹤,等.一种基于上下文语义信息的图像块视觉单词生成算法[J].电子学报,2010,38(5):1156-1161.
    [50] Gu Guang-hua, Zhao Yao, Zhu Zhen-feng. Integrated image representation based natural sceneclassification[J]. Expert Systems with Applications,2011,38(9):11273-11279.
    [51] Boureau Y L, Bach F, LeCun Y, et al. Learning mid-level features for recognition[C]. Proceedingsof IEEE Computer Society Conference on Computer Vision and Pattern Recognition, SanFrancisco, CA, United states,2010:2559-2566.
    [52] Hu Ming-Kuei. Visual pattern recognition by moment invariants[J]. IRE Trans. InformationTheory,1962,8:179-187.
    [53] Ramchandran K, Vetterli M, Herley C. Wavelets, subband coding, and best bases[J]. Proceedingsof the IEEE,1996,84(4):541-560.
    [54] Ojala T, Pietikainen M, Harwood D. A comparative study of texture measures with classificationbased on feature distributions[J]. Pattern Recognition,1996,29(1):51-59.
    [55] Perina A, Cristani M, Murino V. Learning natural scene categories by selective multi-scale featureextraction[J]. Image and Vision Computing,2010,28(6):927-939.
    [56]江悦,王润生.基于多特征扩展pLSA模型的场景图像分类[J].信号处理,2010,26(4):539-544.
    [57]孙显,付琨,王宏琦.基于空间语义对象混合学习的复杂图像场景自动分类方法研究[J].电子与信息学报,2011,33(2):347-354.
    [58] Shechtman E, Irani M. Matching local self-similarities across images and videos[C]. Proceedingsof IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, Unitedstates,2007:1-8.
    [59] Harada T, Ushiku Y, Yamashita Y, et al. Discriminative spatial pyramid[C]. Proceedings of IEEEConference on Computer Vision and Pattern Recognition, Colorado Springs, CO, United states,2011:1617-1624.
    [60] Wu Jian-xi, Rehg J M. CENTRIST: A visual descriptor for scene categorization[J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2011,33(8):1489-1501.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700