面向图像语义描述的场景分类研究

英文题名：Image Semantic Representation Based Scene Classification Research
作者：顾广华
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：场景分类 ; 特征融合 ; 超像素网格 ; 空间金字塔 ; 上下文信息 ; 特征映射
英文关键词：Scene Classification ; Feature Fusion ; Superpixel Lattice ; Spatial
英文关键词：Pyramid ; Contextual Information ; Feature Mapping
学位年度：2013
导师：赵耀
学科代码：081002
学位授予单位：北京交通大学
论文提交日期：2012-12-01

摘要

如何让计算机按照人类理解的方式对海量图像数据进行高效地分类与管理,成为了图像理解领域中一个亟待解决的问题。场景分析与理解为图像的语义分类提供了可能,场景分类被明确认定为是图像语义分类中的一个关键课题。本文主要成果有：
     (1)提出一种基于局部熵加权特征融合的场景分类方法。鉴于不同的特征描述子适合描述不同类型的场景图像,本文针对两种局部特征描述子进行特征融合以增加场景图像特征描述的区分力。首先,通过计算图像的局部熵定量分析场景图像的复杂度,据此定义平坦度,并通过叠加场景类内每幅图像的平坦度获得该场景类的平坦度；其次,提取两种分别适用于描述区域平滑和区域变化的局部特征描述子,并分别进行图像直方图描述；然后,利用场景类图像的平坦度计算两种局部特征的权系数,并对两种基于独立的局部描述子形成的图像直方图描述加权融合,获得场景类图像的最佳描述；最后训练概率生成模型,完成场景分类任务。实验结果表明,该方法对于不同类型的图像特征描述具有一定的普适性。
     (2)提出一种基于超像素网格空间金字塔图像描述的场景分类方法。鉴于传统的词包模型图像描述方法忽略空间信息的缺点,本文采用上下文特征和空间金字塔图像描述来加入图像的空间信息。首先,构建多尺度上下文特征使其能够保证特征描述时加入局部空间结构信息；其次,对图像进行超像素网格分块,网格的分辨率由金字塔层数决定；然后,对各层次上超像素网格分块得到的各个图像子块依据视觉词典生成图像直方图描述,并按照一定的权重组合在一起形成整幅图像的直方图描述；最后,训练分类器,完成场景分类任务。本文采用的超像素网格分块,避免了图像中对象的强制分割,从而保证了子区域内对象语义的一致性。实验结果验证了场景分类过程中上下文信息和超像素网格分块的优越性。
     (3)提出一种基于局部约束线性编码特征映射方式的场景分类方法。提取图像的视觉特征并聚类生成视觉码本以后,依据码本进行视觉特征映射形成图像描述。本文提出一种基于最大求和合并法的局部约束线性编码方式特征映射方法,将前t个概率最大的码字进行线性加权取平均作为特征映射编码结果,并分析讨论t的取值对于场景分类性能的影响,并讨论了不同的码本长度与场景分类性能之间的关系。实验证明,该方法提高了特征码字之间的相关性和特征映射的鲁棒性,取得了较好的场景分类性能。
How to classify and manage the vast amount of image data using the computer by the way of human understanding becomes an urgent problem in the image under-standing area. Scene analysis and understanding make the image semantic classification possible. The scene classification is clearly identified as a key issue in the image se-mantic classification. This thesis performs the middle-level semantic image represent-tations based on the visual image features, establishes the middle-level semantic concept of image and models it to make up the semantic gap between low-level features and high-level semantics. This thesis achieves the following research results:
     (1) This thesis proposes a scene classification algorithm based on weighted feature fusion by local entropy. Because the different feature descriptors fit the different scene images, this thesis fuses the two local feature descriptors to strengthen the discri-mination of scene image feature descriptions. Firstly, this thesis analyses the complexity of the scene image by its local entropy quantitatively, and defines the flatness of image. The flatness of each scene category is further to calculated by adding the flatness of each image in this scene category. Secondly, two local feature descriptors are extracted by describing the smooth image and change image, and the image histogram repre-sentation is constructed. Thirdly, the weighted coefficients are obtained by the flatness of scene category. The optimal image representation is obtained by the weighted fusion on the two image histogram representations. Finally, the generative model is trained to perform the scene classification. Experimental results show that this method has some universality on the different image feature descriptions.
     (2) This thesis presents a scene classification method based on the spatial pyramid image representation by superpixel lattices. Because the traditional image representation method based on BOW (bag-of-words) model ignores the spatial information, this thesis adds it by applying the contextual features and spatial pyramid image representation. Firstly, the multi-scale contextual features are constructed to add the local spatial str-ucture information when performing feature descriptions. Secondly, this thesis applies the superpixel lattices method to segment the image, and the resolutions are determined by the pyramid layers. Thirdly, the image histogram representations of each segmented sub-block region, from superpixel lattices on each level, are formed based on the visual dictionary. These partial sub-representations are weighted to form the whole histogram representation of this image. Finally, the classifier is trained to complete scene image classification. The superpixel lattice based segmentation method avoids the compulsory segmentations of the objects in the image. It ensures the semantic consistency of objects in sub-region. Experimental results demonstrate the superiority of the contextual infor-mation and superpixel lattices segmentation in the scene classification task.
     (3) This thesis proposes a scene classification method based on feature mapping by locality-constrained linear coding. We extract the visual features of images and generate a visual codebook by clustering, then run feature mapping depending on the visual codebook to form image representation. The feature mapping method, in this thesis, belongs to the way of locality-constrained linear coding based on sum-max pooling. We find out the codewords with the first tth maximum probability and weight them, then take the average weighted values as the feature mapping coding result. This thesis discusses the performance of scene classification related to the value of t and the length of codebook. Experiments prove that the proposed method improves the correlation of codewords and the robustness of feature mapping, and achieves good performance of scene classification.

引文

[1]刘硕研.面向感知的图像场景及情感分类算法研究.北京交通大学博士学位论文,2011：1-14
    [2]唐颖军.基于语义主题模型的图像场景分类研究.北京交通大学博士学位论文,2010：1-16
    [3]R. Datta, D. Joshi, J. Li, et al. Image retrieval:ideas, influences, and trends of the new age. ACM Computing Surveys,2008,40(2):1-60
    [4]R. Datta. Semantics and aesthetics inference for image search:statistical learning approaches. The Pennsylvania State University, PhD dissertation,2009:1-25
    [5]高隽,谢昭.图像理解理论与方法.科学出版社,2009：399-430
    [6]解文杰.基于中层语义表示的图像场景分类研究.北京交通大学博士学位论文,2011：1-16
    [7]Lijia Li, Feifei Li. What, where and who? Classifying events by scene and object recognition. In Proc. of IEEE International Conference on Computer Vision (ICCV),2007, 11:1-8
    [8]V. Yanulevskaya, J.C. Gemert, K. Roth. Emotional Valence Categorization Using Holistic Image Features. In Proc. of International Conference on Image Processing (ICIP), 2008:101-104
    [9]A. Oliva, A. Torralba. Modeling the shape of the scene:a holistic representation of the spatial envelope. International Journal of Computer Vision,2001,42(3):145-175
    [10]A. Bosch, A. Zisserman, X. Munoz. Scene Classification via pLSA. In Proc. of European Conference on Computer Vision (ECCV),2006:517-530
    [11]F. F. Li, P. Perona. A Bayesian hierarchical model for learning natural scene categories. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005:524-531
    [12]Anna Bosch. Image Classification for Large Number of Object Categories. PhD Dissertation. University of Girona,2007:1-30
    [13]Pedro Quelhas, Florent Monay. Modeling scenes with local descriptors and latent aspects. In Proc. of IEEE International Conference on Computer Vision (ICCV),2005,1:883-890
    [14]A. Vailaya, M. Figueiredo, A. Jain, et al. Content-based hierarchical classification of vacation images. In Proc. of IEEE International Conference on Multimedia Computing and Systems (ICMCS),1999:518-523
    [15]A. Vailaya, M. Figueiredo, A. Jain, et al. Image classification for content-based indexing. IEEE Trans. On Image Processing,2001,10(1):117-130
    [16]M. Szummer, R. W. Picard. Indoor-outdoor image classification. In Proc. of IEEE Workshop on Content-based Access of Image and Video Databases,1998:42-51
    [17]J. Shen, J. Sheperd, A. H. Ngu. Semantic-Sensitive Classification for Large Image Libraries. In Proc. of Int. Conf. on Multimedia Modeling,2005:340-345
    [18]J. Luo, A. E. Savakisa, A. Singhal. A Bayesian network-based framework for semantic image understanding. Pattern Recognition,2005,38(6):919-934
    [19]J. Fan, Y. Gao, H. Luo. Satistical modeling and conceptualization of natural images. Pattern Recognition,2005,38(6):865-885
    [20]J. Vogel, B. Schiele. Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision,2007,72(2):133-157
    [21]G. Csurka, C. R. Bray, L. X. Fan, et al. Visual categorization with bags of keypoints. In Proc. of European Conference on Computer Vision (ECCV),2004:1-16
    [22]F. Jurie, B. Triggs. Creating efficient codebooks for visual recognition. In Proc. of IEEE International Conference on Computer Vision (ICCV),2005:604-610
    [23]H. Nakayama, T. Harada, Y. Kuniyoshi. Global Gaussian approach for scene categorization using information geometry. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2010:2336-2343
    [24]F. Perronnin. Universal and Adapted Vocabularies for Generic Visual Categorization. IEEE Trans. On Pattern Analysis and Machine Intelligence,2008,30(7):1243-1256
    [25]T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning,2001,42(1):177-196
    [26]D. M. Blei, A. Y. Ng, M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research,2003, (3):993-1022
    [27]X. Yang, D. Xu, S. H. Feng. Scene Categorization with Classified Codebook Model. IEICE Transactions on Information and Systems,2011,94D(6):1349-1352
    [28]J. Qin, N. H. C. Yung. Scene categorization via contextual visual words. Pattern Recognition,2010,43(5):1874-1888
    [29]程环环,王润生.融合空间上下文的自然场景语义建模.电路与系统学报,2010,15(6)：39-46
    [30]Y. Jiang, J. Chen, R. S. Wang. Fusing local and global information for scene classification. Optical Engineering,2010,49(4):047001:1-10
    [31]J. Qin, N. H. C. Yung. Feature fusion within local region using localized maximum-margin learning for scene categorization. Pattern Recognition,2012,45(4):1671-1683
    [32]J. Yang, K. Yu, Y. Gong, et al. Linear spatial pyramid matching using sparse coding for image classification. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2009:1794-1801
    [33]J. J. Wang, J. C. Yang, K. Yu. Locality-constrained Linear Coding for image classification. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010:3360-3367
    [34]孙显,付琨,王宏琦.基于空间语义对象混合学习的复杂图像场景自动分类方法研究.电子与信息学报,2011,33(2)：347-354
    [35]X. L. Meng, Z. Z. Wang, L. Z. Wu. Building global image features for scene recognition. Pattern Recognition,2012,45(1):373-380
    [36]K. Grauman, T. Darrell. The Pyramid Match Kernel:Discriminative Classification with Sets of Image Features. In Proc. of IEEE International Conference on Computer Vision (ICCV), 2005:1458-1465
    [37]S. Lazebnik, C. Schmid, J. Ponce. Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2006:2169-2178
    [38]J. X. Wu. A Fast Dual Method for HIK SVM Learning. In Proc. of European Conference on Computer Vision (ECCV),2010:552-565
    [39]A. Bosch, A. Zisserman, X. Munoz. Scene classification using a hybrid generative/ discriminative approach. IEEE Trans. Pattern Analysis and Machine Intelligence,2008, 30(4):712-727
    [40]J. Vogel, B. Schiele. Natural Scene Retrieval Based on a Semantic Modeling Step. In Proc. of International Conference on Image and Video Retrieval (CTVR),2004:207-215
    [41]J. Sivic, B. C. Russell, A. A. Efros, et al. Discovering objects and their location in images. 2005, In Proc. of IEEE International Conference on Computer Vision (ICCV):370-377
    [42]S. Thorpe, D. Fize, C. Marlot. Speed of processing in the human visual system. Nature, 1996,381:520-522
    [43]F.F. Li, R. VanRullen, C. Koch, et al. Natural scene categorization in the near absence of attention. In Proc. of the National Academy of Sciences,2002,99(14):9596-9601
    [44]D. Larlus, F. Jurie. Latent mixture vocabularies for object categorization and segmentation. Journal of Image and Vision Computing,2009,27(5):523-534
    [45]P. Quelhas, F. Monay. Thousand Words in a Scene. IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(9):1575-1589
    [46]David G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision,2004,60(2):91-110
    [47]J. Koenderink. The Structure of Images. Biological Cybernetics,1984,50(5):363-370
    [48]T. Lindeberg. Scale-space theory:A Basic Tool for Analysing Structures at Different Scales. Journal of Applied Statistics,1994,21(2):224-270
    [49]G.H. Gu, Y. Zhao, Z.F. Zhu. Spatial Distribution Descriptor based Keypoints Matching Algorithm. Optical Engineering,2011,50(9):097001-097008
    [50]K. Mikolajczyk, C. Schmid. A performance evaluation of local descriptors. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2003:257-264
    [51]K. Mikolajczyk, C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(10):1615-1630
    [52]H. Bay, A. Ess, T. Tuytelaars, et al. SURF:Speeded Up Robust Features. Computer Vision and Image Understanding,2008,110(3):346-359
    [53]L. Juan, O. Gwun. A comparison of SIFT, PCA-SIFT and SURF. International Journal of Image Processing,2009,3(4):143-152
    [54]李凤彩.基于码本模型的场景图像分类研究.燕山大学硕士学位论文,2012：29-40
    [55]J.X. Wu, J.M. Rehg. CENTRIST:A Visual Descriptor for Scene Categorization. IEEE Trans. On Pattern Analysis and Machine Intelligence,2011,33(8):1489-1501
    [56]R. Zabih, J. Woodfill. Non-parametric local transforms for computing visual correspondence. In Proc. of European Conference on Computer Vision (ECCV),1994, 2:151-158
    [57]T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Trans. on Pattern Analysis and Machine Intelligence,2002,24(7):971-987
    [58]J.X. Wu, J. M. Rehg. Beyond the Euclidean distance. Creating effective visual codebooks using the histogram intersection kernel. In Proc. of IEEE International Conference on Computer Vision (ICCV),2009:630-637
    [59]J.B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. of the Fifth Symposium on Math, Statistics and Probability,1967:281-297
    [60]D. Arthur, S. Vassilvitskii. K-means++:The Advantages of Careful Seeding. In Proc. of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms,2007:1027-1035
    [61]J.L. Gauvain, C.H. Lee. Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Trans. On Speech and Audio Processing, 1994,2(2):291-298
    [62]D. Reynolds, T. Quatieri, R. Dunn. Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing,2000,10:19-41
    [63]P. Woodland. Speaker Adaptation:Techniques and Challenges. In Proc. of IEEE Workshop Automatic Speech Recognition and Understanding,1999:85-90
    [64]A. Bosch, X. Munoz, R. Marti. A review:Which is the best way to organize/classify images by content? Image Vision Computing,2007,25(6):778-791
    [65]S. Deerwester, S. T. Dumais, G. W. Furnas, et al. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science,1990,41(6):391-407
    [66]A.P. Dempster, N. Laird, D.B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B,1977,39(1):1-38
    [67]李文波,孙乐,张大鲲.基于Labeled-LDA模型的文本分类新算法.计算机学报,2008,31(4)：620-627
    [68]C.M. Bishop. Pattern Recognition And Machine Learning. Springer-Verlag New York Inc., 2006:1-760
    [69]M.I. Jordan, Z. Ghahramani, T.S. JAAKKOLA, et al. An Introduction to Variational methods for graphical models. Machine Learning,1999,37:183-233
    [70]J.M. Winn. Variational Message Passing and its Applications. Ph.D. dissertation,2004:1-60
    [71]T. Needham. A Visual Explanation of Jensen's Inequality. American Mathematical Monthly, 1993,100(8):768-771
    [72]T. Minka. Estimating a Dirichlet distribution. Technical report, M.I.T.,2000:1-40
    [73]何坚勇.最优化方法.清华大学出版社,2007：151-200
    [74]刘康,赵军.基于“产生/判别”混合模型的分类器领域适应性问题研究.In Proc. of IEEE Chinese Conference on Pattern Recognition (CCPR,中国模式识别会议),2008：7-12
    [75]曾璞.面向语义提取的图像分类关键技术研究.国防科技大学博士学位论文,2009：33-36
    [76]M. Gorkani, R. Picard. Texture orientation for sorting photos "at a glance". In Proc. of International Conference on Pattern Recognition (ICPR),1994:459-464
    [77]J. Vogel, B. Schiele. A semantic typicality measure for natural scene categorization. In Proc. of DAGM-Symposium 2004:195-203
    [78]G.H. Gu, Y. Zhao, Z. F. Zhu. An integrative codebook for natural scene categorization. In Proc. of IEEE Conference on Intelligent Information Hiding and Multimedia Signal Processing,2009:463-466
    [79]G.H. Gu, Y. Zhao, Z.F. Zhu. Integrated Image Representation Based Natural Scene Classification. Expert Systems With Applications,2011,38(9):11273-11279
    [80]J. S. Sivic, A. Zisserman. Video google:A text retrieval approach to object matching in videos. In Proc. of IEEE International Conference on Computer Vision (ICCV),2003, 2:1470-1477
    [81]W.H. Hsu, S.F. Chang. Visual cue cluster construction via information bottleneck principle and kernel density estimation. In Proc. of ACM Conference on Image and Video Retrieval (CIVR),2005,3685:591-602
    [82]J.C. van Gemert. Visual word ambiguity. IEEE Trans. On Pattern Analysis and Machine Intelligence,2010,32(7):1271-1283
    [83]C. E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 1948:379-423
    [84]J. Winn, C.M. Bishop. Variational Message Passing. Journal of Machine Learning Research, 2005,6:661-694
    [85]Y.J. Tang, D. Xu, G.H. Gu. Category constrained learning model for scene classification. IEICE Transactions on.Information and System,2009, E92D(2):357-360
    [86]A. Quattoni, A. Torralba. Recognizing indoor scenes. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2009:413-420
    [87]L. J. Li, H. Su, E. Xing, et al. Object Bank:A High-Level Image Representation for Scene Classification and Semantic Feature Sparsification. In Proc. of the Neural Information Processing Systems (NIPS),2010:1378-1386
    [88]江悦,王润生,王程.采用上下文金字塔特征的场景分类.计算机辅助设计与图形学学报,2010,22(8)：1366-1373
    [89]高常鑫,桑农.整合局部特征和滤波器特征的空间金字塔匹配模型.电子学报,2011,39(9)：2034-2038
    [90]G. Mori. Guiding Model Search Using Segmentation. In Proc. of IEEE International Conference on Computer Vision (ICCV),2005,2:1417-1423
    [91]P. Kohli, L. Ladicky, P. Torr. Robust Higher Order Potentials for Enforcing Label Consistency. International Journal of Computer Vision,2009,82:302-324
    [92]J. Shi, J. Malik. Normalized cuts and image segmentation. IEEE Trans. On Pattern Analysis and Machine Intelligence,2000,22(8):888-905
    [93]P. Felzenszwalb, D. Huttenlocher. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision,2004,59(2):167-181
    [94]A. P. Moore, S. Prince, J. Warrell, et al. Superpixel Lattices. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2008:1-8
    [95]A. P. Moore, S. Prince, J. Warrell, et al. Scene Shape Priors for Superpixel Segmentation. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009:771-778
    [96]P. Dollar, Z.W. Tu, S. Belongie. Supervised Learning of Edges and Object Boundaries. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2006, 2:1964-1971
    [97]D.R. Martin, C.C. Fowlkes, J. Malik. Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues. IEEE Trans. On Pattern Analysis and Machine Intelligence,2004,26(5):530-549
    [98]G.H. Gu, F.C. Li, Y. Zhao, et al. Scene Classification Based on Spatial Pyramid Representation by Superpixel Lattices and Contextual Visual Features. Optical Engineering, 2012,51(1):017201:1-8
    [99]C Chang, C Lin. LIBSVM:a Library for Support Vector Machines, http://www.csie. ntu.edu.tw/-cjlin/papers/libsvm.pdf,2010:1-10
    [100]F. C Li, G. H. Gu, C. R. Wang. Scene categorization based on integrated feature description and local weighted feature mapping. Computers and Electrical Engineering,2012, 38(4):917-925
    [101]H. Lee, A. Battle, R. Raina, et al. Efficient sparse coding algorithms. Advances in Neural Information Processing Systems,2007:801-808
    [102]K. Yu, T. Zhang. Improved Local Coordinate Coding Using Local Tangents. In Proc. of the 27th International Conference on Machine Learning,2010:1215-1222

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700