基于稀疏自动编码机的场景识别算法

英文篇名：Scene Recognition Algorithm Based on Sparse Autoencoder
作者：谢林 ; 李菲菲 ; 陈虬
英文作者：XIE Lin;LI Feifei;CHEN Qiu;School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology;
关键词：场景识别 ; 稀疏自动编码机 ; 空间金字塔池化 ; 局部归一化 ; HOG特征 ; SVM
英文关键词：scene recognition;;sparse autoencoder;;spatial pyramid pooling;;local normalization;;HOG;;SVM
中文刊名：DZKK
英文刊名：Electronic Science and Technology
机构：上海理工大学光电信息与计算机工程学院;
出版日期：2019-01-15
出版单位：电子科技
年：2019
期：v.32;No.352
基金：上海市高校特聘教授(东方学者)岗位计划(ES2012XX,ES2014XX)~~
语种：中文;
页：DZKK201901009
页数：5
CN：01
ISSN：61-1291/TN
分类号：42-45+55

摘要

针对场景识别中低级特征与高级概念之间的语义鸿沟问题,提出了一种基于稀疏自动编码机的场景识别方法。采用了稀疏自动编码机和空间金字塔池化相结合的特征编码技术。首先对场景图像提取局部的HOG特征,然后利用改进的稀疏自动编码机对HOG特征进行编码,得到稀疏特征,通过空间金字塔池化和局部归一化得到整张场景图像的表示,最后利用线性SVM实现分类。在标准的场景图像数据集Scene-15上进行的实验表明,该算法可以将识别的准确率提升至81. 97%。
To narrow the gap between low-level features and high-level concepts in scene recognition,a new algorithm based on the sparse autoencoder was proposed. This algorithm adopted the feature encoding technique that combined the sparse autoencoder and spatial pyramid pooling. First of all,the local HOG descriptors were extracted from scene images,then they were encoded into sparse features by the modified sparse autoencoder. After spatial pyramid pooling and local normalization on these sparse features,the image representation can be obtained. Finally,linear SVM was utilized to implement scene recognition. The experimental results on Scene-15 dataset indicated that the recognition accuracy of this algorithm can be increased up to 81. 97%.

引文

[1] G Lowe D. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision,2004,60(2):91-110.
    [2] Bay H,Tuytelaars T,Van Gool L. Surf:speeded up robust features[C]. Graz:European Conference on Computer Vision,2006.
    [3] Ojala T,Pietikainen M,Maenpaa T. Multiresolution grayscale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(7):971-987.
    [4] Sivic J,Zisserman A. Video google:a text retrieval approach to object matching in videos[C]. Nice:IEEE International Conference on Computer Vision,2003.
    [5] Wang J,Yang J,Yu K,et al. Locality-constrained linear coding for image classification[C]. San Francisco:IEEE International Conference on Computer Vision and Pattern Recognition,2010.
    [6] Gao S,Tsang I,Chia L,et al. Local features are not lonelyLaplacian sparse coding for image classification[C]. San Francisco:IEEE International Conference on Computer Vision and Pattern Recognition,2010.
    [7] Dalal N,Triggs B. Histograms of oriented gradients for human detection[C]. San Diego:IEEE International Conference on Computer Vision and Pattern Recognition,2005.
    [8] Japkowicz N,Hanson S,Gluck M. Nonlinear autoassociation is not equivalent to PCA[J]. Neural Computation,2000,12(3):531-545.
    [9] Glorot X,Bordes A,Bengio Y. Deep sparse rectifier neural networks[C]. Ft. Lauderdale:International Conference on Artificial Intelligence and Statistics,2011.
    [10] Li F,Perona P. A Bayesian hierarchical model for learning natural scene categories[C]. San Diego:IEEE International Conference on Computer Vision and Pattern Recognition,2005.
    [11] Lazebnik S,Schmid C,Ponce J. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories[C]. New York:IEEE International Conference on Computer Vision and Pattern Recognition,2006.
    [12] Fan R,Chang K,Hsieh C,et al. Liblinear:a library for large linear classification[J]. Journal of Machine Learning Research,2008(9):1871-1874.
    [13] Parizi S,Oberlin J,Felzenszwalb P. Reconfigurable models for scene recognition[C]. Providence:IEEE International Conference on Computer Vision and Pattern Recognition,2012.
    [14] Yang J,Yu K,Gong Y,et al. Linear spatial pyramid matching using sparse coding for image classification[C]. Miami:IEEE International Conference on Computer Vision and Pattern Recognition,2009.
    [15] Li L,Su H,Xing E,et al. Object bank:a high-level image representation for scene classification&semantic feature sparsification[C]. Hyatt Regency:International Conference on Neural Information Processing Systems,2010.
    [16] Harada T,Ushiku Y,Yamashita Y,et al. Discriminative spatial pyramid[C]. Colorado Springs:IEEE International Conference on Computer Vision and Pattern Recognition,2011.