摘要
针对场景识别中低级特征与高级概念之间的语义鸿沟问题,提出了一种基于稀疏自动编码机的场景识别方法。采用了稀疏自动编码机和空间金字塔池化相结合的特征编码技术。首先对场景图像提取局部的HOG特征,然后利用改进的稀疏自动编码机对HOG特征进行编码,得到稀疏特征,通过空间金字塔池化和局部归一化得到整张场景图像的表示,最后利用线性SVM实现分类。在标准的场景图像数据集Scene-15上进行的实验表明,该算法可以将识别的准确率提升至81. 97%。
To narrow the gap between low-level features and high-level concepts in scene recognition,a new algorithm based on the sparse autoencoder was proposed. This algorithm adopted the feature encoding technique that combined the sparse autoencoder and spatial pyramid pooling. First of all,the local HOG descriptors were extracted from scene images,then they were encoded into sparse features by the modified sparse autoencoder. After spatial pyramid pooling and local normalization on these sparse features,the image representation can be obtained. Finally,linear SVM was utilized to implement scene recognition. The experimental results on Scene-15 dataset indicated that the recognition accuracy of this algorithm can be increased up to 81. 97%.
引文
[1] G Lowe D. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision,2004,60(2):91-110.
[2] Bay H,Tuytelaars T,Van Gool L. Surf:speeded up robust features[C]. Graz:European Conference on Computer Vision,2006.
[3] Ojala T,Pietikainen M,Maenpaa T. Multiresolution grayscale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(7):971-987.
[4] Sivic J,Zisserman A. Video google:a text retrieval approach to object matching in videos[C]. Nice:IEEE International Conference on Computer Vision,2003.
[5] Wang J,Yang J,Yu K,et al. Locality-constrained linear coding for image classification[C]. San Francisco:IEEE International Conference on Computer Vision and Pattern Recognition,2010.
[6] Gao S,Tsang I,Chia L,et al. Local features are not lonelyLaplacian sparse coding for image classification[C]. San Francisco:IEEE International Conference on Computer Vision and Pattern Recognition,2010.
[7] Dalal N,Triggs B. Histograms of oriented gradients for human detection[C]. San Diego:IEEE International Conference on Computer Vision and Pattern Recognition,2005.
[8] Japkowicz N,Hanson S,Gluck M. Nonlinear autoassociation is not equivalent to PCA[J]. Neural Computation,2000,12(3):531-545.
[9] Glorot X,Bordes A,Bengio Y. Deep sparse rectifier neural networks[C]. Ft. Lauderdale:International Conference on Artificial Intelligence and Statistics,2011.
[10] Li F,Perona P. A Bayesian hierarchical model for learning natural scene categories[C]. San Diego:IEEE International Conference on Computer Vision and Pattern Recognition,2005.
[11] Lazebnik S,Schmid C,Ponce J. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories[C]. New York:IEEE International Conference on Computer Vision and Pattern Recognition,2006.
[12] Fan R,Chang K,Hsieh C,et al. Liblinear:a library for large linear classification[J]. Journal of Machine Learning Research,2008(9):1871-1874.
[13] Parizi S,Oberlin J,Felzenszwalb P. Reconfigurable models for scene recognition[C]. Providence:IEEE International Conference on Computer Vision and Pattern Recognition,2012.
[14] Yang J,Yu K,Gong Y,et al. Linear spatial pyramid matching using sparse coding for image classification[C]. Miami:IEEE International Conference on Computer Vision and Pattern Recognition,2009.
[15] Li L,Su H,Xing E,et al. Object bank:a high-level image representation for scene classification&semantic feature sparsification[C]. Hyatt Regency:International Conference on Neural Information Processing Systems,2010.
[16] Harada T,Ushiku Y,Yamashita Y,et al. Discriminative spatial pyramid[C]. Colorado Springs:IEEE International Conference on Computer Vision and Pattern Recognition,2011.