Geometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of Neurons
详细信息    查看全文
  • 关键词:Image classification ; Convolutional Neural Networks ; Spatial co ; occurrence of neurons ; Geometric Neural Phrase Pooling
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9905
  • 期:1
  • 页码:645-661
  • 全文大小:1,281 KB
  • 参考文献:1.Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of Computer Vision and Pattern Recognition (2009)
    2.Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems (2012)
    3.Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis., 1–42 (2015)
    4.Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: CAFFE: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (2014)
    5.Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: International Conference on Machine Learning (2014)
    6.Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of Computer Vision and Pattern Recognition (2014)
    7.Xie, L., Hong, R., Zhang, B., Tian, Q.: Image classification and retrieval are ONE. In: International Conference on Multimedia Retrieval (2015)
    8.Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of Computer Vision and Pattern Recognition (2014)
    9.Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (2015)
    10.Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: European Conference on Computer Vision Workshop on Statistical Learning in Computer Vision, vol. 1, No. 22, pp. 1–2 (2004)
    11.Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef
    12.Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of Computer Vision and Pattern Recognition, pp. 886–893 (2005)
    13.Xie, L., Wang, J., Lin, W., Zhang, B., Tian, Q.: RIDE: reversal invariant descriptor enhancement. In: International Conference on Computer Vision (2015)
    14.Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of Computer Vision and Pattern Recognition, pp. 1794–1801 (2009)
    15.Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Proceedings of Computer Vision and Pattern Recognition (2010)
    16.Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-15561-1_​11 CrossRef
    17.Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-15555-0_​11 CrossRef
    18.Kobayashi, T.: Dirichlet-based histogram feature transform for image classification. In: Proceedings of Computer Vision and Pattern Recognition (2014)
    19.Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of Computer Vision and Pattern Recognition (2006)
    20.Feng, J., Ni, B., Tian, Q., Yan, S.: Geometric LP-norm feature pooling for image classification. In: Proceedings of Computer Vision and Pattern Recognition (2011)
    21.Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATH
    22.Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: Proceedings of Computer Vision and Pattern Recognition (2008)
    23.Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: from visual words to visual phrases. In: Proceedings of Computer Vision and Pattern Recognition (2007)
    24.Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: Proceedings of Computer Vision and Pattern Recognition (2011)
    25.Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In: Proceedings of ACM Multimedia (2009)
    26.Xie, L., Tian, Q., Wang, M., Zhang, B.: Spatial pooling of heterogeneous features for image classification. IEEE Trans. Image Process. 23(5), 1994–2008 (2014)MathSciNet CrossRef
    27.Jiang, Y., Meng, J., Yuan, J.: Randomized visual phrases for object search. In: Proceedings of Computer Vision and Pattern Recognition (2012)
    28.Xie, L., Tian, Q., Hong, R., Yan, S., Zhang, B.: Hierarchical part matching for fine-grained visual categorization. In: IEEE International Conference on Computer Vision (2013)
    29.LeCun, Y., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Handwritten digit recognition with a back-propagation network. In: Proceedings of Advances in Neural Information Processing Systems (1990)
    30.Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
    31.Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:​1207.​0580 (2012)
    32.Xie, L., Wang, J., Wei, Z., Wang, M., Tian, Q.: DisturbLabel: regularizing CNN on the loss layer. In: Proceedings of Computer Vision and Patter Recognition (2016)
    33.Simonyan, K., Zisserman, A.: Very Deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
    34.Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of Computer Vision and Pattern Recognition (2015)
    35.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:​1512.​03385 (2015)
    36.Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: PANDA: pose aligned networks for deep attribute modeling. In: Proceedings of Computer Vision and Pattern Recognition (2014)
    37.Xie, L., Zheng, L., Wang, J., Yuille, A., Tian, Q.: InterActive: inter-layer activeness propagation. In: Proceedings of Computer Vision and Patter Recognition (2016)
    38.Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)
    39.Wang, J., Zhang, Z., Premachandran, V., Yuille, A.: Discovering internal representations from object-CNNs using population encoding. arXiv preprint arXiv:​1511.​06855 (2015)
    40.He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-10578-9_​23
    41.Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems (2015)
    42.LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based Learning Applied to Document Recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
    43.Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
    44.Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: International Conference on Pattern Recognition (2012)
    45.Zeiler, M., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. In: International Conference on Learning Representations (2013)
    46.Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: International Conference on Machine Learning (2013)
    47.Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
    48.Torralba, A., Fergus, R., Freeman, W.: 80 Million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)CrossRef
    49.Nagadomi: the kaggle CIFAR10 network (2014). https://​github.​com/​nagadomi/​kaggle-cifar10-torch7/​
    50.Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:​1605.​07146 (2016)
    51.Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations (2014)
    52.Lee, C., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: International Conference on Artificial Intelligence and Statistics (2015)
    53.Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: Proceedings of Computer Vision and Pattern Recognition (2015)
    54.Lee, C., Gallagher, P., Tu, Z.: Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree. In: International Conference on Artificial Intelligence and Statistics (2016)
    55.Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report: CNS-TR-2007-001, Caltech (2007)
    56.Srivastava, R., Greff, K., Schmidhuber, J.: Training very deep networks. In: Proceedings of Advances in Neural Information Processing Systems (2015)
  • 作者单位:Lingxi Xie (17)
    Qi Tian (18)
    John Flynn (19)
    Jingdong Wang (20)
    Alan Yuille (17)

    17. Center for Imaging Science, The Johns Hopkins University, Baltimore, MD, USA
    18. Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, USA
    19. Department of Statistics, University of California, Los Angeles, CA, USA
    20. Microsoft Research, Beijing, China
  • 丛书名:Computer Vision ¨C ECCV 2016
  • ISBN:978-3-319-46448-0
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9905
文摘
Deep Convolutional Neural Networks (CNNs) are playing important roles in state-of-the-art visual recognition. This paper focuses on modeling the spatial co-occurrence of neuron responses, which is less studied in the previous work. For this, we consider the neurons in the hidden layer as neural words, and construct a set of geometric neural phrases on top of them. The idea that grouping neural words into neural phrases is borrowed from the Bag-of-Visual-Words (BoVW) model. Next, the Geometric Neural Phrase Pooling (GNPP) algorithm is proposed to efficiently encode these neural phrases. GNPP acts as a new type of hidden layer, which punishes the isolated neuron responses after convolution, and can be inserted into a CNN model with little extra computational overhead. Experimental results show that GNPP produces significant and consistent accuracy gain in image classification.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700