Geometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of Neurons

详细信息查看全文

关键词：Image classification ; Convolutional Neural Networks ; Spatial co ; occurrence of neurons ; Geometric Neural Phrase Pooling
刊名：Lecture Notes in Computer Science
出版年：2016
出版时间：2016
年：2016
卷：9905
期：1
页码：645-661
全文大小：1,281 KB
参考文献：1.Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of Computer Vision and Pattern Recognition (2009)
2.Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems (2012)
3.Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis., 1–42 (2015)
4.Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: CAFFE: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (2014)
5.Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: International Conference on Machine Learning (2014)
6.Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of Computer Vision and Pattern Recognition (2014)
7.Xie, L., Hong, R., Zhang, B., Tian, Q.: Image classification and retrieval are ONE. In: International Conference on Multimedia Retrieval (2015)
8.Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of Computer Vision and Pattern Recognition (2014)
9.Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (2015)
10.Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: European Conference on Computer Vision Workshop on Statistical Learning in Computer Vision, vol. 1, No. 22, pp. 1–2 (2004)
11.Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef
12.Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of Computer Vision and Pattern Recognition, pp. 886–893 (2005)
13.Xie, L., Wang, J., Lin, W., Zhang, B., Tian, Q.: RIDE: reversal invariant descriptor enhancement. In: International Conference on Computer Vision (2015)
14.Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of Computer Vision and Pattern Recognition, pp. 1794–1801 (2009)
15.Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Proceedings of Computer Vision and Pattern Recognition (2010)
16.Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1_11 CrossRef
17.Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15555-0_11 CrossRef
18.Kobayashi, T.: Dirichlet-based histogram feature transform for image classification. In: Proceedings of Computer Vision and Pattern Recognition (2014)
19.Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of Computer Vision and Pattern Recognition (2006)
20.Feng, J., Ni, B., Tian, Q., Yan, S.: Geometric LP-norm feature pooling for image classification. In: Proceedings of Computer Vision and Pattern Recognition (2011)
21.Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATH
22.Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: Proceedings of Computer Vision and Pattern Recognition (2008)
23.Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: from visual words to visual phrases. In: Proceedings of Computer Vision and Pattern Recognition (2007)
24.Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: Proceedings of Computer Vision and Pattern Recognition (2011)
25.Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In: Proceedings of ACM Multimedia (2009)
26.Xie, L., Tian, Q., Wang, M., Zhang, B.: Spatial pooling of heterogeneous features for image classification. IEEE Trans. Image Process. 23(5), 1994–2008 (2014)MathSciNet CrossRef
27.Jiang, Y., Meng, J., Yuan, J.: Randomized visual phrases for object search. In: Proceedings of Computer Vision and Pattern Recognition (2012)
28.Xie, L., Tian, Q., Hong, R., Yan, S., Zhang, B.: Hierarchical part matching for fine-grained visual categorization. In: IEEE International Conference on Computer Vision (2013)
29.LeCun, Y., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Handwritten digit recognition with a back-propagation network. In: Proceedings of Advances in Neural Information Processing Systems (1990)
30.Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
31.Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
32.Xie, L., Wang, J., Wei, Z., Wang, M., Tian, Q.: DisturbLabel: regularizing CNN on the loss layer. In: Proceedings of Computer Vision and Patter Recognition (2016)
33.Simonyan, K., Zisserman, A.: Very Deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
34.Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of Computer Vision and Pattern Recognition (2015)
35.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
36.Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: PANDA: pose aligned networks for deep attribute modeling. In: Proceedings of Computer Vision and Pattern Recognition (2014)
37.Xie, L., Zheng, L., Wang, J., Yuille, A., Tian, Q.: InterActive: inter-layer activeness propagation. In: Proceedings of Computer Vision and Patter Recognition (2016)
38.Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)
39.Wang, J., Zhang, Z., Premachandran, V., Yuille, A.: Discovering internal representations from object-CNNs using population encoding. arXiv preprint arXiv:1511.06855 (2015)
40.He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10578-9_23
41.Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems (2015)
42.LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based Learning Applied to Document Recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
43.Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
44.Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: International Conference on Pattern Recognition (2012)
45.Zeiler, M., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. In: International Conference on Learning Representations (2013)
46.Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: International Conference on Machine Learning (2013)
47.Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
48.Torralba, A., Fergus, R., Freeman, W.: 80 Million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)CrossRef
49.Nagadomi: the kaggle CIFAR10 network (2014). https://github.com/nagadomi/kaggle-cifar10-torch7/
50.Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
51.Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations (2014)
52.Lee, C., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: International Conference on Artificial Intelligence and Statistics (2015)
53.Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: Proceedings of Computer Vision and Pattern Recognition (2015)
54.Lee, C., Gallagher, P., Tu, Z.: Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree. In: International Conference on Artificial Intelligence and Statistics (2016)
55.Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report: CNS-TR-2007-001, Caltech (2007)
56.Srivastava, R., Greff, K., Schmidhuber, J.: Training very deep networks. In: Proceedings of Advances in Neural Information Processing Systems (2015)
作者单位：Lingxi Xie (17)
Qi Tian (18)
John Flynn (19)
Jingdong Wang (20)
Alan Yuille (17)

17. Center for Imaging Science, The Johns Hopkins University, Baltimore, MD, USA
18. Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, USA
19. Department of Statistics, University of California, Los Angeles, CA, USA
20. Microsoft Research, Beijing, China
丛书名：Computer Vision ¨C ECCV 2016
ISBN：978-3-319-46448-0
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349
卷排序：9905

文摘

Deep Convolutional Neural Networks (CNNs) are playing important roles in state-of-the-art visual recognition. This paper focuses on modeling the spatial co-occurrence of neuron responses, which is less studied in the previous work. For this, we consider the neurons in the hidden layer as neural words, and construct a set of geometric neural phrases on top of them. The idea that grouping neural words into neural phrases is borrowed from the Bag-of-Visual-Words (BoVW) model. Next, the Geometric Neural Phrase Pooling (GNPP) algorithm is proposed to efficiently encode these neural phrases. GNPP acts as a new type of hidden layer, which punishes the isolated neuron responses after convolution, and can be inserted into a CNN model with little extra computational overhead. Experimental results show that GNPP produces significant and consistent accuracy gain in image classification.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700