Local Alignments for Fine-Grained Categorization

详细信息查看全文

作者：Efstratios Gavves (1)
Basura Fernando (2)
Cees G. M. Snoek (1)
Arnold W. M. Smeulders (1)
Tinne Tuytelaars (2)
关键词：Alignment ; Image representation ; Object classification
刊名：International Journal of Computer Vision
出版年：2015
出版时间：January 2015
年：2015
卷：111
期：2
页码：191-212
全文大小：3,131 KB
参考文献：1. Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 34(11), 2189鈥?202. CrossRef
2. Arbelaez, P. Hariharan, B. Gu, C. Gupta, S. Bourdev, L. & Malik, J. (2012). Semantic segmentation using regions and parts. In / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012 (pp. 3378-3385). IEEE.
3. Azizpour, H. & Laptev, I. (2012). Object detection using strongly-supervised deformable part models. In / Proceedings of the European conference on Computer Vision, (pp. 836鈥?49).
4. Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). / Vision and Image Understanding: Computer, / 110(3), 346鈥?59. CrossRef
5. Berg, T. & Belhumeur, P. N. (2013). POOF: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In / IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 955鈥?62). IEEE.
6. Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. / Psychological Review, / 94(2), 115. CrossRef
7. Bo, L. Ren, X. & Fox, D. (2010) Kernel descriptors for visual recognition. In / Proceedings of the Neural Information Processing Systems.
8. Bourdev, L. & Malik, J. (2009). Poselets: Body part detectors trained using 3D human pose annotations. In / IEEE International Conference on Computer Vision, (pp. 1365鈥?372). IEEE.
9. Branson, S. Wah, C. Schroff, F. Babenko, B. Welinder, P. Perona, P. & Belongie, S. (2010). Visual recognition with humans in the loop. In / Proceedings of the European Conference on Computer Vision.
10. Branson, S. Perona, P. & Belongie, S. (2011). Strong supervision from weak annotation: Interactive training of deformable part models. In / Proceedings of the IEEE International Conference on Computer Vision (ICCV).
11. Branson, S. Van Horn, G. Wah, C. Perona, P. & Belongie, S. (2014). The ignorant led by the blind: A hybrid human鈥攎achine vision system for fine-grained categorization. / International Journal of Computer Vision, 1鈥?7.
12. Carreira, J. (2012). CPMC: Automatic object segmentation using constrained parametric min-cuts. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 34(7), 1312鈥?328. CrossRef
13. Chai, Y. Lempitsky, V. & Zisserman, A. (2011). BiCoS: A bi-level co-segmentation method for image classification. In / IEEE International Conference on Computer Vision, (pp. 2579鈥?586). IEEE.
14. Chai, Y. Rahtu, E. Lempitsky, V. Van Gool, L. & Zisserman, A. (2012). TriCoS: A tri-level class-discriminative co-segmentation method for image classification. In / Proceedings of the European Conference on Computer Vision.
15. Chai, Y. Lempitsky, V. & Zisserman, A. (2013). Symbiotic segmentation and part localization for fine-grained categorization. In / IEEE International Conference on Computer Vision (ICCV). IEEE.
16. Cinbis, R. G. Verbeek, J. & Schmid, C. (2013). / Segmentation driven object detection with fisher vectors. In / IEEE International Conference on Computer Vision.
17. Dalal, N. & Triggs, B. (2005). Histograms of oriented gradients for human detection. In / Proceedings of the IEEE Conference Computer Vision and Pattern Recognition.
18. Darwin, C. (1859) On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life.
19. Deng, J. Dong, W. Socher, R. Li, L.-J. Li, K. & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
20. Donahue, J. Jia, Y. Vinyals, O. Hoffman, J. Zhang, N. Tzeng, E. & Darrell, T. (2013). DeCAF: A deep convolutional activation feature for generic visual recognition. Technical report. arXiv:1310.1531.
21. Duan, K. Parikh, D. Crandall, D. & Grauman, K. (2012). Discovering localized attributes for fine-grained recognition. In / Proceedings of the IEEE Conference on Vision and Pattern Recognition.
22. Everingham, M. Van Gool, L. Williams, C. K. I. Winn, J. & Zisserman, A. (2007). The PASCAL Visual Object Classes Challenge (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007 2007
23. Farrell, R. Oza, O. Zhang, N. Morariu, V. I. Darrell, T. & Davis, L. S. (2011). Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In / Proceedings of the IEEE International Conference on Computer Vision.
24. Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. / International Journal of Computer Vision, / 59(2), 167鈥?81. CrossRef
25. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 32(9), 1627鈥?645. CrossRef
26. Gavves, E. Snoek, C. G. M. & Smeulders, A. W. M. (2012). Convex reduction of high-dimensional kernels for visual classification. In / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
27. Gavves, E. Fernando, B. Snoek, C. G. M. Smeulders, A. W. M. & Tuytelaars, T. (2013). Fine-grained categorization by alignments. In / Proceedings of the IEEE International Conference on Computer Vision.
28. Gosselin, P. H. Murray, N. J茅gou, H. & Perronnin, F. (2013). Inria+Xerox@FGcomp: Boosting the fisher vector for fine-grained classification. Research Report RR-8431, INRIA.
29. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 20, 1254鈥?259. CrossRef
30. Jia, Y. Vinyals, O. & Darrell, T. (2013). Pooling-Invariant Image Feature Learning. Technical report. arXiv:1302.5056.
31. Khosla, A. Jayadevaprakash, N. Yao, B. & Fei-Fei, L. (2011). Novel dataset for fine-grained image categorization. In / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
32. Lazebnik, S. Schmid, C. & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In / Proceedings of the IEEE Conference Computer Vision and Pattern Recognition.
33. Liu, J. Kanazawa, A. Jacobs, D. & Belhumeur, P. (2012). Dog breed classification using part localization. In / Proceedings of the European Conference on Computer Vision.
34. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. / International Journal of Computer Vision, / 60(2), 91鈥?10. CrossRef
35. Maji, S. Berg, A. C. & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In / Proceedings of the IEEE Conference on Vision and Pattern Recognition.
36. Maji, S. Kannala, J. Rahtu, E. Blaschko, M. & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. Technical report.
37. Man茅n, S. Guillaumin, M. & Van Gool, L. (2013). Prime object proposals with randomized prim鈥檚 algorithm. In / Proceedings of the IEEE International Conference on Computer Vision.
38. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., et al. (2005). A comparison of affine region detectors. / International Journal of Computer Vision, / 65, 43鈥?2. CrossRef
39. Nilsback, M.E. & Zisserman, A. (2008). Automated flower classification over a large number of classes. In / ICVGIP.
40. Parikh, D. & Grauman, K. (2011). Interactive discovery of task-specific nameable attributes. In / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
41. Parkhi, O. M. Vedaldi, A. Zisserman, A. & Jawahar, C. V. (2012). Cats and dogs. In / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
42. Perd贸ch, M. Chum, O. & Matas, J. (2009). Efficient representation of local geometry for large scale object retrieval. In / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
43. Perronnin, F. Sanchez, J. & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In / Procedings of the European Conference Computer Vision.
44. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. / Cognitive Psychology, / 8, 382鈥?39. CrossRef
45. Rother, C. Kolmogorov, V. & Blake, A. (2004). Interactive foreground extraction using iterated graph cuts. In / ACM Transactions on Graphics: Grabcut. ACM
46. Sanchez, J. Perronnin, F. & Akata, Z. (2011). Fisher vectors for fine-grained visual categorization. In / Proceedings of the IEEE Conference Computer Vision and Pattern Recognition.
47. Shalev-Shwartz, S. Singer, Y. & Srebro, N. (2007) Pegasos: Primal estimated sub-gradient solver for svm. In / Proceedings of the International Conference on Machine Learning.
48. Swain, M. J., & Ballard, D. H. (1991). Color indexing. / International Journal of Computer Vision, / 7, 11鈥?2. CrossRef
49. Uijlings, J. R. R., van de Sande, K. E. A., Gevers, T., & Smeulders, A. W. M. (2013). Selective search for object recognition. / International Journal of Computer Vision, / 104, 154鈥?71. CrossRef
50. van de Sande, K. E. A. Gevers, T. & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. In / IEEE Transactions on Pattern Analysis and Machine Intelligence.
51. Vedaldi, A. & Fulkerson, B. (2010). VLFeat: An open and portable library of computer vision algorithms. In / Proceedings of the International Conference on Multimedia. ACM
52. Vedaldi, A. Gulshan, V. Varma, M. & Zisserman, A. (2009). Multiple kernels for object detection. In / Proceedings of the International Conference on Vision.
53. Wah, C. Branson, S. Perona, P. & Belongie, S. (2011a). Multiclass recognition and part localization with humans in the loop. In / Proceedings of the IEEE International Conference on Computer Vision.
54. Wah, C. Branson, S. Welinder, P. Perona, P. & Belongie, S. (2011b). The Caltech-UCSD Birds-200-2011 Dataset. Technical report.
55. Xie, L. Tian, Q. Yan, B. & Zhang, S. (2013). Hierarcical part matching for fine-grained visual categorization. In / Proceedings of the IEEE Conference on Computer Vision.
56. Yang, S. Bo, L. Wang, J. & Shapiro, L. (2012). Unsupervised template learning for fine-grained object recognition. In / Proceedings of the Neural Information Processing Systems.
57. Yao, B. Khosla, A. & Fei-Fei, L. (2011). Combining randomization and discrimination for fine-grained image categorization. In / Proceedings of the IEEE Conference on Vision and Pattern Recognition.
58. Yao, B. Bradski, G. & Fei-Fei, L. (2012). A codebook-free and annotation-free approach for fine-grained image categorization. In / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
59. Zhang, N. Farrell, R. & Darrell, T. (2012). Pose pooling kernels for sub-category recognition. In / Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
60. Zhang, N. Farrell, R. Iandola, F. & Darrell, T. (2013). Deformable part descriptors for fine-grained recognition and attribute prediction. In / Proceedings of the IEEE Conference on Computer Vision.
作者单位：Efstratios Gavves (1)
Basura Fernando (2)
Cees G. M. Snoek (1)
Arnold W. M. Smeulders (1)
Tinne Tuytelaars (2)

1. University of Amsterdam, Amsterdam, Netherlands
2. KU Leuven, ESAT PSI, iMinds, Leuven, Belgium
刊物类别：Computer Science
刊物主题：Computer Imaging, Vision, Pattern Recognition and Graphics
Artificial Intelligence and Robotics
Image Processing and Computer Vision
Pattern Recognition
出版者：Springer Netherlands
ISSN：1573-1405

文摘

The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape. Then, one may proceed to the classification by examining the corresponding regions of the alignments. More specifically, the alignments are used to transfer part annotations from training images to unseen images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We further argue that for the distinction of sub-classes, distribution-based features like color Fisher vectors are better suited for describing localized appearance of fine-grained categories than popular matching oriented shape-sensitive features, like HOG. They allow capturing the subtle local differences between subclasses, while at the same time being robust to misalignments between distinctive details. We evaluate the local alignments on the CUB-2011 and on the Stanford Dogs datasets, composed of 200 and 120, visually very hard to distinguish bird and dog species. In our experiments we study and show the benefit of the color Fisher vector parameterization, the influence of the alignment partitioning, and the significance of object segmentation on fine-grained categorization. We, furthermore, show that by using object detectors as voters to generate object confidence saliency maps, we arrive at fully unsupervised, yet highly accurate fine-grained categorization. The proposed local alignments set a new state-of-the-art on both the fine-grained birds and dogs datasets, even without any human intervention. What is more, the local alignments reveal what appearance details are most decisive per fine-grained object category.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700