摘要
随着大数据和硬件的快速发展,细粒度分类任务应运而生,其目的是对粗粒度的大类别进行子类分类。为利用类间细微差异,提出基于RPN(Region Proposal Network)与B-CNN(Bilinear CNN)的细粒度图像分类算法。利用OHEM(Online Hard Example Mine)筛选出对识别结果影响大的图像,防止过拟合;将筛选后的图像输入到由soft-nms(Soft Non Maximum Suppression)改进的RPN网络中,得到对象级标注的图像,同时减少假阴性概率;将带有对象级标注信息的图像输入到改进后的B-CNN中,改进后的B-CNN可以融合不同层特征并加强空间联系。实验结果表明,在CUB200-2011和Standford Dogs数据集平均识别精度分别达到85.50%和90.10%。
With the rapid development of big data and hardware, fine-grained classification has emerged. Its purpose is to classify the coarse-granted categories into subclasses. In order to use the subtle differences between similarities, we proposed a fine-granted classification algorithm based on RPN and B-CNN. The online hard example mine(OHEM) algorithm was used to screen out the images which had a great impact on the recognition results to prevent the over-fitting. Then, the selected image was input into the RPN network improved by soft non maximum suppression(soft-nms). The false negative probability was reduced, and the image with object-level annotation was obtained. The image with object-level annotation was input the improved B-CNN. The improved B-CNN could fuse features of different layers and enhanced their spatial connection. The experimental results demonstrate that the average recognition accuracy of CUB200-2011 and Stanford Dogs datasets is 85.50% and 90.10%.
引文
[1] 彭晏飞, 陶进, 訾玲玲. 基于卷积神经网络和E2LSH的遥感图像检索研究[J].计算机应用与软件,2018,35(7):250-255.
[2] Dasgupta R, Namboodiri A M. Leveraging multiple tasks to regularize fine-grained classification[C]//International Conference on Pattern Recognition.IEEE,2017:3476-3481.
[3] Sang N, Chen Y, Gao C, et al. Detection of vehicle parts based on Faster R-CNN and relative position information[C]//Pattern Recognition and Computer Vision. 2018:83.
[4] Lin T Y, Roychowdhury A, Maji S. Bilinear CNN Models for Fine-Grained Visual Recognition[C]//IEEE International Conference on Computer Vision.IEEE,2016:1449-1457.
[5] Huang S, Xu Z, Tao D, et al. Part-Stacked CNN for Fine-Grained Visual Categorization[C]//Computer Vision and Pattern Recognition. IEEE, 2016:1173-1182.
[6] Shen Z, Jiang Y G, Wang D, et al. Iterative object and part transfer for fine-grained recognition[C]//IEEE International Conference on Multimedia and Expo.IEEE,2017:1470-1475.
[7] Yao H, Zhang S, Zhang Y, et al. Coarse-to-Fine Description for Fine-Grained Visual Categorization[J]. IEEE Transactions on Image Processing, 2016, 25(10):4858-4872.
[8] Liu X, Xia T, Wang J, et al. Fully Convolutional Attention Networks for Fine-Grained Recognition[EB]. arXiv:1603.06765, 2017.
[9] Murabito F, Spampinato C, Palazzo S, et al. Top-Down Saliency Detection Driven by Visual Classification[J]. Computer Vision & Image Understanding, 2018, 40(7):1130-1141.
[10] Shrivastava A, Gupta A, Girshick R. Training Region-Based Object Detectors with Online Hard Example Mining[C]//IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016:761-769.
[11] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with RegionProposal Networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 1. MIT Press,2015:91-99.
[12] 杨国亮, 王志元, 张雨, 等. 基于垂直区域回归网络的自然场景文本检测[J]. 计算机工程与科学, 2018, 40(7):1256-1263.
[13] Yeung S, Russakovsky O, Mori G, et al. End-to-end learning of action detection from frame glimpses in videos[C]//IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016:2678-2687.
[14] 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报, 2017, 43(8):1306-1318.
[15] 杨兴. 基于B-CNN模型的细粒度分类算法研究[D]. 北京:中国地质大学(北京), 2017.
[16] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification[C]//Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2017:1480-1489.