基于RPN与B- CNN的细粒度图像分类算法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:FINE-GRANTED IMAGE CALSSIFICATION ALGORITHM BASED ON RPN AND B-CNN
  • 作者:赵浩如 ; 张永 ; 刘国柱
  • 英文作者:Zhao Haoru;Zhang Yong;Liu Guozhu;College of Information Science and Technology, Qingdao University of Science and Technology;
  • 关键词:细粒度分类 ; 类间差异 ; 双向卷积网络 ; 非极大值抑制 ; 特征融合
  • 英文关键词:Fine-granted classification;;Interclass difference;;B-CNN;;Non-maximum suppression;;Feature fused
  • 中文刊名:JYRJ
  • 英文刊名:Computer Applications and Software
  • 机构:青岛科技大学信息科学技术学院;
  • 出版日期:2019-03-12
  • 出版单位:计算机应用与软件
  • 年:2019
  • 期:v.36
  • 基金:国家自然科学基金项目(61472196,61672305,61702295);; 山东省自然科学基金项目(ZR2014FM015)
  • 语种:中文;
  • 页:JYRJ201903039
  • 页数:5
  • CN:03
  • ISSN:31-1260/TP
  • 分类号:216-219+270
摘要
随着大数据和硬件的快速发展,细粒度分类任务应运而生,其目的是对粗粒度的大类别进行子类分类。为利用类间细微差异,提出基于RPN(Region Proposal Network)与B-CNN(Bilinear CNN)的细粒度图像分类算法。利用OHEM(Online Hard Example Mine)筛选出对识别结果影响大的图像,防止过拟合;将筛选后的图像输入到由soft-nms(Soft Non Maximum Suppression)改进的RPN网络中,得到对象级标注的图像,同时减少假阴性概率;将带有对象级标注信息的图像输入到改进后的B-CNN中,改进后的B-CNN可以融合不同层特征并加强空间联系。实验结果表明,在CUB200-2011和Standford Dogs数据集平均识别精度分别达到85.50%和90.10%。
        With the rapid development of big data and hardware, fine-grained classification has emerged. Its purpose is to classify the coarse-granted categories into subclasses. In order to use the subtle differences between similarities, we proposed a fine-granted classification algorithm based on RPN and B-CNN. The online hard example mine(OHEM) algorithm was used to screen out the images which had a great impact on the recognition results to prevent the over-fitting. Then, the selected image was input into the RPN network improved by soft non maximum suppression(soft-nms). The false negative probability was reduced, and the image with object-level annotation was obtained. The image with object-level annotation was input the improved B-CNN. The improved B-CNN could fuse features of different layers and enhanced their spatial connection. The experimental results demonstrate that the average recognition accuracy of CUB200-2011 and Stanford Dogs datasets is 85.50% and 90.10%.
引文
[1] 彭晏飞, 陶进, 訾玲玲. 基于卷积神经网络和E2LSH的遥感图像检索研究[J].计算机应用与软件,2018,35(7):250-255.
    [2] Dasgupta R, Namboodiri A M. Leveraging multiple tasks to regularize fine-grained classification[C]//International Conference on Pattern Recognition.IEEE,2017:3476-3481.
    [3] Sang N, Chen Y, Gao C, et al. Detection of vehicle parts based on Faster R-CNN and relative position information[C]//Pattern Recognition and Computer Vision. 2018:83.
    [4] Lin T Y, Roychowdhury A, Maji S. Bilinear CNN Models for Fine-Grained Visual Recognition[C]//IEEE International Conference on Computer Vision.IEEE,2016:1449-1457.
    [5] Huang S, Xu Z, Tao D, et al. Part-Stacked CNN for Fine-Grained Visual Categorization[C]//Computer Vision and Pattern Recognition. IEEE, 2016:1173-1182.
    [6] Shen Z, Jiang Y G, Wang D, et al. Iterative object and part transfer for fine-grained recognition[C]//IEEE International Conference on Multimedia and Expo.IEEE,2017:1470-1475.
    [7] Yao H, Zhang S, Zhang Y, et al. Coarse-to-Fine Description for Fine-Grained Visual Categorization[J]. IEEE Transactions on Image Processing, 2016, 25(10):4858-4872.
    [8] Liu X, Xia T, Wang J, et al. Fully Convolutional Attention Networks for Fine-Grained Recognition[EB]. arXiv:1603.06765, 2017.
    [9] Murabito F, Spampinato C, Palazzo S, et al. Top-Down Saliency Detection Driven by Visual Classification[J]. Computer Vision & Image Understanding, 2018, 40(7):1130-1141.
    [10] Shrivastava A, Gupta A, Girshick R. Training Region-Based Object Detectors with Online Hard Example Mining[C]//IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016:761-769.
    [11] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with RegionProposal Networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 1. MIT Press,2015:91-99.
    [12] 杨国亮, 王志元, 张雨, 等. 基于垂直区域回归网络的自然场景文本检测[J]. 计算机工程与科学, 2018, 40(7):1256-1263.
    [13] Yeung S, Russakovsky O, Mori G, et al. End-to-end learning of action detection from frame glimpses in videos[C]//IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016:2678-2687.
    [14] 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报, 2017, 43(8):1306-1318.
    [15] 杨兴. 基于B-CNN模型的细粒度分类算法研究[D]. 北京:中国地质大学(北京), 2017.
    [16] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification[C]//Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2017:1480-1489.