基于难负样本挖掘的改进Faster RCNN训练方法

英文篇名：Improved Faster RCNN Training Method Based on Hard Negative Mining
作者：艾拓 ; 梁亚玲 ; 杜明辉
英文作者：AI Tuo;LIANG Ya-ling;DU Ming-hui;School of Electronics and Information,South China University of Technology;
关键词：甚高速区域卷积网络 ; 目标检测 ; 难负样本挖掘 ; 自助采样
英文关键词：Faster RCNN;;Object detection;;Hard negative mining;;Bootstrap sampling
中文刊名：JSJA
英文刊名：Computer Science
机构：华南理工大学电子与信息学院;
出版日期：2018-05-15
出版单位：计算机科学
年：2018
期：v.45
基金：广州市科技计划项目(201707010070)资助
语种：中文;
页：JSJA201805045
页数：5
CN：05
ISSN：50-1075/TP
分类号：257-261

摘要

目标检测方法甚高速卷积神经网络(Faster Region-based Convolutional Neural Network,Faster RCNN)在训练过程中存在负样本远多于正样本的问题,即数据集不平衡问题。针对该问题,提出了一个综合定位误差和分类误差的判别函数用于判别难正样本,基于该函数和难负样本挖掘提出了改进的自助采样法,并提出了基于该自助采样的"五步训练法"用于训练Faster RCNN。与传统的Faster RCNN训练方法相比,五步法加强了对难样本的学习,提高了网络泛化能力,减少了误判;训练出的模型在Pascal VOC 2007数据集上测试的平均正确率均值(mean Average Precision,mAP)提高了2.4%,在FDDB(Face Detection Data Set and Benchmark)相同检出率下误检率降低了3.2%,且边框拟合度更高。
In the training process of object detection method named Faster RCNN(Faster Region-based Convolutional Neural Network),there is a data imbalance problem which means that training data contains an overwhelming number of negative examples.Aiming at this problem,a discriminant function was proposed to distinguish hard positive examples,which combines location loss and classification loss.Based on this function and hard negative mining,an improved bootstrap sampling method was proposed.Five-step training method was proposed by introducing the bootstrap sampling into traditional Faster RCNN training.Comparing with the traditional training,this method improves network's generalization ability,reduces false positive rate,and can learn hard example better.The experimental results show that the model trained by five step attains 2.4% higher mAP(mean Average Precision)on Pascal VOC 2007 dataset,reduces false positive by 3.2% on FDDB(Face Detection Data Set and Benchmark)with the same true positive rate,and gets higher fitting degree of boundary box.

引文

[1]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards RealTime Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2017,39(6):1137-1149.
    [2]YANG M,RUAN Y D,CHEN L K,et al.New Video Recognition Algorithm for Inland River Ships Based on Faster RCNN[J].Journal of Beijing Univerisyt of Posts and Telecommunications,2017,40(S1):130-134.(in Chinese)杨名,阮雅端,陈林凯,等.甚高速区域卷积神经网络的船舶视频目标识别算法[J].北京邮电大学学报,2017,40(S1):130-134.
    [3]LI J B,YANG W H,XU J Q,et al.Deep Convolutional Network Based SAR Image Object Detection and Recognition[J].Navigation Position and Timing,2017,4(1):60-66.(in Chinese)李君宝,杨文慧,许剑清,等.基于深度卷积网络的SAR图像目标检测识别[J].导航定位与授时,2017,4(1):60-66.
    [4]WANG W G,TIAN B,LIU Y,et al.Study on the electrical devices detection in UAV images based on region based convolutional neural networks[J].Journal of Geo-information Science,2017,19(2):256-263.(in Chinese)王万国,田兵,刘越,等.基于RCNN的无人机巡检图像电力小部件识别研究[J].地球信息科学学报,2017,19(2):256-263.
    [5]VIOLA P,JONES M J.Robust Real-Time Object Detection[C]∥International Workshop on Statistical and Computational Theories of Vision--Modeling,Learning,Computing,and Sampling.Vancouver:IEEE Press,2001.
    [6]FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D,et al.Object detection with discriminatively trained part-based models[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2010,32(9):1627-1645.
    [7]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥IEEE Computer Society Conference on Computer Vision&Pattern Recognition.IEEE Computer Society,2005:886-893.
    [8]CHEN L Y.Object Detection Based on Ensemble of Exemplars[D].Shanghai:Shanghai Jiao Tong University,2015.(in Chinese)陈璐艳.基于范例集成的目标检测模型研究[D].上海:上海交通大学,2015.
    [9]ZHANG X S.Research on Traffic Sign Detection in Cluttered Outdoor Scene[J].Computer Applications,2015,34(10):39-42.(in Chinese)张雪松.复杂室外场景中交通标志检测研究[J].自动化技术与应用,2015,34(10):39-42.
    [10]SUNG K K.Learning and example selection for object and pattern detection[M].Cambridge:Massachusetts Institute of Technology,1996.
    [11]GIRSHICK R.Fast R-CNN[C]∥IEEE International Conference on Computer Vision.Santiago:IEEE,2015:1440-1448.
    [12]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2015,37(9):1904-1916.
    [13]NEUBECK A,VAN GOOL L.Efficient Non-Maximum Suppression[C]∥International Conference on Pattern Recognition.IEEE Computer Society,2006:850-855.
    [14]WAN S,CHEN Z,ZHANG T,et al.Bootstrapping Face Detection with Hard Negative Examples[OL].https://arxiv.org/abs/1608.02236.
    [15]SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training Region-based Object Detectors with Online Hard Example Mining[OL].https://arxiv.org/abs/1604.03540.
    [16]JIA Y Q,SHELHAMER E,DONAHUE J,et al.Caffe:Convolutional Architecture for Fast Feature Embedding[J].Eprint Arxiv:1408.5093.
    [17]ZEILER M D,FERGUS R.Visualizing and Understanding Convolutional Networks[M].Computer Vision-ECCV.2014:Springer International Publishing,2014:818-833.
    [18]RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet Large Scale Visual Recognition Challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
    [19]EVERINGHAM M,GOOL L,WILLIAMS C K,et al.The Pascal Visual Object Classes(VOC)Challenge[J].International Journal of Computer Vision,2010,88(2):303-338.
    [20]YANG S,LUO P,CHEN C L,et al.WIDER FACE:A Face Detection Benchmark[C]∥IEEE Conference on Computer Vision&Pattern Recognition.2015:5525-5533.
    [21]JAIN V,LEARNED-MILLER E.FDDB:A Benchmark for Face Detection in Unconstrained Settings[M]∥UMass Amherst Technical Report.University of Massachusetts,2010.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700