遗漏负样本挖掘的行人检测方法

英文篇名：Pedestrian Detection Based on Escaped Negative Samples Mining
作者：刘芷含 ; 李旻先 ; 赵春霞
英文作者：LIU Zhihan;LI Minxian;ZHAO Chunxia;Nanjing University of Science and Technology,School of Computer Science and Engineering;
关键词：卷积神经网络 ; 行人检测 ; 挖掘难负示例方法
英文关键词：convolutional neural networks;;pedestrian detection;;the hard negative mining
中文刊名：JSSG
英文刊名：Computer & Digital Engineering
机构：南京理工大学计算机科学与工程学院;
出版日期：2019-02-20
出版单位：计算机与数字工程
年：2019
期：v.47;No.352
语种：中文;
页：JSSG201902036
页数：6
CN：02
ISSN：42-1372/TP
分类号：181-186

摘要

用于物体检测的许多现代方法,如Faster R-CNN,都是基于卷积神经网络的。在监控视频中,由于背景复杂和行人多姿态等原因,存在许多误检。挖掘难负示例的方法可以在一定程度上可以解决误检,Faster RCNN由于使用端到端的训练策略而没有使用挖掘难负示例的方法,生成样本时只考虑了真值候选框周围的样本。为此,论文提出了一种新颖的算法来挖掘被遗漏的负样本,当生成分类器的负样本时,综合利用候选框的置信度以产生更具代表性的负样本。该方法不仅在静态行人数据库(如INRIA)上,还在监控视频(PKU-SVD-B和Caltech)下的数据库上,对召回率,精度率和F1测量指标进行了一致的提高。我们的方法只需要改进样本生成算法,不需要任何额外的超参数,因此它不会增加Faster R-CNN的计算量,并且易于实现。
Many modern methods used for object detection,such as Faster R-CNN,are based on convolutional neural net-works. In the surveillance video,there are many false positives due to complex background and pedestrian multi-gesture. Faster R-CNN do not use the hard negative mining which can solve false positives to a certain extent due to the end-to-end training strategies.When generating samples,Faster R-CNN only uses the samples around the ground truth. To this end,a novel algorithm to mine theescaped negative samples is proposed,which synthesizes the confidence of the candidate boxes to produce more representative nega-tive samples when generating negative samples for the classifier. The method obtains consistent improvements for the recall,the pre-cision and F1 measure not only on standard datasets like INRIA,but also on the database under the surveillance videos(PKU-SVD-B and Caltech). At the same time,the method only needs to change the sample generation algorithm,does not need any addi-tional hyper-parameters,therefore it does not increase the computation of the Faster R-CNN and is easy to implement.

引文

[1]S.Huang,D.Ramanan. Recognition in-the-Tail:TrainingDetectors for Unusual Pedestrians with Synthetic Impos-ters[J]. Computer Science,2017.
    [2]L.Zhang,L.Lin,X.Liang,et al. Is Faster R-CNN DoingWell for Pedestrian Detection?[J]. In European Confer-ence on Computer Vision,2016:443-457.
    [3]L.Chi,H.Zhang and M.Chen. End-to-End Spatial Transform Face Detection and Recognition[J]. Computer Sci-ence,2017.
    [4]J.Ren,X.Chen,J.Liu,et al. Accurate Single Stage Detec-tor Using Recurrent Rolling Convolution[J]. ComputerScience,2017:752-760.
    [5]S.Ren,K.He,R.Girshick,et al. Faster R-CNN:TowardsReal-time Object Detection with Region Proposal Net-works[J]. IEEE Transactions on Pattern Analysis&Ma-chine Intelligence,2017,39(6):1137.
    [6]R.Girshick. Fast R-CNN[J]. In Proceedings of the IEEEInternational Conference on Computer Vision,2015:1440-1448.
    [7] S.Bell,CL.Zitnick,K.Bala,et al. Inside-outside Net:Detecting Objects in Context with Skip Pooling and Recur-rent Neural Networks[J]. In Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition,2016:2874-2883.
    [8]F.Shen,Z.Tang,and J.Xu. Locality Constrained Repre-sentation Based Classification with Spatial Pyramid Patch-es[J]. In Neurocomputing,2013,101:104-115.
    [9]KQ.Huang,WQ.Ren,TN.Tan. A Review on Image ObjectClassification and Detection[J]. In Chinese Journal ofComputers,2014,37(6):1225-1240.
    [10]F.Shen,Y.Mu,Y.Yang,et al. Classification by Retriev-al:Binarizing Data and Classifiers[C]//InternationalACM SIGIR Conference on Research and Developmentin Information Retrieval. ACM,2017:595-604.
    [11] A.Shrivastava,A.Gupta,R.Girshick. Training Region-based Object Detectors with Online Hard Example Min-ing[J]. In Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition,2016:761-769.
    [12]N.Dalal,B.Triggs. Histograms of Oriented Gradients forHuman Detection[C]//Computer Vision and PatternRecognition,2005. CVPR 2005. IEEE Computer SocietyConference on. IEEE,2005:886-893.
    [13]PF.Felzenszwalb,RB.Girshick,D.McAllester,et al. Object Detection with Discriminatively Trained Part-basedModels[J]. In IEEE transactions on pattern analysis andmachine intelligence,2010,32(9):1627-1645.
    [14]R.Girshick,J.Donahue,T.Darrell,et al. Rich FeatureHierarchies for Accurate Object Detection and SemanticSegmentation[J]. In Proceedings of the IEEE conferenceon computer vision and pattern recognition. 2014:580-587.
    [15]K.He,X.Zhang,S.Ren,et al. Spatial Pyramid Poolingin Deep Convolutional Networks for Visual Recognition[J]. In European Conference on Computer Vision,2014:346-361.
    [16]JRR.Uijlings,KEAVD.Sande,T.Gevers,et al. SelectiveSearch for Object Recognition[J]. In International jour-nal of computer vision,2013,104(2):154-171.
    [17]M.Li,Z.Zhang,H.Yu,et al. S-OHEM:Stratified On-line Hard Example Mining for Object Detection[J]. Com-puter Science,2017.
    [18]S.Wan,Z.Chen,T.Zhang,et al. Bootstrapping Face De-tection with Hard Negative Examples[J]. Computer Sci-ence,2016.
    [19]C.Wojek,P.Dollar,B.Schiele,P.Perona. Pedestrian de-tection:An evaluation of the state of the art. IEEE Trans-actions on Pattern Analysis&Machine Intelligence[J].2012,34(4):743-761.
    [20] MD.Zeiler,R.Fergus. Visualizing and UnderstandingConvolutional Networks[J]. In European Conference onComputer Vision,2014:818-833.
    [21]K. Simonyan and A. Zisserman. Very deep convolutionalnetworks for large-scale image recognition[J]. In Inter-national Conference on Learning Representations(ICLR),2015.
    [22] X.Jia,N.Li and Y.Jin. Dynamic Convolutional NeuralNetwork Extreme Learning Machine for Text SentimentClassification[J]. In Journal of Beijing University ofTechnology,2017,43(1):28-3.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700