特征增强的SSD算法及其在目标检测中的应用

英文篇名：Feature Enhancement SSD for Object Detection
作者：谭红臣 ; 李淑华 ; 刘彬 ; 刘秀平
英文作者：Tan Hongchen;Li Shuhua;Liu Bin;Liu Xiuping;School of Mathematical Sciences, Dalian University of Technology;
关键词：SSD算法 ; 目标检测 ; 特征融合 ; 网络结构
英文关键词：single shot multibox detector;;object detection;;feature fusion;;network structure
中文刊名：JSJF
英文刊名：Journal of Computer-Aided Design & Computer Graphics
机构：大连理工大学数学科学学院;
出版日期：2019-04-15
出版单位：计算机辅助设计与图形学学报
年：2019
期：v.31
语种：中文;
页：JSJF201904007
页数：7
CN：04
ISSN：11-2925/TP
分类号：63-69

摘要

针对多尺度单发射击检测(SSD)算法不同尺度的特征层很难进行融合互补问题,提出一种特征增强的SSD(FE-SSD)算法.首先对SSD算法的金字塔特征层中,每一尺度的特征进行尺寸不变的卷积操作;然后将卷积前与卷积后的特征进行特征融合操作,进而产生一组新的金字塔特征层;最后在新产生的金字塔特征层上执行目标的检测与定位任务.在PASCALVOC2007公共数据库上进行实验,当输入图像尺寸为300×300时,检测精度(mAP)达到78.0%,检测速度(FPS)达到82.5帧/s.此外,在拓展实验中,文中算法对图像中模糊目标的检测效果也优于SSD算法.
This paper presents feature enhancement single shot multi-box detector(FE-SSD) for object detection. In FE-SSD network structure, firstly we apply scale-invariant convolution operation on each scale feature map in SSD's pyramid feature maps. Then fusing the original feature and convolved feature generates new SSD's feature pyramid, which will be fed to multibox detectors to predict the final detection results.On the PASCAL VOC2007 test, our network can achieve 78.0% mean average precision(mAP) at the speed of 82.5 frame per second(FPS) with the input size 300×300. On extended experiment, FE-SSD performance over SSD in blurry object detection.

引文

[1]Girshick R,Donahue J,Darrell T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2014:580-587
    [2]Girshick R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEEComputer Society Press,2015:1440-1448
    [3]Ren S Q,He K M,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2015,1:91-99
    [4]Redmon J,Divvala S,Girshick R,et al.You only look once:unified,real-time object detection[C]//Proceedings of the IEEEInternational Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2016:779-788
    [5]Liu W,Anguelov D,Erhan D,et al.SSD:single shot multibox detector[C]//Proceedings of European Conference on Computer Vision.Aire-la-Ville:Eurographics Association Press,2016:21-37
    [6]He K M,Zhang X Y,Ren S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEETransactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916
    [7]Dai J F,Li Y,He K M,et al.R-FCN:object detection via region-based fully convolutional networks[OL].[2018-06-17].http://cn.arxiv.org/abs/1605.06409
    [8]Fu C Y,Liu W,Ranga A,et al.DSSD:deconvolutional single shot detector[OL].[2018-06-17].http://cn.arxiv.org/abs/1701.06659
    [9]Shen Z Q,Liu Z,Li J G,et al.DSOD:learning deeply supervised object detectors from scratch[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2017:1937-1945
    [10]Lin T Y,Dollar P,Girshick R,et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2017:936-944
    [11]Cao G M,Xie X M,Yang W Z,et al.Feature-fused SSD:fast detection for small objects[OL].[2018-06-17].https://arxiv.org/abs/1709.05054
    [12]Chen C Y,Liu M Y,Tuzel O,et al.R-CNN for small object detection[C]//Proceedings of Asian Conference on Computer Vision.Gewerbestrasse:Springer International Publishing AG,2016:214-230
    [13]Bell S,Lawrence Zitnick C,Bala K,et al.Inside-outside net:detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEEComputer Society Press,2016:2874-2883
    [14]Kong T,Yao A,Chen Y R,et al.HyperNet:towards accurate region proposal generation and joint object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2016:845-853
    [15]Chen X Z,Kundu K,Zhu Y K,et al.3D object proposals for accurate object class detection[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2015:424-432
    [16]Hu P Y,Ramanan D.Finding tiny faces[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2017:1522-1530
    [17]Li J N,Liang X D,Wei Y C,et al.Perceptual generative adversarial networks for small object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2017:1951-1959
    [18]Goodfellow I J,Pouget-Abadie J,Mirza M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2014,2:2672-2680
    [19]Pinheiro P O,Lin T Y,Collobert R,et al.Learning to refine object segments[C]//Proceedings of European Conference on Computer Vision.Aire-la-Ville:Eurographics Association Press,2016:75-91
    [20]Li Z X,Zhou F Q.FSSD:feature fusion single shot multibox detector[OL].[2018-06-17].http://cn.arxiv.org/abs/1712.00960
    [21]He K M,Zhang X Y,Ren S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEEComputer Society Press,2016:770-778
    [22]Ioffe S,Szegedy C.Batch normalization:accelerating deep network training by reducing internal covariate shift[OL].[2018-06-17].https://arxiv.org/abs/1502.03167
    [23]Liu W,Rabinovich A,Berg A C.ParseNet:looking wider to see better[OL].[2018-06-17].https://arxiv.org/abs/1506.04579
    [24]Deng J,Dong W,Socher R,et al.ImageNet:a large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2009:248-255
    [25]Jia Y Q,Shelhamer E,Donahue J,et al.Caffe:convolutional architecture for fast feature embedding[OL].[2018-06-17].http://cn.arxiv.org/abs/1408.5093
    [26]Everingham M,Gool L,Williams C K,et al.The Pascal visual object classes(VOC)challenge[J].International Journal of Computer Vision,2010,88(2):303-338

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700