基于轻量化SSD的车辆及行人检测网络

英文篇名：Vehicle and Pedestrian Detection Model Based on Lightweight SSD
作者：郑冬 ; 李向群 ; 许新征
英文作者：Zheng Dong;Li Xiangqun;Xu Xinzheng;School of Computer Science and Technology,China University of Mining and Technology;Key Laboratory of Data Science and Intelligence Application,Fujian Province University;
关键词：目标检测 ; 卷积神经网络 ; 轻量化神经网络 ; SSD ; MobileNetv2
英文关键词：object detection;;convolutional neural network;;lightweight neural network;;SSD;;mobileNetv2
中文刊名：NJSF
英文刊名：Journal of Nanjing Normal University(Natural Science Edition)
机构：中国矿业大学计算机科学与技术学院;数据科学与智能应用福建省高校重点实验室;
出版日期：2019-03-20
出版单位：南京师大学报(自然科学版)
年：2019
期：v.42;No.157
基金：国家自然科学基金(61672522);; 数据科学与智能应用福建省高校重点实验室开放课题(D1804)
语种：中文;
页：NJSF201901013
页数：9
CN：01
ISSN：32-1239/N
分类号：79-87

摘要

近年来,基于深度学习的目标检测算法发展迅速.但是由于深度网络规模过大,导致其还不能在嵌入式平台上进行广泛应用.本文针对SSD(Single Shot Multi-box Detector)模型的规模进行优化,引入了轻量化卷积神经网络MobileNetv2,对比了SSD和其轻量化版本SSDLite的网络结构,在此基础上提出了基于轻量化SSD的车辆及行人检测模型LVP-DN(Lightweight Vehicle and Pedestrian Detection Network).首先,通过MobilNetv2替代VGG作为基础网络进行特征提取.然后,用轻量化的SSD版本SSDLite替代SSD,从而达到减少模型大小、加快检测速度的目的.进一步通过优化默认候选框的比例,提高了网络对行人的检测精度.最后,在KITTI和PASCAL VOC数据集上分别对比了不同基础网络、输入图像尺寸及是否使用预训练模型这3个因素对网络性能的影响.实验结果表明,相比其他流行的目标检测模型,本文所提出的车辆及行人检测模型在精度、速度和模型大小等评价标准上取得了较好的效果.
In recent years,the object detection algorithm based on deep learning has developed rapidly. However,it can't be widely used in embedded platforms because the network is too large. This paper optimized the model size of SSD(Single Shot Multi-box Detector)network,introduced the lightweight convolutional neural network—MobileNetv2,analyzed the inverted residual and linear bottleneck structure in MobileNetv2,and compared SSD and its lightweight version—SSDLite. We proposed a lightweight vehicle and pedestrian detection model which named LVP-DN(Lightweight Vehicle and Pedestrian Detection Network). First,the MobilNetv2 was used to instead of VGG as the basic network to perform feature extraction. Then,the SSDLite was used to replace the original structure,in order to reduce the model size and speed up the detection process. It is improved that the accuracy of network detection for pedestrians by optimizing the ratio of the default box. We compared the impact of three factors on network performance on the KITTI and PASCAL VOC datasets. The factors are the input image size,different basic network and whether used the pre-training models. The experimental results show that compared with other popular object detection models,the vehicle and pedestrian detection models proposed in this paper have achieved good results in the evaluation standards such as accuracy,speed,and model size.

引文

[1] SZEGEDY C,TOSHEV A,ERHAN D.Deep neural networks for object detection[C]//International Conference on Neural Information Processing Systems.USA:MIT Press,2013,26:2553-2561.
    [2] SERMANET P,EIGEN D,ZHANG X,et al.OverFeat:integrated recognition,localization and detection using convolutional networks[J].Eprint Arxiv,2013:1312.6229.
    [3] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//International Conference on Neural Information Processing Systems.Canada:MIT Press,2015:91-99.
    [4] DAI J,LI Y,HE K,et al.R-FCN:Object detection via region-based fully convolutional networks[J].Eprint Arxiv,2016:1605.06409.
    [5] HE K,GKIOXARI G,DOLLáR P,et al.Mask R-CNN[C]//IEEE International Conference on Computer Vision.Italy:IEEE,2017:2980-2988.
    [6] LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shot multibox detector[C]//Computer Vision-ECCV 2016.Amsterdam,the Netherlands:Springer International Publishing,2016:21-37.
    [7] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Computer Vision and Pattern Recognition.USA:IEEE,2016:779-788.
    [8] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.USA:IEEE Computer Society,2014:580-587.
    [9] SANDLER M,HOWARD A,ZHU M,et al.MobileNetV2:inverted residuals and linear bottlenecks[J].Eprint Arxiv,2018.
    [10] GIRSHICK R.Fast R-CNN[C]//IEEE International Conference on Computer Vision.USA:IEEE Computer Society,2015:1440-1448.
    [11] REDMON J,FARHADI A.YOLO9000:better,faster,stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition.Italy:IEEE Computer Society,2017:6517-6525.
    [12] REDMON J,FARHADI A.YOLOv3:an incremental improvement[J].Eprint Arxiv,2018:104.02767.
    [13] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.USA:IEEE Computer Society,2016:770-778.
    [14] LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.Italy:IEEE Computer Society,2017:936-944.
    [15] WONG A,SHAFIEE M J,LI F,et al.Tiny SSD:a tiny single-shot detection deep convolutional neural network for real-time embedded object detection[C]//Conference on Computer and Robot Vision.Foronto:IEEE,2018(15):95-101.
    [16] IANDOLA F N,HAN S,MOSKEWICZ M W,et al.SqueezeNet:AlexNet-level accuracy with 50×fewer parameters and<0.5 MB model size[J].Eprint Arxiv,2016:1602.07360.
    [17] EVERINGHAM M.The PASCAL visual object classes challenge[J].Lecture notes in computer science,2005,111(1):98-136.
    [18] CHOLLET F.Xception:deep learning with depthwise separable convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition.Italy:IEEE Computer Society,2017:1800-1807.
    [19] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:efficient convolutional neural networks for mobile vision applications[J].Eprint Arxiv,2017.
    [20] GEIGER A,LENZ P,URTASUN R.Are we ready for autonomous driving?The KITTI vision benchmark suite[C]//IEEE Conference on Computer Vision and Pattern Recognition.USA:IEEE Computer Society,2012:3354-3361.
    [21] GEIGER A,LENZ P,STILLER C,et al.Vision meets robotics:the KITTI dataset[J].International journal of robotics research,2013,32(11):1231-1237.
    [22] LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
    [23] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//International Conference on Leoorning Representations.USA:IEEE,2014.
    [24] GIRSHICK R.Fast R-CNN[C]//IEEE International Conference on Computer Vision.USA:IEEE Computer Society,2015:1440-1448.
    [25] KIM H,LEE Y,YIM B,et al.On-road object detection using deep neural network[C]//IEEE International Conference on Consumer Electronics-Asia.Korea:IEEE,2016:1-4.
    [26] HUANG J,GUADARRAMA S,MURPHY K,et al.Speed/accuracy trade-offs for modern convolutional object detectors[C]//IEEE International Conference on Computer Vision.USA:IEEE Computer Society,2016:3296-3297.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700