基于改进YOLOv3的快速车辆检测方法

英文篇名：Fast Vehicle Detection Method Based on Improved YOLOv3
作者：张富凯 ; 杨峰 ; 李策
英文作者：ZHANG Fukai;YANG Feng;LI Ce;School of Mechanical Electronic and Information Engineering, China University of Mining and Technology(Beijing);
关键词：车辆检测 ; 特征融合 ; 卷积神经网络 ; 实时检测 ; YOLOv3
英文关键词：vehicle detection;;feature fusion;;convolutional neural network;;real-time detection;;YOLOv3
中文刊名：JSGG
英文刊名：Computer Engineering and Applications
机构：中国矿业大学(北京)机电与信息工程学院;
出版日期：2019-01-15
出版单位：计算机工程与应用
年：2019
期：v.55;No.921
基金：国家自然科学基金(No.61601466);; 煤炭资源与安全开采国家重点实验室项目(No.SKLCRSM16KFD04);; 中央大学基础研究基金(No.2016QJ04)
语种：中文;
页：JSGG201902003
页数：9
CN：02
分类号：18-26

摘要

对图像或视频数据中的车辆进行检测是城市交通监控中非常重要并且具有挑战性的任务。该任务的难度在于对复杂场景中相对较小的车辆进行精准地定位和分类。针对这些问题,提出了一个单阶段的深度神经网络(DF-YOLOv3),实现城市交通监控中不同类型车辆的实时检测。DF-YOLOv3对传统的YOLOv3算法进行改进,首先增强深度残差网络提取车辆特征,然后设计6个不同尺度的卷积特征图,并与残差网络中相应尺度的特征图进行融合,形成最终的特征金字塔执行车辆预测任务。在KITTI数据集上的实验表明,提出的DF-YOLOv3方法在精度和速度上均能获得较高的检测性能。具体地,对于512×512分辨率的输入模型,基于英伟达1080Ti GPU,DF-YOLOv3获得93.61%的mAP(均值平均精度),速度达到45.48 f/s(每秒传输帧数)。特别地,对于精度,DF-YOLOv3比Fast R-CNN、Faster R-CNN、DAVE、YOLO、SSD、YOLOv2、YOLOv3与SINet表现更好。
Vehicle detection on image or video data is an important but challenging task for urban traffic surveillance. The difficulty of this task is to accurately locate and classify relatively small vehicles in complex scenes. In response to these problems, this paper presents a single deep neural network(DF-YOLOv3)for fast detecting vehicles with different types in urban traffic surveillance. DF-YOLOv3 improves the conventional YOLOv3 by first enhancing the residual network to extract vehicle features, then designing 6 different scale convolution feature maps and merging with the corresponding feature maps in the previous residual network, to form the final feature pyramid for performing vehicle prediction. Experimental results on the KITTI dataset demonstrate that the proposed DF-YOLOv3 can achieve efficient detection performance in terms of accuracy and speed. Specifically, for the 512 × 512 input model, using NVIDIA GTX 1080 Ti GPU,DF-YOLOv3 achieves 93.61% mAP(mean average precision)at the speed of 45.48 f/s(frames per second). Especially,as for accuracy, DF-YOLOv3 performances better than those of Fast R-CNN, Faster R-CNN, DAVE, YOLO, SSD, YOLOv2,YOLOv3 and SINet.

引文

[1]刘博艺,程杰仁,唐湘滟,等.复杂动态环境下运动车辆的识别方法[J].计算机科学与探索,2017,11(1):134-143.
    [2]Li H,Fu K,Yan M L,et al.Vehicle detection in remote sensing images using denoizing-based convolutional neural networks[J].Remote Sensing Letters,2017,8(3):262-270.
    [3]Geiger A,Lenz P,Urtasun R.Are we ready for autonomous driving?The KITTI vision benchmark suite[C]//CVPR,2012.
    [4]Wen L Y,Du D W,Cai Z W,et al.UA-DETRAC:a new benchmark and protocol for multi-object detection and tracking[J].arXiv:1511.04136v3,2015.
    [5]Lyu S W,Chang M C,Du D W,et al.UA-DETRAC 2017:report of AVSS2017&IWT4S challenge on advanced traffic monitoring[C]//AVSS,2017:1-7.
    [6]Krizhevsky A,Sutskever I,Hinton G E.ImageNet classification with deep convolutional neural networks[C]//NIPS,2012.
    [7]Simonyan K,Zisserman A.Verydeep convolutional networks for large-scale image recognition[C]//NIPS,2015.
    [8]Szegedy C,Liu W,Jia Y Q,et al.Goingdeeper with convolutions[J].arXiv:1409.4842v1,2014.
    [9]姚群力,胡显,雷宏.深度卷积神经网络在目标检测中的研究进展[J].计算机工程与应用,2018,54(17):1-9.
    [10]谢林江,季桂树,彭清,等.改进的卷积神经网络在行人检测中的应用[J].计算机科学与探索,2018,12(5):708-718.
    [11]Girshick R,Donahue J,Darrell T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//CVPR,2014:580-587.
    [12]He K M,Zhang X Y,Ren S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].arXiv:1406.4729v4,2015.
    [13]Girshick R.Fast R-CNN[C]//ICCV,2015:1440-1448.
    [14]Ren S Q,He K M,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
    [15]Dai J F,He K M,Sun J.R-FCN:object detection via region-based fully convolutional networks[J].arXiv:1605.06409,2016.
    [16]Uijlings J R R,Sande K E A V D,Gevers T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
    [17]Redmon J,Divvala S,Girshick R,et al.You only look once:unified,real-time object detection[C]//CVPR,2016.
    [18]Liu W,Anguelov D,Erhan D.SSD:single shot multibox detector[C]//ECCV,2016:21-37.
    [19]Redmon J,Farhadi A.YOLO9000:better,faster,stronger[J].arXiv:1612.08242v1,2016.
    [20]Redmon J,Farhadi A.YOLOv3:an incremental improvement[J].arXiv:1804.02767v1,2018.
    [21]Zhou Y,Liu L,Shao L,et al.Fast automatic vehicle annotation for urban traffic surveillance[J].IEEE Transactions on Intelligent Transportation Systems,2018,19(6):1973-1984.
    [22]Yuan X,Su S,Chen H J.A graph-based vehicle proposal location and detection algorithm[J].IEEE Transactions on Intelligent Transportation Systems,2017,18(12):3282-3289.
    [23]Min W D,Fan M D,Guo X G,et al.A new approach to track multiple vehicles with the combination of robust detection and two classifiers[J].IEEE Transactions on Intelligent Transportation Systems,2018,19(1):174-186.
    [24]Cao W M,Yuan J H,He Z H,et al.Fast deep neural networks with knowledge guided training and predicted regions of interests for real-time video object detection[J].IEEE Access,2018,6:8990-8999.
    [25]Luo Z M,Charron F B,Lemaire C,et al.MIO-TCD:a new benchmark dataset for vehicle classification and localization[J].IEEE Transactions on Image Processing,2018,27(10):5129-5141.
    [26]Szegedy C,Vanhoucke V,Ioffe S,et al.Rethinking the inception architecture for computer vision[J].arXiv:1512.00567v3,2015.
    [27]He K M,Zhang X Y,Ren S Q,et al.Deep residual learning for image recognition[C]//CVPR,2016.
    [28]Chollet F.Xception:deep learning with depthwise separable convolutions[J].arXiv:1610.02357v3,2017.
    [29]Huang G,Liu Z,Maaten L V D,et al.Densely connected convolutional networks[C]//CVPR,2017.
    [30]Hu X W,Xu X M,Xiao Y J,et al.SINet:a scale-insensitive convolutional neural network for fast vehicle detection[J].arXiv:1804.00433v1,2018.
    [31]Ioffe S,Szegedy C.Batch normalization:accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167v3,2015.
    [32]Lin T Y,Dollár P,Girshick R,et al.Feature pyramid networks for object detection[J].arXiv:1612.03144v2,2017.
    [33]LeCun Y,Boser B,Denker J S,et al.Backpropagation applied to handwritten zip code recognition[J].Neural Computation,1989,1(4):541-551.
    [34]Redmon J.Darknet:open source neural networks in C[DB/OL].http://pjreddie.com/darknet/.
    [35]Everingham M,Eslami M A,Gool L V,et al.The pascal visual object classes challenge:a retrospective[J].International Journal of Computer Vision,2015,111(1):98-136.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700