摘要
提出了一种应用于嵌入式图形处理器(GPU)的实时目标检测算法。针对嵌入式平台计算单元较少、处理速度较慢的现状,提出了一种基于YOLO-V3(You Only Look Once-Version 3)架构的改进的轻量目标检测模型,对汽车目标进行了离线训练,在嵌入式平台上部署训练好的模型,实现了在线检测。实验结果表明,在嵌入式平台上,所提方法对分辨率为640 pixel×480 pixel的视频图像的检测速度大于23 frame/s。
A real-time target detection algorithm is proposed and used in the embedded graphic processing unit(GPU). In view of the lack of computing units and the slow processing speed for an embedded platform, an improved lightweight target detection model is proposed based on the YOLO-V3(You Only Look Once-Version 3) structure. This model is first trained off-line with vehicle targets and then deployed on the embedded GPU platform to achieve the online prediction. The experimental results show that the processing speed of the proposed method on the embedded GPU platform reaches 23 frame/s for a 640 pixel×480 pixel video.
引文
[1] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[2] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[3] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[4] Girshick R. Fast R-CNN[C]. IEEE International Conference on Computer Vision, 2015: 1440-1448.
[5] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[6] Li Y, He K, Sun J. R-FCN: Object detection via region-based fully convolutional networks[C]. Advances in Neural Information Processing Systems, 2016: 379-387.
[7] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]. European Conference on Computer Vision, 2016: 21-37.
[8] Redmon J, Farhadi A. YOLO9000: Better, faster, stronger[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517-6525.
[9] Redmon J, Farhadi A. Yolov3: An incremental improvement[EB/OL]. (2018-04-08)[2018-09-07]. https://arxiv.org/abs/1804.02767.
[10] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[11] Feng X Y, Mei W, Hu D S. Aerial target detection based on improved faster R-CNN[J]. Acta Optica Sinica, 2018, 38(6): 0615004. 冯小雨, 梅卫, 胡大帅. 基于改进Faster R-CNN的空中目标检测[J]. 光学学报, 2018, 38(6): 0615004.
[12] Xin P, Xu Y L, Tang H, et al. Fast airplane detection based on multi-layer feature fusion of fully convolutional networks[J]. Acta Optica Sinica, 2018, 38(3): 0315003. 辛鹏, 许悦雷, 唐红, 等. 全卷积网络多层特征融合的飞机快速检测[J]. 光学学报, 2018, 38(3): 0315003.
[13] Iandola F N, Han S, Moskewicz M W, et al. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL]. (2016-11-04)[2018-09-07]. https://arxiv.org/abs/1602.07360.
[14] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2018-09-07]. https://arxiv.org/abs/ 1704.04861.
[15] Sandler M, Howard A, Zhu M, et al. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation[EB/OL]. (2018-04-02)[2018-09-07]. https://arxiv.org/abs/1801.04381.
[16] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2012: 3354-3361.
[17] Johnson-Roberson M, Barto C, Mehta R, et al. Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks?[C]. IEEE International Conference on Robotics and Automation, 2017: 746-753.
[18] Redmon J. YOLO-tiny[EB/OL]. (2018-08-16)[2018-09-07]. https://github.com/pjreddie/darknet/blob/master/cfg/yolov3-tiny.cfg.