摘要
基于深度可分离卷积,提出了一种适用于嵌入式平台的小型目标检测网络MTYOLO(MobileNet Tiny-Yolo),它将待检测的图片平均分割成多个单元格,并采用深度可分离卷积代替传统卷积,减少了参数量和计算量。采用点卷积和特征图融合的方法来提高检测精度。实验结果表明,所提MTYOLO网络模型大小为41 MB,约为Tiny-Yolo模型的67%,其在PASCAL VOC 2007数据集上的检测准确率可达到57.25%,检测效果优于Tiny-Yolo模型,更适合应用于嵌入式系统。
Based on depth separable convolution, a small object detection network for embedded platform, MTYOLO(MobileNet Tiny-Yolo), is proposed. It divides the image into many grids and replaces the traditional convolution by the depth separable convolution, which decreases the number of parameters and computational cost. The point convolution and the feature map merging are adopted to improve the detection accuracy. The experimental results show that the size of the proposed MTYOLO network model is 41 MB, approximately 67% of that of Tiny-Yolo model. Furthermore, its detection accuracy on the PASCAL VOC 2007 dataset is up to 57.25%, superior to the Tiny-Yolo model's. The proposed model is particularly suitable for application in embedded platforms.
引文
[1] Erhan D,Szegedy C,Toshev A,et al.Scalable object detection using deep neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition,2014:2155-2162.
[2] Wang T Y.The motion detection based on background difference method and active contour model[C]//IEEE Joint International Information Technology and Artificial Intelligence Conference,2011:480-483.
[3] Lee D S.Effective Gaussian mixture learning for video background subtraction[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):827-832.
[4] Yuan G W,Chen Z Q,Gong J,et al.A moving object detection algorithm based on a combination of optical flow and three-frame difference[J].Journal of Chinese Computer Systems,2013,34(3):668-671.袁国武,陈志强,龚健,等.一种结合光流法与三帧差分法的运动目标检测算法[J].小型微型计算机系统,2013,34(3):668-671.
[5] Xiao J,Zhu S P,Huang H,et al.Object detecting and tracking algorithm based on optical flow[J].Journal of Northeastern University (Natural Science),2016,37(6):770-774.肖军,朱世鹏,黄杭,等.基于光流法的运动目标检测与跟踪算法[J].东北大学学报(自然科学版),2016,37(6):770-774.
[6] Girshick R,Donahue J,Darrell T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]// IEEE Conference on Computer Vision and Pattern Recognition,2014:580-587.
[7] He K M,Zhang X Y,Ren S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[M]//He K M,Zhang X Y,Ren S Q,et al.eds.Computer Vision-ECCV 2014.Cham:Springer International Publishing,2014:346-361.
[8] Girshick R.Fast R-CNN[C]//IEEE International Conference on Computer Vision (ICCV),2015:1440-1448.
[9] Zhu W J,Wang G L,Tian J,et al.Detection of moving objects in complex scenes based on multiple features[J].Acta Optica Sinica,2018,38(6):0612004.朱文杰,王广龙,田杰,等.基于多特征的复杂场景运动目标检测[J].光学学报,2018,38(6):0612004.
[10] Feng X Y,Mei W,Hu D S.Aerial target detection based on improved faster R-CNN[J].Acta Optica Sinica,2018,38(6):0615004.冯小雨,梅卫,胡大帅.基于改进Faster R-CNN的空中目标检测[J].光学学报,2018,38(6):0615004.
[11] Ren S Q,He K M,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[12] Liu W,Anguelov D,Erhan D,et al.SSD:single shot multiBox detector[M]//Liu W,Anguelov D,Erhan D,et al.eds.Computer Vision-ECCV 2016.Cham:Springer International Publishing,2016:21-37.
[13] Redmon J,Divvala S,Girshick R,et al.You only look once:unified,real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016:779-788.
[14] Shaifee M J,Chywl B,Li F,et al.Fast YOLO:a fast you only look once system for real-time embedded object detection in video[EB/OL].(2018-10-20)[2017-01-18].https://arxiv.org/abs/1709.05943.
[15] Howard A G,Zhu M,Chen B,et al.Mobilenets:efficient convolutional neural networks for mobile vision applications[EB/OL].(2018-10-15)[2017-01-17].https://arxiv.org/abs/1704.04861.
[16] Huang G,Liu Z,Maaten L V D,et al.Densely connected convolutional networks[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017:2261-2269.
[17] Redmon J,Farhadi A.YOLO9000:better,faster,stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017:6517-6525.
[18] Everingham M,van Gool L,Williams C K I,et al.The pascal visual object classes (VOC) challenge[J].International Journal of Computer Vision,2010,88(2):303-338.