面向嵌入式平台的轻量级目标检测网络

英文篇名：Light-Weight Object Detection Networks for Embedded Platform
作者：崔家华 ; 张云洲 ; 王争 ; 刘及惟
英文作者：Cui Jiahua;Zhang Yunzhou;Wang Zheng;Liu Jiwei;College of Information Science & Engineering, Northeastern University;Faculty of Robot Science and Engineering, Northeastern University;
关键词：机器视觉 ; 目标检测 ; 深度神经网络 ; 嵌入式系统 ; 实时性
英文关键词：machine vision;;object detection;;deep neural network;;embedded system;;real time
中文刊名：GXXB
英文刊名：Acta Optica Sinica
机构：东北大学信息科学与工程学院;东北大学机器人科学与工程学院;
出版日期：2018-12-27 11:14
出版单位：光学学报
年：2019
期：v.39;No.445
基金：国家自然科学基金(61471110);; 中央高校基本科研业务专项资金(N172608005);; 辽宁省自然科学基金(20180520040);; 沈阳市高层次创新人才支持计划(RC170490)
语种：中文;
页：GXXB201904036
页数：7
CN：04
ISSN：31-1252/O4
分类号：307-313

摘要

基于深度可分离卷积,提出了一种适用于嵌入式平台的小型目标检测网络MTYOLO(MobileNet Tiny-Yolo),它将待检测的图片平均分割成多个单元格,并采用深度可分离卷积代替传统卷积,减少了参数量和计算量。采用点卷积和特征图融合的方法来提高检测精度。实验结果表明,所提MTYOLO网络模型大小为41 MB,约为Tiny-Yolo模型的67%,其在PASCAL VOC 2007数据集上的检测准确率可达到57.25%,检测效果优于Tiny-Yolo模型,更适合应用于嵌入式系统。
Based on depth separable convolution, a small object detection network for embedded platform, MTYOLO(MobileNet Tiny-Yolo), is proposed. It divides the image into many grids and replaces the traditional convolution by the depth separable convolution, which decreases the number of parameters and computational cost. The point convolution and the feature map merging are adopted to improve the detection accuracy. The experimental results show that the size of the proposed MTYOLO network model is 41 MB, approximately 67% of that of Tiny-Yolo model. Furthermore, its detection accuracy on the PASCAL VOC 2007 dataset is up to 57.25%, superior to the Tiny-Yolo model's. The proposed model is particularly suitable for application in embedded platforms.

引文

[1] Erhan D,Szegedy C,Toshev A,et al.Scalable object detection using deep neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition,2014:2155-2162.
    [2] Wang T Y.The motion detection based on background difference method and active contour model[C]//IEEE Joint International Information Technology and Artificial Intelligence Conference,2011:480-483.
    [3] Lee D S.Effective Gaussian mixture learning for video background subtraction[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):827-832.
    [4] Yuan G W,Chen Z Q,Gong J,et al.A moving object detection algorithm based on a combination of optical flow and three-frame difference[J].Journal of Chinese Computer Systems,2013,34(3):668-671.袁国武,陈志强,龚健,等.一种结合光流法与三帧差分法的运动目标检测算法[J].小型微型计算机系统,2013,34(3):668-671.
    [5] Xiao J,Zhu S P,Huang H,et al.Object detecting and tracking algorithm based on optical flow[J].Journal of Northeastern University (Natural Science),2016,37(6):770-774.肖军,朱世鹏,黄杭,等.基于光流法的运动目标检测与跟踪算法[J].东北大学学报(自然科学版),2016,37(6):770-774.
    [6] Girshick R,Donahue J,Darrell T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]// IEEE Conference on Computer Vision and Pattern Recognition,2014:580-587.
    [7] He K M,Zhang X Y,Ren S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[M]//He K M,Zhang X Y,Ren S Q,et al.eds.Computer Vision-ECCV 2014.Cham:Springer International Publishing,2014:346-361.
    [8] Girshick R.Fast R-CNN[C]//IEEE International Conference on Computer Vision (ICCV),2015:1440-1448.
    [9] Zhu W J,Wang G L,Tian J,et al.Detection of moving objects in complex scenes based on multiple features[J].Acta Optica Sinica,2018,38(6):0612004.朱文杰,王广龙,田杰,等.基于多特征的复杂场景运动目标检测[J].光学学报,2018,38(6):0612004.
    [10] Feng X Y,Mei W,Hu D S.Aerial target detection based on improved faster R-CNN[J].Acta Optica Sinica,2018,38(6):0615004.冯小雨,梅卫,胡大帅.基于改进Faster R-CNN的空中目标检测[J].光学学报,2018,38(6):0615004.
    [11] Ren S Q,He K M,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
    [12] Liu W,Anguelov D,Erhan D,et al.SSD:single shot multiBox detector[M]//Liu W,Anguelov D,Erhan D,et al.eds.Computer Vision-ECCV 2016.Cham:Springer International Publishing,2016:21-37.
    [13] Redmon J,Divvala S,Girshick R,et al.You only look once:unified,real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016:779-788.
    [14] Shaifee M J,Chywl B,Li F,et al.Fast YOLO:a fast you only look once system for real-time embedded object detection in video[EB/OL].(2018-10-20)[2017-01-18].https://arxiv.org/abs/1709.05943.
    [15] Howard A G,Zhu M,Chen B,et al.Mobilenets:efficient convolutional neural networks for mobile vision applications[EB/OL].(2018-10-15)[2017-01-17].https://arxiv.org/abs/1704.04861.
    [16] Huang G,Liu Z,Maaten L V D,et al.Densely connected convolutional networks[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017:2261-2269.
    [17] Redmon J,Farhadi A.YOLO9000:better,faster,stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017:6517-6525.
    [18] Everingham M,van Gool L,Williams C K I,et al.The pascal visual object classes (VOC) challenge[J].International Journal of Computer Vision,2010,88(2):303-338.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700