基于深度学习的目标检测框架进展研究

英文篇名：Research on Progress of Object Detection Framework Based on Deep Learning
作者：寇大磊 ; 权冀川 ; 张仲伟
英文作者：KOU Dalei;QUAN Jichuan;ZHANG Zhongwei;Command & Control Engineering College, Army Engineering University of PLA;Unit 68023 of PLA;Unit 73671 of PLA;
关键词：深度学习 ; 目标检测 ; 卷积神经网络 ; 计算机视觉
英文关键词：deep learning;;object detection;;convolutional neural networks;;computer vision
中文刊名：JSGG
英文刊名：Computer Engineering and Applications
机构：陆军工程大学指挥控制工程学院;中国人民解放军68023部队;中国人民解放军73671部队;
出版日期：2019-03-26 08:40
出版单位：计算机工程与应用
年：2019
期：v.55;No.930
语种：中文;
页：JSGG201911005
页数：10
CN：11
分类号：30-39

摘要

在R-CNN框架提出后,基于深度学习的目标检测框架逐渐成为主流,可分为基于候选窗口和基于回归两类。近两年来,在Faster R-CNN、YOLO、SSD等经典的基于深度学习目标检测框架的基础上,出现了大量的优秀框架。根据优化方法对近几年提出的框架进行了梳理和总结。在PASCAL_VOC和MS COCO等主流测试集上对目标检测方法的性能及优缺点进行了对比分析。讨论了目标检测领域当前面临的困难与挑战,对可能的发展方向进行了展望。
After the R-CNN framework is proposed, the object detection framework based on deep learning has gradually become the mainstream, which can be divided into one-stage and two-stage. In the past two years, based on the classic deep learning object detection frameworks such as Faster R-CNN, YOLO, and SSD, a large number of excellent frameworks have emerged. Firstly, according to the optimization method, the frameworks proposed in the past few years are sorted out and summarized. Then, the performance of the object detection methods is compared on the mainstream test sets such as PASCAL_VOC and MS COCO. The advantages and disadvantages are analyzed. Finally, the current difficulties and challenges in the field are discussed, and the possible development directions are prospected.

引文

[1]吴帅,徐勇,赵东宁.基于深度卷积网络的目标检测综述[J].模式识别与人工智能,2018,31(4):335-346.
    [2] Yang M,Kriegman D,Ahuja N.Detecting faces in images:a survey[J].IEEE TPAMI,2002,24(1):34-58.
    [3] Sun Z,Bebis G,Miller R.On road vehicle detection:a review[J].IEEE TPAMI,2006,28(5):694-711.
    [4] Enzweiler M,Gavrila D M.Monocular pedestrian detection:survey and experiments[J].IEEE TPAMI,2009,31(12):2179-2195.
    [5] Geronimo D,Lopez A M,Sappa A D,et al.Survey of pedestrian detection for advanced driver assistance systems[J].IEEE TPAMI,2010,32(7):1239-1258.
    [6] Dollar P,Wojek C,Schiele B,et al.Pedestrian detection:an evaluation of the state of the art[J].IEEE TPAMI,2012,34(4):743-761.
    [7] Zafeiriou S,Zhang C,Zhang Z.A survey on face detection in the wild:past,present and future[J].Computer Vision and Image Understanding,2015,138:1-24.
    [8] Ye Q,Doermann D.Text detection and recognition in imagery:a survey[J].IEEE TPAMI,2015,37(7):1480-1500.
    [9] Ponce J,Hebert M,Schmid C,et al.Toward category level object recognition[M].Berlin:Springer,2007.
    [10] Andreopoulos A,Tsotsos J.50 years of object recognition:directions forward[J].Computer Vision and Image Understanding,2013,117(8):827-891.
    [11] Zhang X,Yang Y,Han Z,et al.Object class detection:a survey[J].ACM Computing Surveys,2013,46(1).
    [12] Borji A,Cheng M,Jiang H,et al.Salient object detection:a survey[J].arXiv preprint arXiv:1411.5878,2014.
    [13]谢林江,季桂树,彭清,等.改进的卷积神经网络在行人检测中的应用[J].计算机科学与探索,2018,12(5):708-718.
    [14]龙敏,佟越洋.应用卷积神经网络的人脸活体检测算法研究[J].计算机科学与探索,2018,12(10):1658-1670.
    [15] Wu W,Yin Y,Wang X,et al.Face detection with different scales based on faster R-CNN[J].IEEE Transactions on Cybernetics,2018(99):1-12.
    [16]方路平,何杭江,周国民.目标检测框架研究综述[J].计算机工程与应用,2018,54(13):11-18.
    [17]郑伟成,李学伟,刘宏哲.基于深度学习的目标检测算法综述[J].计算机科学,2018,45(10A):6-8.
    [18] Liu Li,Ouyang W,Wang Xiaogang,et al.Deep learning for generic object detection:a survey[J].arXiv preprint arXiv:1809.02165,2018.
    [19]于进勇,丁鹏程,王超.卷积神经网络在目标检测中的应用综述[J].计算机科学,2018,45(11A):17-26.
    [20]姚群力,胡显,雷宏.深度卷积神经网络在目标检测中的研究进展[J].计算机工程与应用,2018,54(17):1-9.
    [21] Girshick R,Donahue J,Darrell T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2013:580-587.
    [22] Redmon J,Divvala S,Girshick R,et al.You only look once:unified,real-time object detectiono[C]//Proceedings of CVPR 2015,2015:779-788.
    [23] Liu W,Anguelov D,Erhan D,et al.SSD:single shot multibox detector[C]//Proceedings of European Conference on Computer Vision,2016:21-37.
    [24] He K,Zhang X,Ren S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2015,37(9):1904-1916.
    [25] Girshick R.Fast R-CNN[C]//Proceedings of ICCV 2015,2015:1440-1448.
    [26] Ren S,He K,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Proceedings of International Conference on Neural Information Processing Systems,2015:91-99.
    [27] Uijlings J R R,Sande K E A V D,Gevers T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
    [28] Dai J,Li Y,He K,et al.R-FCN:object detection via region based fully convolutional networks[C]//Proceedings of NIPS 2016,2016.
    [29] He K,Zhang X,Ren S,et al.Deep residual learning for image recognition[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition,2016:770-778.
    [30] Li Zeming,Peng Chao,Yu Gang,et al.Light-Head R-CNN:in defense of two-stage object detector[C]//Proceedings of CVPR 2017,2017.
    [31] He K,Gkioxari G,Dollár P,et al.Mask R-CNN[C]//Proceedings of ICCV 2017,2017.
    [32] Xie S,Girshick R,Dollar P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition,2016:5987-5995.
    [33] Redmon J,Farhadi A.YOLO9000:better,faster,stronger[C]//Proceedings of CVPR 2016,2016.
    [34] Redmon J,Farhadi A.YOLOv3:an incremental improvement[J].arXiv preprint arXiv:1804.02767,2018.
    [35] Szegedy C,Liu W,Jia Y,et al.Going deeper with convolutions[C]//Proceedings of CVPR 2015,2015:1-9.
    [36] Huang G,Liu Z,Maaten L V D,et al.Densely connected convolutional networks[C]//Proceedings of CVPR 2017,2017.
    [37] Hu J,Shen L,Sun G.Squeeze and excitation networks[C]//Proceedings of CVPR 2018,2018.
    [38] Iandola F N,Han S,Moskewicz M W,et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[C]//Proceedings of ICLR 2017,2017.
    [39] Howard A,Zhu M,Chen B,et al.Mobilenets:efficient convolutional neural networks for mobile vision applications[C]//Proceedings of CVPR 2017,2017.
    [40] Wang R J,Li X,Ling C X.Pelee:a real-time object detection system on mobile devices[C]//Proceedings of NIPS 2018,2018.
    [41] Bell S,Lawrence Z,Bala K,et al.Inside outside net:detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of CVPR,2016:2874-2883.
    [42] Kong T,Yao A,Chen Y,et al.HyperNet:towards accurate region proposal generation and joint object detection[C]//IEEE International Conference on Computer Vision and Pattern Recognition,2016:845-853.
    [43] Fu C Y,Liu W,Ranga A,et al.DSSD:deconvolutional single shot detector[J].arXiv preprint arXiv:1701.06659,2017.
    [44] Lin T,Dollar P,Girshick R,et al.Feature pyramid networks for object detection[C]//Proceedings of CVPR2017,2017.
    [45] Shrivastava A,Sukthankar R,Malik J,et al.Beyond skip connections:top down modulation for object detection[C]//Proceedings of CVPR 2017,2017.
    [46] Zhou P,Ni B,Geng C,et al.Scale transferrable object detection[C]//Proceedings of CVPR 2018,2018.
    [47] Singh B,Najibi M,Davis L S.SNIPER:efficient multiscale training[C]//Advances in Neural Information Processing Systems,2018:9310-9320.
    [48] Li Yanghao,Chen Yuntao,Wang Naiyan,et al.Scale-aware trident networks for object detectiond[J].arXiv Preprint arXiv:1901.01892,2019.
    [49] Lin T,Goyal P,Girshick R,et al.Focal loss for dense object detection[C]//Proceedings of CVPR ICCV 2017,2017.
    [50] Ouyang Wanli,Wang Kun,Zhu Xin,et al.Chained cascade network for object detection[C]//Proceedings of ICCV 2017,2017.
    [51] Zhang S,Wen L,Bian X,et al.Single shot refinement neural network for object detection[C]//Proceedings of CVPR 2018,2018.
    [52] Law H,Deng J.CornerNet:detecting objects as paired keypoints[C]//Proceedings of ECCV 2018,2018.
    [53] Zhou Xingyi,Zhuo Jiacheng,Kr?henbühl P.Bottom-up object detection by grouping extreme and center points[J].arXiv Preprint arXiv:1901.08043v2,2019.
    [54] Shen Z,Liu Z,Li J,et al.DSOD:learning deeply supervised object detectors from scratch[C]//Proceedings of CVPR 2017,2017:1937-1945.
    [55] He Kaiming,Girshick R,Dollár P.Rethinking ImageNet pre-training[J].arXiv:1811.08883v1,2018.
    [56] Bodla N,Singh B,Chellappa R,et al.Soft-NMS—improving object detection with one line of code[C]//Proceedings of ICCV 2017,2017.
    [57] Dai Jifeng,Qi Haozhi,Xiong Yuwen,et al.Deformable convolutional networks[J].ar Xiv Preprint ar Xiv:1703.06211v3,2017.
    [58] Wang Xinglong,Shrivastava A.A-Fast-RCNN:hard positive generation via adversary for object detection[C]//Proceedings of CVPR 2017,2017:3039-3048.
    [59] Li Z,Peng C,Yu G,et al.DetNet:a backbone network for object detection[C]//Proceedings of ECCV 2018,2018.
    [60] Gao Mingfei,Yu Ruichi,Li Ang,et al.Dynamic zoom-in network for fast object detection in large images[C]//Proceedings of CVPR 2018,2018.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700