结合金字塔池化模块的YOLOv2的井下行人检测

英文篇名：Pedestrian Detection Based on YOLOv2 with Pyramid Pooling Module in Underground Coal Mine
作者：王琳 ; 卫晨 ; 李伟山 ; 张钰良
英文作者：WANG Lin;WEI Chen;LI Weishan;ZHANG Yuliang;College of Communication and Information Technology,Xi'an University of Posts and Telecommunications;College of Economics and Management,Xi'an University of Posts and Telecommunications;
关键词：目标检测 ; 行人检测 ; YOLOv2 ; 金字塔场景解析网络(PSPnet)
英文关键词：object detection;;pedestrian detection;;YOLOv2;;Pyramid Scene Parsing Network(PSPnet)
中文刊名：JSGG
英文刊名：Computer Engineering and Applications
机构：西安邮电大学通信与信息工程学院;西安邮电大学经济与管理学院;
出版日期：2018-04-10 11:47
出版单位：计算机工程与应用
年：2019
期：v.55;No.922
基金：陕西省科技厅资源主导型产业关键技术(链)工业邻域项目(No.2015KTCXSF-10-13)
语种：中文;
页：JSGG201903022
页数：7
CN：03
分类号：138-144

摘要

煤矿井下的行人检测对于保障井下作业人员的安全至关重要。煤矿井下光照暗、粉尘大,直接用YOLOv2检测井下行人,准确率低,仅达到54.3%。针对此问题,以YOLOv2网络为基础,结合了金字塔场景解析网络(PSPnet)中的金字塔池化模块,充分利用图片的上下文信息,提出了YOLOv2_PPM网络。在井下行人检测数据集上进行实验,YOLOv2_PPM网络的准确率提升到63.5%,较YOLOv2网络增加了9.2%,且速度达到了39帧/s(FPS)。当输入图片的大小为480×480时,检测的准确率提升到71.6%,同时速度为28帧/s,满足了实时检测的要求。
Pedestrian detection is very important to ensure the safety of workers in underground coal mine. Due to the dark light and big dust of the underground environment, directly using YOLOv2 to detect pedestrian is harmful for accuracy which is only 54.3%. To solve this problem, this paper proposes YOLOv2_PPM network which makes full use of context information of image based on YOLOv2 network and combines with pyramid pooling module of Pyramid Scene Parsing network(PSPnet). Conducting experiment on the underground coal mine pedestrian dataset, YOLOv2_PPM network improves accuracy to 63.5% which increases 9.2% than YOLOv2 network, and achieves at 39 Frames Per Second(FPS).When the size of input image is 480×480, detection accuracy increases to 71.6%, and achieves at 28 FPS which meets the requirement of real-time detection.

引文

[1] Krizhevsky A,Sutskever I,Hinton G E.ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems,Nevada,2012:1097-1105.
    [2] Russakovsky O,Deng J,Su H,et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision,2014,115(3):211-252.
    [3] Girshick R,Donahue J,Darrell T.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition,Columbus,2014:580-587.
    [4] Girshick R.Fast R-CNN[C]//IEEE International Conference on Computer Vision,Santiago,2015:1440-1448.
    [5] Ren S,Girshick R,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2017,39(6):1137-1149.
    [6] Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition,Boston,2015:3431-3440.
    [7] Zhao H,Shi J,Qi X,et al.Pyramid scene parsing network[C]//IEEE Conference on Computer Vision and Pattern Recognition,Hawaii,2017:2881-2890.
    [8] Chen L C,Papandreou G,Kokkinos I,et al.DeepLab:semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2016(99).
    [9] Chen L C,Papandreou G,Kokkinos I,et al.Semantic image segmentation with deep convolutional nets and fully connected CRFs[J].Computer Science,2014(4):357-361.
    [10] Taigman Y,Yang M,Ranzato M,et al.Deep Face:closing the gap to human-level performance in face verification[C]//IEEE Conference on Computer Vision and Pattern Recognition,Columbus,2014:1701-1708.
    [11] Schroff F,Kalenichenko D,Philbin J.FaceNet:a unified embedding for face recognition and clustering[C]//IEEE Conference on Computer Vision and Pattern Recognition,Boston,2015:815-823.
    [12] Chen D,Cao X,Wen F,et al.Blessing of dimensionality:high-dimensional feature and its efficient compression for face verification[C]//IEEE Conference on Computer Vision and Pattern Recognition,Portland,2013:3025-3032.
    [13] Dollar P,Wojek C,Schiele B,et al.Pedestrian detection:an evaluation of the state of the art[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2012,34(4):743-761.
    [14] Benenson R,Omran M,Hosang J,et al.Ten years of pedestrian detection,what have we learned?[J].arXiv:1411.4304,2014.
    [15] Felzenszwalb P F,Girshick R B,Mcallester D,et al.Object detection with discriminatively trained part-based models[J].Computer,2014,47(2):6-7.
    [16] Zhang S,Bauckhage C,Cremers A B.Informed haar-like features improve pedestrian detection[C]//IEEE Conference on Computer Vision and Pattern Recognition,Columbus,2014:947-954.
    [17] Paisitkriangkrai S,Shen C,Hengel A V D.Strengthening the effectiveness of pedestrian detection with spatially pooled features[C]//European Conference on Computer Vision,Zurich,2014:546-561.
    [18] Simonyan K,Zisserman A.Two-stream convolutional networks for action recognition in videos[J].Advances in Neural Information Processing Systems,2014,1(4):568-576.
    [19] Simonyan K,Zisserman A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
    [20] Szegedy C,Liu W,Jia Y,et al.Going deeper with convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition,Boston,2015:1-9.
    [21] He K,Zhang X,Ren S,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,2016:770-778.
    [22]闫喜亮,王黎明.卷积深度神经网络的手写汉字识别系统[J].计算机工程与应用,2017,53(10):246-250.
    [23]程圆娥,周绍光,袁春琦,等.基于主动深度学习的高光谱影像分类[J].计算机工程与应用,2017,53(17):192-196.
    [24] Shin H C,Roth H R,Gao M,et al.Deep convolutional neural networks for computer-aided detection:CNN architectures,dataset characteristics and transfer learning[J].IEEE Transactions on Medical Imaging,2016,35(5):1285-1298.
    [25]胡长胜,詹曙,吴从中.基于深度特征学习的图像超分辨率重建[J].自动化学报,2017,43(5):814-821.
    [26]张兆晨,冀俊忠.基于卷积神经网络的fMRI数据分类方法[J].模式识别与人工智能,2017,30(6):549-558.
    [27] Hosang J,Omran M,Benenson R,et al.Taking a deeperlook at pedestrians[C]//IEEE Conference on Computer Vision and Pattern Recognition,Boston,2015:4073-4082.
    [28] Zhang L,Lin L,Liang X,et al.Is faster R-CNN doing wellfor pedestrian detection[C]//European Conference on Computer Vision,Las Vegas,2016:443-457.
    [29] Redmon J,Divvala S,Girshick R,et al.You only look once:unified,real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,2016:779-788.
    [30] Redmon J,Farhadi A.YOLO9000:better,faster,stronger[J].arXiv:1612.08242,2016.
    [31] Jia Y,Shelhamer E,Donahue J,et al.Caffe:convolutional architecture for fast feature embedding[C]//ACM International Conferenceon Multimedia,Orlando,2014:675-678.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700