基于Mask R-CNN的物体识别和定位

英文篇名：Object recognition and localization based on Mask R-CNN
作者：彭秋辰 ; 宋亦旭
英文作者：PENG Qiuchen;SONG Yixu;State Key Laboratory of Intelligent Technology and Systems,Department of Computer Science and Technology,Tsinghua University;
关键词：机器人导航 ; Mask ; R-CNN ; 特征匹配 ; 物体识别 ; 双目视觉
英文关键词：robot navigation;;Mask R-CNN;;feature matching;;object recognition;;binocular vision
中文刊名：QHXB
英文刊名：Journal of Tsinghua University(Science and Technology)
机构：清华大学计算机科学与技术系智能技术与系统国家重点实验室;
出版日期：2018-12-14 16:02
出版单位：清华大学学报(自然科学版)
年：2019
期：v.59
语种：中文;
页：QHXB201902008
页数：7
CN：02
ISSN：11-2223/N
分类号：53-59

摘要

为了让机器人能识别物体类别、探测物体形状、判断物体距离,提出一种基于Mask R-CNN模型的双目视觉的物体识别和定位方法。该方法利用Mask R-CNN处理双目图像,对每张图像进行物体识别和形状分割,然后利用神经网络特征对双目图像中的相同目标进行匹配。以物体形状为依据,使用最近点搜索算法估计视差并计算距离。实验结果表明,该方法能够以准实时的速度进行物体的识别和定位,与传统的依赖计算全局视差图的方法相比,在速度和精度上都有提高。
Robots need to identify the type of object,detect the shape and judge the distance to the object.This paper presents an object recognition and localization method that uses binocular information based on the Mask R-CNN model.The Mask R-CNN is used to process the binocular image and complete the bounding box selection,recognition and shape segmentation for each image.Then,the neural network feature is used to match the same object in the binocular images.Finally,the iterative closest point(ICP)method is used to estimate the parallax and calculate the distance according to the obtained object shape.Tests show that the method can process data in near real-time speed with better precision than the traditional disparity map algorithm.

引文

[1] YANG R J,WANG F,QIN H.Research of pedestrian detection and location system based on stereo images[J].Application Research of Computers,2018,35(5):1591-1600.
    [2] REDMON J,FARHADI A.YOLO9000:Better,faster,stronger[C]//Proceedings of the IEEE Conference on Computer VisionandPatternRecognition(CVPR).Honolulu,USA,2017:6517-6525.
    [3] HE K M,GKIOXARI G,DOLLAR P,et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV).Venice,Italy,2017:2980-2988.
    [4] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer VisionandPatternRecognition(CVPR).Columbus,USA,2014:580-587.
    [5] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV).Santiago,Chile,2015:1440-1448.
    [6] REN S Q,HE K M,GIRSHICK R B,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
    [7] LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,USA,2017:936-944.
    [8] SHELHAMER E,LONG J,DARRELL T,et al.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(4):640-651.
    [9] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas,USA,2016:770-778.
    [10]XIE S,GIRSHICK R,DOLLAR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,USA,2017:5987-5995.
    [11]LIN T Y, MAIRE M,BELONGIE S,et al. Microsoft COCO:Common objects in context[C]//Proceedings of the EuropeanConferenceonComputerVision. Zurich,Switzerland,2014:740-755.
    [12]GEIGER A,LENZ P,STILLER C,et al.Vision meets robotics:The KITTI dataset[J].The International Journal of Robotics Research,2013,32(11):1231-1237.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700