摘要
为了让机器人能识别物体类别、探测物体形状、判断物体距离,提出一种基于Mask R-CNN模型的双目视觉的物体识别和定位方法。该方法利用Mask R-CNN处理双目图像,对每张图像进行物体识别和形状分割,然后利用神经网络特征对双目图像中的相同目标进行匹配。以物体形状为依据,使用最近点搜索算法估计视差并计算距离。实验结果表明,该方法能够以准实时的速度进行物体的识别和定位,与传统的依赖计算全局视差图的方法相比,在速度和精度上都有提高。
Robots need to identify the type of object,detect the shape and judge the distance to the object.This paper presents an object recognition and localization method that uses binocular information based on the Mask R-CNN model.The Mask R-CNN is used to process the binocular image and complete the bounding box selection,recognition and shape segmentation for each image.Then,the neural network feature is used to match the same object in the binocular images.Finally,the iterative closest point(ICP)method is used to estimate the parallax and calculate the distance according to the obtained object shape.Tests show that the method can process data in near real-time speed with better precision than the traditional disparity map algorithm.
引文
[1] YANG R J,WANG F,QIN H.Research of pedestrian detection and location system based on stereo images[J].Application Research of Computers,2018,35(5):1591-1600.
[2] REDMON J,FARHADI A.YOLO9000:Better,faster,stronger[C]//Proceedings of the IEEE Conference on Computer VisionandPatternRecognition(CVPR).Honolulu,USA,2017:6517-6525.
[3] HE K M,GKIOXARI G,DOLLAR P,et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV).Venice,Italy,2017:2980-2988.
[4] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer VisionandPatternRecognition(CVPR).Columbus,USA,2014:580-587.
[5] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV).Santiago,Chile,2015:1440-1448.
[6] REN S Q,HE K M,GIRSHICK R B,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[7] LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,USA,2017:936-944.
[8] SHELHAMER E,LONG J,DARRELL T,et al.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(4):640-651.
[9] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas,USA,2016:770-778.
[10]XIE S,GIRSHICK R,DOLLAR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,USA,2017:5987-5995.
[11]LIN T Y, MAIRE M,BELONGIE S,et al. Microsoft COCO:Common objects in context[C]//Proceedings of the EuropeanConferenceonComputerVision. Zurich,Switzerland,2014:740-755.
[12]GEIGER A,LENZ P,STILLER C,et al.Vision meets robotics:The KITTI dataset[J].The International Journal of Robotics Research,2013,32(11):1231-1237.