基于张量分解融合RGB-D图像的物体识别

英文篇名：Object Recognition Based on Tensor Decomposition Fusing RGB-D Image
作者：余霆嵩 ; 文元美 ; 凌永权
英文作者：YU Tingsong;WEN Yuanmei;LING Yongquan;School of Information Engineering, Guangdong University of Technology;
关键词：RGB-D图像融合 ; 卷积神经网络 ; 张量分解 ; Tucker分解 ; 物体识别
英文关键词：RGB-D image fusion;;convolutional neural network;;tensor decomposition;;Tucker decomposition;;object recognition
中文刊名：JSGG
英文刊名：Computer Engineering and Applications
机构：广东工业大学信息工程学院;
出版日期：2018-07-04 17:19
出版单位：计算机工程与应用
年：2019
期：v.55;No.921
基金：国家自然科学基金(No.61372173,No.61671163,No.11871168);; 广东省自然科学基金(No.2018A030310593);; 广东工业大学2017年中央财政支持地方高校发展专项资金项目(No.201707)
语种：中文;
页：JSGG201902028
页数：5
CN：02
分类号：180-184

摘要

为了充分利用RGB-D图像的深度图像信息,提出了基于张量分解的物体识别方法。首先将RGB-D图像构造成一个四阶张量,然后将该四阶张量分解为一个核心张量和四个因子矩阵,再利用相应的因子矩阵将原张量进行投影,获得融合后的RGB-D数据,最后输入到卷积神经网络中进行识别。RGB-D数据集中三组相似物体的识别结果表明,利用张量分解融合RGB-D图像的物体识别准确率高于未采用张量分解的物体识别准确率,并且单一错分实例的准确率最高可提升99%。
To make full use of the depth information for RGB-D image recognition, this paper proposes a new object recognition method based on tensor decomposition. Firstly, it represents the RGB-D image as a fourth-order tensor. Then, it decomposes the fourth-order tensor into a core tensor and four factor matrices. Finally, after projecting the fourth-order tensor by factor matrices, the newly obtained tensor is sent to a convolution neural network for object recognition. Comparative experimental results of three group similar objects on RGB-D dataset show that the proposed method obtains higher recognition accuracy than method that no-tensor fusing. Moreover, the single-object recognition accuracy can be improved by up to 99%.

引文

[1]Schwarz M,Schulz H,Behnke S.RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features[C]//IEEE International Conference on Robotics and Automation,2015:1329-1335.
    [2]Sanchez-Riera J,Hua K L,Hsiao Y S,et al.A comparative study of data fusion for RGB-D based visual recognition[J].Pattern Recognition Letters,2016,73:1-6.
    [3]卢良锋,谢志军,叶宏武.基于RGB特征与深度特征融合的物体识别算法[J].计算机工程,2016,42(5):186-193.
    [4]Sharma A,Sankar K P.Enhancing RGB CNNs with depth[C]//IAPR Asian Conference on Pattern Recognition,2015:31-35.
    [5]Lee S,Park S J,Hong K S.RDFNet:RGB-D multi-level residual feature fusion for indoor semantic segmentation[C]//IEEE International Conference on Computer Vision,2017:4990-4999.
    [6]Socher R,Huval B,Bhat B,et al.Convolutional-recursive deep learning for 3D object classification[C]//NIPS,2012:665-673.
    [7]Wang A,Cai J,Lu J,et al.MMSS:multi-modal sharable and specific feature learning for RGB-D object recognition[C]//IEEE International Conference on Computer Vision,2015:1125-1133.
    [8]Zhu H,Weibel J B,Lu S.Discriminative multi-modal feature fusion for RGBD indoor scene recognition[C]//IEEEConference on Computer Vision and Pattern Recognition,2016:2969-2976.
    [9]Couprie C,Farabet C,Najman L,et al.Indoor semantic segmentation using depth information[J].Eprint Arxiv,2013.
    [10]张贤达.矩阵分析与应用[M].2版.北京:清华大学出版社,2013.
    [11]Vasilescu M A O,Terzopoulos D.Multilinear image analysis for facial recognition[C]//International Conference on Pattern Recognition,2002:511-514.
    [12]Hazan T,Polak S,Shashua A.Sparse image coding using a 3D non-negative tensor factorization[C]//Tenth IEEEInternational Conference on Computer Vision,2005:50-57.
    [13]Kolda T G,Bader B W.Tensor decompositions and applications[J].SIAM Review,2009,51(3):455-500.
    [14]Lai K,Bo L,Ren X,et al.A large-scale hierarchical multi-view RGB-D object dataset[C]//IEEE International Conference on Robotics and Automation,Shanghai,China,2011:1817-1824.
    [15]Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition,2015:3431-3440.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700