摘要
为了充分利用RGB-D图像的深度图像信息,提出了基于张量分解的物体识别方法。首先将RGB-D图像构造成一个四阶张量,然后将该四阶张量分解为一个核心张量和四个因子矩阵,再利用相应的因子矩阵将原张量进行投影,获得融合后的RGB-D数据,最后输入到卷积神经网络中进行识别。RGB-D数据集中三组相似物体的识别结果表明,利用张量分解融合RGB-D图像的物体识别准确率高于未采用张量分解的物体识别准确率,并且单一错分实例的准确率最高可提升99%。
To make full use of the depth information for RGB-D image recognition, this paper proposes a new object recognition method based on tensor decomposition. Firstly, it represents the RGB-D image as a fourth-order tensor. Then, it decomposes the fourth-order tensor into a core tensor and four factor matrices. Finally, after projecting the fourth-order tensor by factor matrices, the newly obtained tensor is sent to a convolution neural network for object recognition. Comparative experimental results of three group similar objects on RGB-D dataset show that the proposed method obtains higher recognition accuracy than method that no-tensor fusing. Moreover, the single-object recognition accuracy can be improved by up to 99%.
引文
[1]Schwarz M,Schulz H,Behnke S.RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features[C]//IEEE International Conference on Robotics and Automation,2015:1329-1335.
[2]Sanchez-Riera J,Hua K L,Hsiao Y S,et al.A comparative study of data fusion for RGB-D based visual recognition[J].Pattern Recognition Letters,2016,73:1-6.
[3]卢良锋,谢志军,叶宏武.基于RGB特征与深度特征融合的物体识别算法[J].计算机工程,2016,42(5):186-193.
[4]Sharma A,Sankar K P.Enhancing RGB CNNs with depth[C]//IAPR Asian Conference on Pattern Recognition,2015:31-35.
[5]Lee S,Park S J,Hong K S.RDFNet:RGB-D multi-level residual feature fusion for indoor semantic segmentation[C]//IEEE International Conference on Computer Vision,2017:4990-4999.
[6]Socher R,Huval B,Bhat B,et al.Convolutional-recursive deep learning for 3D object classification[C]//NIPS,2012:665-673.
[7]Wang A,Cai J,Lu J,et al.MMSS:multi-modal sharable and specific feature learning for RGB-D object recognition[C]//IEEE International Conference on Computer Vision,2015:1125-1133.
[8]Zhu H,Weibel J B,Lu S.Discriminative multi-modal feature fusion for RGBD indoor scene recognition[C]//IEEEConference on Computer Vision and Pattern Recognition,2016:2969-2976.
[9]Couprie C,Farabet C,Najman L,et al.Indoor semantic segmentation using depth information[J].Eprint Arxiv,2013.
[10]张贤达.矩阵分析与应用[M].2版.北京:清华大学出版社,2013.
[11]Vasilescu M A O,Terzopoulos D.Multilinear image analysis for facial recognition[C]//International Conference on Pattern Recognition,2002:511-514.
[12]Hazan T,Polak S,Shashua A.Sparse image coding using a 3D non-negative tensor factorization[C]//Tenth IEEEInternational Conference on Computer Vision,2005:50-57.
[13]Kolda T G,Bader B W.Tensor decompositions and applications[J].SIAM Review,2009,51(3):455-500.
[14]Lai K,Bo L,Ren X,et al.A large-scale hierarchical multi-view RGB-D object dataset[C]//IEEE International Conference on Robotics and Automation,Shanghai,China,2011:1817-1824.
[15]Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition,2015:3431-3440.