Performance evaluation of deep feature learning for RGB-D image/video classification

详细信息查看全文

作者：Ling Shao ; ^a ; ^b ; ^{ling.shao@ieee.org} ; Ziyun Cai^c ; Li Liu^b ; Ke Lu^d ; ^e
关键词：Deep neural networks ; RGB-D data ; Feature learning ; Performance evaluation
刊名：Information Sciences
出版年：2017
出版时间：April 2017
年：2017
卷：385-386
期：Complete
页码：266-283
全文大小：2116 K
卷排序：385

文摘

Deep Neural Networks for image/video classification have obtained much success in various computer vision applications. Existing deep learning algorithms are widely used on RGB images or video data. Meanwhile, with the development of low-cost RGB-D sensors (such as Microsoft Kinect and Xtion Pro-Live), high-quality RGB-D data can be easily acquired and used to enhance computer vision algorithms [14]. It would be interesting to investigate how deep learning can be employed for extracting and fusing features from RGB-D data. In this paper, after briefly reviewing the basic concepts of RGB-D information and four prevalent deep learning models (i.e., Deep Belief Networks (DBNs), Stacked Denoising Auto-Encoders (SDAE), Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) Neural Networks), we conduct extensive experiments on five popular RGB-D datasets including three image datasets and two video datasets. We then present a detailed analysis about the comparison between the learned feature representations from the four deep learning models. In addition, a few suggestions on how to adjust hyper-parameters for learning deep neural networks are made in this paper. According to the extensive experimental results, we believe that this evaluation will provide insights and a deeper understanding of different deep learning algorithms for RGB-D feature extraction and fusion.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700