基于多孔卷积神经网络的图像深度估计模型

英文篇名：Image depth estimation model based on atrous convolutional neural network
作者：廖斌 ; 李浩文
英文作者：LIAO Bin;LI Haowen;School of Computer Science and Information Engineering, Hubei University;
关键词：多孔卷积 ; 卷积神经网络 ; 条件随机场 ; 深度估计 ; 深度学习
英文关键词：atrous convolution;;Convolutional Neural Network(CNN);;Conditional Random Field(CRF);;depth estimation;;deep learning
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：湖北大学计算机与信息工程学院;
出版日期：2018-09-26 15:08
出版单位：计算机应用
年：2019
期：v.39;No.341
基金：国家自然科学基金资助项目(61300125)~~
语种：中文;
页：JSJY201901047
页数：8
CN：01
ISSN：51-1307/TP
分类号：273-280

摘要

针对在传统机器学习方法下单幅图像深度估计效果差、深度值获取不准确的问题,提出了一种基于多孔卷积神经网络(ACNN)的深度估计模型。首先,利用卷积神经网络(CNN)逐层提取原始图像的特征图;其次,利用多孔卷积结构,将原始图像中的空间信息与提取到的底层图像特征相互融合,得到初始深度图;最后,将初始深度图送入条件随机场(CRF),联合图像的像素空间位置、灰度及其梯度信息对所得深度图进行优化处理,得到最终深度图。在客观数据集上完成了模型可用性验证及误差估计,实验结果表明,该算法获得了更低的误差值和更高的准确率,均方根误差(RMSE)比基于机器学习的算法平均降低了30. 86%,而准确率比基于深度学习的算法提高了14. 5%,所提算法在误差数据和视觉效果方面都有较大提升,表明该模型能够在图像深度估计中获得更好的效果。
Focusing on the issues of poor depth estimation and inaccurate depth value acquisition under traditional machine learning methods, a depth estimation model based on Atrous Convolutional Neural Network( ACNN) was proposed.Firstly, the feature map of original image was extracted layer by layer using Convolutional Neural Network( CNN). Secondly,with the atrous convolution structure, the spatial information in original image and the extracted feature map were fused to obtain initial depth map. Finally, the Conditional Random Field( CRF) with combining three constraints, pixel spatial position, grayscale and gradient information were used to optimize initial depth map and obtain final depth map. The model usability verification and error estimation were completed on objective data set. The experimental results show that the proposed algorithm obtains lower error value and higher accuracy. The Root Mean Square Error( RMS) is averagely reduced by30. 86% compared with machine learning based algorithm, and the accuracy is improved by 14. 5% compared with deep learning based algorithm. The proposed algorithm has a significant improvement in error reduction and visual effect, indicating that the model can obtain better results in image depth estimation.

引文

[1]SAXENA A, CHUNG S H, NG A Y. Learning depth from single monocular images[C]//Proceedings of the 2005 International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2005:1161-1168.
    [2]胡良梅,姬长动,张旭东,等.聚焦性检测与彩色信息引导的光场图像深度提取[J].中国图象图形学报,2016,21(2):155-164.(HU L M, JI C D, ZHANG X D, et al. Color-guided depth map ex-traction from light field based on focusness detection[J]. Journal of Image and Graphics, 2016, 21(2):155-164.)
    [3]HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:770-778.
    [4]KRISTAN M, PFLUGFELDER R, MATAS J, et al. The visual object tracking VOT2013 challenge results[C]//Proceedings of the2013 IEEE International Conference on Computer Vision Workshops. Washington, DC:IEEE Computer Society, 2013:98-111.
    [5]SANTANA E, HOTZ G. Learning a driving simulator[J/OL]. Ar Xiv Preprint, 2016, 2016:1608. 01230[2017-08-03]. https://arxiv. org/abs/1608. 01230.
    [6]SZELISKI R. Computer vision[J]. Springer-Verlag Gmb H, 2010,12(8):1741-1751.
    [7]CHEN C H. Handbook of Pattern Recognition and Computer Vision[M]. Singapore:World Scientific, 1993:697-698.
    [8]CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[J/OL]. Ar Xiv Preprint, 2017, 2017:1706. 05587[2018-01-17]. https://arxiv.org/abs/1706. 05587.
    [9]杨帆,李建平,李鑫,等.基于多任务深度卷积神经网络的显著性对象检测算法[J].计算机应用,2018,38(1):91-96.(YANG F,LI J P, LI X, et al. Salient object detection algorithm based on multi-task deep convolutional neural network[J]. Journal of Computer Applications, 2018, 38(1):91-96.)
    [10]孙毅堂,宋慧慧,张开华,等.基于极深卷积神经网络的人脸超分辨率重建算法[J].计算机应用,2018,38(4):1141-1145.(SUN Y T, SONG H H, ZHANG K H, et al. Face super-resolution via very deep convolutional neural network[J]. Journal of Computer Applications, 2018, 38(4):1141-1145.)
    [11]BATTIATO S, CURTI S, CASCIA M L, et al. Depth map generation by image classification[C]//Proceedings of the 2004 ThreeDimensional Image Capture and Applications VI. Bellingham,WA:SPIE, 2004:95-104.
    [12]CHANG Y L, FANG C Y, DING L F, et al. Depth map generation for 2D-to-3D conversion by short-term motion assisted color segmentation[C]//Proceedings of the 2007 IEEE International Conference on Multimedia and Expo. Piscataway, NJ:IEEE,2007:1958-1961.
    [13]KARSCH K, LIU C, KANG S B. Depth transfer:depth extraction from video using non-parametric sampling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 36(11):2144.
    [14]EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multi-scale deep network[J/OL]. Ar Xiv Preprint, 2014, 2014:1406. 2283[2017-12-09]. https://arxiv.org/abs/1406. 2283.
    [15]LIU F, SHEN C, LIN G, et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016,38(10):2024-2039.
    [16]ABHAYARATNE G C K, PESQUETPOPESCU B. Adaptive integer-to-integer wavelet transforms using update lifting[C]//Proceedings of the SPIE Wavelets:Applications in Signal and Image Processing X. Bellingham, WA:SPIE, 2003:813-824.
    [17]SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J/OL]. Ar Xiv Preprint, 2015,2015:1409. 1556[2017-04-10]. https://arxiv. org/abs/1409.1556.
    [18]GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[C]//Proceedings of the 2011 International Conference on Artificial Intelligence and Statistics. Fort Lauderdale:PMLR, 2011:315-323.
    [19]RADOSAVLJEVIC V, VUCETIC S, OBRADOVIC Z. Continuous conditional random fields for regression in remote sensing[J].Frontiers in Artificial Intelligence and Applications, 2010, 215:809-814.
    [20]ADAMS A, BAEK J, ABRAHAM DAVIS M. Fast high-dimensional filtering using the permutohedral lattice[J]. Computer Graphics Forum, 2010, 29(2):753-762.
    [21]SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images[C]//ECCV 2012:Proceedings of the European Conference on Computer Vision. Berlin:Springer, 2012:746-760.
    [22]KARSCH K, LIU C, KANG S B. Depth transfer:depth extraction from videos using nonparametric sampling[M]//Dense Image Correspondences for Computer Vision. Berlin:Springer, 2016:775-788.
    [23]SAXENA A, SUN M, NG A Y. Make3D:learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5):824-840.
    [24]LIU M, SALZMANN M, HE X. Discrete-continuous depth estimation from a single image[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington,DC:IEEE Computer Society, 2014:716-723.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700