基于视觉词典的深度图生成算法

英文篇名：Depth Map Generation Algorithm Based on Visual Dictionary
作者：刘杰平 ; 周华盛 ; 余朗衡 ; 丁树浩 ; 梁亚玲
英文作者：Liu Jieping;Zhou Huasheng;Yu Langheng;Ding Shuhao;Liang Yaling;School of Electronic and Information Engineering,South China University of Technology;
关键词：机器视觉 ; 深度图 ; 机器学习 ; 视觉单词 ; 视觉词典 ; 难例挖掘
英文关键词：machine vision;;depth map;;machine learning;;visual word;;visual dictionary;;hard example mining
中文刊名：GXXB
英文刊名：Acta Optica Sinica
机构：华南理工大学电子与信息学院;
出版日期：2018-05-04 15:53
出版单位：光学学报
年：2018
期：v.38;No.438
基金：国家自然科学基金(61471173,61701181);; 广东省自然科学基金(2016A030313455,2017A030325430)
语种：中文;
页：GXXB201809036
页数：9
CN：09
ISSN：31-1252/O4
分类号：276-284

摘要

针对从二维彩色图像中恢复深度信息的问题,提出一种基于视觉词典的深度图生成算法。采用基于数据驱动的方法,从包含深度图的深度图像库中找出图像中各种空间结构对应的深度信息,得到由空间结构相似的图像块组成的初始视觉单词;采用难例挖掘方法找到视觉单词的难例负样本,更新视觉单词分类器,获得最优的分类效果;利用视觉单词分类器和视觉单词组成的视觉词典对目标图像进行多尺度检测,得到对应的深度图并进行边缘保持平滑滤波。实验结果表明,该算法生成的深度图符合目标图像的深度变化,在主观视觉效果和各种客观评价指标上都有显著提高。
In order to recover depth information from two-dimensional color image,a visual-dictionary-based depth map generation algorithm is proposed.A data-driven method is used to find depth information of various spatial structures from depth map library,so as to obtain initial visual words which consist of image patches with similar structure.Hard example mining method is used to find hard negative examples of visual word,and visual word classifier is updated to get best classification result.Visual dictionary composed of visual word classifiers and visual words is used to detect target image at multiple scales to get corresponding depth map,to which edge-preserving smoothing filter will be applied.Experimental results show that depth maps generated by the proposed algorithm match depth change of target images,and has a good improvement in both subjective visual effects and objective evaluation indexes.

引文

[1]Wang K,Dunn E,Tighe J,et al.Combining semantic scene priors and haze removal for single image depth estimation[C]∥IEEE Winter Conference on Applications of Computer Vision(WACV),2014:800-807.
    [2]Jung Y J,Baik A,Kim J,et al.A novel 2D-to-3Dconversion technique based on relative height-depth cue[J].Proceedings of SPIE,2009,7273:72731U.
    [3]Ding W L,Li Y,Wang W F,et al.Depth estimation of urban road image based on contour understanding[J].Acta Optica Sinica,2014,34(7):0715001.丁伟利,李勇,王文锋,等.基于轮廓特征理解的城市道路图像深度估计[J].光学学报,2014,34(7):0715001.
    [4]He J M,Qiu J,Liu C.Fusing feature point density and edge information for scene depth estimation[J].Laser&Optoelectronics Progress,2017,54(7):071101.何建梅,邱钧,刘畅.融合特征点密度与边缘信息的场景深度估计[J].激光与光电子学进展,2017,54(7):071101.
    [5]Hoiem D,Efros A A,Hebert M.Automatic photo pop-up[J].ACM Transactions on Graphics,2005,24(3):577-584.
    [6]Hoiem D,Efros A A,Hebert M.Recovering surface layout from an image[J].International Journal of Computer Vision,2007,75(1):151-172.
    [7]Saxena A,Sun M,Ng A Y.Learning 3D scene structure from a single still image[C]∥IEEEInternational Conference on Computer Vision(ICCV),2007:1-8.
    [8]Saxena A,Sun M,Ng A Y.3Dreconstruction from sparse views using monocular vision[C]∥IEEEInternational Conference on Computer Vision(ICCV),2007:1-8.
    [9]Saxena A,Sun M,Ng A Y.Make3D:learning 3Dscene structure from a single still image[J].IEEETransactions on Pattern Analysis and Machine Intelligence,2009,31(5):824-840.
    [10]Konrad J,Wang M,Ishwar P.2D-to-3D image conversion by learning depth from examples[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops(CVPR),2012:16-22.
    [11]Xu L,Zhao H T,Sun S Y.Monocular infrared image depth estimation based on deep convolutional neural networks[J].Acta Optica Sinica,2016,36(7):0715002.许路,赵海涛,孙韶媛.基于深层卷积神经网络的单目红外图像深度估计[J].光学学报,2016,36(7):0715002.
    [12]Wu S C,Zhao H T,Sun S Y.Depth estimation from monocular infrared video based on bi-recursive convolutional neural network[J].Acta Optica Sinica,2017,37(12):1215003.吴寿川,赵海涛,孙韶媛.基于双向递归卷积神经网络的单目红外视频深度估计[J].光学学报,2017,37(12):1215003.
    [13]Yao G S,Sun S Y,Fang J A,et al.Depth estimation of night driverless vehicle scene based on infrared and radar[J].Laser&Optoelectronics Progress,2017,54(12):121003.姚广顺,孙韶媛,方建安,等.基于红外与雷达的夜间无人车场景深度估计[J].激光与光电子学进展,2017,54(12):121003.
    [14]Konrad J,Wang M,Ishwar P,et al.Learningbased,automatic 2D-to-3D image and video conversion[J].IEEE Transactions on Image Processing,2013,22(9):3485-3496.
    [15]Dalal N,Triggs B.Histograms of oriented gradients for human detection[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR),2005:886-893.
    [16]Xu H H,Jiang M Y,Li F.Depth estimation algorithm based on data-driven approach and depth cues for stereo conversion in three-dimensional displays[J].Optical Engineering,2016,55(12):123106.
    [17]Sivic J,Zisserman A.Video google:a text retrieval approach to object matching in videos[C]∥IEEEInternational Conference on Computer Vision,2003,2:1470-1472.
    [18]Russell B C,Freeman W T,Efros A A,et al.Using multiple segmentations to discover objects and their extent in image collections[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR),2006:1605-1614.
    [19]Herrera J L,del-Blanco C R,García N.A novel 2Dto3D video conversion system based on a machine learning approach[J].IEEE Transactions on Consumer Electronics,2016,62(4):429-436.
    [20]Lowe D G.Distinctive image features from scaleinvariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
    [21]Felzenszwalb P F,Girshick R B,McAllester D,et al.Object detection with discriminatively trained part-based models[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1627-1645.
    [22]He K M,Sun J,Tang X O.Guided image filtering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(6):1397-1409.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700