基于多尺度卷积网络的单幅图像的点法向估计

英文篇名：Normal Estimation from Single Monocular Images based on Multi-Scale Convolution Network
作者：冼楚华 ; 刘欣 ; 李桂清 ; 金烁
英文作者：XIAN Chuhua;LIU Xin;LI Guiqing;JIN Shuo;School of Computer Science and Engineering,South China University of Technology;Tricorn (Beijing) Technology Co.,Ltd.;
关键词：法向量预测 ; 单幅图像 ; 卷积网络
英文关键词：normal estimation;;monocular image;;convolutional neural network
中文刊名：HNLG
英文刊名：Journal of South China University of Technology(Natural Science Edition)
机构：华南理工大学计算机科学与工程学院;三角兽科技有限公司;
出版日期：2018-12-15
出版单位：华南理工大学学报(自然科学版)
年：2018
期：v.46;No.387
基金：国家自然科学基金资助项目(61572202);; 广东省自然科学基金资助项目(2015A030313220,2017A030313347);; 浙江大学CAD&CG国家重点实验室开放性课题(A1715)~~
语种：中文;
页：HNLG201812002
页数：9
CN：12
ISSN：44-1251/T
分类号：7-15

摘要

单幅图片法向量估计是计算机图形学和计算机视觉研究的重要问题之一.在缺少其它三维信息的情况下,由单幅图像预测出对应法向量,对于三维场景重建,三维模型识别,三维语义分割等具有重要意义.为解决这一问题,文中使用多尺度的卷积网络结构,对图像进行端到端的输出预测.该网络由两个层级组成,第1层采用在ImageNet中性能最好的DenseNet分类网络,对输入进行全局处理.第2层级采用全卷积网络结构,对第1层级获得的输出进行进一步的精细预测.实验结果表明,即使不使用其他预处理或后处理步骤,文中提出的网络在单幅图像点法向预测方面仍能取得较理想的结果.
Normal estimation from monocular images is one of the most important issues in computer graphics and computer vision research. Short of three-dimensional information,the corresponding normal is predicted from the monocular images,which is of great significance for 3D scene reconstruction,3D model recognition,3D semantic segmentation,etc. In order to find the solution to the problem,this paper adopts a multi-scale convolutional network structure to predict an end-to-end output of the image. The network consists of two scales,the first layer uses the DenseNet classification network with the best performance in ImageNet to process the input globally. The second level uses a fully convolutional network to further fine-tune the output obtained from the first level. The experimental results show that the network proposed in this paper can achieve better results in normal prediction of monocular image even without using other pre-processing or post-processing steps.

引文

[1] HOIEM D,EFROS A A,HEBERT M. Automatic photopop-up[J]∥ACM Transactions on Graphics(TOG),2005,24(3):577-584.
    [2] FOUHEY D F,GUPTA A,HEBERT M. Data-driven 3Dprimitives for single image understanding[C]∥Proceed-ings of the IEEE International Conference on ComputerVision. Piscataway,NJ:IEEE,2013:3392-3399.
    [3] FOUHEY D F,GUPTA A,HEBERT M. Unfolding an in-door origami world[C]∥European Conference on Com-puter Vision. Cham:Springer,2014:687-702.
    [4] FOUHEY D F,HUSSAIN W,GUPTA A,et al. Single im-age 3D without a single 3D image[C]∥Proceedings ofthe IEEE International Conference on Computer Vision.Piscataway,NJ:IEEE,2015:1053-1061.
    [5] EIGEN D,FERGUS R. Predicting depth,surface normalsand semantic labels with a common multi-scale convolu-tional architecture[C]∥Proceedings of the IEEE Inter-national Conference on Computer Vision. Piscataway,NJ:IEEE,2015:2650-2658.
    [6] BANSAL A,RUSSELL B,GUPTA A. Marr revisited:2d-3d alignment via surface normal prediction[C]∥Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition. Piscataway,NJ:IEEE,2016:5965-5974.
    [7] WANG X,FOUHEY D,GUPTA A. Designing deep net-works for surface normal estimation[C]∥Proceedings ofthe IEEE Conference on Computer Vision and PatternRecognition. Piscataway,NJ:IEEE,2015:539-547.
    [8] EIGEN D,PUHRSCH C,FERGUS R. Depth map predic-tion from a single image using a multi-scale deep network[C]∥Advances in neural information processing sys-tems. New York,NY:Curran Associates,2014:2366-2374.
    [9] ZEISL B,POLLEFEYS M. Discriminatively trained densesurface normal estimation[C]∥European conference oncomputer vision. Cham:Springer,2014:468-484.
    [10] HOIEM D,EFROS A A,HEBERT M. Recovering surfacelayout from an image[J]. International Journal of Com-puter Vision,2007,75(1):151-172.
    [11] SCHWING A G,FIDLER S,POLLEFEYS M,et al. Boxin the box:Joint 3d layout and object reasoning from sin-gle images[C]∥Proceedings of the IEEE InternationalConference on Computer Vision. Piscataway,NJ:IEEE,2013:353-360.
    [12] SRAJER F,SCHWING A G,POLLEFEYS M,et al.Match box:Indoor image matching via box-like scene es-timation[C]∥3D Vision(3DV),2014 2nd InternationalConference on IEEE. Piscataway,NJ:IEEE,2014:705-712.
    [13] WANG P,SHEN X,RUSSELL B,et al. SURGE:surfaceregularized geometry estimation from a single image[C]∥Advances in Neural Information Processing Systems. NewYork,NY:Curran Associates,2016:172-180.
    [14] RUSSAKOVSKY O,DENG J,SU H,et al. Imagenet largescale visual recognition challenge[J]. InternationalJournal of Computer Vision. Hingham,Mass:Kluwer Aca-demic Publishers,2015,115(3):211-252.
    [15] LIU B,GOULD S,KOLLER D. Single image depth esti-mation from predicted semantic labels[C]∥ComputerVision and Pattern Recognition(CVPR),2010 IEEEConference on. Piscataway,NJ:IEEE,2010:1253-1260.
    [16] XU D,RICCI E,OUYANG W,et al. Multi-scale continu-ous crfs as sequential deep networks for monocular depthestimation[C]∥Proceedings of CVPR. Piscataway,NJ:IEEE,2017.
    [17] ROY A,TODOROVIC S. Monocular depth estimation u-sing neural regression forest[C]∥Proceedings of theIEEE Conference on Computer Vision and Pattern Recog-nition. Piscataway,NJ:IEEE,2016:5506-5514.
    [18] SILBERMAN N,HOIEM D,KOHLI P,et al. Indoor seg-mentation and support inference from rgbd images[C]∥European Conference on Computer Vision. Berlin,Heidelberg:Springer,2012:746-760.
    [19] SONG S,LICHTENBERG S P,XIAO J. Sun rgb-d:Argb-d scene understanding benchmark suite[C]∥Pro-ceedings of the IEEE conference on computer vision andpattern recognition. Piscataway,NJ:IEEE,2015:567-576.
    [20] KRIZHEVSKY A,SUTSKEVER I,HINTON G E. Ima-genet classification with deep convolutional neural net-works[C]∥Advances in neural information processingsystems. New York,NY:Curran Associates,2012:1097-1105.
    [21] SIMONYAN K,ZISSERMAN A. Very deep convolutionalnetworks for large-scale image recognition[J]. ComputerScience,2014(9):ar Xiv:1409. 1556.
    [22] HUANG G,LIU Z,WEINBERGER K Q,et al. Denselyconnected convolutional networks[C]∥Proceedings ofthe IEEE conference on computer vision and pattern re-cognition. Piscataway,NJ:IEEE,2017,1(2):3.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700