摘要
单幅图片法向量估计是计算机图形学和计算机视觉研究的重要问题之一.在缺少其它三维信息的情况下,由单幅图像预测出对应法向量,对于三维场景重建,三维模型识别,三维语义分割等具有重要意义.为解决这一问题,文中使用多尺度的卷积网络结构,对图像进行端到端的输出预测.该网络由两个层级组成,第1层采用在ImageNet中性能最好的DenseNet分类网络,对输入进行全局处理.第2层级采用全卷积网络结构,对第1层级获得的输出进行进一步的精细预测.实验结果表明,即使不使用其他预处理或后处理步骤,文中提出的网络在单幅图像点法向预测方面仍能取得较理想的结果.
Normal estimation from monocular images is one of the most important issues in computer graphics and computer vision research. Short of three-dimensional information,the corresponding normal is predicted from the monocular images,which is of great significance for 3D scene reconstruction,3D model recognition,3D semantic segmentation,etc. In order to find the solution to the problem,this paper adopts a multi-scale convolutional network structure to predict an end-to-end output of the image. The network consists of two scales,the first layer uses the DenseNet classification network with the best performance in ImageNet to process the input globally. The second level uses a fully convolutional network to further fine-tune the output obtained from the first level. The experimental results show that the network proposed in this paper can achieve better results in normal prediction of monocular image even without using other pre-processing or post-processing steps.
引文
[1] HOIEM D,EFROS A A,HEBERT M. Automatic photopop-up[J]∥ACM Transactions on Graphics(TOG),2005,24(3):577-584.
[2] FOUHEY D F,GUPTA A,HEBERT M. Data-driven 3Dprimitives for single image understanding[C]∥Proceed-ings of the IEEE International Conference on ComputerVision. Piscataway,NJ:IEEE,2013:3392-3399.
[3] FOUHEY D F,GUPTA A,HEBERT M. Unfolding an in-door origami world[C]∥European Conference on Com-puter Vision. Cham:Springer,2014:687-702.
[4] FOUHEY D F,HUSSAIN W,GUPTA A,et al. Single im-age 3D without a single 3D image[C]∥Proceedings ofthe IEEE International Conference on Computer Vision.Piscataway,NJ:IEEE,2015:1053-1061.
[5] EIGEN D,FERGUS R. Predicting depth,surface normalsand semantic labels with a common multi-scale convolu-tional architecture[C]∥Proceedings of the IEEE Inter-national Conference on Computer Vision. Piscataway,NJ:IEEE,2015:2650-2658.
[6] BANSAL A,RUSSELL B,GUPTA A. Marr revisited:2d-3d alignment via surface normal prediction[C]∥Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition. Piscataway,NJ:IEEE,2016:5965-5974.
[7] WANG X,FOUHEY D,GUPTA A. Designing deep net-works for surface normal estimation[C]∥Proceedings ofthe IEEE Conference on Computer Vision and PatternRecognition. Piscataway,NJ:IEEE,2015:539-547.
[8] EIGEN D,PUHRSCH C,FERGUS R. Depth map predic-tion from a single image using a multi-scale deep network[C]∥Advances in neural information processing sys-tems. New York,NY:Curran Associates,2014:2366-2374.
[9] ZEISL B,POLLEFEYS M. Discriminatively trained densesurface normal estimation[C]∥European conference oncomputer vision. Cham:Springer,2014:468-484.
[10] HOIEM D,EFROS A A,HEBERT M. Recovering surfacelayout from an image[J]. International Journal of Com-puter Vision,2007,75(1):151-172.
[11] SCHWING A G,FIDLER S,POLLEFEYS M,et al. Boxin the box:Joint 3d layout and object reasoning from sin-gle images[C]∥Proceedings of the IEEE InternationalConference on Computer Vision. Piscataway,NJ:IEEE,2013:353-360.
[12] SRAJER F,SCHWING A G,POLLEFEYS M,et al.Match box:Indoor image matching via box-like scene es-timation[C]∥3D Vision(3DV),2014 2nd InternationalConference on IEEE. Piscataway,NJ:IEEE,2014:705-712.
[13] WANG P,SHEN X,RUSSELL B,et al. SURGE:surfaceregularized geometry estimation from a single image[C]∥Advances in Neural Information Processing Systems. NewYork,NY:Curran Associates,2016:172-180.
[14] RUSSAKOVSKY O,DENG J,SU H,et al. Imagenet largescale visual recognition challenge[J]. InternationalJournal of Computer Vision. Hingham,Mass:Kluwer Aca-demic Publishers,2015,115(3):211-252.
[15] LIU B,GOULD S,KOLLER D. Single image depth esti-mation from predicted semantic labels[C]∥ComputerVision and Pattern Recognition(CVPR),2010 IEEEConference on. Piscataway,NJ:IEEE,2010:1253-1260.
[16] XU D,RICCI E,OUYANG W,et al. Multi-scale continu-ous crfs as sequential deep networks for monocular depthestimation[C]∥Proceedings of CVPR. Piscataway,NJ:IEEE,2017.
[17] ROY A,TODOROVIC S. Monocular depth estimation u-sing neural regression forest[C]∥Proceedings of theIEEE Conference on Computer Vision and Pattern Recog-nition. Piscataway,NJ:IEEE,2016:5506-5514.
[18] SILBERMAN N,HOIEM D,KOHLI P,et al. Indoor seg-mentation and support inference from rgbd images[C]∥European Conference on Computer Vision. Berlin,Heidelberg:Springer,2012:746-760.
[19] SONG S,LICHTENBERG S P,XIAO J. Sun rgb-d:Argb-d scene understanding benchmark suite[C]∥Pro-ceedings of the IEEE conference on computer vision andpattern recognition. Piscataway,NJ:IEEE,2015:567-576.
[20] KRIZHEVSKY A,SUTSKEVER I,HINTON G E. Ima-genet classification with deep convolutional neural net-works[C]∥Advances in neural information processingsystems. New York,NY:Curran Associates,2012:1097-1105.
[21] SIMONYAN K,ZISSERMAN A. Very deep convolutionalnetworks for large-scale image recognition[J]. ComputerScience,2014(9):ar Xiv:1409. 1556.
[22] HUANG G,LIU Z,WEINBERGER K Q,et al. Denselyconnected convolutional networks[C]∥Proceedings ofthe IEEE conference on computer vision and pattern re-cognition. Piscataway,NJ:IEEE,2017,1(2):3.