一种由粗至精的室内场景的空间布局估计方法

英文篇名：A Coarse-to-Fine Estimation Method for Spatial Layout of Indoor Scenes
作者：刘天亮 ; 顾雁秋 ; 曹旦旦 ; 戴修斌 ; 罗杰波
英文作者：LIU Tianliang;GU Yanqiu;CAO Dandan;DAI Xiubin;LUO Jiebo;Jiangsu Provincial Key Lab of Image Processing and Image Communication, Nanjing University of Posts and Telecommunications;Department of Computer Science, University of Rochester;
关键词：室内场景 ; 布局估计 ; 卷积神经网络 ; 场景布局类别
英文关键词：indoor scene;;layout estimation;;convolution neural network;;scene layout category
中文刊名：JQRR
英文刊名：Robot
机构：南京邮电大学江苏省图像处理与图像通信重点实验室;罗彻斯特大学计算机科学系;
出版日期：2018-12-12 11:43
出版单位：机器人
年：2019
期：v.41
基金：国家自然科学基金(61001152,31200747,61071091);; 江苏省自然科学基金(BK2012437);; 国家留学基金
语种：中文;
页：JQRR201901007
页数：7
CN：01
ISSN：21-1137/TP
分类号：60-66

摘要

为有效标注室内场景的布局关系,提出一种由粗至精的空间布局估计方法.首先,采用局部不连续自适应阈值检测场景的长直线段,根据直线段的方向将其分为竖直和水平直线段;基于投票机制和正交准则估计垂直与水平消失点,由这两个消失点等角度间隔地引出成对射线生成场景候选布局.其次,采用VGG-16全卷积神经网络估计相应场景的几何上下文和信息化边界,采用softmax分类器决策其fc7层特征以获取布局类别,融合信息化边界和布局类别生成全局特征以粗选取场景候选布局.接着,基于VGG空间多尺度卷积神经网络估计相应场景的法向图和深度图以提取法向特征和深度特征.然后,利用消失点射线夹角参数化3D盒式布局模型,利用几何积分图聚集候选布局中的直线段成员、几何上下文、法向量和深度等区域级特征,采用割平面法学习结构化模型参数.最后,对候选布局的结构化预测得分进行排序,将得分最高者选取为最终空间布局.Hedau和LSUN数据集实验表明,该方法能获得空间布局的精准区域面划分个数和精确边界位置.
A coarse-to-fine estimation method for spatial layout is presented to effectively label the layout relationship of indoor scenes. Firstly, the adaptive threshold detection method with local discontinues is exploited to acquire the long straight lines of the given scene, which are splitted into the vertical lines and horizontal ones in terms of the corresponding directions.The vertical and horizontal vanishing points are estimated based on the vote mechanism and orthogonality principle, and the pairs of the rays led from two vanishing points at equal angular interval are used to generate the candidates of the given scene layout. Next, the informative edge and geometric context of the given scene are estimated with VGG-16 full convolution neural network, and the softmax classifier is applied to deciding the given fc7 features to obtain the layout category, while the global features merged with the informative edge and layout category are generated to roughly select the layout candidates.Then, the normal vector and depth map of the given scenes are estimated with the VGG-based spatial multi-scale convolution neural network to extract the related normal vector and geometric depth feature. And next, the 3 D box spatial layout model can be parameterized by the angles between the rays from vanishing points, while the line membership, geometric context,normal vector and depth feature are accumulated via geometric integral image to extract the regional features of layout candidates, and the structural model parameter can be learned with cutting-plane method. Finally, the layout candidate with the highest structural prediction score is selected as the final spatial layout. Experimental results on the Hedau and LSUN datasets demonstrate that the presented method can obtain more accurate number of divided polygons and more precise boundary positions of spatial layout.

引文

[1]姚拓中,左文辉,宋加涛,等.结合物体先验和空域约束的室内空域布局推理[J].自动化学报,2017,43(8):1402-1411.Yao T Z,Zuo W H,Song J T,et al.Estimating spatial layout of cluttered rooms by using object prior and spatial constraints[J].Acta Automatica Sinica,2017,43(8):1402-1411.
    [2]庄严,卢希彬,李云辉.移动机器人基于三维激光测距的室内场景认知[J].自动化学报,2011,37(10):1232-1240.Zhuang Y,Lu X B,Li Y H.Mobile robot indoor scene cognition using 3D laser scanning[J].Acta Automatica Sinica,2011,37(10):1232-1240.
    [3]Hedau V,Hoiem D,Forsyth D.Recovering the spatial layout of cluttered rooms[C]//IEEE International Conference on Computer Vision.Piscataway,USA:IEEE,2009:1849-1856.
    [4]Hoiem D,Efros A A,Hebert M.Geometric context from a single image[C]//IEEE International Conference on Computer Vision.Piscataway,USA:IEEE,2005:654-661.
    [5]Lee D C,Hebert M,Kanade T.Geometric reasoning for single image structure recovery[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,USA:IEEE,2009:2136-2143.
    [6]Ramalingam S,Pillai J K,Jain A,et al.Manhattan junction catalogue for spatial reasoning of indoor scenes[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,USA:IEEE,2013:3065-3072.
    [7]Zhang J,Kan C,Schwing A G,et al.Estimating the 3D layout of indoor scenes and its clutter from depth sensors[C]//IEEE International Conference on Computer Vision.Piscataway,USA:IEEE,2013:1273-1280.
    [8]Wang H,Gould S,Roller D.Discriminative learning with latent variables for cluttered indoor scene understanding[J].Communications of the ACM,2010,56(4):92-99.
    [9]Schwing A G,Hazan T,Pollefeys M,et al.Efficient structured prediction for 3D indoor scene understanding[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,USA:IEEE,2012:2815-2822.
    [10]Mallya A,Lazebnik S.Learning informative edge maps for indoor scene layout prediction[C]//IEEE International Conference on Computer Vision.Piscataway,USA:IEEE,2015:936-944.
    [11]Rother C.A new approach to vanishing point detection in architectural environments[J].Image and Vision Computing,2002,20(9/10):647-655.
    [12]吴培良,李亚南,杨芳,等.一种基于CLM的服务机器人室内功能区分类方法[J].机器人,2018,40(2):188-194.Wu P L,Li Y N,Yang F,et al.A CLM-based method of indoor affordance areas classification for service robots[J].Robot,2018,40(2):188-194.
    [13]Krizhevsky A,Sutskever I,Hinton G E.ImageNet classification with deep convolutional neural networks[C]//IEEE International Conference on Neural Information Processing Systems.Piscataway,USA:IEEE,2012:1097-1105.
    [14]Dasgupta S,Fang K,Chen K,et al.DeLay:Robust spatial layout estimation for cluttered indoor scenes[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,USA:IEEE,2016:616-624.
    [15]Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,USA:IEEE,2015:3431-3440.
    [16]Department of Computer Science,Princeton University.LSUNlarge-scale scene understanding challenge,room layout estimation dataset[DB/OL].(2017-01-11)[2017-12-07].http://lsun.cs.princeton.edu/leaderboard/#roomlayout.
    [17]Eigen D,Fergus R.Predicting depth,surface normals and semantic labels with a common multi-scale convolutional architecture[C]//IEEE International Conference on Computer Vision.Piscataway,USA:IEEE,2015:2650-2658.
    [18]Tsochantaridis I,Joachims T,Hofmann T,et al.Large margin methods for structured and interdependent output variables[J].Journal of Machine Learning Research,2005,6(2):1453-1484.
    [19]Lee D C,Gupta A,Hebert M,et al.Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces[C]//Annual Conference on Neural Information Processing Systems.Vancouver,Canada:Springer,2010:1288-1296.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700