Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images
详细信息    查看全文
  • 作者:Salman H. Khan ; Mohammed Bennamoun…
  • 关键词:Scene parsing ; Graphical models ; Geometric reasoning ; Structured learning
  • 刊名:International Journal of Computer Vision
  • 出版年:2016
  • 出版时间:March 2016
  • 年:2016
  • 卷:117
  • 期:1
  • 页码:1-20
  • 全文大小:2,509 KB
  • 参考文献:Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. TPAMI, 33(5), 898–916.CrossRef
    Blake, A., Kohli, P., & Rother, C. (2011). Markov random fields for vision and image processing. Cambridge: The MIT Press.MATH
    Boykov, Y., & Funka-Lea, G. (2006). Graph cuts and efficient nd image segmentation. IJCV, 70(2), 109–131.CrossRef
    Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. TPAMI, 23(11), 1222–1239.CrossRef
    Breiman, L. (2001). Random forests. Machine Learning, 45(0885–6125), 5–32.CrossRef MATH
    Cadena, C., & Košecká, J. (2014). Semantic segmentation with heterogeneous sensor coverages.
    Carreira, J., & Sminchisescu, C. (2012). Cpmc: Automatic object segmentation using constrained parametric min-cuts. TPAMI, 34(7), 1312–1328.CrossRef
    Couprie, C., Farabet, C., Najman, L., & LeCun, Y.(2013). Indoor semantic segmentation using depth information. ICLR.
    Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, vol 1 (pp 886–893).
    Edwards, W., Miles, R. F, Jr, & Von Winterfeldt, D. (2007). Advances in decision analysis: from foundations to applications. Cambridge: Cambridge University Press.CrossRef
    Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. TPAMI, 35(8), 1915–1929. doi:10.​1109/​TPAMI.​2012.​231 .CrossRef
    Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. IJCV, 59(2), 167–181.CrossRef
    Fukunaga, K., & Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. TIT, 21(1), 32–40.MathSciNet MATH
    Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In IEEE ICCV (pp 1–8).
    Gulshan, V., Rother, C., Criminisi, A., Blake, A., & Zisserman, A. (2010). Geodesic star convexity for interactive image segmentation. In IEEE CVPR (pp 3129–3136).
    Gupta, S., Arbelaez, P., & Malik, J. (2013), Perceptual organization and recognition of indoor scenes from rgb-d images. In IEEE CVPR (pp. 564–571).
    Gupta, S., Girshick, R., Arbeláez. P., & Malik, J. (2014). Learning rich features from rgb-d images for object detection and segmentation. In Computer Vision–ECCV 2014 (pp. 345–360). Springer.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The weka data mining software: An update. ACM SIGKDD, 11(1), 10–18.CrossRef
    Hayat, M., Bennamoun, M., & An, S. (2015). Deep reconstruction models for image set classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 713–727. doi:10.​1109/​TPAMI.​2014.​2353635 .CrossRef
    He, X., Zemel, R. S., & Carreira-Perpinán, M. A. (2004). Multiscale conditional random fields for image labeling. In IEEE CVPR, vol 2 (pp II–695).
    Huang, Q., Han, M., Wu, B., & Ioffe, S. (2011). A hierarchical conditional random field model for labeling and segmenting images of street scenes. In IEEE CVPR (pp. 1953–1960).
    Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., et al (2011). Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In ACM Proceedings of the 24th annual ACM symposium on User interface software and technology (pp. 559–568).
    Jiang, Y., Lim, M., Zheng, C., & Saxena, A. (2012). Learning to place new objects in a scene. IJRR, 31(9), 1021–1043.
    Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural svms. JML, 77(1), 27–59.CrossRef MATH
    Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5), 433–449.CrossRef
    Khan, S., Bennamoun, M., Sohel, F., & Togneri, R. (2014a). Automatic feature learning for robust shadow detection. In IEEE CVPR.
    Khan, S., He, X., Bennamoun, M., Sohel, F., & Togneri, R. (2015). Separating objects and clutter in indoor scenes. In IEEE CVPR.
    Khan, S. H., Bennamoun, M., Sohel, F., & Togneri, R. (2014b). Geometry driven semantic labeling of indoor scenes. In Computer Vision–ECCV 2014 (pp. 679–694). Springer.
    Kohli, P., Kumar, M. P., & Torr, P. H. (2007). P3 & beyond: Solving energies with higher order cliques. In IEEE CVPR (pp. 1–8).
    Kohli, P., Torr, P. H., et al. (2009). Robust higher order potentials for enforcing label consistency. IJCV, 82(3), 302–324.CrossRef
    Koppula, H. S., Anand, A., Joachims, T., & Saxena ,A. (2011). Semantic labeling of 3d point clouds for indoor scenes. In NIPS (pp. 244–252).
    Krähenbühl, P., & Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. In NIPS (pp. 109–117).
    Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. (2009). Associative hierarchical crfs for object class image segmentation. In IEEE ICCV (pp. 739–746).
    Ladickỳ, L., Russell, C., Kohli, P., & Torr, P. H. (2013). Inference methods for crfs with co-occurrence statistics. In IJCV (pp. 1–13).
    Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view rgb-d object dataset. In IEEE ICRA (pp. 1817–1824).
    Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). Pylon model for semantic segmentation. In NIPS (pp. 1485–1493).
    Li, Y., Tarlow, D., & Zemel, R. (2013). Exploring compositional high order pattern potentials for structured output learning. In IEEE CVPR (pp. 49–56).
    Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef
    Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.CrossRef
    Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In CVPR (pp. 413–420). doi:10.​1109/​CVPR.​2009.​5206537 .
    Quigley, M., Batra, S., Gould, S., Klingbeil, E., Le, Q., Wellman, A., & Ng, A. Y. (2009). High-accuracy 3d sensing for mobile manipulation: Improving object detection and door opening. In IEEE ICRA (pp. 2816–2822).
    Rabbani, T., van Den Heuvel, F., & Vosselmann, G. (2006). Segmentation of point clouds using smoothness constraint. IAPR SSIS, 36(5), 248–253.
    Rao, D., Le, Q. V., Phoka, T., Quigley, M., Sudsang, A., & Ng, A. Y. (2010). Grasping novel objects with depth segmentation. In IEEE IROS (pp. 2578–2585).
    Ren, X., Bo, L., & Fox, D. (2012). Rgb-(d) scene labeling: Features and algorithms. In IEEE CVPR (pp. 2759–2766).
    Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: Interactive foreground extraction using iterated graph cuts. TOG, ACM, 23, 309–314.CrossRef
    Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV, 81(1), 2–23.CrossRef
    Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In IEEE ICCV Workshops (pp. 601–608).
    Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In ECCV (pp. 746–760). Springer.
    Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning crfs using graph cuts. In ECCV (pp 582–595). Springer.
    Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In ACM ICML (p 104).
    Van De Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In ECCV (pp 334–348). Springer
    Von Gioi, R. G., Jakubowicz, J., Morel, J. M., & Randall, G. (2010). Lsd: A fast line segment detector with a false detection control. TPAMI, 32(4), 722–732.CrossRef
    Woodford, O. J., Rother, C., & Kolmogorov, V. (2009). A global perspective on map inference for low-level vision. In IEEE ICCV (pp. 2319–2326).
    Xiao, J., Owens, A., & Torralba, A. (2013). Sun3d: A database of big spaces reconstructed using sfm and object labels. In IEEE ICCV
    Xiong, X., & Huber, D. (2010). Using context to create semantic 3d models of indoor environments. In BMVC (pp. 45–1).
  • 作者单位:Salman H. Khan (1)
    Mohammed Bennamoun (1)
    Ferdous Sohel (1)
    Roberto Togneri (2)
    Imran Naseem (3)

    1. School of CSSE, The University of Western Australia, 35 Stirling Highway, Crawley, WA, 6009, Australia
    2. School of EECE, The University of Western Australia, 35 Stirling Highway, Crawley, WA, 6009, Australia
    3. Department of Engineering, Karachi Institute of Economics and Technology, Karachi, 75190, Pakistan
  • 刊物类别:Computer Science
  • 刊物主题:Computer Imaging, Vision, Pattern Recognition and Graphics
    Artificial Intelligence and Robotics
    Image Processing and Computer Vision
    Pattern Recognition
  • 出版者:Springer Netherlands
  • ISSN:1573-1405
文摘
Inexpensive structured light sensors can capture rich information from indoor scenes, and scene labeling problems provide a compelling opportunity to make use of this information. In this paper we present a novel conditional random field (CRF) model to effectively utilize depth information for semantic labeling of indoor scenes. At the core of the model, we propose a novel and efficient plane detection algorithm which is robust to erroneous depth maps. Our CRF formulation defines local, pairwise and higher order interactions between image pixels. At the local level, we propose a novel scheme to combine energies derived from appearance, depth and geometry-based cues. The proposed local energy also encodes the location of each object class by considering the approximate geometry of a scene. For the pairwise interactions, we learn a boundary measure which defines the spatial discontinuity of object classes across an image. To model higher-order interactions, the proposed energy treats smooth surfaces as cliques and encourages all the pixels on a surface to take the same label. We show that the proposed higher-order energies can be decomposed into pairwise sub-modular energies and efficient inference can be made using the graph-cuts algorithm. We follow a systematic approach which uses structured learning to fine-tune the model parameters. We rigorously test our approach on SUN3D and both versions of the NYU-Depth database. Experimental results show that our work achieves superior performance to state-of-the-art scene labeling techniques.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700