The Shape Boltzmann Machine: A Strong Model of Object Shape

详细信息查看全文

作者：S. M. Ali Eslami (1)
Nicolas Heess (2)
Christopher K. I. Williams (1)
John Winn (3)
关键词：Shape ; Generative ; Deep Boltzmann machine ; Sampling
刊名：International Journal of Computer Vision
出版年：2014
出版时间：April 2014
年：2014
卷：107
期：2
页码：155-176
全文大小：3,611 KB
参考文献：1. Ackley, D., Hinton, G., & Sejnowski, T. (1985). A learning algorithm for Boltzmann machines. / Cognitive Science, / 9(1), 147鈥?69. CrossRef
2. Alexe, B., Deselaers, T., & Ferrari, V. (2010a). ClassCut for unsupervised class segmentation. In / European Conference on Computer vision (pp. 380鈥?93).
3. Alexe, B., Deselaers, T., & Ferrari, V. (2010b). What is an object?. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 73鈥?0).
4. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., & Davis, J. (2005). SCAPE: Shape completion and animation of people. / ACM Transactions on Graphics (SIGGRAPH), / 24(3), 408鈥?16. CrossRef
5. Bertozzi, A., Esedoglu, S., & Gillette, A. (2007). Inpainting of binary images using the Cahn鈥揌illiard equation. / IEEE Transactions on Image Processing, / 16(1), 285鈥?91. CrossRef
6. Bo, Y., & Fowlkes, C. (2011). Shape-based pedestrian parsing. In / IEEE Conference on Computer Vision and Pattern Recognition 2011.
7. Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining top-down and bottom-up segmentation. In / CVPR Workshop on Perceptual Organization in Computer Vision.
8. Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for oOptimal boundary & region segmentation of objects in N-D images. In / International Conference on Computer Vision 2001 (pp. 105鈥?12).
9. Bridle, J. S. (1990). Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In / Advances in Neural Information Processing Systems (Vol. 2, pp. 211鈥?17).
10. Cemgil, T., Zajdel, W., & Krose, B. (2005). A hybrid graphical model for robust feature extraction from video. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 1158鈥?165).
11. Chan, T. F., & Shen, J. (2001). Nontexture inpainting by curvature-driven diffusions. / Journal of Visual Communication and Image Representation, / 12(4), 436鈥?49. CrossRef
12. Chen, F., Yu, H., Hu, R., & Zeng, X. (2013). Deep learning shape priors for object segmentation. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 1870鈥?877).
13. Cootes, T., Taylor, C., Cooper, D. H., & Graham, J. (1995). Active shape models鈥擳heir training and application. / Computer Vision and Image Understanding, / 61, 38鈥?9. CrossRef
14. Desjardins, G., & Bengio, Y. (2008). / Empirical evaluation of convolutional RBMs for vision. Tech. Rep. 1327, D茅partement d鈥橧nformatique et de Recherche Op茅rationnelle, Universit茅 de Montr茅al.
15. Eslami, S. M. A., & Williams, C. K. I. (2011). Factored shapes and appearances for parts-based object understanding. In / British Machine Vision Conference 2011, (pp. 18.1鈥?8.12).
16. Eslami, S. M. A., & Williams, C. K. I. (2012). A generative model for parts-based object segmentation. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, & K. Weinberger (Eds.), / Advances in Neural Information Processing Systems (Vol. 25, pp. 100鈥?07). Red Hook, NY: Curran Associates, Inc.
17. Fei-Fei, L., Fergus, R., Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In / IEEE Conference on Computer Vision and Pattern Recognition 2004, Workshop on Generative-Model Based Vision.
18. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 99, 1鈥?9.
19. Freund, Y., & Haussler, D. (1994). / Unsupervised learning of distributions on binary vectors using two layer networks, Tech. Rep. UCSC-CRL-94-25. Santa Cruz: University of California.
20. Frey, B., Jojic, N., & Kannan, A. (2003). Learning appearance and transparency manifolds of occluded objects in layer. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 45鈥?2).
21. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. / Journal of Statistical Software, / 33(1), 1鈥?2.
22. Gavrila, D. M. (2007). A Bayesian, exemplar-based approach to hierarchical shape matching. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 29, 1408鈥?421. CrossRef
23. Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In / International Conference on Computer Vision.
24. Heess, N., Roux, N. L., & Winn, J. M. (2011). Weakly supervised learning of foreground-background segmentation using masked RBMs. In / International Conference on Artificial Neural Networks (Vol. 2, pp. 9鈥?6).
25. Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. / Neural Computation, / 14(8), 1771鈥?800. CrossRef
26. Jojic, N., & Caspi, Y. (2004). Capturing image structure with probabilistic index maps. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 212鈥?19).
27. Jojic, N., Perina, A., Cristani, M., Murino, V., & Frey, B. (2009). Stel component analysis: Modeling spatial correlations in image class structure. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 2044鈥?051).
28. Kapoor, A. & Winn, J. (2006). Located hidden random fields: Learning discriminative parts for object detection. In / European Conference on Computer Vision (pp. 302鈥?15).
29. Kohli, P., Kumar, M. P., Torr, P. H. S. (2007). P3 & beyond: Solving energies with higher order cliques. In / IEEE Conference on Computer Vision and Pattern Recognition.
30. Kohli, P., Ladicky, L., & Torr, P. H. S. (2009). Robust higher order potentials for enforcing label consistency. / International Journal of Computer Vision, / 82(3), 302鈥?24. CrossRef
31. Komodakis, N. & Paragios, N. (2009). Beyond pairwise energies: Efficient optimization for higher-order mrfs. In / IEEE Conference on Computer Vision and Pattern Recognition 2007 (pp. 2985鈥?992).
32. Kumar, P., Torr, P., & Zisserman, A. (2005). OBJ CUT. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 18鈥?5).
33. Lampert, C. H., Blaschko, M., & Hofmann, T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 1鈥?).
34. Le Roux, N., & Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. / Neural Computation, / 20(6), 1631鈥?649. CrossRef
35. Le Roux, N., Heess, N., Shotton, J., & Winn, J. (2011). Learning a generative model of images by factoring appearance and shape. / Neural Computation, / 23(3), 593鈥?50. CrossRef
36. Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of Hierarchical representations. In / International Conference on Machine Learning (pp. 609鈥?16).
37. Morris, R. D., Descombes, X., & Zerubia, J. (1996). The Ising/Potts model is not well suited to segmentation tasks. In / Proceedings of the IEEE Digital Signal Processing Workshop.
38. Murray, I., & Salakhutdinov, R. (2009). Evaluating probabilities under high-dimensional latent variable models. In / Advances in Neural Information Processing Systems (Vol. 21).
39. Neal, R. M. (1992). Connectionist learning of belief networks. / Artificial Intelligence, / 56, 71鈥?13. CrossRef
40. Neal, R. M. (2001). Annealed importance sampling. / Statistics and Computing, / 11(2), 125鈥?39. CrossRef
41. Norouzi, M., Ranjbar, M., & Mori, G. (2009). Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning. In / CVPR (pp. 2735鈥?742).
42. Nowozin, S., & Lampert, C. H. (2009). Global connectivity potentials for random field models. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 818鈥?25).
43. Raina, R., Madhavan, A., & Ng, A. Y. (2009). Large-scale deep unsupervised learning using graphics processors. In / International Conference on Machine Learning (pp. 873鈥?80).
44. Ranzato, M., Mnih, V., & Hinton, G. E. (2010). How to generate realistic images using gated MRFs. In J. Lafferty, C. K. I. Williams, R. Zemel, J. Shawe-Taylor, & A. Culotta (Eds.), / Advances in Neural Information Processing Systems (Vol. 23). Cambridge: MIT Press.
45. Ranzato, M., Susskind, J., Mnih, V., & Hinton, G. E. (2011). On deep generative models with applications to recognition. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 2857鈥?864).
46. Robbins, H., & Monro, S. (1951). A stochastic approximation method. / The Annals of Mathematical Statistics, / 22(3), 400鈥?07. CrossRef
47. Roth, S., & Black, M. J. (2005). Fields of experts: A framework for learning image priors. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 860鈥?67).
48. Rother, C., Kolmogorov, V., & Blake, A. (2004). 鈥淕rabCut鈥? Interactive foreground extraction using iterated graph cuts. / ACM Transactions on Graphics (SIGGRAPH), / 23, 309鈥?14. CrossRef
49. Rother, C., Kohli, P., Feng, W., & Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In / IEEE Conference on Computer Vision and Pattern Recognition (pp. 1382鈥?389).
50. Rowley, H., Baluja, S., & Kanade, T. (1998). Neural network-based face detection. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 20(1), 23鈥?8. CrossRef
51. Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). LabelMe: A database and web-based tool for image annotation. / International Journal of Computer Vision, / 77, 157鈥?73. CrossRef
52. Salakhutdinov, R. & Hinton, G. (2009). Deep Boltzmann machines. In / International Conference on Artificial Intelligence and Statistics 2009, (Vol. 5, pp. 448鈥?55).
53. Salakhutdinov, R., & Murray, I. (2008). On the quantitative analysis of deep belief networks. In / International Conference on Machine Learning 2008.
54. Schneiderman, H. (2000). / A statistical approach to 3D object detection applied to faces and cars. PhD Thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
55. Shekhovtsov, A., Kohli, P., & Rother, C. (2012). Curvature prior for MRF-based segmentation and shape inpainting. In / DAGM/OAGM Symposium (pp. 41鈥?1).
56. Sigal, L., Balan, A., & Black, M. (2010). HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. / International Journal of Computer Vision, / 87(1鈥?), 4鈥?7. CrossRef
57. Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., & Gool, L. V. (2009). Using multi-view recognition and meta-data annotation to guide a robot鈥檚 attention. / International Journal of Robotics Research, / 28(8), 976鈥?98.
58. Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In / International Conference on Machine Learning 2008 (pp. 1064鈥?071).
59. Tjelmeland, H., & Besag, J. (1998). Markov random fields with higher-order interactions. / Scandinavian Journal of Statistics, / 25(3), 415鈥?33.
60. Williams, C. K. I., & Titsias, M. (2004). Greedy learning of multiple objects in images using robust statistics and factorial learning. / Neural Computation, / 16(5), 1039鈥?062. CrossRef
61. Winn, J., & Jojic, N. (2005). LOCUS: Learning object classes with unsupervised segmentation. In / International Conference on Computer Vision (pp. 756鈥?63).
62. Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. In / Stochastics and Stochastics Reports (Vol. 65, pp. 177鈥?28).
63. Younes, L., & Sud, P. (1989). Parametric inference for imperfectly observed Gibbsian fields. / Probability Theory and Related Fields, / 82, 625鈥?45. CrossRef
作者单位：S. M. Ali Eslami (1)
Nicolas Heess (2)
Christopher K. I. Williams (1)
John Winn (3)

1. School of Informatics, University of Edinburgh, Edinburgh, UK
2. Gatsby Computational Neuroscience Unit, University College London, London, UK
3. Microsoft Research, Cambridge, UK
ISSN：1573-1405

文摘

A good model of object shape is essential in applications such as segmentation, detection, inpainting and graphics. For example, when performing segmentation, local constraints on the shapes can help where object boundaries are noisy or unclear, and global constraints can resolve ambiguities where background clutter looks similar to parts of the objects. In general, the stronger the model of shape, the more performance is improved. In this paper, we use a type of deep Boltzmann machine (Salakhutdinov and Hinton, International Conference on Artificial Intelligence and Statistics, 2009) that we call a Shape Boltzmann Machine (SBM) for the task of modeling foreground/background (binary) and parts-based (categorical) shape images. We show that the SBM characterizes a strong model of shape, in that samples from the model look realistic and it can generalize to generate samples that differ from training examples. We find that the SBM learns distributions that are qualitatively and quantitatively better than existing models for this task.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700