A Computational Learning Theory of Active Object Recognition Under Uncertainty
详细信息    查看全文
  • 作者:Alexander Andreopoulos (1)
    John K. Tsotsos (2)
  • 关键词:Object recognition ; Visual search ; Active vision ; Attention ; Computational complexity of vision
  • 刊名:International Journal of Computer Vision
  • 出版年:2013
  • 出版时间:January 2013
  • 年:2013
  • 卷:101
  • 期:1
  • 页码:95-142
  • 全文大小:2044KB
  • 参考文献:1. Aloimonos, J., Bandopadhay, A., & Weiss, I. (1988). Active vision. / International Journal of Computer Vision, / 1, 333鈥?56. CrossRef
    2. Andreopoulos, A., & Tsotsos, J. K. (2008). Active vision for door localization and door opening using playbot: A computer controlled wheelchair for people with mobility impairments. In / Proc. 5th Canadian conference on computer and robot vision.
    3. Andreopoulos, A., & Tsotsos, J. K. (2009). A theory of active object localization. In / Proc. int. conf. on computer vision.
    4. Andreopoulos, A., & Tsotsos, J. K. (2012). On sensor bias in experimental methods for comparing interest point, saliency and recognition algorithms. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 34(1), 110鈥?26. CrossRef
    5. Andreopoulos, A., Hasler, S., Wersing, H., Janssen, H., Tsotsos, J. K., & K枚rner, E. (2011). Active 3D object localization using a humanoid robot. / IEEE Transactions on Robotics, / 27(1), 47鈥?4. CrossRef
    6. Angluin, D., & Laird, P. (1988). Learning from noisy examples. / Machine Learning, / 2(4), 343鈥?70.
    7. Aristotle (350 B.C.) $\varPi\epsilon\rho\acute{\iota}$ $\varPsi\upsilon\chi\acute{\eta}\varsigma$ (On the Soul).
    8. Bajcsy, R. (1985). Active perception vs. passive perception. In / IEEE workshop on computer vision representation and control, Bellaire, Michigan.
    9. Ballard, D. (1991). Animate vision. / Artificial Intelligence, / 48, 57鈥?6. CrossRef
    10. Barrow, H., & Popplestone, R. (1971). Relational descriptions in picture processing. / Machine Intelligence, / 6, 377鈥?96.
    11. Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: risk bounds and structural results. / Journal of Machine Learning Research, / 3, 463鈥?82.
    12. Bartlett, P. L., Long, P. M., & Williamson, R. C. (1996). Fat-shattering and the learnability of real-valued functions. / Journal of Computer and System Sciences, / 52, 434鈥?52. CrossRef
    13. Baum, E., & Haussler, D. (1989). What size net gives valid generalization? / Neural Computation, / 1(1), 151鈥?60. CrossRef
    14. Ben-David, S., & Lindenbaum, M. (1998). Localization vs. identification of semi-algebraic sets. / Machine Learning, / 32, 207鈥?24. CrossRef
    15. Biederman, I. (1987). Recognition-by-components: a theory of human image understanding. / Psychological Review, / 94, 115鈥?47. CrossRef
    16. Boshra, M., & Bhanu, B. (2000). Predicting performance of object recognition. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 22(9), 956鈥?69. CrossRef
    17. Brentano, F. (1874). / Psychologie vom empirischen Standpunkt. Leipzig: Meiner.
    18. Broadbent, D. (1958). / Perception and communication. Elmsford: Pergamon Press. CrossRef
    19. Brooks, R., Greiner, R., & Binford, T. (1979). The ACRONYM model-based vision system. In / Proc. of 6th int. joint conf. on artificial intelligence.
    20. Bruce, N. D., & Tsotsos, J. K. (2009). Saliency, attention and visual search: an information theoretic approach. / Journal of Vision, / 9(3), 1鈥?4. CrossRef
    21. Callari, F., & Ferrie, F. (2001). Active recognition: looking for differences. / International Journal of Computer Vision, / 43(3), 189鈥?04. CrossRef
    22. de Berg, M., van Krefeld, M., Overmars, M., & Schwarzkopf, O. (2000). / Computational geometry: algorithms and applications. Berlin: Springer.
    23. Dickinson, S., Christensen, H., Tsotsos, J., & Olofsson, G. (1997). Active object recognition integrating attention and viewpoint control. / Computer Vision and Image Understanding, / 67(3), 239鈥?60. CrossRef
    24. Dickinson, S., Wilkes, D., & Tsotsos, J. (1999). A computational model of view degeneracy. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 21(8), 673鈥?89. CrossRef
    25. Ekvall, S., Jensfelt, P., & Kragic, D. (2006). Integrating active mobile robot object recognition and SLAM in natural environments. In / Proc. Intelligent robots and systems.
    26. Findlay, J. M., & Gilchrist, I. D. (2003). / Active vision: the psychology of looking and seeing. London: Oxford University Press. CrossRef
    27. Fukushima, K. (1980). Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. / Biological Cybernetics, / 36(4), 193鈥?02. CrossRef
    28. Garvey, T. (1976). / Perceptual strategies for purposive vision (Tech. rep., Nr. 117). SRI Int鈥檒.
    29. Gerstner, W., & Kistler, W. (2002). / Spiking neuron models: single neurons, populations, plasticity. Cambridge: Cambridge University Press. CrossRef
    30. Gibson, J. (1979). / The ecological approach to visual perception. Boston: Houghton Mifflin.
    31. Giefing, G., Janssen, H., & Mallot, H. (1992). Saccadic object recognition with an active vision system. In / International conference on pattern recognition.
    32. Grimson, W. E. L. (1991). The combinatorics of heuristic search termination for object recognition in cluttered environments. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 13, 920鈥?35. CrossRef
    33. Grossberg, S. (1973). Contour enhancement, short-term memory, and constancies in reverberating neural networks. / Studies in Applied Mathematics, / 52, 213鈥?57.
    34. Hinton, G. (1978). / Relaxation and its role in vision. PhD thesis, University of Edinburgh.
    35. Ikeuchi, K., & Kanade, T. (1988). Automatic generation of object recognition programs. / Proceedings of the IEEE, / 76(8), 1016鈥?035. CrossRef
    36. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 20(11), 1254鈥?259. CrossRef
    37. Kearns, M. (1993). Efficient noise-tolerant learning from statistical queries. In / Proc. of the 25th ACM symposium on the theory of computing.
    38. Kearns, M. J., & Vazirani, U. V. (1994). / An introduction to computational learning theory. Cambridge: MIT Press.
    39. Laporte, C., & Arbel, T. (2006). Efficient discriminant viewpoint selection for active Bayesian recognition. / International Journal of Computer Vision, / 68(3), 267鈥?87. CrossRef
    40. Lindenbaum, M. (1997). An integrated model for evaluating the amount of data required for reliable recognition. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 19(11), 1251鈥?264. CrossRef
    41. Marr, D. (1982). / Vision: a computational investigation into the human representation and processing of visual information. New York: Freeman.
    42. Maver, J., & Bajcsy, R. (1993). Occlusions as a guide for planning the next view. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 15(5), 417鈥?33. CrossRef
    43. McAllester, D. A. (2003). Pac-Bayesian stochastic model selection. / Machine Learning, / 51, 5鈥?1. CrossRef
    44. Meger, D., Forssen, P., Lai, K., Helmer, S., McCann, S., Southey, T., Baumann, M., Little, J., & Lowe, D. (2008). Curious George: an attentive semantic robot. / Robotics and Autonomous Systems, / 56(6), 503鈥?11. CrossRef
    45. Minsky, M., & Papert, S. (1969) / Perceptrons. Cambridge, MIT Press.
    46. Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. / Nature, / 434, 387鈥?91. CrossRef
    47. Navalpakkam, V., & Itti, L. (2005). Modeling the influence of task on attention. / Vision Research, / 45(2), 205鈥?31. CrossRef
    48. Nevatia, R., & Binford, T. (1977). Description and recognition of curved objects. / Artificial Intelligence, / 8, 77鈥?8. CrossRef
    49. Rimey, R. D., & Brown, C. M. (1994). Control of selective perception using Bayes nets and decision theory. / International Journal of Computer Vision, / 12(2/3), 173鈥?07. CrossRef
    50. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. / Psychological Review, / 65(6), 386鈥?08. CrossRef
    51. Roy, S. D., Chaudhury, S., & Banerjee, S. (2000). Isolated 3D object recognition through next view planning. / IEEE Transactions on Systems, Man and Cybernetics. Part A. Systems and Humans, / 30(1), 67鈥?6. CrossRef
    52. Saidi, F., Stasse, O., Yokoi, K., & Kanehiro, F. (2007). Online object search with a humanoid robot. In / Proc. Intelligent robots and systems.
    53. Schiele, B., & Crowley, J. (1998). Transinformation for active object recognition. In / Proc. int. conf. on computer vision.
    54. Seeger, M. (2002). The proof of McAllester鈥檚 Pac-Bayesian theorem. In: / Advances in neural information processing systems.
    55. Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. / Nature, / 381(6582), 520鈥?22. CrossRef
    56. Tsotsos, J. K. (1990). Analyzing vision at the complexity level. / Behavioral and Brain Sciences, / 13(3), 423鈥?45. CrossRef
    57. Tsotsos, J. K. (1992). On the relative complexity of active vs. passive visual search. / International Journal of Computer Vision, / 7(2), 127鈥?41. CrossRef
    58. Tsotsos, J. K. (2011). / A computational perspective on visual attention. Cambridge: MIT Press.
    59. Tsotsos, J. K., Culhane, S. M., Wai, W. Y. K., Lai, Y., Davis, N., & Nuflo, F. (1995). Modeling visual attention via selective tuning. / Artificial Intelligence, / 78, 507鈥?45. CrossRef
    60. Tsotsos, J., Liu, Y., Martinez-Trujillo, J., Pomplun, M., Simine, E., & Zhou, K. (2005). Attending to visual motion. / Computer Vision and Image Understanding, / 100(1鈥?), 3鈥?0. CrossRef
    61. Valiant, L. (1984a). Deductive learning. / Philosophical Transactions of the Royal Society of London, / 312, 441鈥?46.
    62. Valiant, L. (1984b). A theory of the learnable. / Communications of the ACM, / 27(11), 1134鈥?142. CrossRef
    63. Valiant, L. (1985). Learning disjunctions of conjunctions. In / Proc. 9th international joint conference on artificial intelligence.
    64. Verghese, P., & Pelli, D. (1992). The information capacity of visual attention. / Vision Research, / 32(5), 983鈥?95. CrossRef
    65. Wixson, L. E., & Ballard, D. H. (1994). Using intermediate objects to improve the efficiency of visual search. / International Journal of Computer Vision, / 12(2/3), 209鈥?30. CrossRef
    66. Ye, Y., & Tsotsos, J. (1999). Sensor planning for 3D object search. / Computer Vision and Image Understanding, / 73(2), 145鈥?68. CrossRef
    67. Ye, Y., & Tsotsos, J. (2001). A complexity level analysis of the sensor planning task for object search. / Computational Intelligence, / 17(4), 605鈥?20. CrossRef
  • 作者单位:Alexander Andreopoulos (1)
    John K. Tsotsos (2)

    1. IBM Research-Almaden, 650 Harry Road, San Jose, CA, 95120-6099, USA
    2. Dept. of Computer Science and Engineering, Centre for Vision Research, York University, Toronto, Ontario, Canada
  • ISSN:1573-1405
文摘
We present some theoretical results related to the problem of actively searching a 3D scene to determine the positions of one or more pre-specified objects. We investigate the effects that input noise, occlusion, and the VC-dimensions of the related representation classes have in terms of localizing all objects present in the search region, under finite computational resources and a search cost constraint. We present a number of bounds relating the noise-rate of low level feature detection to the VC-dimension of an object representable by an architecture satisfying the given computational constraints. We prove that under certain conditions, the corresponding classes of object localization and recognition problems are efficiently learnable in the presence of noise and under a purposive learning strategy, as there exists a polynomial upper bound on the minimum number of examples necessary to correctly localize the targets under the given models of uncertainty. We also use these arguments to show that passive approaches to the same problem do not necessarily guarantee that the problem is efficiently learnable. Under this formulation, we prove the existence of a number of emergent relations between the object detection noise-rate, the scene representation length, the object class complexity, and the representation class complexity, which demonstrate that selective attention is not only necessary due to computational complexity constraints, but it is also necessary as a noise-suppression mechanism and as a mechanism for efficient object class learning. These results concretely demonstrate the advantages of active, purposive and attentive approaches for solving complex vision problems.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700