Activity representation with motion hierarchies
详细信息    查看全文
  • 作者:Adrien Gaidon (1)
    Zaid Harchaoui (2)
    Cordelia Schmid (2)
  • 关键词:Action recognition ; Video analysis ; Motion decomposition ; Spectral clustering ; Kernel methods
  • 刊名:International Journal of Computer Vision
  • 出版年:2014
  • 出版时间:May 2014
  • 年:2014
  • 卷:107
  • 期:3
  • 页码:219-238
  • 全文大小:3,440 KB
  • 参考文献:1. Bilen, H., Namboodiri, V.P., & Van Gool, L.J. (2011). Object and action classification with latent variables. In / BMVC, Bristol.
    2. Bradski, G., & Kaehler, A. (2008). / Learning OpenCV: Computer vision with the OpenCV library. Sebastopol: O’Reilly Media.
    3. Brendel, W., & Todorovic, S. (2011). Learning spatiotemporal graphs of human activities. In / ICCV, Boston.
    4. Brox, T., & Malik, J. (2010). Object segmentation by long term analysis of point trajectories. In / ECCV, Carlton.
    5. Dalal, N., Triggs, B., & Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. In / ECCV, Berlin.
    6. De Castro, E., & Morandi, C. (1987). Registration of translated and rotated images using finite Fourier transforms. In / PAMI, Portage la Prairie.
    7. Diestel, R. (2005). / Graph theory. Heidelberg: Springer.
    8. Duda, R., Hart, P., & Stork, D. (2001). / Pattern classification. New York: Wiley
    9. Dupé, F., & Brun, L. (2008). Hierarchical bag of paths for kernel based shape classification. In / Structural, Syntactic, and Statistical Pattern Recognition, Windsor.
    10. Farneb?ck, G. (2003). Two-frame motion estimation based on polynomial expansion. In / Image Analysis, Lowa.
    11. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. In / PAMI, Portage la Prairie.
    12. Foster, L., Waagen, A., Aijaz, N., Hurley, M., Luis, A., Rinsky, J., Satyavolu, C., Way, M.J., Gazis, P., & Srivastava, A. (2009). Stable and efficient gaussian process calculations. In / JMLR, Las Vegas.
    13. Fowlkes, C., Belongie, S., Chung, F., & Malik, J. (2004). Spectral grouping using the Nystrom method. In / PAMI, London.
    14. Fradet, M., Robert, P., & Pérez, P. (2009). Clustering point trajectories with various life-spans. In / CVMP, London.
    15. Gaidon, A., Harchaoui, Z., & Schmid, C. (2011). Actom sequence models for efficient action detection. In / CVPR, Providence.
    16. Gaidon, A., Harchaoui, Z., & Schmid, C. (2012). Recognizing activities with cluster-trees of tracklets. In / BMVC, Bristol.
    17. Gilbert, A., Illingworth, J., & Bowden, R. (2010). Action recognition using mined hierarchical compound features. In / PAMI, Portage la Prairie.
    18. Grundmann, M., Meier, F., & Essa, I. (2008). 3D shape context and distance transform for action recognition. In / ICPR, Delhi.
    19. Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical, learning (2nd edition). New York: Springer.
    20. Hongeng, S., & Nevatia, R. (2003). Large-scale event detection using semi-hidden markov models. In / ICCV, Boston.
    21. Ikizler-Cinbis, N., & Sclaroff, S. (2010). Object, scene and actions: Combining multiple features for human action recognition. In / ECCV, Carlton.
    22. Jiang, Y., Dai, Q., Xue, X., Liu, W., & Ngo, C. (2012a), Trajectory-based modeling of human actions with motion reference points. In / ECCV, Carlton.
    23. Jiang, Z., Lin, Z., & Davis, L. (2012b). Recognizing human actions by learning and matching shape-motion prototype trees. In / PAMI, Portage la Prairie.
    24. Kliper-Gross, O., Gurovich, Y., Hassner, T., & Wolf, L. (2012). Motion interchange patterns for action recognition in unconstrained videos. In / ECCV, Carlton.
    25. Kovashka, A., & Grauman, K. (2010). Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In / CVPR, Providence.
    26. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). HMDB: a large video database for human motion recognition. In / ICCV, New York.
    27. Laptev, I. (2005). On space-time interest points. In / IJCV, Rosario.
    28. Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. In / CVPR, Providence.
    29. Laxton, B., Lim, J., & Kriegman, D. (2007). Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In / CVPR, Providence.
    30. Lezama, J., Alahari, K., Sivic, J., & Laptev, I. (2011). Track to the future: Spatio-temporal video segmentation with long-range motion cues. In / CVPR, Providence.
    31. Liu, J., Kuipers, B., & Savarese, S. (2011). Recognizing human actions by attributes. In / CVPR, Providence.
    32. Lowe, D.G. (2004). Distinctive image features from scale-invariant keypoints. In / IJCV, Ho Chi Minh.
    33. Maji, S., Berg, A., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In / CVPR, Providence.
    34. Marszalek, M., Laptev, I., & Schmid, C. (2009). Actions in context. In / CVPR, Providence.
    35. Matikainen, P., Hebert, M., & Sukthankar, R. (2010). Representing pairwise spatial and temporal relations for action recognition. In / ECCV, Carlton.
    36. Mikolajczyk, K., & Uemura, H. (2008). Action recognition with motion-appearance vocabulary forest. In / CVPR, Providence.
    37. Niebles, J.C., & Fei-Fei, L. (2007). Hierarchical model of shape and appearance for human action classification. In / CVPR, Providence.
    38. Niebles, J.C., Chen, C., & Fei-Fei, L. (2010) Modeling temporal structure of decomposable motion segments for activity classification. In / ECCV, Carlton.
    39. Oliver, N.M., Rosario, B., & Pentland, A.P. (2000). A Bayesian computer vision system for modeling human interactions. In / PAMI, Portage la Prairie.
    40. Pablo, A., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. In / PAMI, Portage la Prairie.
    41. Patron-Perez, A., Marszalek, M., Zisserman, A., & Reid, I.D. (2010). High five: Recognising human interactions in TV shows. In / BMVC, Bristol.
    42. Prest, A., Ferrari, V., & Schmid, C. (2012). Explicit modeling of human-object interactions in realistic videos. In / PAMI, Portage la Prairie.
    43. Raptis, M., Kokkinos, I., & Soatto, S. (2012). Discovering discriminative action parts from mid-level video representations. In / CVPR, Providence.
    44. Reddy, K. K., Liu, J., & Shah, M. (2009). Incremental action recognition using feature-tree. In / CVPR, Providence.
    45. Sadanand, S., Corso, J. J. (2012). Action bank: A high-level representation of activity in video. In / CVPR, New York.
    46. Sapienza, M., Cuzzolin, F., & Torr, P. (2012). Learning discriminative space-time actions from weakly labelled videos. In / BMVC, Bristol.
    47. Sch?lkopf, B., & Smola, A. J. (2002). / Learning with Kernels. Mexico: MIT Press.
    48. Sch?lkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. / Neural Computation, / 10, 1299-319. CrossRef
    49. Sculley, D. (2010). Web-scale k-means clustering. In / WWW, New York.
    50. Shawe-Taylor, J. (2004). / Cristianini. Cambridge: Cambridge Univ Press.
    51. Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In / ICCV, IEEE, Beijing.
    52. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. In / PAMI, London.
    53. Shi, J., & Tomasi, C. (1994). Good features to track. In / CVPR, Providence.
    54. Suard, F., Rakotomamonjy, A., & Bensrhair, A. (2007). Kernel on bag of paths for measuring similarity of shapes. In / European Symposium on Artificial Neural Networks, pp 1-.
    55. Szeliski, R. (2010). / Computer vision: Algorithms and applications. New York: Springer.
    56. Tang, K., Fei-Fei, L., & Koller, D. (2012). Learning latent temporal structure for complex event detection. In / CVPR, Providence.
    57. Todorovic, S. (2012). Human activities as stochastic kronecker graphs. In / ECCV, Carlton.
    58. Vig, E., Dorr, M., & Cox, D. (2012). Space-variant descriptor sampling for action recognition based on saliency and eye movements. In / ECCV, Carlton.
    59. Wang, H., Kl?ser, A., Schmid, C., & Cheng-Lin, L. (2013). Dense trajectories and motion boundary descriptors for action recognition. In / IJCV, Dublin.
    60. Wang, Y. (2011). / & Mori, G. Probabilistic versus max-margin. PAMI: Hidden part models for human action recognition.
    61. Williams, C., & Seeger, M. (2001). Using the Nystr?m method to speed up kernel machines. In / NIPS, Allahabad.
    62. Yu, G., Yuan, J., & Liu, Z. (2012). Propagative hough voting for human activity recognition. In / ECCV, New York.
  • 作者单位:Adrien Gaidon (1)
    Zaid Harchaoui (2)
    Cordelia Schmid (2)

    1. Xerox Research Center Europe, Meylan, France
    2. LEAR Team, INRIA Grenoble Rh?ne-Alpes, 655 Avenue de l’Europe, Montbonnot?, 38330, France
  • ISSN:1573-1405
文摘
Complex activities, e.g. pole vaulting, are composed of a variable number of sub-events connected by complex spatio-temporal relations, whereas simple actions can be represented as sequences of short temporal parts. In this paper, we learn hierarchical representations of activity videos in an unsupervised manner. These hierarchies of mid-level motion components are data-driven decompositions specific to each video. We introduce a spectral divisive clustering algorithm to efficiently extract a hierarchy over a large number of tracklets (i.e. local trajectories). We use this structure to represent a video as an unordered binary tree. We model this tree using nested histograms of local motion features. We provide an efficient positive definite kernel that computes the structural and visual similarity of two hierarchical decompositions by relying on models of their parent–child relations. We present experimental results on four recent challenging benchmarks: the High Five dataset?(Patron-Perez et al., High five: recognising human interactions in TV shows, 2010), the Olympics Sports dataset?(Niebles et al., Modeling temporal structure of decomposable motion segments for activity classification, 2010), the Hollywood 2 dataset?(Marszalek et al., Actions in context, 2009), and the HMDB dataset?(Kuehne et al., HMDB: A large video database for human motion recognition, 2011). We show that per-video hierarchies provide additional information for activity recognition. Our approach improves over unstructured activity models, baselines using other motion decomposition algorithms, and the state of the?art.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700