Action recognition by hidden temporal models

详细信息查看全文

作者：Jianzhai Wu (1)
Dewen Hu (1)
Fanglin Chen (1)
关键词：Human action recognition ; Temporal pyramid model (TPM) ; Multi ; model representation ; Latent SVM
刊名：The Visual Computer
出版年：2014
出版时间：December 2014
年：2014
卷：30
期：12
页码：1395-1404
全文大小：2,045 KB
参考文献：1. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 257鈥?67 (2001) 10.1109/34.910878" target="_blank" title="It opens in new window">CrossRef
2. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124鈥?137 (2004) 10.1109/TPAMI.2004.60" target="_blank" title="It opens in new window">CrossRef
3. Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ECCV (2011)
4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
5. Dalal, N., Triggs, B., Schimid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV (2006)
6. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. (2008)
7. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: CVPR (2008)
8. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627鈥?645 (2010) 10.1109/TPAMI.2009.167" target="_blank" title="It opens in new window">CrossRef
9. Hoai, M., la Torre, F.D.: Max-margin early event detectors. In: CVPR (2012)
10. Ikizler, N., Forsyth, D.: Searching for complex human activities with no visual examples. Int. J. Comput. Vis. 80(3), 337鈥?57 (2008) 10.1007/s11263-008-0142-8" target="_blank" title="It opens in new window">CrossRef
11. Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV (2007)
12. Jiang, Y.G., Dai, Q., Xue, X., Liu, W., Ngo, C.W.: Trajectory-based modeling of human actions with motion reference points. In: ECCV (2012)
13. Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: ECCV (2012)
14. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video dataset for human action recognition. In: ICCV (2011)
15. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of CVPR (2008)
16. Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: CVPR (2007)
17. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
18. Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)
19. Liu, J., Shah, M.: Learning human actions via information maximization. In: Proceedings of CVPR (2008)
20. Natarajan, P., Nevatia, R.: View and scale invariant action recognition using multiview shape-flow models. In: CVPR (2008)
21. Niebles, J.C., Chen, C.W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: ECCV (2010)
22. Ogale, A.S., Karapurkar, A., Gutemberg, G.F., Aloimonos, Y.: View invariant identification of pose sequences for action recognition. In: VACE (2004)
23. Sadanand, S., Corso, J.: Action bank: a high-level representation of activity in video. In: CVPR (2012)
24. Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: ECCV (2010)
25. Schindler, K., Gool, L.V.: Action snippets: how many frames does human action recognition require? In: CVPR (2008)
26. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888鈥?05 (2000) 10.1109/34.868688" target="_blank" title="It opens in new window">CrossRef
27. Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR (2012)
28. Tran, D., Yuan, J.: Optimal spatio-temporal path discovery for video event detection. In: CVPR (2011)
29. Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: NIPS (2012)
30. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)
31. Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010)
32. Wang, H., Klaser, A., Schimid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)
33. Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60鈥?9 (2013) 10.1007/s11263-012-0594-8" target="_blank" title="It opens in new window">CrossRef
34. Wang, Y., Mori, G.: Hidden part models for human action recognition: probabilistic versus max margin. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1310鈥?323 (2011) 10.1109/TPAMI.2010.214" target="_blank" title="It opens in new window">CrossRef
35. Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV (2007)
36. Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224鈥?41 (2011) 10.1016/j.cviu.2010.10.002" target="_blank" title="It opens in new window">CrossRef
37. Xiang, T., Gong, S.: Beyong tracking: modelling action and understanding behavior. Int. J. Comput. Vis. 67(1), 21鈥?1 (2006) 10.1007/s11263-006-4329-6" target="_blank" title="It opens in new window">CrossRef
38. Yao, B., Fei-Fei, L.: Recognizing human actions in still images by modeling the mutual context of objects and human poses. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1691鈥?703 (2012) 10.1109/TPAMI.2012.67" target="_blank" title="It opens in new window">CrossRef
39. Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)
40. Yuan, J., Liu, Z., Wu, Y.: Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1728鈥?743 (2011) 10.1109/TPAMI.2011.38" target="_blank" title="It opens in new window">CrossRef
41. Yuille, A., Rangarajan, A.: The concave-convex procedure (cccp). In: NIPS, pp. 1033鈥?040 (2001)
作者单位：Jianzhai Wu (1)
Dewen Hu (1)
Fanglin Chen (1)

1. Department of Automatic Control, College of Mechatronics and Automation, National University of Defense Technology, Changsha, 410073, Hunan, China
ISSN：1432-2315

文摘

We focus on the recognition of human actions in uncontrolled videos that may contain complex temporal structures. It is a difficult problem because of the large intra-class variations in viewpoint, video length, motion pattern, etc. To address these difficulties, we propose a novel system in this paper that represents each action class by hidden temporal models. In this system, we represent the crucial action event per category by a video segment that covers a fixed number of frames and can move temporally within the sequences. To capture the temporal structures, the video segment is described by a temporal pyramid model. To capture large intra-class variations, multiple models are combined using Or operation to represent alternative structures. The index of model and the start frame of segment are both treated as hidden variables. We implement a learning procedure based on the latent SVM method. The proposed approach is tested on two difficult benchmarks: the Olympic Sports and HMDB51 data sets. The experimental results reveal that our system is comparable to the state-of-the-art methods in the literature.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700