Realistic human action recognition by Fast HOG3D and self-organization feature map

设为首页

收藏本站

网站地图 | English | 公务邮箱

About the library

Background
History
Leadership
Organization

Readers' Guide

Opening Hours
Collections
Help Via Email

Publications

Electronic Information Resources

Realistic human action recognition by Fast HOG3D and self-organization feature map

详细信息查看全文

作者：Nijun Li ; Xu Cheng ; Suofei Zhang ; Zhenyang Wu
关键词：Spatio ; temporal interest points (STIPs) ; HOG3D/Fast HOG3D ; Self ; organization feature map (SOM) ; Support vector machine (SVM) ; Bag ; of ; words (BoW)
刊名：Machine Vision and Applications
出版年：2014
出版时间：October 2014
年：2014
卷：25
期：7
页码：1793-1812
全文大小：3,724 KB
参考文献：1. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976-90 (2010) CrossRef
2. Turaga, P., Chellappa, R.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473-488 (2008) CrossRef
3. Chaquet, J.M., Carmona, E.J., Caballero, A.F.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. (CVIU) 117(6), 633-59 (2013) CrossRef
4. Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. Proc. CVPR 2, 1709-718 (2006)
5. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59-9 (1982) CrossRef
6. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proceedings of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, pp. 65-2 (2005)
7. Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. Proc. ECCV 3954, 490-03 (2006)
8. Yao, A., Gall, J., Van Gool, L.: A Hough transform-based voting framework for action recognition. In: Proceedings of CVPR, pp. 2061-068 (2010)
9. Klaser, A., Marsza?ek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British Machine Vision Conference (BMVC), pp. 995-004 (2008)
10. Ji, Yanli, Shimada, A., Taniguchi, R.: Human action recognition by SOM considering the probability of spatio-temporal features. Neural Inf. Process. Models Appl. 6444, 391-98 (2010) CrossRef
11. Ilonen, J., Kamarainen, J.K.: Object categorization using self-organization over visual appearance. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 4549-553 (2006)
12. Huang, W., Wu, Q.M.J.: Human action recognition based on self-organizing map. In: Proceedings of ICASSP, pp. 2130-133 (2010)
13. Jin, S., Li, Y., Lu, G.-M., et al.: SOM-based hand gesture recognition for virtual interactions. In: Proceedings of the IEEE International Symposium on Virtual Reality Innovation (ISVRI), pp. 317-22 (2011)
14. Shimada, A., Taniguchi, R.i.: Gesture recognition using sparse code of hierarchical SOM. In: Proceedings of ICPR, pp. 1- (2008)
15. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Proc. ICCV 2, 1395-402 (2005)
16. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Proc. ICPR 3, 32-6 (2004)
17. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild- In: Proceedings of CVPR, pp. 1996-003 (2009)
18. Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of ICCV, pp. 1593-600 (2009)
19. Cohen, I., Li, H.: Inference of human postures by classification of 3D human body shape. In: Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), pp. 74-1 (2003)
20. Sheikh, Y., Sheikh, M., Shah, M.: Exploring the space of a human action. Proc. CVPR 1, 144-49 (2005)
21. Kellokumpu, V., Pietik?inen, M., Heikkil?, J.: Human activity recognition using sequences of postures. In: Proceedings of IAPR Conference on Machine Vision Applications, pp. 570-73 (2005)
22. Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: Proceedings of CVPR, pp. 1- (2007)
23. Wang, L., Suter, D.: Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: Proceedings of CVPR, pp. 1- (2007)
24. Abdelkader, M.F., Almageed, W.A., Srivastava, A., Chellappa, R.: Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds. Comput. Vis. Image Underst. (CVIU) 115(3), 439-55 (2011) CrossRef
25. Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. Proc. CVPR 1, 984-89 (2005)
26. Achard, C., Qu, X., Mokhber, A., Milgram, M.: A novel approach for recognition of human actions with semi-global features. Mach. Vis. Appl. 19, 27-4 (2008) CrossRef
27. Grundmann, M., Meier, F., Essa, I.: 3D shape context and distance transform for action recognition. In: Proceedings of ICPR, pp. 1- (2008)
28. Laptev, I.: On space-time interest points. IJCV 64(2/3), 107-23 (2005) CrossRef
29. Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: action recognition through the motion analysis of tracked features. In: Proceedings of ICCV Workshops, pp. 514-21 (2009)
30. Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of ICCV, pp. 104-11 (2009)
31. Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: Proceedings of ECCV, pp. 577-90 (2010)
32. Wang, H., Klaser, A., Schmid, C.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60-9 (2013) CrossRef
33. Niebles, J.C., Wang, Hongcheng, Fei-Fei, Li: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299-18 (2008) CrossRef
34. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of CVPR, pp. 1- (2008)
35. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of ACM International Conference on Multimedia, pp. 357-60 (2007)
36. Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: Proceedings of ECCV, pp. 508-21 (2010)
37. Zhang, Y., Liu, X., Chang, M.C., et al.: Spatio-temporal phrases for activity recognition. Proc. ECCV 7574, 707-21 (2012)
38. Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: Proceedings of CVPR, pp. 1- (2008)
39. Etemad, S.A., Arya, A.: 3D human action recognition and style transformation using resilient backpropagation neural networks. Proc. Intell. Comput. Intell. Syst. (ICIS) 4, 296-01 (2009)
40. Li, N., Cheng, X., Zhang, S., Wu, Z.: Recognizing human actions by BP-AdaBoost algorithm under a hierarchical framework. In: Proceedings of ICASSP, pp. 3407-411 (2013)
41. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282-89 (2001)
42. Wang, Y., Mori, G.: Max-margin hidden conditional random fields for human action recognition. In: Proceedings of CVPR, pp. 872-79 (2009)
43. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257-86 (1989) CrossRef
44. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50-7 (1999)
45. Gong, Shaogang, Xiang, Tao: Recognition of group activities using dynamic probabilistic networks. Proc. ICCV 2, 742-49 (2003)
46. Ryoo, M.S., Chen, C.C., Aggarwal J.K., et al.: An overview of contest on semantic description of human activities (SDHA) 2010. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 270-85 (2010)
47. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Elsevier Pte Ltd., Singapore (2010)
48. Boberg, J., Salakoski, T.: General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances. Pattern Recognit. 26(9), 1395-406 (1993) CrossRef
49. Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: Proceedings of CVPR, pp. 461-68 (2009)
50. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Proceedings of CVPR, pp. 1- (2008)
51. Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: Proceedings of ICCV, pp. 492-97 (2009)
52. Imtiaz, H., Mahbub, U., Ahad, M.A.R.: Action recognition algorithm based on optical flow and RANSAC in frequency domain. In: Proceedings of SICE Annual Conference, pp. 1627-631 (2011)
53. Waltisberg, D., Yao, A., Gall, J., Van Gool, L.: Variations of a Hough-voting action recognition system. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 306-12 (2010)
54. Zhen, X., Shao, L.: A local descriptor based on Laplacian pyramid coding for action recognition. Pattern Recognit. Lett. (PRL) 34(15), 1899-905 (2013) CrossRef
55. Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interactions between human performers by ‘dominating pose doublet- In: Proceedings of the Machine Vision and Applications, pp. 1-0 (2013)
作者单位：Nijun Li (1)
Xu Cheng (1)
Suofei Zhang (1)
Zhenyang Wu (1)

1. School of Information Science and Engineering, Southeast University, Room 205 of Jianxiong Building, Sipailou #2, Xuanwu District, Nanjing, 210096, People’s Republic of China
ISSN：1432-1769

文摘

Nowadays, local features are very popular in vision-based human action recognition, especially in “wild-or unconstrained videos. This paper proposes a novel framework that combines Fast HOG3D and self-organization feature map (SOM) network for action recognition from unconstrained videos, bypassing the demanding preprocessing such as human detection, tracking or contour extraction. The contributions of our work not only lie in creating a more compact and computational effective local feature descriptor than original HOG3D, but also lie in first successfully applying SOM to realistic action recognition task and studying its training parameters-influence. We mainly test our approach on the UCF-YouTube dataset with 11 realistic sport actions, achieving promising results that outperform local feature-based support vector machine and are comparable with bag-of-words. Experiments are also carried out on KTH and UT-Interaction datasets for comparison. Results on all the three datasets confirm that our work has comparable, if not better, performance comparing with state-of-the-art.