Realistic human action recognition by Fast HOG3D and self-organization feature map
详细信息    查看全文
  • 作者:Nijun Li ; Xu Cheng ; Suofei Zhang ; Zhenyang Wu
  • 关键词:Spatio ; temporal interest points (STIPs) ; HOG3D/Fast HOG3D ; Self ; organization feature map (SOM) ; Support vector machine (SVM) ; Bag ; of ; words (BoW)
  • 刊名:Machine Vision and Applications
  • 出版年:2014
  • 出版时间:October 2014
  • 年:2014
  • 卷:25
  • 期:7
  • 页码:1793-1812
  • 全文大小:3,724 KB
  • 参考文献:1. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976-90 (2010) CrossRef
    2. Turaga, P., Chellappa, R.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473-488 (2008) CrossRef
    3. Chaquet, J.M., Carmona, E.J., Caballero, A.F.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. (CVIU) 117(6), 633-59 (2013) CrossRef
    4. Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. Proc. CVPR 2, 1709-718 (2006)
    5. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59-9 (1982) CrossRef
    6. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proceedings of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, pp. 65-2 (2005)
    7. Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. Proc. ECCV 3954, 490-03 (2006)
    8. Yao, A., Gall, J., Van Gool, L.: A Hough transform-based voting framework for action recognition. In: Proceedings of CVPR, pp. 2061-068 (2010)
    9. Klaser, A., Marsza?ek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British Machine Vision Conference (BMVC), pp. 995-004 (2008)
    10. Ji, Yanli, Shimada, A., Taniguchi, R.: Human action recognition by SOM considering the probability of spatio-temporal features. Neural Inf. Process. Models Appl. 6444, 391-98 (2010) CrossRef
    11. Ilonen, J., Kamarainen, J.K.: Object categorization using self-organization over visual appearance. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 4549-553 (2006)
    12. Huang, W., Wu, Q.M.J.: Human action recognition based on self-organizing map. In: Proceedings of ICASSP, pp. 2130-133 (2010)
    13. Jin, S., Li, Y., Lu, G.-M., et al.: SOM-based hand gesture recognition for virtual interactions. In: Proceedings of the IEEE International Symposium on Virtual Reality Innovation (ISVRI), pp. 317-22 (2011)
    14. Shimada, A., Taniguchi, R.i.: Gesture recognition using sparse code of hierarchical SOM. In: Proceedings of ICPR, pp. 1- (2008)
    15. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Proc. ICCV 2, 1395-402 (2005)
    16. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Proc. ICPR 3, 32-6 (2004)
    17. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild- In: Proceedings of CVPR, pp. 1996-003 (2009)
    18. Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of ICCV, pp. 1593-600 (2009)
    19. Cohen, I., Li, H.: Inference of human postures by classification of 3D human body shape. In: Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), pp. 74-1 (2003)
    20. Sheikh, Y., Sheikh, M., Shah, M.: Exploring the space of a human action. Proc. CVPR 1, 144-49 (2005)
    21. Kellokumpu, V., Pietik?inen, M., Heikkil?, J.: Human activity recognition using sequences of postures. In: Proceedings of IAPR Conference on Machine Vision Applications, pp. 570-73 (2005)
    22. Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: Proceedings of CVPR, pp. 1- (2007)
    23. Wang, L., Suter, D.: Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: Proceedings of CVPR, pp. 1- (2007)
    24. Abdelkader, M.F., Almageed, W.A., Srivastava, A., Chellappa, R.: Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds. Comput. Vis. Image Underst. (CVIU) 115(3), 439-55 (2011) CrossRef
    25. Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. Proc. CVPR 1, 984-89 (2005)
    26. Achard, C., Qu, X., Mokhber, A., Milgram, M.: A novel approach for recognition of human actions with semi-global features. Mach. Vis. Appl. 19, 27-4 (2008) CrossRef
    27. Grundmann, M., Meier, F., Essa, I.: 3D shape context and distance transform for action recognition. In: Proceedings of ICPR, pp. 1- (2008)
    28. Laptev, I.: On space-time interest points. IJCV 64(2/3), 107-23 (2005) CrossRef
    29. Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: action recognition through the motion analysis of tracked features. In: Proceedings of ICCV Workshops, pp. 514-21 (2009)
    30. Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of ICCV, pp. 104-11 (2009)
    31. Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: Proceedings of ECCV, pp. 577-90 (2010)
    32. Wang, H., Klaser, A., Schmid, C.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60-9 (2013) CrossRef
    33. Niebles, J.C., Wang, Hongcheng, Fei-Fei, Li: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299-18 (2008) CrossRef
    34. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of CVPR, pp. 1- (2008)
    35. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of ACM International Conference on Multimedia, pp. 357-60 (2007)
    36. Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: Proceedings of ECCV, pp. 508-21 (2010)
    37. Zhang, Y., Liu, X., Chang, M.C., et al.: Spatio-temporal phrases for activity recognition. Proc. ECCV 7574, 707-21 (2012)
    38. Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: Proceedings of CVPR, pp. 1- (2008)
    39. Etemad, S.A., Arya, A.: 3D human action recognition and style transformation using resilient backpropagation neural networks. Proc. Intell. Comput. Intell. Syst. (ICIS) 4, 296-01 (2009)
    40. Li, N., Cheng, X., Zhang, S., Wu, Z.: Recognizing human actions by BP-AdaBoost algorithm under a hierarchical framework. In: Proceedings of ICASSP, pp. 3407-411 (2013)
    41. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282-89 (2001)
    42. Wang, Y., Mori, G.: Max-margin hidden conditional random fields for human action recognition. In: Proceedings of CVPR, pp. 872-79 (2009)
    43. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257-86 (1989) CrossRef
    44. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50-7 (1999)
    45. Gong, Shaogang, Xiang, Tao: Recognition of group activities using dynamic probabilistic networks. Proc. ICCV 2, 742-49 (2003)
    46. Ryoo, M.S., Chen, C.C., Aggarwal J.K., et al.: An overview of contest on semantic description of human activities (SDHA) 2010. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 270-85 (2010)
    47. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Elsevier Pte Ltd., Singapore (2010)
    48. Boberg, J., Salakoski, T.: General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances. Pattern Recognit. 26(9), 1395-406 (1993) CrossRef
    49. Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: Proceedings of CVPR, pp. 461-68 (2009)
    50. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Proceedings of CVPR, pp. 1- (2008)
    51. Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: Proceedings of ICCV, pp. 492-97 (2009)
    52. Imtiaz, H., Mahbub, U., Ahad, M.A.R.: Action recognition algorithm based on optical flow and RANSAC in frequency domain. In: Proceedings of SICE Annual Conference, pp. 1627-631 (2011)
    53. Waltisberg, D., Yao, A., Gall, J., Van Gool, L.: Variations of a Hough-voting action recognition system. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 306-12 (2010)
    54. Zhen, X., Shao, L.: A local descriptor based on Laplacian pyramid coding for action recognition. Pattern Recognit. Lett. (PRL) 34(15), 1899-905 (2013) CrossRef
    55. Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interactions between human performers by ‘dominating pose doublet- In: Proceedings of the Machine Vision and Applications, pp. 1-0 (2013)
  • 作者单位:Nijun Li (1)
    Xu Cheng (1)
    Suofei Zhang (1)
    Zhenyang Wu (1)

    1. School of Information Science and Engineering, Southeast University, Room 205 of Jianxiong Building, Sipailou #2, Xuanwu District, Nanjing, 210096, People’s Republic of China
  • ISSN:1432-1769
文摘
Nowadays, local features are very popular in vision-based human action recognition, especially in “wild-or unconstrained videos. This paper proposes a novel framework that combines Fast HOG3D and self-organization feature map (SOM) network for action recognition from unconstrained videos, bypassing the demanding preprocessing such as human detection, tracking or contour extraction. The contributions of our work not only lie in creating a more compact and computational effective local feature descriptor than original HOG3D, but also lie in first successfully applying SOM to realistic action recognition task and studying its training parameters-influence. We mainly test our approach on the UCF-YouTube dataset with 11 realistic sport actions, achieving promising results that outperform local feature-based support vector machine and are comparable with bag-of-words. Experiments are also carried out on KTH and UT-Interaction datasets for comparison. Results on all the three datasets confirm that our work has comparable, if not better, performance comparing with state-of-the-art.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.