Recognizing human interactions by genetic algorithm-based random forest spatio-temporal correlation
详细信息    查看全文
  • 作者:Nijun Li ; Xu Cheng ; Haiyan Guo ; Zhenyang Wu
  • 关键词:Motion context (MC) ; Spatio ; temporal interest points (STIPs) ; Random forest ; Genetic algorithm (GA) ; Spatio ; temporal (S ; T) correlation
  • 刊名:Pattern Analysis & Applications
  • 出版年:2016
  • 出版时间:February 2016
  • 年:2016
  • 卷:19
  • 期:1
  • 页码:267-282
  • 全文大小:2,597 KB
  • 参考文献:1.Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput (IVC) 28(6):976–990CrossRef
    2.Turaga P, Chellappa R, Subrahmanian VS et al (2008) Machine recognition of human activities: a survey. IEEE Trans Circuits Syst Video Technol 18(11):1473–1488CrossRef
    3.Blank M, Gorelick L, Shechtman E, et al. (2005) Actions as space-time shapes. In: Proc. of International Conference on Computer Vision (ICCV), pp 1395–1402
    4.Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proc. of International Conference on Pattern Recognition (ICPR), pp 32–36
    5.Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: Proc. of International Conference on Computer Vision (ICCV), pp 444–451
    6.Laptev I, Marszalek M, Schmid C, et al. (2008) Learning realistic human actions from movies. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
    7.Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1996–2003
    8.Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proc. of International Conference on Computer Vision (ICCV), pp 1593–1600
    9.Cohen I, Li H (2003) Inference of human postures by classification of 3D human body shape. In: IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), pp 74–81
    10.Bloom V, Argyriou V, Makris D (2013) Dynamic feature selection for online action recognition. In: International Workshop on Human Behavior Understanding (HBU), pp 64–76
    11.Schwarz LA, Mateus D, Navab N (2012) Recognizing multiple human activities and tracking full-body pose in unconstrained environments. Pattern Recognit 45(1):11–23CrossRef
    12.Kellokumpu V, Pietikäinen M, Heikkilä J (2005) Human activity recognition using sequences of postures. In: IAPR Conf. of Machine Vision Applications, pp 570–573
    13.Junejo IN, Junejo KN, Aghbari ZA (2013) Silhouette-based human action recognition using SAX-shapes. Vis Comput 30(3):259–269CrossRef
    14.Grundmann M, Meier F, Essa I (2008) 3D shape context and distance transform for action recognition. In: Proc. of International Conference on Pattern Recognition (ICPR), pp 1–4
    15.Razzaghi P, Palhang M, Gheissari N (2013) A new invariant descriptor for action recognition based on spherical harmonics. Pattern Anal Appl (PAAA) 16(4):507–518MathSciNet CrossRef
    16.Laptev I (2005) On space-time interest points. Int J Comput Vis (IJCV) 64(2–3):107–123CrossRef
    17.Dollar P, Rabaud V, Cottrell G, et al. (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp 65–72
    18.Wang H, Klaser A, Schmid C et al (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis (IJCV) 103(1):60–79MathSciNet CrossRef
    19.Yu J, Liu D, Tao D et al (2012) On combining multiple features for cartoon character retrieval and clip synthesis. IEEE Trans Syst Man Cybern 42(5):1413–1427CrossRef
    20.Yu J, Wang M, Tao D (2012) Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Trans Image Process (TIP) 21(11):4636–4648MathSciNet CrossRef
    21.Liu W, Tao D, Cheng J et al (2014) Multiview Hessian discriminative sparse coding for image annotation. Comput Vis Image Underst (CVIU) 118:50–60CrossRef
    22.Liu W, Tao D (2013) Multiview Hessian regularization for image annotation. IEEE Trans Image Process (TIP) 22(7):2676–2687MathSciNet CrossRef
    23.Tao D, Tang X, Li X et al (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell (TPAMI) 28(7):1088–1099CrossRef
    24.Li N, Cheng X, Zhang S, et al. (2013) Recognizing human actions by BP-Adaboost algorithm under a hierarchical recognition framework. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3407–3411
    25.Li N, Cheng X, Zhang S et al (2014) Realistic human action recognition by Fast HOG3D and self-organization feature map. Mach Vis Appl (MVA) 25(7):1793–1812CrossRef
    26.Quattoni A, Wang S (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(10):1848–1853CrossRef
    27.Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
    28.Ubalde S, Goussies NA, Mejail ME (2013) Efficient descriptor tree growing for fast action recognition. Pattern Recognit Lett 36(1):213–220
    29.Ahmad M, Lee SW (2006) HMM-based human action recognition using multiview image sequences. In: Proc. of International Conference on Pattern Recognition (ICPR), pp 263–266
    30.Niebles JC, Wang H, Li FF (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis (IJCV) 79(3):299–318CrossRef
    31.Zhang Z, Hu Y, Chan S, et al. (2008) Motion context: a new representation for human action recognition. In: Proc. of European Conference on Computer Vision (ECCV), pp 817–829
    32.Ogale AS, Karapurkar A, Aloimonos Y (2007) View-invariant modeling and recognition of human actions using grammars. In: Workshops on Dynamical Vision, pp 115–126
    33.Wang L, Wang Y, Gao W (2011) Mining layered grammar rules for action recognition. Int J Comput Vis (IJCV) 93(2):162–182CrossRef
    34.Zhang Z, Tao D (2012) Slow feature analysis for human action recognition. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34(3):436–450MathSciNet CrossRef
    35.Zhang X, Yang Y, Jiao LC et al (2013) Manifold-constrained coding and sparse representation for human action recognition. Pattern Recognit 46(7):1819–1831CrossRef
    36.Zhang T, Tao D, Li X et al (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21(9):1299–1313CrossRef
    37.Yu J, Liu D, Tao D et al (2011) Complex object correspondence construction in two-dimensional animation. IEEE Trans Image Process (TIP) 20(11):3257–3269MathSciNet CrossRef
    38.Tao D, Li X, Wu X et al (2007) General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(10):1700–1715CrossRef
    39.Tao D, Li X, Wu X et al (2009) Geometric mean for subspace selection. IEEE Trans Pattern Anal Mach Intell (TPAMI) 31(2):260–274CrossRef
    40.Liu W, Liu H, Tao D et al (2015) Multiview Hessian regularized logistic regression for action recognition. Signal Process 110:101–107MathSciNet CrossRef
    41.Guan N, Tao D, Luo Z et al (2011) Non-negative patch alignment framework. IEEE Trans Neural Netw (TNN) 22(8):1218–1230CrossRef
    42.Guan N, Tao D, Luo Z, et al. (2012) MahNMF: Manhattan non-negative matrix factorization. pp 1–43 (preprint) arXiv:​1207.​3438
    43.Guan N, Tao D, Luo Z et al (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process (TSP) 60(6):2882–2898MathSciNet CrossRef
    44.Shabani AH, Clausi DA, Zelek JS (2012) Evaluation of local spatio-temporal salient feature detectors for human action recognition. In: Proc. of Computer and Robot Vision (CRV), pp 468–475
    45.Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1948–1955
    46.Thurau C, Hlavac V (2008) Pose primitive based human action recognition in videos or still images. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
    47.Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: Proc. of British Machine Vision Conference (BMVC), pp 995–1004
    48.Scovanner P, Ali S, Shah M (2007) A 3-dimensional SIFT descriptor and its application to action recognition. In: Proc. of ACM International Conference on Multimedia, pp 357–360
    49.Everts I, van Gemert JC, Gevers T (2013) Evaluation of color STIPs for human action recognition. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2850–2857
    50.Reddy KK, Liu J, Shah M (2009) Incremental action recognition using feature-tree. In: Proc. of International Conference on Computer Vision (ICCV), pp 1010–1017
    51.Belongie S, Malik J, Puzicha J (2001) Shape context: a new descriptor for shape matching and object recognition. In: Proc. Advances in Neural Information Processing Systems, pp 831–837
    52.Matikainen P, Hebert M, Sukthankar R (2010) Representing pairwise spatial and temporal relations for action recognition. In: Proc. of European Conference on Computer Vision (ECCV), pp 508–521
    53.Zhang Y, Liu X, Chang MC, et al. (2012) Spatio-temporal phrases for activity recognition. In: Proc. of European Conference on Computer Vision (ECCV), pp 707–721
    54.Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process (TIP) 21(7):3262–3272MathSciNet CrossRef
    55.Yao A, Gall J, Van Gool L (2010) A Hough transform-based voting framework for action recognition. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2061–2068
    56.Waltisberg D, Yao A, Gall J, et al. (2010) Variations of a Hough-voting action recognition system. In: ICPR Contests on Recognizing Patterns in Signals, Speech, Images and Videos, pp 306–312
    57.Liu C, Kong Y, Wu X, et al. (2012) Action recognition with discriminative mid-level features. In: Proc. of International Conference on Pattern Recognition (ICPR), pp 3366–3369
    58.Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. In: Proc. of Advances in Neural Information Processing Systems, pp 985–992
    59.Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453CrossRef
    60.Ryoo MS, Chen CC, Aggarwal JK, et al. (2010) An overview of contest on semantic description of human activities (SDHA) 2010. In: ICPR Contests on Recognizing Patterns in Signals, Speech, Images and Videos, pp 270–285
    61.Quinlan JR (1986) Introduction of decision trees. Mach Learn 1(1):81–106
    62.Quinlan JR (1993) C4.5: Programs for machine learning, Morgan Kaufmann
    63.Mitchell TM (1997) Machine learning. McGraw-Hill Education Co. (Asia), New York
    64.Theodoridis S, Koutroumbas K (2010) Pattern Recognit. Elsevier Pte Ltd, Singapore
    65.Zhang X, Cui J, Tian L, et al. (2011) Local spatio-temporal feature based voting framework for complex human activity detection and localization. In: Proc. of Asian Conference on Pattern Recognition (ACPR), pp 12–16
  • 作者单位:Nijun Li (1)
    Xu Cheng (1)
    Haiyan Guo (1)
    Zhenyang Wu (1)

    1. School of Information Science and Engineering, Southeast University, Room 205 of Jianxiong Building, Sipailou #2, Xuanwu District, Nanjing, 210096, People’s Republic of China
  • 刊物类别:Computer Science
  • 刊物主题:Pattern Recognition
  • 出版者:Springer London
  • ISSN:1433-755X
文摘
Recognizing human interactions is a more challenging task than recognizing single person activities and has attracted much attention of the computer vision community. This paper proposes an innovative and effective way to recognize human interactions, which incorporates the advantages of both global motion context (MC) feature and spatio-temporal (S-T) correlation of local spatio-temporal interest point feature. The MC feature is used to train a random forest where genetic algorithm (GA) is applied to the training phase to achieve a good compromise between reliability and efficiency. Besides, we propose S-T correlation-based match, where MC’s structure and Needleman–Wunsch algorithm are used to calculate the spatial and temporal correlation score of two videos, respectively. Experiments on the UT-Interaction dataset show that our approaches outperform other prevalent machine learning methods, and that the combination of GA search-based random forest and S-T correlation achieves the state-of-the-art performance. Keywords Motion context (MC) Spatio-temporal interest points (STIPs) Random forest Genetic algorithm (GA) Spatio-temporal (S-T) correlation
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.