文摘
This paper presents a novel Local Surface Geometric Feature (LSGF) for human action recognition from video sequences captured by a depth camera. The LSGF is extracted from each skeleton joint in point cloud space to capture the static appearance and pose cues, which includes joint position, normal, and local curvature. A temporal pyramid of covariance matrix is exploited to model both pairwise relations of features instead of features themselves and the temporal evolution. Finally, Fisher vector encoding is imported as a global representation for a video sequence and SVM classifier is used for classification. In the extensive experiments, we achieve classification results superior to most of previous published results on three public benchmark datasets, i.e., MSR-Action3D, MSR DailyActivity3D, and UTKinect Action.