We provide a promising vision-based view-invariant action recognition system from RGB-D videos.
We use the Euclidean distance between spatio-temporal feature vectors that are represented in a Spatio-Temporal Matrix (STM).
We describe the local tendency of the STM using pyramid-structural bag-of-words (BoW-Pyramid) and train a SVM classifier.