文摘
This paper focuses on describing our method designed for both track 2 and track 3 at Looking at People (LAP) challenging [1]. We propose an action and gesture spotting system, which is mainly composed of three steps: (i) temporal segmentation, (ii) clip classification, and (iii) post processing. For track 2, we resort to a simple sliding window method to divide each video sequence into clips, while for track 3, we design a segmentation method based on the motion analysis of human hands. Then, for each clip, we choose a kind of super vector representation with dense features. Based on this representation, we train a linear SVM to conduct action and gesture recognition. Finally, we use some post processing techniques to void the detection of false positives. We demonstrate the effectiveness of our proposed method by participating the contests of both track 2 and track 3. We obtain the best performance on track 2 and rank \(4^{th}\) on track 3, which indicates that the designed system is effective for action and gesture recognition.