Action and Gesture Temporal Spotting with Super Vector Representation

详细信息查看全文

作者：Xiaojiang Peng (16) (18)
Limin Wang (17) (18)
Zhuowei Cai (18)
Yu Qiao (18)

16. Southwest Jiaotong University ; Chengdu ; China
18. Shenzhen Key Lab of CVPR ; Shenzhen Institutes of Advanced Technology ; CAS ; Shenzhen ; China
17. Department of Information Engineering ; The Chinese University of Hong Kong ; Hong Kong ; China
关键词：Action recognition ; Gesture recognition ; Temporal spotting ; Super vector
刊名：Lecture Notes in Computer Science
出版年：2015
出版时间：2015
年：2015
卷：8925
期：1
页码：518-527
全文大小：1,103 KB
参考文献：1. Escalera, S Chalearn looking at people challenge 2014: dataset and results. In: Bronstein, M, Agapito, L, Rother, C eds. (2015) Computer Vision - ECCV 2014 Workshops. Springer, Heidelberg, pp. 459-473 CrossRef
2. Aggarwal, JK, Ryoo, MS (2011) Human activity analysis: A review. ACM Comput. Surv. 43: pp. 16 CrossRef
3. Mitra, S, Acharya, T (2007) Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C 37: pp. 311-324 CrossRef
4. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV, pp. 2556鈥?563 (2011)
5. Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012)
6. Sanchez, D., Bautista, M., Escalera, S.: Hupab 8k+: Dataset and ecoc-graphcut based segmentation of human limbs. Neurocomputing (2014)
7. Escalera, S., Gonz脿lez, J., Bar贸, X., Reyes, M., Lopes, O., Guyon, I., Athitsos, V., Escalante, H.J.: Multi-modal gesture recognition challenge 2013: dataset and results. In: ICMI, pp. 445鈥?52 (2013)
8. Wang, H, Kl盲ser, A, Schmid, C, Liu, CL (2013) Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision 103: pp. 60-79 CrossRef
9. Perronnin, F, S谩nchez, J, Mensink, T Improving the fisher kernel for large-scale image classification. In: Daniilidis, K, Maragos, P, Paragios, N eds. (2010) Computer Vision 鈥?ECCV 2010. Springer, Heidelberg, pp. 143-156 CrossRef
10. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470鈥?477 (2003)
11. Wang, X, Wang, LM, Qiao, Y A comparative study of encoding, pooling and normalization methods for action recognition. In: Lee, KM, Matsushita, Y, Rehg, JM, Hu, Z eds. (2013) Computer Vision 鈥?ACCV 2012. Springer, Heidelberg, pp. 572-585 CrossRef
12. Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. CoRR abs/1405.4506 (2014)
13. Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008). http://www.vlfeat.org/
14. Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS, pp. 487鈥?93 (1998)
15. Chang, CC, Lin, CJ (2011) LIBSVM: A library for support vector machines. ACM TIST 2: pp. 27
16. Yong, P, Bingbing, N, Indriyati, A Mixture of heterogeneous attribute analyzers for human action detection. In: Bronstein, M, Agapito, L, Rother, C eds. (2015) Computer Vision - ECCV 2014 Workshops. Springer, Heidelberg, pp. 528-540 CrossRef
17. Shu, Z, Yun, K, Samaras, D Action detection with improved dense trajectories and sliding window. In: Bronstein, M, Agapito, L, Rother, C eds. (2015) Computer Vision - ECCV 2014 Workshops. Springer, Heidelberg, pp. 541-551 CrossRef
18. Neverova, N, Wolf, C, Taylor, GW, Nebout, F (2015) Multi-scale deep learning for gesture detection and localization. Computer Vision - ECCV 2014 Workshops. Springer, Heidelberg, pp. 474-490 CrossRef
19. Monnier, C, German, S, Ost, A A multi-scale boosted detector for efficient and robust gesture recognition. In: Bronstein, M, Agapito, L, Rother, C eds. (2015) Computer Vision - ECCV 2014 Workshops. Springer, Heidelberg, pp. 491-502 CrossRef
20. Chang, JY Nonparametric gesture labeling from multi-modal data. In: Bronstein, M, Agapito, L, Rother, C eds. (2015) Computer Vision - ECCV 2014 Workshops. Springer, Heidelberg, pp. 503-517 CrossRef
作者单位：Computer Vision - ECCV 2014 Workshops
丛书名：978-3-319-16177-8
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349

文摘

This paper focuses on describing our method designed for both track 2 and track 3 at Looking at People (LAP) challenging [1]. We propose an action and gesture spotting system, which is mainly composed of three steps: (i) temporal segmentation, (ii) clip classification, and (iii) post processing. For track 2, we resort to a simple sliding window method to divide each video sequence into clips, while for track 3, we design a segmentation method based on the motion analysis of human hands. Then, for each clip, we choose a kind of super vector representation with dense features. Based on this representation, we train a linear SVM to conduct action and gesture recognition. Finally, we use some post processing techniques to void the detection of false positives. We demonstrate the effectiveness of our proposed method by participating the contests of both track 2 and track 3. We obtain the best performance on track 2 and rank \(4^{th}\) on track 3, which indicates that the designed system is effective for action and gesture recognition.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700