Multiple feature fusion in convolutional neural networks for action recognition

详细信息查看全文

作者：Hongyang Li ; Jun Chen ; Ruimin Hu
关键词：action recognition ; video deep ; learned representation ; convolutional neural network ; feature fusion
刊名：Wuhan University Journal of Natural Sciences
出版年：2017
出版时间：February 2017
年：2017
卷：22
期：1
页码：73-78
全文大小：
刊物类别：Mathematics and Statistics
刊物主题：Life Sciences, general; Materials Science, general; Computer Science, general;
出版者：Wuhan University
ISSN：1993-4998
卷排序：22

文摘

Action recognition is important for understanding the human behaviors in the video, and the video representation is the basis for action recognition. This paper provides a new video representation based on convolution neural networks (CNN). For capturing human motion information in one CNN, we take both the optical flow maps and gray images as input, and combine multiple convolutional features by max pooling across frames. In another CNN, we input single color frame to capture context information. Finally, we take the top full connected layer vectors as video representation and train the classifiers by linear support vector machine. The experimental results show that the representation which integrates the optical flow maps and gray images obtains more discriminative properties than those which depend on only one element. On the most challenging data sets HMDB51 and UCF101, this video representation obtains competitive performance.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700