Multiple feature fusion in convolutional neural networks for action recognition
详细信息    查看全文
文摘
Action recognition is important for understanding the human behaviors in the video, and the video representation is the basis for action recognition. This paper provides a new video representation based on convolution neural networks (CNN). For capturing human motion information in one CNN, we take both the optical flow maps and gray images as input, and combine multiple convolutional features by max pooling across frames. In another CNN, we input single color frame to capture context information. Finally, we take the top full connected layer vectors as video representation and train the classifiers by linear support vector machine. The experimental results show that the representation which integrates the optical flow maps and gray images obtains more discriminative properties than those which depend on only one element. On the most challenging data sets HMDB51 and UCF101, this video representation obtains competitive performance.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700