基于场景理解的人体动作识别模型

英文篇名：Human action recognition model based on scene understanding
作者：张嘉祺 ; 赵晓丽 ; 张翔
英文作者：Zhang Jiaqi;Zhao Xiaoli;Zhang Xiang;Department of Electrical and Electronic Engineering,Shanghai University of Engineering Science;
关键词：双流网络结构 ; 场景识别 ; 人体动作识别
英文关键词：dual stream network structure;;scene recognition;;human action recognition
中文刊名：JZCK
英文刊名：Computer Measurement & Control
机构：上海工程技术大学电子电气工程学院;
出版日期：2019-03-25
出版单位：计算机测量与控制
年：2019
期：v.27;No.246
基金：国家自然科学基金项目(61461021);; 上海市科教委项目(15590501300)
语种：中文;
页：JZCK201903031
页数：5
CN：03
ISSN：11-4762/TP
分类号：161-164+169

摘要

为了满足在复杂环境下对人体动作识别的需求,提出了一种基于场景理解的双流网络识别结构;将场景信息作为辅助信息加入了人体动作识别网络结构中,改善识别网络的识别准确率;对场景识别网络与人体动作识别网络不同的融合方式进行研究,确定了网络最佳识别结构;通过分析不同参数对识别准确率的影响,最终确定了双流网络的所有结构参数,设计并训练完成了双流网络结构;通过在UCF50,UCF101等公开数据集上实验,分别取得了95%,93%的准确率,高于典型的识别网络结果;对其他一些典型识别网络加入同样场景信息进行了研究,其实验结果证明了此方法可以有效改善识别准确率。
In order to meet the needs of human action recognition in complex environments,a dual-flow network recognition structure based on scene understanding is proposed.The scene information is added as auxiliary information to the human action recognition network structure to improve the recognition accuracy.The different fusion modes of the scene recognition network and the human action recognition network are studied,and the network optimal identification structure is determined.By analyzing the influence of different parameters on the recognition accuracy,all the structural parameters of the dual-flow network are finally determined.Through experiments on public data sets such as UCF50 and UCF101,95% and 93% accuracy were obtained,respectively,which is higher than the typical identification network results.Some other typical identification networks have been studied by adding the same scene information.The experimental results show that this method can effectively improve the recognition accuracy.

引文

[1]于萧榕,席屏,黄健荣.监控系统预警视频的分布式检索设计与实现[J].计算机测量与控制,2015,23(7):2511-2514.
    [2]He K,Zhang X,Ren S,et al.Deep Residual Learning for Image Recognition[J].2015:770-778.
    [3]Du T,Bourdev L,Fergus R,et al.Learning Spatiotemporal Features with 3DConvolutional Networks[A].IEEE International Conference on Computer Vision[C].IEEE,2016:4489-4497.
    [4]Simonyan K,Zisserman A.Two-Stream Convolutional Networks for Action Recognition in Videos[J].2014,1(4):568-576.
    [5]Diba A,Vivek S,Luc V.Deep temporal linear encoding networks[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].2017(1).
    [6]Feichtenhofer C,Pinz A,Zisserman A.Convolutional TwoStream Network Fusion for Video Action Recognition[J].2016:1933-1941.
    [7]Donahue J,Hendricks L A,Guadarrama S,et al.Long-term recurrent convolutional networks for visual recognition and description.AB initto calculation of the structures and properties of molecules[M].Elsevier,2017:85-91.
    [8]Szegedy C,Liu W,Jia Y,et al.Going deeper with convolutions[A].IEEE Conference on Computer Vision and Pattern Recognition[C].IEEE,2015:1-9.
    [9]Ioffe S,Szegedy C.Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shift[J].2015:448-456.
    [10]Bengio Y,Boulanger-Lewandowski N,Pascanu R.Advances in optimizing recurrent networks[A].IEEE International Conference on Acoustics,Speech and Signal Processing[C].IEEE,2013:8624-8628.
    [11]Sak H,Senior A,Beaufays F.Long short-term memory recurrent neural network architectures for large scale acoustic modeling[J].Computer Science,2014:338-342.
    [12]Cho K,Van Merrienboer B,Gulcehre C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[J].Computer Science,2014.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700