动态视角下人体行为识别研究

英文篇名：Human Behavior Recognition in Dynamic Perspective
作者：纪亮亮 ; 赵敏
英文作者：JI Liang-liang;ZHAO Min;School of Oplical-Electrical and Computer Engineering,University of Shanghai for Science and Technology;
关键词：人体行为识别 ; 3D数据库 ; CRFasRNN
英文关键词：human activity recognition;;3D dataset;;CRFasRNN
中文刊名：RJDK
英文刊名：Software Guide
机构：上海理工大学光电信息与计算机工程学院;
出版日期：2019-01-04 11:16
出版单位：软件导刊
年：2019
期：v.18;No.197
语种：中文;
页：RJDK201903040
页数：5
CN：03
ISSN：42-1671/TP
分类号：184-188

摘要

3D人体行为识别数据库发展给人体行为识别研究者提供了便利,然而现存数据库视角固定等问题限制了机器人移动范围。为了研究真实环境下的人体行为识别,建立一个基于RGB-D摄像机的动态多视角人体行为数据库DMV Action3D,收集了20人的600多个行为视频,约60万帧彩色图像和深度图像。另外,在DMV Action3D数据库基础上,利用CRFasRNN图片分割技术将人像进行分割并分别提取Harris3D特征,利用隐马尔可夫模型对动态视角下的人体行为进行识别。实验结果表明,在动态视角下使用CRFasRNN图像分割方法,人像分割效果突出,且不受环境、场景、光照因素影响,与真实环境下人体轮廓的相似度极高。DMV Ac?tion3D数据集对于研究真实环境下人体行为具有较大优势,为服务机器人识别真实环境下人体行为提供了一个较佳资源。
The development of 3D human action database is convenient for researchers to study human behavior recognition,but the existing database have some deficiencies such as perspective fixate limit robot movement range. In order to study the identification of human behavior in real environment,we introduce a large-scale complex RGB-D human activity database for human activity recognition named DMV action3D. The database contains 20 peoples' behavior in more than 600 videos which contain contain 600 thousand frames of color images and depth images. In addition,we propose to use CRFasRNN image segmentation method to recognize the human body and extract the Harris3D and HOG3D features based on DMV action3D database. Hidden Markov modal is used to identify the human behavior in dynamic perspective. The experimental results show that CRF as RNN image segmentation method is prominent and not effected by environment,scene and lighting factors under the dynamic visual perspective.DWV action3D database has great advantages and applicable value to study human activities recognition in the real world.

引文

[1]LI W,ZHANG Z,LIU Z.Action recognition based on a bag of 3Dpoints[C].IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops,2010:9-14.
    [2]WANG J,LIU Z,WU Y,et al.Mining action let ensemble for action recognition with depth cameras[C].Computer Vision and Pattern Recognition(CVPR),2012:1290-1297.
    [3]SUNG J,PONCE C,SELMAN B,et al.Unstructured human activity detection from RGBD images[J].IEEE International Conference on Robotics&Automation,2011,44(8):47-55.
    [4]KOPPULA H S,GUPTA R,SAXENA A.Learning human activities and object affordances from RGB-D videos[J].International Journal of Robotics Research,2013,32(8):951-970.
    [5]CHEN C,JAFARI R,KEHTARNAVAZ N.UTD-MHAD:a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C].IEEE International Conference on Image Processing,2015:168-172.
    [6]RAHMANI H,MAHMOOD A,DU H,et al.Histogram of oriented principal components for cross-view action recognition[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2016,38(12):2430-2443.
    [7]SHAHROUDY A,LIU J,NG T T,et al.NTU RGB+D:a large scale dataset for 3D human activity analysis[C].Computer Vision&Pattern Recognition,2016:1010-1019.
    [8]ZHENG S,JAYASUMANA S,ROMERA-PAREDES B,et al.Conditional random fields as recurrent neural networks[C].IEEE International Conference on Computer Vision,2016:1529-1537.
    [9]YAMADA T,HAYAMIZU Y,YAMAMOTO Y,et al.A stretchable carbon annotate strain sensor for human-motion detection[J].Nature Nanotechnology,2011,6(5):296-301.
    [10]TAO M,BAI J,KOHLI P,et al.Simple flow:a non iterative,sub linear optical flow algorithm[J].Computer Graphics Forum,2012,31(2pt1):345-353.
    [11]LI N,CHENG X,ZHANG S,et al.Realistic human action recognition by fast HOG3D and self-organization feature map[J].Machine Vision&Applications,2014,25(7):1793-1812.
    [12]TOMPSON J,JAIN A,LECUN Y,et al.Joint training of a convolution network and a graphical model for human pose estimation[C].Eprint Arxiv,2014:1799-1807.
    [13]LEUTENEGGER S,CHLI M,SIEGWART R Y.BRISK:binary robust invariant scalable key points[J].International Conference on Computer Vision(ICCV),2011,58(11):2548-2555.
    [14]XU W,XU W,YANG M,et al.3D Convolution neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2012:35(1):221-231.
    [15]DO T M T,ARTIERES T.Neural conditional random fields[C].Thirteenth International Conference on Artificial Intelligence&Statistics,2010:177-184.
    [16]BELL S,UPCHURCH P,SNAVELY N,et al.Material recognition in the wild with the Materials in context database[C].IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2015:3479-3487.
    [17]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deep lab:semantic image segmentation with deep convolution nets,aurous convolution,and fully connected CRFs[J].IEEE Trans Pattern Anal Mach Intel,2016,40(4):834-848.
    [18]YAO K,PENG B,ZWEIG G,et al.Recurrent conditional random field for language understanding[C].IEEE International Conference on Acoustics,Speech and Signal Processing,2014:4077-4081.
    [19]GIRSHICK R R,IANDOLA F,DARRELL T,et al.Deformable part models are convolutional neural networks[C].Computer Vision and Pattern Recognition,2015:437-446.
    [20]KNISS J,JIN K,IVANS R,et al.Robotics Research with TurtleBot2016[D].Idaho:Boise State University Scholar Works,2016.
    [21]LéCUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
    [22]LONG J,SHELHAMER E,DARRELL T.Fully convolution networks for semantic segmentation[C].IEEE Conference on Computer Vision and Pattern Recognition,2015:3431-3440.
    [23]KOLTUN V.Efficient inference in fully connected CRFs with Gaussian edge potentials[C].International Conference on Neural Information Processing Systems,2011:109-117.
    [24]SIPIRAN I,BUSTOS B.Harris 3D:a robust extension of the Harris operator for interest point detection on 3D meshes[J].Visual Computer,2011,27(11):963.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700