动态多视角复杂3D人体行为数据库及行为识别

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

动态多视角复杂3D人体行为数据库及行为识别

详细信息查看全文 | 推荐本文 |

英文篇名：Dynamic and Multi-view Complicated 3D Database of Human Activity and Activity Recognition
作者：王永雄 ; 李璇 ; 李梁华
英文作者：Wang Yongxiong;Li Xuan;Li Lianghua;School of Optional-Electrical and Computer Engineering,University of Shanghai for Science and Technology;
关键词：人体行为识别 ; 3D数据库 ; 多视角
英文关键词：human activity recognition;;3D dataset;;multi-view
中文刊名：SJCJ
英文刊名：Journal of Data Acquisition and Processing
机构：上海理工大学光电信息与计算机工程学院;
出版日期：2019-01-15
出版单位：数据采集与处理
年：2019
期：v.34;No.153
基金：国家自然科学基金(61673276,61603255)资助项目;; 上海青年科技英才扬帆计划(17YF1427000)资助项目
语种：中文;
页：SJCJ201901008
页数：12
CN：01
ISSN：32-1367/TN
分类号：72-83

摘要

提供了一个较大规模的基于RGB-D摄像机的人体复杂行为数据库DMV(Dynamic and multiview)action3D,从2个固定视角和一台移动机器人动态视角录制人体行为。数据库现有31个不同的行为类,包括日常行为、交互行为和异常行为类等三大类动作,收集了超过620个行为视频约60万帧彩色图像和深度图像,为机器人寻找最佳视角提供了可供验证的数据库。为验证数据集的可靠性和实用性,本文采取4种方法进行人体行为识别,分别是基于关节点信息特征、基于卷积神经网络(Convolutional neural networks,CNN)和条件随机场(Conditional random field,CRF)结合的CRFasRNN方法提取的彩色图像HOG3D特征,然后采用支持向量机(Support vector machine,SVM)方法进行了人体行为识别;基于3维卷积网络(C3D)和3D密集连接残差网络提取时空特征,通过softmax层以预测动作标签。实验结果表明:DMV action3D人体行为数据库由于场景多变、动作复杂等特点,识别的难度也大幅增大。DMV action3D数据集对于研究真实环境下的人体行为具有较大的优势,为服务机器人识别真实环境下的人体行为提供了一个较佳的资源。
In view of the fact that the existing 3D databases have fewer behavioral categories,few interactions with scenes,and single and fixed perspectives,this paper provides a large-scale human body complex behavior database DMV action 3D based on RGB-D cameras,from two fixed perspectives and a mobile robot records human behavior from a dynamic perspective. There are 31 different behavioral classes in the database,including daily behaviors,interaction behaviors,and abnormal behaviors angles. Validated database collected more than 620 behavioral videos,about 600 000 frames of color images and depth images,to provide robots with optimal viewing. In order to verify the reliability and practicability of the data set,this paper adopts four methods for human behavior recognition,which are HOG3D features extracted by CRFasRNN method based on the information features of customs nodes, CNN and conditional random field(CRF),and then adopts SVM method for human behavior recognition. Spatial and temporal characteristics are extracted based on the three-dimensional convolutional network (C3D) and the 3D dense connection residual network,and the motion tags are predicted by softmax layer. The results show that DMV action 3D human behavior database is characterized by a variety of scenes and complicated movements,and the difficulty of recognition is greatly increased. The DMV action 3D database has great advantages for studying human behavior in real environments,and provides a better resource for serving robots to recognize human behavior in real environments.

引文

[1]Koppula H S,Gupta R,Saxena A.Learning human activities and object affordances from RGB-D videos[M].England:Sage Publications Inc,2013:951-970.
    [2]Huang A S,Bachrach A,Henry P,et al.Visual odometry and mapping for autonomous flight using an RGB-D camera[M].Switzerland Robotics Research:Springer International Publishing,2017:235-252.
    [3]黄凯奇,任伟强,谭铁牛.图像物体分类与检测算法综述[J].计算机学报,2014,37(6):1225-1240.Huang Kaiqi,Ren Weiqiang,Tan Tieniu.A review on image object classification and detection[J].Chinese Journal of Computers,2014,37(6):1225-1240.
    [4]Aggarwal J K,Ryoo M S.Human activity analysis:A review[M].New York:ACM,2011:16.
    [5]Gorelick L,Blank M,Shechtman E,et al.Actions as space-time shapes[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2007,29(12):2247-2253.
    [6]Rodriguez M D,Ahmed J,Shah M.Action MACH a spatio-temporal maximum average correlation height filter for action recognition[C]//Computer Vision and Pattern Recognition 2008.[S.l.]:IEEE,2008:1-8.
    [7]Ryoo M S.Human activity prediction:Early recognition of ongoing activities from streaming videos[C]//IEEE Inter-national Conference on Computer Vision.[S.l.]:IEEE,2012:1036-1043.
    [8]Soomro K,Idrees H,Shah M.Action localization in videos through context walk[C]//IEEE International Conference on Computer Vision.[S.l.]:IEEE,2015:3280-3288.
    [9]Weinland D,Boyer E,Ronfard R.Action recognition from arbitrary views using 3D exemplars[J].ICCV,2010,2380(7504):1-7.
    [10]Li W,Zhang Z,Liu Z.Action recognition based on a bag of 3D points[C]//Computer Vision and Pattern Recognition Workshops.[S.l.]:IEEE,2010:9-14.
    [11]Wu Y.Mining actionlet ensemble for action recognition with depth cameras[C]//IEEE Conference on Computer Visi-on and Pattern Recognition.[S.l.]:IEEE,2012:1290-1297.
    [12]Shahroudy A,Ng T T,Yang Q,et al.Multimodal multipart learning for action recognition in depth videos[J].IEEETransactions on Pattern Analysis&Machine Intelligence,2016,38(10):2123-2129.
    [13]Sung J,Ponce C,Selman B,et al.Unstructured human activity detection from RGBD images[J].IEEE,2012,44(8):47-55.
    [14]Sung J,Ponce C,Selman B,et al.Human activity detection from RGBD images[C]//IEEE International Conference on Robotics and Automation.[S.l.]:IEEE,2011:842-849.
    [15]Feifei L,Perona P.A Bayesian Hierarchical model for learning natural scene categories[J].CVPR,2005,2:524-531.
    [16]Clevenger K A,Howe C A.Energy cost and enjoyment of active videogames in children and teens:Xbox 360 Kinect[J].Games for Health Journal,2015,4(4):318-324.
    [17]Kniss J,Jin K,Ivans R,et al.Robotics research with turtlebot 2016[D].Boise:Boise State University,2016.
    [18]Wang Y,Li X,Ding X.Probabilistic framework of visual anomaly detection for unbalanced data[J].Neurocomputing,2016,201:12-18.
    [19]Razavian A S,Azizpour H,Sullivan J,et al.CNN features off-the-shelf:An astounding baseline for recognition[C]//Computer Vision and Pattern Recognition Workshops.[S.l.]:IEEE,2014:512-519.
    [20]李昕迪,王云龙,何艳,等.基于Kinect的人体单关节点修复算法研究[J].自动化技术与应用,2016,35(4):96-98.Li Xindi,Wang Yunlong,He Yan,et al.Research on human single-pass node repair algorithm based on Kinect[J].Automation Technology and Application,2016,35(4):96-98.
    [21]Zheng S,Jayasumana S,Romera-Paredes B,et al.Conditional random fields as recurrent neural networks[C]//IEEE International Conference on Computer Vision.[S.l.]:IEEE,2016:1529-1537.
    [22]Wu C H,Tzeng G H,Goo Y J,et al.A real-valued genetic algorithm to optimize the parameters of support vector mac-hine for predicting bankruptcy[J].Expert Systems with Applications,2007,32(2):397-408.
    [23]Klaser A.A spatiotemporal descriptor based on 3D-gradients[C]//British Machine Vision Conference.Leed S,United Kingdom:British Machine Vision Association,2008:1-10.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700