自适应骨骼中心的人体行为识别算法

英文篇名：Human action recognition algorithm based on adaptive skeleton center
作者：冉宪宇 ; 刘凯 ; 李光 ; 丁文文 ; 陈斌
英文作者：Ran Xianyu;Liu Kai;Li Guang;Ding Wenwen;Chen Bin;Computer Science and Technology,Xidian University;
关键词：人体行为识别 ; 骨骼序列 ; 特征提取 ; 自适应 ; 归一化
英文关键词：human action recognition;;skeleton sequence;;feature extraction;;adaptive;;renormalize
中文刊名：ZGTB
英文刊名：Journal of Image and Graphics
机构：西安电子科技大学计算机学院;
出版日期：2018-04-16
出版单位：中国图象图形学报
年：2018
期：v.23;No.264
基金：国家自然科学基金面上项目(61571345,91538101,61550110247)~~
语种：中文;
页：ZGTB201804006
页数：7
CN：04
ISSN：11-3758/TB
分类号：57-63

摘要

目的基于3维骨架的行为识别研究在计算机视觉领域一直是非常活跃的主题,在监控、视频游戏、机器人、人机交互、医疗保健等领域已取得了非常多的成果。现今的行为识别算法大多选择固定关节点作为坐标中心,导致动作识别率较低,为解决动作行为识别中识别精度低的问题,提出一种自适应骨骼中心的人体行为识别的算法。方法该算法首先从骨骼数据集中获取三维骨架序列,并对其进行预处理,得到动作的原始坐标矩阵;再根据原始坐标矩阵提取特征,依据特征值的变化自适应地选择坐标中心,重新对原始坐标矩阵进行归一化;最后通过动态时间规划方法对动作坐标矩阵进行降噪处理,借助傅里叶时间金字塔表示的方法减少动作坐标矩阵时间错位和噪声问题,再使用支持向量机对动作坐标矩阵进行分类。论文使用国际上通用的数据集UTKinect-Action和MSRAction3D对算法进行验证。结果结果表明,在UTKinect-Action数据集上,该算法的行为识别率比HO3D J2算法高4.28%,比CRF算法高3.48%。在MSRAction3D数据集上,该算法比HOJ3D算法高9.57%,比Profile HMM算法高2.07%,比Eigenjoints算法高6.17%。结论本文针对现今行为识别算法的识别率低问题,探究出问题的原因是采用了固定关节坐标中心,提出了自适应骨骼中心的行为识别算法。经仿真验证,该算法能有效提高人体行为识别的精度。
Objective Human action recognition based on 3D skeleton has been a popular topic in computer vision,the goal of which is to automatically segment,capture,and recognize human action. Human action recognition has been widely applied in real-world applications. For the past several decades,it has been used in surveillance,video games,robotics,human-human interaction,human-computer interaction,and health care,and has been widely explored by researchers since the 1960 s. This study obtains 3D data in four ways. First,a motion capture system is used based on a marker. Second,multiple views are used for 2D image sequence reconstruction of 3D information. Third,range sensors are used. Fourth,RGB videos are used. However,extracting data by using a motion capture system and reconstruction is inconvenient. Range sensors are expensive and difficult to use in a human environment,and they obtain data slowly and provide a poorly estimated distance. Moreover,RGB images usually provide the appearance information of the objects in the scene. Given the limited information provided by RGB images,solving certain problems,such as the partition of the foreground and background with similar colors and textures,is difficult,if not impossible. Moreover,RGB data are highly sensitive to various factors,such as illumination,viewpoint,occlusions,clutter,or diversity of datasets. RGB video sensor data cannot capture the information that human needs. The rapid development of depth sensors,such as 3D Microsoft Kinect sensor,in recent years has provided not only color image data but also 3D depth image information. Three-dimensional depth images record the distance between object and body,thereby producing considerable information. Real-time skeletal-tracking technique and support vector machine recognize various postures and extract key information. The investigation of computer vision algorithms based on 3D skeleton algorithms has thus attracted significant attention in the last few years. Many researchers have been studying skeleton-based algorithms,which have presented numerous achievements and contributions. The present action recognition algorithm selects a fixed joint as the coordinate center,which leads to a low recognition rate. An adaptive skeleton center algorithm for human action recognition is proposed to solve the problem of low accuracy. Method In the algorithm,frames of skeleton action sequences are loaded onto a human action dataset,redundant frames are removed from the sequence frame information,and the original coordinate matrix is obtained by preprocessing the sequences. Rigid vector and joint angle features are generated by extracting the original coordinate matrix. The adaptive value can be determined on the basis of changes in rigid vector and joint angle values. The coordinate center can be adaptively selected according to the adaptive value and used to renormalize the original matrix. The action coordinate matrix is denoised by using a dynamic time-planning method. The Fourier time pyramid method is used to reduce the time displacement and noise problems of the action coordinate matrix. The matrix is classified by using support vector machine. Result Unlike existing algorithms,such as histogram of 3D joint( HO3 DJ),conditional random field( CRF),Eigen Joints,profile hidden Markov model( HMM),relation matrix of 3D rigid bodies + principal geodesic distance,and actionlet algorithms,the proposed algorithm exhibits improved performances on different datasets. On the UTKinect dataset,the action recognition rate of the proposed algorithm is 4. 28% higher than that of the HO3DJ algorithm and 3. 48% higher than that of the CRF algorithm. On the MSRAction3D dataset,the action recognition rate of the proposed algorithm is 9. 57% higher than that of the HO3DJ algorithm,2. 07% higher than that of the profile HMM algorithm,and 6. 17% higher than that of the Eigen Joints algorithm. Action Set( AS) 1,AS2,and AS3 are subsets of the MSRAction3D dataset. The action recognition rate of the proposed algorithm is not as good as that of the other algorithms on the AS2 dataset,but the action recognition rates of the proposed algorithm are high on the AS1 and AS3 datasets. Conclusion The proposed algorithm solves the low accuracy problem of the existing action recognition algorithm. The coordinate center of a fixed joint is adopted. Simulation results show that the proposed algorithm can effectively improve the accuracy of human action recognition,and its action recognition rate is higher than those of existing algorithms. On the UTKinect dataset,the recognition rate of the proposed algorithm is at least 3% higher than those of other algorithms,and the generated single-action recognition rate is as high as 90%. On the MSRAction3D dataset,the proposed algorithm shows advantages on AS1 and AS2 datasets,but its recognition rate on AS2 is not ideal,particularly in the recognition of the upper limb. Therefore,this algorithm needs improvement. The algorithm is generally efficient for single-action recognition. The next research direction is complex action recognition.

引文

[1]Rahmani H,Mian A,Shah M.Learning a deep model for human action recognition from novel viewpoints[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017.[DOI:10.1109/TPAMI.2017.2691768](in press)
    [2]Lu T W,Peng L,Miao S J.Human action recognition of hidden markov model based on depth information[C]//The 15th International Symposium on Parallel and Distributed Computing.Piscataway:IEEE,2016:354-357.[DOI:10.1109/ISPDC.2016.58]
    [3]Xia L,Chen C C,Aggarwal J K.View invariant human action recognition using histograms of 3D joints[C]//Proceedings of2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.Providence,RI:IEEE,2012:20-27.[DOI:10.1109/CVPRW.2012.6239233]
    [4]Han L,Xu X X,Liang W,et al.Discriminative human action recognition in the learned hierarchical manifold space[J].Image and Vision Computing,2010,28(5):836-849.[DOI:10.1016/j.imavis.2009.08.003]
    [5]Yang X D,Tian Y L.Effective 3D action recognition using Eigen Joints[J].Journal of Visual Communication and Image Representation,2014,25(1):2-11.[DOI:10.1016/j.jvcir.2013.03.001]
    [6]Vemulapalli R,Arrate F,Chellappa R.Human action recognition by representing 3D skeletons as points in a Lie group[C]//Proceeding of the Conference on Computer Vision and Pattern Recognition.Columbus,OH:IEEE,2014:88-595.[DOI:10.1109/CVPR.2014.82]
    [7]Ding W W,Liu K,Li G,et al.Human action recognition using spectral embedding to similarity degree between postures[C]//Proceedings of 2016 Visual Communications and Image Processing.Chengdu:IEEE,2016:1-4.[DOI:10.1109/VCIP.2016.7805441]
    [8]Müller M.Information Retrieval for Music and Motion[M].Berlin,Heidelberg:Springer-Verlag,2007:2-5.
    [9]Wang J,Liu Z C,Wu Y,et al.Mining actionlet ensemble for action recognition with depth cameras[C]//Proceedings of 2012IEEE Conference on Computer Vision and Pattern Recognition.Providence,RI:IEEE,2012:1290-1297.[DOI:10.1109/CVPR.2012.6247813]
    [10]Ding W W,Liu K,Fu X J,et al.Profile hmms for skeletonbased human action recognition[J].Signal Processing:Image Communication,2016,42:109-119.[DOI:10.1016/j.image.2016.01.010]

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700