基于长短时记忆网络的多媒体教学手势识别研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于长短时记忆网络的多媒体教学手势识别研究

详细信息查看全文 | 推荐本文 |

英文篇名：Multimedia teaching gesture recognition based on long short-time memory network
作者：秦敏莹 ; 肖秦琨
英文作者：Qin Minying;Xiao Qinkun;Xi′an University of Technology Electronic and Information Engineering;
关键词：手势识别 ; 循环神经网络 ; 长短时记忆网络 ; 轨迹特征
英文关键词：sign language recognition;;recurrent neural network;;long short-time memory;;skeleton joint trajectory
中文刊名：GWCL
英文刊名：Foreign Electronic Measurement Technology
机构：西安工业大学未央校区;
出版日期：2019-06-15
出版单位：国外电子测量技术
年：2019
期：v.38;No.295
基金：国家自然科学基金(61671362,61271362);; 陕西省自然科学基金(2017JM6041)项目资助
语种：中文;
页：GWCL201906016
页数：6
CN：06
ISSN：11-2268/TN
分类号：86-91

摘要

手势识别研究旨在探索更加和谐自然的人机交互方式,促进人工智能技术的发展应用,具有重要的研究价值和广泛的社会影响。针对现有的手势识别算法鲁棒性弱、实时性差和识别率低的问题以及传统手势识别当中人工设计的手型特征单一,时序建模过程繁琐的问题,以大多数手势词汇使用运动轨迹进行表述为基础,将运动轨迹的骨架节点流坐标作为训练集和测试集特征,提出基于循环神经网络(RNN)的变体结构长短时记忆网络(LSTM)序列到序列的多媒体教学手势识别算法。采用Kinect 2.0采集了6种常见的多媒体教学手势的骨架流坐标数据建立小型数据集,以上述算法为基础进行实验。实验证明,该方法对6种多媒体教学手势的平均识别率相比传统动态时间规整算法(DTW)的有效性,可应用于多媒体教学等领域。
The research about gesture recognition is aiming to explore a better human-computer interaction mode in harmony and natural.To promote the improvement and application of artificial intelligence technology,it has a huge value for research and extensive impact on our society.Aiming at the problems of weak robustness,poor real-time performance and low recognition rate of the existing gesture recognition algorithms;and the problem that the hand-designed features of the traditional gesture recognition are single and the time series modeling process is cumbersome.In this paper,based on the representation of most gesture vocabulary using motion trajectory,the skeletal node flow coordinates of motion trajectory are used as the characteristics of training set and test set,and a variable structure sequence-to-sequence long short term memory based on recurrent neural network(RNN)is proposed in multimedia teaching gesture recognition algorithm.In this paper,kinect 2.0 is used to collect the skeleton flow coordinate data of six common multimedia teaching gestures to build a small data set,and the experiment is based on the above algorithm.Experiments show that our method can be applied to multimedia teaching and other fields in terms of the average recognition rate of six multimedia teaching gestures compared with the traditional dynamic time warping(DTW).

引文

[1]MURAKAMI K,TAGUCHI H.Gesture recognition using recurrent neural networks[C].Conference on Human Factors in Computing Systems,1991.
    [2]HUANG J,ZHOU W,LI H,et al.Sign language recognition using 3D convolutional neural networks[C].IEEE International Conference on Multimedia&Expo,IEEE Computer Society,2015.
    [3]HUANG J,ZHOU W,ZHANG Q,et al.Videobased sign language recognition without temporal segmentation[C].32nd AAAI Conference on Artificial Intelligence,2018.
    [4]LIU T,ZHOU W,LI H.Sign language recognition with long short-term memory[C].IEEEInternational Conference on Image Processing,2016.
    [5]GROBEL K,ASSAN M.Isolated sign language recognition using hidden Markov models[C].IEEE International Conference on Systems,2002.
    [6]AUEPHANWIRIYAKUL S,PHITAKWINAI S,SUTTAPAK W,et al.Thai sign language translation using scale invariant feature transform and hidden markov models[J].Pattern Recognition Letters,2013,34(11):1291-1298.
    [7]ZAFRULLA Z,BRASHEAR H,STARNER T,et al.American sign language recognition with the kinect[C].International Conference on Multimodal Interfaces(ICMI),2011:279-286.
    [8]GENG L,MA X,WANG H,et al.Chinese sign language recognition with 3D hand motion trajectories and depth images[C].11th World Congress on Intelligent Control and Automation(WCICA),IEEE,2014:1457-1461.
    [9]GROBEL K,ASSAN M.Isolated sign language recognition using hidden Markov models[C].IEEE International Conference on Systems,2002.
    [10]STARNER T,WEAVER J,PENTLAND A.Realtime american sign language recognition using desk and wearable computer based video[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,1998,20(12):1371-1375.
    [11]ZHANG Z.Microsoft kinect sensor and its effect[J].IEEE Multimedia,2012,19(2):4-10.
    [12]PU W,FANG G,ZHAO D,et al.A Chinese sign language recognition system based on SOFM/SRN/HMM[J].Pattern Recognition,2004,37(12):2389-2402.
    [13]SUN R,SARKAR S.Detecting coarticulation in sign language using conditional random fields[C].International Conference on Pattern Recognition,IEEE Computer Society,2006.
    [14]WANG L C,WANG R,KONG D H,et al.Similarity assessment model for chinese sign language videos[J].IEEE Transactions on Multimedia,2014,16(3):751-761.
    [15]SUN J,ZHOU W,LI H.A threshold-based HMM-DTW approach for continuous sign language recognition[C].ACM Press International Conference,2014.
    [16]Simonyan K,Zisserman A.Very deep convolutional networks for large-scale image recognition[C].ICLR,2014.
    [17]BARROS P,MAGG S,WEBER C,et al.A multichannel convolutional neural network for hand posture recognition[C].Artificial Neural Networks and Machine Learning.Springer International Publishing,2014.
    [18]PIGOU L,VAN DEN OORD A,DIELEMAN S,et al.Beyond temporal pooling:recurrence and temporal convolutions for gesture recognition in video[J].International Journal of Computer Vision,2018,126(2-4):430-439.
    [19]HUANG J,ROHRBACH M,DONAHUE J,et al,Sequence to Sequence video to text[C].Proceedings of the IEEE Internation Conference on Computer Vision,2015.
    [20]WU Z,JIANG Y G,WANG J,et al.Exploring inter-feature and inter-class relationships with deep neural networks for video classification[C].Acm International Conference on Multimedia,2014.
    [21]AUEPHANWIRIYAKUL S,PHITAKWINAI S,SUTTAPAK W,et al.Thai sign language translation using scale invariant feature transform and hidden Markov models[J].Pattern Recognition Letters,2013,34(11):1291-1298.
    [22]DONAHUE Z,BRASHEAR H,STARNER T,et al.American sign language recognition with the kinect[C].International Conference on Multimodal Interfaces(ICMI),2011:279-286.
    [23]DU L,MA X,WANG H,et al.Chinese sign language recognition with 3Dhand motion trajectories and depth images[C].11th World Congress on Intelligent Control and Automation(WCICA),IEEE,2014:1457-1461.
    [24]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C].NIPS,2014.
    [25]VENUGOPALAN S,ROHRBACH M,DONAHUEJ,et al,Sequence to sequence video to text[C].Proceedings of the IEEE Internation Conference on Computer Vision,2015.
    [26]PIGOU L,FANG G,ZHAO D,et al.A Chinese sign language recognition system based on SOFM/SRN/HMM[J].Pattern Recognition,2004,37(12):2389-2402.
    [27]LIU T,ZHOU W,LI H.Sign language recognition with long short-term memory[C].IEEEInternational Conference on Image Processing,2016.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700