用户名: 密码: 验证码:
多模深度卷积神经网络应用于视频表情识别
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Video-based facial expression recognition using multimodal deep convolutional neural networks
  • 作者:潘仙张 ; 张石清 ; 郭文平
  • 英文作者:PAN Xian-zhang;ZHANG Shi-qing;GUO Wen-ping;Institute of Intelligent Information Processing,Taizhou college;
  • 关键词:深度卷积神经网络 ; 多模深度学习 ; 表情识别 ; 时空特征 ; 深度信念神经网络
  • 英文关键词:deep convolutional neural network;;multimodal deep learning;;facial expression recognition;;temporal-spatial features;;Deep Belief Network (DBN)
  • 中文刊名:GXJM
  • 英文刊名:Optics and Precision Engineering
  • 机构:台州学院智能信息处理研究所;
  • 出版日期:2019-04-15
  • 出版单位:光学精密工程
  • 年:2019
  • 期:v.27
  • 基金:浙江省公益技术研究计划基金资助项目(No.LGF19F020009);; 浙江省自然科学基金资助项目(No.LY14F020036,No.LY16F020011);; 国家自然科学基金资助项目(No.61203257)
  • 语种:中文;
  • 页:GXJM201904023
  • 页数:8
  • CN:04
  • ISSN:22-1198/TH
  • 分类号:230-237
摘要
由于视频中的手工特征和主观情感之间的直接相关性很小,识别视频序列中的面部表情是一项很有挑战性的任务,为了克服这个缺陷,有效提高视频中的人脸表情识别性能。本方法采用两个深度卷积神经网络,即空间卷积神经网络和时间卷积神经网络,用于视频中的时空表情特征学习。其中,空间卷积神经网络用于提取视频中每一帧静态的表情图像的空间信息特征,而时间卷积神经网络用于从视频中多帧表情图像的光流信息中提取动态信息特征。然后,将这两个深度卷积神经网络学习到的时空特征进行基于深度信念网络(DBN)的特征层融合,输入到支持向量机实现视频中的人脸表情分类任务。在公共的RML和BAUM-1s视频情感数据集的测试结果表明,该方法分别取得了71.06%和52.18%的正确识别率,明显优于现有文献报导的结果。多模深度卷积神经网络的人脸表情识别方法能提高视频中人脸表情的识别性能。
        Recognizing facial expressions in video sequences is challenging because of the difficulty in distinguishing between hand-crafted features and subjective emotions.To solve this problem,we aim to improve the performance of facial expression recognition in videos.Our method used two deep convolutional neural networks(DCNNs)(i.e.,spatial and temporal convolutional neural networks)to learn the temporal-spatial expression features in videos.The spatial convolutional neural network was used to extract the spatial features of static expression images from each video frame,where as the temporal convolutional neural network was used to extract dynamic features from optical flow information hidden in multi-frame expression images of a video.The temporal-spatial features were then fused using a deep belief network.Finally,support vector machines were employed to perform facial expression classification.Based on experimental results on public RML and BAUM-1 svideobased emotional datasets,our method achieved an accuracy of 71.06% and 52.18%,respectively,which is clearly better than the results of existing studies.This study thus showed that our multimodal DCNN can improve the performance of facial expression recognition in videos.
引文
[1]VENI S,THUSHARA S.Multimodal approach to emotion recognition for enhancing human machine interaction-a survey[J].International Journal on Advanced Science,Engineering and Information Technology,2017,7(4):1428-1433.
    [2]ZHAO X,ZHANG S.A review on facial expression recognition:feature extraction and classification[J].Iete Technical Review,2016,33(5):505-517.
    [3]HINTON GE,SALAKHUTDINOV RR.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
    [4]熊昌镇,单艳梅,郭芬红.结合主体检测的图像检索方法[J].光学精密工程,2017,25(3):792-798.XIONG CH ZH,SHAN Y M,GUO F H.Image retrieval method based on image principal part detection[J].Opt.Precision Eng.,2017,25(3):792-798.(in Chinese)
    [5]刘智,黄江涛,冯欣.构建多尺度深度卷积神经网络行为识别模型[J].光学精密工程,2017,25(3):799-805.LIU ZH,HUANG J T,FENG X.Action recognition model construction based on multi-scale deep convolution neural network[J].Opt.Precision Eng.,2017,25(3):799-805.(in Chinese)
    [6]ZHAO Z,JIAO L,ZHAO J,et al..Discriminant deep belief network for high-resolution SAR image classification[J].Pattern Recognition,2017,61:686-701.
    [7]KRIZHEVSKY A,SUTSKEVER I,HINTON GE.ImageNet classification with deep convolutional neural networks[C].International Conference on Neural Information Processing Systems,2012:1097-1105.
    [8]ESSA IA,PENTLAND AP.Coding,analysis,interpretation,and recognition of facial expressions[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2002,19(7):757-763.
    [9]ALEKSIC PS,KATSAGGELOS AK.Automatic facial expression recognition using facial animation parameters and multistream HMMs[J].IEEETransactions on Information Forensics&Security,2006,1(1):3-11.
    [10]YANG P,LIU Q,METAXAS DN.Boosting coded dynamic features for facial action units and facial expression recognition[C].2007 CVPR′07IEEE Conference on Computer Vision and Pattern Recognition,2007:1-6.
    [11]NANNI L,LUMINI A,BRAHNAM S.Survey on LBP based texture descriptors for image classification[J].Expert Systems with Applications,2012,39(3):3634-3641.
    [12]ZHAO G,AHONEN T,MATAS J,et al..Rotation-invariant image and video description with local binary pattern features[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2012,21(4):1465-1477.
    [13]NANNI L,BRAHNAM S,LUMINI A.Local phase quantization descriptor for improving shape retrieval/classification[J].Pattern Recognition Letters,2012,33(16):2254-2260.
    [14]KAYAOGLU M,EROGLU ERDEM C.Affect recognition using key frame selection based on minimum sparse reconstruction[J].Proceedings of the 2015ACM on International Conference on Multimodal Interaction-ICMI′15,2015:519-524.
    [15]ZHAO G,PIETIKINEN M.Dynamic texture recognition using local binary patterns with an application to facial expressions[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2007,29(6):915-928.
    [16]KIM BK,ROH J,DONG SY,et al..Hierarchical committee of deep convolutional neural networks for robust facial expression recognition[J].Journal on Multimodal User Interfaces,2016,10(2):1-17.
    [17]ACAR E,HOPFGARTNER F,ALBAYRAK S.A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material[J].Multimedia Tools&Applications,2017,76(9):11809-11837.
    [18]GAO J,FU Y,JIANG Y G,et al..Frame-transformer emotion classification network[C].ACMon International Conference on Multimedia Retrieval,Bucharest,Romania,2017:78-83.
    [19]PANG L,NGO CW.Mutlimodal learning with deep boltzmann machine for emotion prediction in user generated videos[C].the Proceedings of the5th ACM on International Conference on Multimedia Retrieval(ICMR),Shanghai,China,2015:619-622.
    [20]WANG Y,GUAN L,VENETSANOPOULOSAN.Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition[J].IEEE Transactions on Multimedia,2012,14(3):597-607.
    [21]ZHALEHPOUR S,ONDER O,AKHTAR Z,et al..BAUM-1:a spontaneous audio-visual face database of affective and mental states[J].IEEETransactions on Affective Computing,2017,8(3):300-313.
    [22]SIMONYAN K,ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[C].Nternational Conference on Neural Information Processing Systems,Montreal,Canada,2014:568-576.
    [23]BRUHN A,WEICKERT J,SCHN9RR C.Lucas/kanade meets horn/schunck:combining local and global optic flow methods[J].International Journal of Computer Vision,2005,61(3):211-231.
    [24]RANJAN R,PATEL VM,CHELLAPPA R.Hyper face:a deep multi-task learning framework for face detection,landmark localization,pose estimation,and gender recognition[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2016,PP(99):1-1.
    [25]ZHANG S,ZHAO X,CHUANG Y,et al..Learning discriminative dictionary for facial expression recognition[J].IETE Technical Review,2017,33(5):1-7.
    [26]HINTON GE,OSINDERO S,TEH YW.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554.
    [27]ZHANG S,ZHANG S,HUANG T,et al..Learning affective features with a hybrid deep model for audio-visual emotion recognition[J].IEEE Transactions on Circuits&Systems for Video Technology,2017,(99):1-1.
    [28]ELMADANY NED,HE Y,GUAN L.Multiview emotion recognition via multi-set locality preserving canonical correlation analysis[C].IEEE International Symposium on Circuits and Systems,Montral,QC,Canada,2016:590-593.
    [29]ZHANG S,ZHANG S,HUANG T,et al..Multimodal deep convolutional neural network for audio-visual emotion recognition[C].in Proceedings of the 6th ACM on International Conference on Multimedia Retrieval(ICMR),New York,USA,2016:281-284.
    [30]ZHANG S,ZHANG S,HUANG T,et al..Learning affective features with a hybrid deep model for audio-visual emotion recognition[J].IEEE Transactions on Circuits&Systems for Video Technology,2018,28(10),1-14.
    [31]李宇,刘学莹,张洪群,等.基于卷积神经网络的光学遥感图像检索[J].光学精密工程,2018,26(1):200-207.LI Y,LIU X Y,ZHANG H Q,et al..Optical remote sensing image retrieval based on convolutional neural networks[J].Opt.Precision Eng.,2018,26(1):200-207.(in Chinese)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700