基于变分模态分解的语音情感识别方法

英文篇名：Speech emotion recognition based on variational mode decomposition
作者：王玮蔚 ; 张秀再
英文作者：WANG Weiwei;ZHANG Xiuzai;Nanjing University of Information Science and Technology;Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology CICAEET;
关键词：变分模态分解 ; Mel倒谱系数 ; 希尔伯特谱 ; 极限学习机
英文关键词：Variational modal decomposition;;Mel frequency cepstral coefficents;;Hilbert marginal spectrum;;Extreme learning machine
中文刊名：YYSN
英文刊名：Journal of Applied Acoustics
机构：南京信息工程大学电子与信息工程学院;江苏省大气环境与装备技术协同创新中心;
出版日期：2019-03-12 15:30
出版单位：应用声学
年：2019
期：v.38
基金：江苏省自然科学青年基金项目(BK20141004);; 国家自然科学青年基金项目(11504176,61601230);; 江苏高校优势学科建设工程资助项目
语种：中文;
页：YYSN201902014
页数：8
CN：02
ISSN：11-2121/O4
分类号：91-98

摘要

针对传统语音情感特征参数在进行情感分类时性能不佳的问题,该文提出了一种基于变分模态分解的语音情感识别方法。情感语音信号首先由变分模态分解提取固有模态函数,然后对所选主导固有模态函数进行重新聚合,再提取梅尔倒谱系数和各固有模态函数的希尔伯特边际谱。为了验证该文提出的特征性能,选用两种语音数据库(EMODB、RAVDESS)进行实验,按该文方法提取特征后使用极限学习机进行语音情感分类识别。实验结果表明:相比基于经验模态分解和集合经验模态分解的语音情感特征,该文提出的特征有更好的识别性能,验证了该方法的实用性。
In view of the problem of poor performance of traditional speech emotion feature parameters in emotion classification, this paper proposes a speech emotion recognition method based on variational mode decomposition(VMD). The emotion speech signal is first extracted by the VMD into the intrinsic mode functions(IMF), then the selected dominant IMFs are re-aggregated, after that the Mel frequency cepstral coefficents(MFCC) and the Hilbert marginal spectrum of each IMF are extracted. In order to verify the performance of the features proposed in this paper, two speech databases(EMODB、RAVDESS) are selected for the experiment. After extracting features according to the method of this paper, the extreme learning machine(ELM) is used for speech emotion classification and recognition. The experimental results show that compared with the emotion features based on empirical mode decomposition(EMD) and ensemble empirical mode decomposition(EEMD), the features proposed in this paper have better recognition performance, and the practicability of the method is verified.

引文

[1] Lin Y L, Wei G, Yang K C. Research progress of speech emotion recognition[J]. Journal of Circuits and Systems,2007, 12(1):90–98.
    [2] Schuller B, Rigoll G, Lang M. Hidden Markov modelbased speech emotion recognition[C]//2003 International Conference on Multimedia and Expo. ICME’03. Proceedings, 2003:401–404.
    [3] Ververidis D, Kotropoulos C, Pitas I. Automatic emotional speech classi?cation[C]//2004 IEEE International Conference on Acoustics, Speech, and Signal Processing,2004, 1:593–596.
    [4] Sun R, Moore E. Investigating glottal parameters and teager energy operators in emotion recognition[M]//Affective computing and intelligent interaction. Berlin, Heidelberg:Springer, 2011:425–434.
    [5]韩一,王国胤,杨勇.基于MFCC的语音情感识别[J].重庆邮电大学学报:自然科学版, 2008, 20(5):597–602.Han Yi, Wang Kuangyin, Yang Yong. Speech emotion recognition based on MFCC[J]. Journal of Chongqing University of Posts and Telecommunications:Natural Science, 2008, 20(5):597–602.
    [6] He L, Lech M, Maddage N C, et al. Study of empirical mode decomposition and spectral analysis for stress and emotion classi?cation in natural speech[J]. Biomedical Signal Processing and Control, 2011, 6(2):139–146.
    [7] Sethu V, Ambikairajah E, Epps J. Empirical mode decomposition based weighted frequency feature for speechbased emotion classi?cation[C]//Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on. IEEE, 2008:5017–5020.
    [8] Shahnaz C, Sultana S, Fattah S A, et al. Emotion recognition based on EMD-wavelet analysis of speech signals[C]//Digital Signal Processing(DSP), 2015 IEEE International Conference on. IEEE, 2015:307–310.
    [9]向磊.语音情感特征提取与识别的研究[D].杭州:浙江理工大学, 2013.
    [10] Dragomiretskiy K, Zosso D. Variational mode decomposition[J]. IEEE Transactions on Signal Processing, 2014,62(3):531–544.
    [11] Zhao H, Li L. Fault diagnosis of wind turbine bearing based on variational mode decomposition and Teager energy operator[J]. IET Renewable Power Generation, 2016,11(4):453–460.
    [12] Grimm M, Kroschel K, Narayanan S. Support vector regression for automatic recognition of spontaneous emotions in speech[C]//Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. IEEE, 2007, 4:IV-1085-IV-1088.
    [13] Hu H, Xu M X, Wu W. GMM supervector based SVM with spectral features for speech emotion recognition[C].2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, 2007:413–416.
    [14] Neumann M, Vu N T. Attentive convolutional neural network based speech emotion recognition:a study on the impact of input features, signal length, and acted speech[J].arXiv preprint arXiv:1706.00612, 2017.
    [15]朱菊霞,吴小培,吕钊.基于SVM的语音情感识别算法[J].计算机系统应用, 2011, 20(5):87–91.Zhu Juxia, Wu Xiaopei, Lyu Zhao. SVM-based speech emotion recognition algorithm[J]. Computer System Application, 2011, 20(5):87–91.
    [16] Lin Y L, Wei G. Speech emotion recognition based on HMM and SVM[C]//Machine Learning and Cybernetics,2005. Proceedings of 2005 International Conference on.IEEE, 2005, 8:4898–4901.
    [17] Pao T L, Chen Y, Yeh J H. Emotion recognition and evaluation from mandarin speech signals[J]. International Journal of Innovative Computing, Information and Control, 2008, 4(7):1695–1709.
    [18] YüncüE, Hacihabiboglu H, Bozsahin C. Automatic speech emotion recognition using auditory models with binary decision tree and svm[C]//Pattern Recognition(ICPR), 2014 22nd International Conference on. IEEE,2014:773–778.
    [19] Pan Y, Shen P, Shen L. Speech emotion recognition using support vector machine[J]. International Journal of Smart Home, 2012, 6(2):101–108.
    [20] W?llmer M, Kaiser M, Eyben F, et al. LSTM-modeling of continuous emotions in an audiovisual affect recognition framework[J]. Image and Vision Computing, 2013, 31(2):153–163.
    [21] Huang G B. An insight into extreme learning machines:random neurons, random features and kernels[J]. Cognitive Computation, 2014, 6(3):376–390.
    [22] Han K, Yu D, Tashev I. Speech emotion recognition using deep neural network and extreme learning machine[C]//Fifteenth Annual Conference of the International Speech Communication Association, 2014:223–227.
    [23] Huang G B, Wang D H, Lan Y. Extreme learning machines:a survey[J]. International Journal of Machine Learning and Cybernetics, 2011, 2(2):107–122.
    [24] Livingstone S R, Peck K, Russo F A. Ravdess:the ryerson audio-visual database of emotional speech and song[C]//Annual meeting of the Canadian Society for Brain, Behaviour and Cognitive Science, 2012:205–211.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700