语音情感特征提取方法和情感识别研究

英文题名：Research on Emotion Recognition from Speech-Features and Models
作者：郭鹏娟
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：情感特征 ; 全局特征 ; 短时特征 ; 语音情感识别
英文关键词：emotion feature ; global feature ; short-time feature ; speech emotion recognition
学位年度：2007
导师：蒋冬梅
学科代码：081203
学位授予单位：西北工业大学
论文提交日期：2007-03-01

摘要

在目前的语音情感识别研究中，情感特征提取和情感识别方法多种多样，而且由于各文献使用的情感语音数据库不同，识别结果不具有可比性，很难客观地判别特征及建模方法，尤其是采用全局特征建立静态模型和采用短时特征建立动态模型的优劣。本文对含有高兴、生气、悲伤和平静4种情感的语音信号，分析和选择了反映情感变化信息的语音特征，并在项目组录制的情感语音数据库上做了情感识别实验。主要研究内容如下：
     1．录制了情感语音数据库。录音文本选自标准TIMIT英语语音数据库，每人以高兴、生气、悲伤和平静四种情感重复朗读25句文本，共录制了46个人、四种感情的4600句语音。通过主观情感感知实验，筛选出情感表达最好的8个人的800句语音，用于文本的情感分析和识别实验。
     2．基于情感语音数据库，观察并分析了在四种情感状态下，语音信号的基频、谱信息、语速等特征的变化规律，选择和定义了具有情感判别力的基频统计特征、共振峰、语速、平均能量等23维全局特征，其中除了一般的基频全局特征外，还定义了基频曲线起始端上升和下降斜率相关的特征。
     3．研究了高斯混合模型(GMM)的参数训练和识别算法，为全局情感特征建立了GMM语音情感识别实验，结果表明：如果只采用基频相关的12维特征，悲伤、平静的正确识别率较高，而高兴和生气容易被相互误识。加入共振峰、语速、平均能量后，各类情感的识别率都有所提高，这是因为语速、平均能量对四种情感具有判别力，而共振峰能够区分高兴和生气。
     4．研究了隐马尔科夫模型(HMM)的参数训练和识别算法，针对提取的语音Mel滤波器组倒谱特征(MFCC)，以及一组包括短时能量、共振峰、子带能量的短时特征，做了基于HMM的情感识别实验，结果表明，MFCC不适用于语音情感识别，而添加了子带能量、基频等特征后，平均识别率提高了29.55％。
     5．对基于GMM和基于HMM的语音情感识别的结果进行了比较，分析表明：对于语音情感识别，采用全局特征建立静态模型，还是采用短时特征并为情感变化的动态过程建模得到的识别率基本相当，重要的是采用具有什么物理意义的特征。
Emotional recognition from speech becomes a hot topic currently, but because of different emotion features and recognition modals, and the fact that experiments are done on different emotional speech databases, which causes the results not comparable, it is difficult to discriminate the merits of the features and modals, especially the modal with global features and the dynamic modal with short-time features. Here we first analyze and select the emotional speech features which reflect the variation trend of the four emotions (happy, anger, sad, neutral), and compare results on the global modal and dynamic modal based on the same emotional speech database.
    1. An emotional speech database has been record. Scripts from standard TIMTT English
    speech database are read by 46 individuals with four emotions (happy, angry, sad and neutral), each person repeats 25 sentences with the four emotions. Through perception subjective perception and evaluation experiment, 8 persons' 800 sentences are selected for our experiments.
    2. Through observing and analyzing, the variation trends on each emotion of the following
    feature curves: pitch, spectral information and speed, we elect and define a 23-dimentional global emotion features (pitch, resonance, speed, average energy, etc.) which are discriminative on the four emotions.
    3. The training and recognition algorithms of GMM is studied, the GMMs with global
    emotion features are built for four emotions. Emotion recognition experiments show that, if only the 12-dimentional pitch related features are adopted, sad and can be correctly recognized than the other two emotions. After the resonance, speed, average energy are considered, the correct recognition rates are improved for the four emotions. Results also show that speed and average energy are discriminant for the four emotions, while resonance is useful for the distinguishing happy and angry.
    4. The training and (?)ecognition algorithm of HMM is studied, emotion HMMs are built
    respectively with MFCC features (feature 1), and with dynamic features including short-time energy, resonance, sub-band energy (feature 2). Emotion recognition experiments results show that, feature 2 gets the improvement of 29.55% on the

引文

[1] 罗森林，潘丽敏。“情感计算理论与技术”。系统工程与电子技术，25(7)：904-909，2003。
    [2] S. J. L. Mozziconacci, D. J. Hermes. "Role Of Intonation Patterns In Conveying Emotion In Speech". Proceedings, International Conference of Phonetic Sciences, San Francisco, August 1999.
    [3] Paeschke A, Sendlmeier W, F. Prosodic. "characteristics of emotional speech: measurements of fundamental frequency movements". Proc of ISCA Workshop on Speech and Emotion. Northern Ireland: Textflow, 75-80, 2000.
    [4] Dellaert F, Polzin t, Waibel A. "Recognizing Emotion in Speech". In Prec. of ICSLP, Philadelphia. PA.1996. 1970-1973.
    [5] Devillers. L, Lamel. L, Vasilescu.I. "Emotion detection in task-oriented spoken dialogues". IEEE, 3: 549-552, 2003.
    [6] 赵力，钱向民，邹采荣等。“语音信号中的情感特征分析和识别的研究”。通信学报，21(10)：18-24，2000。
    [7] Razak, A. A., Komiya, R., Izani, M., Abidin, Z.. "Comparison between fuzzy and NN method for speech emotion recognition"., information Technology and Applications, 1: 297-302 2005.
    [8] Nicholson, J., Takahashi, K., Nakatsu, R., "Emotion Recognition in Speech Using Neural networks". Neural Information Processing,, 2: 495-501, 1999.
    [9] Bhatti, M. W., Yongjin Wang, Ling Guan. "A neural network approach for human emotion recognition in speech". Circuits and Systems, 2: 181-184, 2004.
    [10] Yi-Lin Lin, Gang Wei. "Speech emotion recognition based on HMM and SVM". Machine Learning and Cybernetics, 8: 4898-4901, 2005.
    [11] Schuller, B., Rigoll, G., Lang, M. "Hidden Marker model-based speech emotion recognition". Multimedia and Expo, 1: 401-404, 2003.
    [12] J. Nicholson, K. Takahashi, R. Nakastu. "Emotion Recognition in Speech Using Neural Networks". Neural Computing & Applications, 9: 290-296, 2000.
    [13] Valery A. Petrushin, "Emotion Recognition Agents in Real World". 2000 AAAI Fall Symposium on Socially Intelligent Agents: Human in the Loop.
    [14] Bjorn Schuller, Gerhard Rigoll, Manfred Lang. "Hidden Marker Model-Based Speech Emotion Recognition". IEEE International Conference on Acoustics, Speech, and Signal Processing, 2: 1-4, 2003.
    [15] Tin Lay New, Say Wei Foe, Liyanage C. De Silva. "Speech Emotion Recognition Using Hidden Marker Models". Speech Communication, 41(4): 603-623, 2003.
    [16] Daniel Neiberg, Kjell Elenius, Kernel Laskowski. "Emotion Recognition in Spontaneous Speech Using GMMS". http://www.speech.kth.se/prod/publications/files/1192.pdf
    [17] Klaus R. Scherer. "A Cross-Cultural Investigation of Emotion Inferences from Voice and Speech: Implications for Speech Technology". The 6th International Conference on Spoken Language Processing Beijing, China, 2000: 379-382.
    [18] Feng Yu, Eric Chang, Ying-Qing Xu, Heung-Teung Shum. "Emotion Detection from Speech to Enrich Multimedia Content". IEEE Pacific Rim Conference on Multimedia. Beijing, China Springer-Verlag GmbH, 2001, 2195: 550-557.
    [19] Zhiping Wang, Li Zhao, Cairong Zou. "Support Vector Machines for Emotion Recognition in Chinese Speech". Journal of Southeast University, 19(4): 307-310. 2003.
    [20] Dimitrios Ververidis, Constantine Kotropoulos, Ioannis Pitas. "Automatic Emotional Speech Classification". IEEE International Conference on Acoustics, Speech, and Signal Processing, 593-596, 2004.
    [21] Dan-Ning Jiang, Lian-Hong Cai. "Speech Emotion Classification with the Combination of Statistic Features and Temporal Features". IEEE International Conference on Multimedia and Expo,3 1967-1970, 2004.
    [22] 姚天任。《数字语音信号处理》，华中理工大学出版社，1991。
    [23] 杨行峻，迟惠生等，《语音信号数字处理》，电子工业出版社，1995。
    [24] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, erc. "Emotion Recognition in Human-Computer Interaction". IEEE, Signal Processing Magazine, 1: 32-80, 2001.
    [25] 赵力等。“语音信号中的情感特征分析和识别的研究”。电子学报，32(4)：606-609，2004。
    [26] Tanja Banziger, Kaus R. Scherer. "The role of intonation in emotional expressions". Elsever Speech communication 46: 3-43-4, 252-267, 2005.
    [27] 张颖，罗森林。“情感建模和情感识别”。计算机工程与应用，33：98-102，2003。
    [28] Chang-Hyun Park, Kwec-Bo Sire. "Emotion recognition and acoustic analysis from speech signal". Proceedings of the International Joint Conferenec, 4: 2594-2598, 2003.
    [29] Schuller, B., Rigoll, G., Lang, M.." Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture". Acoustics. Speech, and Signal Processing, IEEE International Conference, 1: 577-580, 2004.
    [30] 成新民。“情感语音信息中共振峰参数的提取方法”。湖州师范学院学报，25(6)：76-80，2003。
    [31] 林奕琳。“基于语音信号的情感识别研究”，华南理工大学博士学位论文，2006。
    [32] 王炳锡，屈丹，彭煊等。“实用语音识别基础”。国防工业出版社，2004。
    [33] 蒋丹宁，蔡莲红。“基于语音声学特征的情感信息识别”。清华大学学报(自然科学版)，46(1)：86-89，2006。
    [34] L R Rabiner. "A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition". Proceedings of the IEEE, 77(2): 257, 1989.
    [35] 徐雯，董林，田家斌。“一种改进的高斯混合模型算法”。信息工程大学学报，6(2)65-67，2005．
    [36] 张彩虹。“隐马氏模型的建模及应用”。国防科学技术大学硕士学位论文，2004。

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700