基于PAD情感模型的可训练语音合成研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Trainable Emotional Speech Synthesis Based on PAD
  • 作者:陈雁翔 ; 龙润田
  • 英文作者:CHEN Yan-Xiang;LONG Run-Tian;School of Computer & Information,Hefei University of Technology;Computer Science and Technology Postdoctoral Research Station,Hefei University of Technology;Institute of Linguistics,Shanghai Nonmal University;
  • 关键词:PAD情感模型 ; 可训练语音合成 ; 情感量化 ; 参数修正 ; 情感特征
  • 英文关键词:PAD Emotional model;;Trainable Speech Synthesis;;Emotional Quantification;;Parameter Calibration;;Emotional Characteristic
  • 中文刊名:MSSB
  • 英文刊名:Pattern Recognition and Artificial Intelligence
  • 机构:合肥工业大学计算机与信息学院;合肥工业大学计算机科学与技术博士后科研流动站;上海师范大学语言研究所;
  • 出版日期:2013-11-15
  • 出版单位:模式识别与人工智能
  • 年:2013
  • 期:v.26;No.125
  • 基金:国家自然科学基金项目(No.61105076);; 第51批中国博士后科学基金项目(No.2012M511402);; 安徽省自然科学基金项目(No.11040606M127);; 安徽省语音产业科技创新专项项目(No.11010202192)资助
  • 语种:中文;
  • 页:MSSB201311004
  • 页数:7
  • CN:11
  • ISSN:34-1089/TP
  • 分类号:29-35
摘要
情感语音合成是情感计算和语音信号处理研究的热点之一,进行准确的语音情感分析是合成高质量情感语音的前提.文中采用PAD情感模型作为情感分析量化模型,对情感语料库中的语音进行情感分析和聚类,获得各情感PAD参数模型.由HMM语音合成系统合成的情感语音,通过PAD模型进行参数修正,使得合成语音的情感参数更加准确,从而提高情感语音合成的质量.实验表明该方法能较好地提高合成语音的自然度和情感清晰度,在同性别不同说话人中也能达到较好的性能.
        Emotional speech synthesis is the emphasis and hotspot in affective computing and speech signal processing.In speech synthesis,accurate speech emotion analysis is a prerequisite for high-quality synthesis of emotional speech.In this paper,PAD emotional model is used to build a 3D emotional space for sentiment analysis and clustering of emotional corpus of speech in order to get emotional PAD parameters model.The emotional speech is synthesized by HMM speech synthesis system,and the parameters of synthesized speech emotion are modified by PAD model.Therefore,the quality of emotional speech synthesis is improved.The experimental results show that the proposed method improves the naturalness of synthesized speech and the clarity of emotion and also achieves good performance among different male speakers.
引文
[1]Xu Linhong,Lin Hongfei,Zhao Jing.Construction and Analysis of Emotional Corpus.Journal of Chinese Information Processing,2008,22(1):116-122(in Chinese)(徐琳宏,林鸿飞,赵晶.情感语料库的构建和分析.中文信息学报,2008,22(1):116-122)
    [2]Bulut M,Lee S,Narayanan S.A Statistical Approach for Modeling Prosody Features Using POS Tags for Emotional Speech Synthesis//Proc of the IEEE International Conference on Acoustics Speech and Signal Processing.Honolulu,USA,2007,IV:1237-1240
    [3]Tao Jianhua,Kang Yongguo,Li Aijun.Prosody Conversion from Neutral Speech to Emotional Speech.IEEE Trans on Audio,Speech,and Language Processing,2006,14(4):1145-1154
    [4]Liu Zhen,Jing Xinxing.Research of Chinese Emotional Speech Synthesis.Science&Technology Information,2008,(9):78-85(in Chinese)(刘震,景新幸.汉语情感语音合成的研究.科技信息,2008,(9):78-85)
    [5]Chen Jie,Zhang Xueying,Sun Ying.Study for HMM-Based Trainable Emotional Speech Synthesis.Audio Engineering,2012,36(3):43-46(in Chinese)(陈洁,张雪英,孙颖.基于HMM的可训练情感语音合成研究.电声技术,2012,36(3):43-46)
    [6]Darwin C.The Expression of the Emotions in Man and Animals.London,UK:John Murray,1872
    [7]Ekman P.Facial Expression and Emotion.American Psychologist,1993,48(4):384-392
    [8]Russell J A,Bachorowski J A,Fernández-Dols J M.Facial and Vocal Expressions of Emotion.Annual Review of Psychology,2003.DOI:10.1146/annurev.psych.54.101601.145102
    [9]Mehrabian A.Pleasure-Arousal-Dominance:A General Framework for Describing and Measuring Individual Differences in Temperament.Current Psychology,1996,14(4):261-292
    [10]Wu Yijian,Wang Renhua.HMM-Based Trainable Speech Synthesis for Chinese.Journal of Chinese Information Processing,2006,20(4):75-81(in Chinese)(吴义坚,王仁华.基于HMM的可训练中文语音合成.中文信息学报,2006,20(4):75-81)
    [11]Masuko T,Tokuda K,Kobayashi T.et al.Speech Synthesis Using HMMs with Dynamic Features Acoustics//Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Atlanta,USA,1996,I:389-392
    [12]Li Xiaoming,Fu Xiaolan,Deng Guofeng.Preliminary Application of the Abbreviated PAD Emotion Scale to Chinese Undergraduate.Chinese Mental Health Journal,2008,22(5):327-329(in Chinese)(李晓明,傅小兰,邓国峰.中文简化版PAD情绪量表在京大学生中的初步试用.中国心理卫生杂志,2008,22(5):327-329)
    [13]Liu Ye,Tao Linmi,Fu Xiaolan.The Analysis of PAD Emotional State Model Based on Emotion Pictures.Journal of Image and Graphics,2009,14(5):753-758(in Chinese)(刘烨,陶霖密,傅小兰.基于情绪图片的PAD情感状态模型分析.中国图象图形学报,2009,14(5):753-758)
    [14]Cui Dandan.Research of Emotional Speech Analysis and Transformation.Ph.D Dissertation Beijing,China:Tsinghua University,2007(in Chinese)(崔丹丹.情感语音分析与变换的研究.博士学位论文.北京:清华大学,2007)
    [15]Scherer K R,Ladd D R,Silverman K E A.Vocal Cues to Speaker Affect:Testing Two Modals.Journal of the Acoustic Society of America,1984,76(5):1346-1356
    [16]Ladd D R,Silverman K E A,Tolkmitt F,et al.Evidence for the Independent Function of Intonation Contour Type,Voice Quality,and F0Range in Signaling Speaker Affect.Journal of the Acoustical Society of America,1985,78(2):435-444
    [17]Zhou Hui.Emotional Speech Conversion and Recognition Based on the 3D-PAD Model.Master Dissertation.Lanzhou,China:Northwest Normal University,2009(in Chinese)(周慧.基于PAD三维情绪模型的情感语音转换与识别.硕士学位论文.兰州:西北师范大学,2009)
    [18]Pereira C.Dimensions of Emotional Meaning in Speech//Proc of the ITRW on Speech and Emotion.Newcastle,UK,2000:25-28
    [19]Jin Xuecheng.A Study on Recognition of Emotions in Speech.Ph.D Dissertation.Hefei,China:University of Science and Technology of China,2007(in Chinese)(金学成.基于语音信号的情感识别研究.博士学位论文.合肥:中国科学技术大学,2007)
    [20]Sun Peihong,Tao Linmi.Emotion Measuring Method in PAD Emotional Space//Proc of the 4th Joint Conference on Harmonious Human Machine Environment.Kunming,China,2008:638-645(in Chinese)(孙佩宏,陶霖密.PAD情感空间中情感距离度量方法//第四届和谐人机环境联合学术会议.昆明,中国,2008:638-645
    [21]Kawahara H.STRAIGHT,Exploitation of the Other Aspect of VOCODER:Perceptually Isomorphic Decomposition of Speech Sounds.Acoustical Science and Technology,2006,27(6):349-353
    [22]Cabral J P,Oliveira L C.EmoVoice:A System to Generate Emotions in Speech//Proc of the 9th International Conference on Spoken Language Processing.Pittsburgh,USA,2006:1798-1801
    [23]Schrder M.Emotional Speech Synthesis:A Review//Proc of the7th European Conference on Speech Communication and Technology.Aalborg,Denmark,2001:561-564

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700