语音合成自然度的研究

英文题名：The Research of Speech Synthesis Naturalness Based on Computer
作者：吕鹏
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：语音合成 ; 自然度 ; PSOLA算法 ; 清浊音的判断 ; 短时自相关函数
英文关键词：Speech synthesis ; Nature ; PSOLA algorithm ; Clear voiced sound judgment ; Short-time autocorrelation function
学位年度：2010
导师：刘齐跃
学科代码：081001
学位授予单位：河北科技大学
论文提交日期：2010-03-10

摘要

随着社会的不断进步,人们在关于语音处理方面的研究已经取得了很多研究成果,尤其是语音合成的可懂度已经达到了相当高的要求,但是在语音自然度方面仍然与人们的预期要求有一定的差距,这将严重影响语音合成技术的进一步发展。
     本文主要研究在语音合成的基础上,针对现在的语音合成自然度不高的问题提出的改进方法,主要过程为以自我录制的语音库的语音合成为例,利用波形拼接的方法对语音自然度进行改进,并通过主客观评测方式验证语音自然度的改进效果。主要内容如下:
     1)从语音学的基本要素出发,分析语音合成的基本要素,研究一些影响语音合成自然度的相关问题,并从中分析出语音合成与语音识别等的关系。
     2)以音节为单位制作语音库,并通过对语音的无声段处理,消除掉影响语音信号连接的停顿较长的问题,并分析出合成语音时不必要的部分,运用波形拼接算法中的TD-PSOLA和FD-PSOLA方法分别对语音的时长和频率进行调整,使其在韵律控制上更加贴近自然发音,同时利用语音韵律参数声音及图像的对比来看出语音合成前后及与自然音之间的差距,进而分析出语音自然度的改进程度。
     3)最后本文对语音合成自然度的系统进行了仿真实验,经系统仿真后在语音的自然度上有了一定的提高,并利用主客观的方法对合成结果进行了评测,效果非常理想。本文的研究为语音合成自然度的进一步研究提供了很好的基础和方案。
Along with society's unceasing progress, the people in have already obtained the very many research results about the pronunciation processing aspect research, the speech synthesis understandability already has met the quite high requirements in particular, but still had certain disparity in the pronunciation nature aspect with people's anticipated request, this will be serious affects the speech synthesis technology the further development.
     The this article main research in the speech synthesis foundation, the improvement method which proposed in view of the present speech synthesis nature not high question, this article take the self-transcribing pronunciation storehouse speech synthesis as the example, and carries on the subjective and objective using the profile splicing method to the pronunciation nature enhancement the contrast improvement. The primary coverage is as follows:
     1)Embarks from the phonetics basic essential factor, the analysis speech synthesis basic essential factor, studies some influence speech synthesis nature related question, and analyzes the speech synthesis and the speech recognition and so on the relations.
     2)for the unit manufacture speech corpora syllables, and through the silent period of speech, eliminate the influence of speech signal connection problem, and a long pause out synthesized speech unnecessary parts, when using the stitching algorithm waveform and FD-PSOLA, TD-PSOLA method for voice - the duration and frequency adjustment, which is more close to the rhythm control in pronunciation, and use natural voice and image sound prosodic parameter comparison to see speech synthesis and the gap between the sound and the nature, and then analyzes the voice of degree of improvement of natural.
     3)Finally this article has carried on the simulation experiment to the speech synthesis nature system, had certain enhancement after the system simulation in the pronunciation nature, and carried on the evaluation using the subjective and objective method to the synthesis result, the effect has been extremely ideal. This article research has provided the very good foundation and the plan for the speech synthesis nature further research.

引文

[1]朱维彬,吕士楠.基于语义的语音合成--语音合成技术的现状及展望.北京理工大学学报,2007, 1(5): 408-411
    [2]蔡莲红,黄德智,蔡锐.现代语音技术基础与应.北京:清华大学出版社,2003: 168-174
    [3]柳春,于洪志.语音合成技术研究.现代教育技术, 2008, 26(2):64-66
    [4]胡航.语音信号处理.哈尔滨:哈尔滨工业大学出版社,2002:153
    [5]吴志勇,蔡莲红.语音合成技术的原理和应用
    [6]沈炯.关于韵律和语调的一些看法.第三届全国语音学研讨会论文集,1996
    [7] Z.J.WU. Rules of intonation in Standard Chinese. Preprints of papers for the working group on intonation,1982
    [8] D.GChilders. Matlab之语音处理与合成工具箱.北京:清华大学出版社.2004
    [9] H.Q.BAO, A.H.WANG, S.N.LU. A Study of Evalutation Method For Synthetic. Mandam Speech,ISCSLP 2002
    [10] AKMI IIDA, NICK CAMPBELL. .Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY 6,379-392,2003@2003Kluwer Academic Publishers Manufactured in The Netherlands
    [11] Donglas-Cowie., E,Cowie, R,Schroder. M,A new emotion database:Consideration. sources and scope.ISCA workshop on speech and emotion,Belafase,2000
    [12]蔡莲红,崔丹丹,蔡锐.汉语普通话语音合成语料库TH-CoSS的建设和分析.中文信息学报,2007, 21(2): 94-99
    [13]蔡莲红,赵世霞.汉语语音合成语料库的研究与建立.语言文字应用,1999, 3
    [14]叶振兴,蔡莲红.一种基于决策树模型的音库构建和基元选取方法.计算机工程,2006, 32(10): 41-47
    [15]初敏,吕士楠.一种高清晰度、高自然度的汉语文语转换系统.声学学报,1996, 21(4)
    [16]黄慧明,王瑛,赵思伟,张知易.语音系统客观音质评价研究.电子学报,2000, 4
    [17] A.TICKLE. English and Janpanese Speskers’Emotion Vocalisation and Recognition:A Comparision Highlightiong Vowel Quality,ISCA Workshop on Speech&Emotion,Northern Ireland 2000:104-109.
    [18]陈国,胡修林,张蕴玉等.语音质量客观评测方法研究进展.电子学报,2001, 4
    [19]高明明.普通话句中强调重音韵律特征的实验研究. [北京大学中文系博士论文]. 1993
    [20]李净,徐明星,张继勇等.汉语连续语音识别中声学模型基元比较:音节、音素、声韵母.第六届全国人机语音通讯学术会议, 2001,11:267-271
    [21]郑玉玲.韵律词边界的协同发音问题--对语音合成自然度的思考.清华大学学报,2008, 48(S1):645-651
    [22] G.P.CHEN. A superposed prosodic model for Chinese text-to-speech synthesis,In the International Conference of Chinese Spoken Language Processing, 2004:177-180,
    [23] G.BAILLY.and B.HOLM. Learning the hidden structure of speech:fron communicative functions to prosody,Cademos de Estudos Linguisticos, 2002:37-54
    [24]王天庆,李爱军.基于SFC模型的韵律词音高模式研究. Report of Phonetic Research ,2005
    [25]陶建华,赵晨,蔡莲红.基于统计韵律模型的汉语语音合成系统的研究.中文信息学报,16(1):1-6
    [26] Marc Schroder, Emotional Speech Synthesis:A Review,Eurospeech 2001
    [27]林茂灿.普通话语句的韵律结构和基频(F0)高低线构建.当代语言学,2002
    [28] Paul. Boersma Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, institute of Phonetic Sciences, University of Amsterdam,Proceedings ,1993:97-100
    [29]岳东剑,柴佩琪,宣国荣.面向文语转换的标记语言标准的研究.计算机辅助工程,2000,2
    [30]郭庆,片江伸之,于浩,岩见田均.语音合成系统中高质量的韵律生成. 2008, 22(2):110-115
    [31] Q.GUO, Nobuyuki Katate. Duration Prediction in Mandarin TTS System. 3th International Conference on Speech Prosody.2006
    [32]蔡莲红,黄德智,蔡锐.现代语音技术基础与应用.北京:清华大学出版社,2003
    [33]赵力.语音信号处理,北京:机械工业出版社,2003:33-86
    [34]赵博. Matlab在语音分析中的应用.计算机系统应用,2005, 2:33-37
    [35]蔡莲红,吴志勇,等.语音技术的拓展与展望.计算机世界,2001,6
    [36]陈永斌,王仁华.语音信号处理.合肥:中国科技大学出版社,1990
    [37]王炳锡.语音编码.西安:西安电子科技大学出版社,2002
    [38]韩纪庆,张磊,郑铁然.语音信号处理.北京:清华大学出版社,2004
    [39]张雄伟,陈亮,杨吉斌.现代语音处理技术及应用.北京:机械工业出版社,2003
    [40] D.H.KLATT. software for a case/parallel information synthesizer. JASA,67(3):971-995
    [41]刘晓明,覃胜,刘宗行,等.语音端点检测的仿真研究.系统仿真学报,2005, 17(8): 1974-1976
    [42]易克初,田斌,付强.语音信号处理..北京:国防工业出版社.2000
    [43]蔡莲红.波形编辑语音合成技术及在汉语TTS中的应用.小型微型计算机系统,1994, 15(10):11-16
    [44]苏珊珊.基于波形拼接的语音合成技术研究.福建电脑,2008,10:104-105
    [45]方青,圆辛纯,洪锐. TD-PSOLA算法对基音频率和时长的控制.电子测量技术,2006, 29(6)
    [46]张后旗,俞振利,张礼和.基于TD-PSOLA算法的汉语普通话韵律合成.科技通报,2002, 18(1)
    [47]黎子芬,谢晓芳,林丽娜,刘剑锋.基于TD-PSOLA算法的语音合成方法研究.海军航空工程学院学报,2008, 23(1)
    [48]罗小东,裘雪红,刘凯.语音信号的基音标注算法.计算机与现代化,2003, 1
    [49]郭锋.基于PSOLA的汉语文语转换技术研究. [南京航空航天大学硕士论文]
    [50]付强.语音的参数表示和质量客观评价研究.西安电子科技大学,2000
    [51]张家騄,齐士钤,愈舸.汉语语音合成系统评价方法.声学学报,1998, 23(1):19-30

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700