基于原型波形内插算法的语音问题的研究

英文题名：Study of Speech Based on Prototype Waveform Interpolaton
作者：史学晶
论文级别：硕士
学科专业名称：检测技术与自动化装置
中文关键词：原型波形内插 ; 语音编码 ; 语音合成 ; 声调调整
英文关键词：prototype waveform interpolation ; speech coding ; speech synthesis ; tone modification
学位年度：2004
导师：赵淑清
学科代码：081102
学位授予单位：北京化工大学
论文提交日期：2004-04-15

摘要

本论文主要完成基于原型波形内插(PWI-Prototype Waveform Interpolation)算法的语音编码和基于这一算法在汉语语音合成中声调调整方面的研究。原型波形内插(PWI)算法是美国AT&T贝尔实验室的W.B.Kleijn博士首先提出来的,这种算法利用了浊音语音的周期性,将浊音语音看作是慢变化的基音周期波形的连接,每隔20～30ms提取一单个的基音周期波形,然后在更新点处进行内插重建语音信号。
    本文系统介绍了原型波形内插(PWI)的基本原理及其实现方法,然后在对规则脉冲激励—长时预测(RPE-LTP)语音编码方案(13kb/s)研究的基础上,利用原型波形内插方法,提出了浊音语音4.8kb/s的编码方案,使编码速率大大降低。计算机模拟实验表明,这种编码语音质量与GSM编码方案质量相当。
    此外,本论文还研究了PWI算法在语音合成上,尤其在声调调整上的应用。传统的基音同步叠加算法(PSOLA)虽然具有良好的韵律调整能力,但是也有不足之处,当基音频率修改过大时有可能出现严重的谱包络失真,即共振峰特性产生不可接受的变异。本论文将PWI算法与PSOLA算法结合,对这一缺陷进行了改进。
This paper introduces a speech coding scheme and a tone modification method for Chinese speech synthesis based on prototype waveform interpolation (PWI) algorithm. PWI algorithm is proposed by Dr. W.B.Kleijn when he worked in AT&T Bell Laboratories. Based on the periodicity, voiced speech is interpreted as a concatenation of slowly evolving pitch-cycle waveforms. The waveform of a single pitch cycle, which will be referred to as the prototype waveform, is transmitted at regular intervals (of 20-30 ms) and then interpolated between theses update points.
    The principle and implementation method of the prototype waveform interpolation are analyzed and introduced in detail. On the base of the study of Regular Pulse Excitation-Long Term Prediction speech coding scheme (13kb/s) a voiced speech coding scheme used PWI method at 4.8kb/s is described, which greatly reduces the coding rate. The computer simulation results show that the synthesized speech quality of PWI scheme is close to that of the original one.
    In addition, this paper introduces a tone modification method for


    Chinese speech synthesis used PWI. Traditional algorithm, time-domain pitch–synchronous overlap-add (PSOLA) is capable of transforming prosodic features of the Chinese speech. But PSOLA algorithm shows some of shortcomings. When the pitch frequency is modified greatly, spectral envelope will distort, which means formant features produce inacceptable variation. This paper combines PWI and PSOLA algorithm, and improves the result of tone modification.

引文

[1] 鲍长春,低比特率数字语音编码基础,北京:北京工业大学出版社,2001
    [2] 徐金标,基于波形内插语音编码算法,中国科学院声学研究所博士后论文,1999
    [3] 陈显治,现代通信技术,北京:电子工业出版社,2001
    [4] Recommendation G.721, 32kb/s A daptive Differential Pulse Code Modulation, 8th Plenary Assembly of CCITT, 1884
    [5] Recommendation G.729, 8kbit/s Conjugate Structure -Algebraic Code Excited Linear Prediction,ITU-T,1996
    [6] 杨行俊,迟惠生,语音信号数字处理,北京:电子工业出版社,1995
    [7] 谌卫军,李建民,林福宗,张钹,汉语文语转换系统(TTS),计算机工程与应用,2000(9)
    [8] Oliveira L C, Viana M, Rule-Based Text-to-Speech System for Portuguese, ICASSP, 1992, 73-76
    [9] MoulinesE, CharpentierF,Pitch-synchronous waveform processing Techniques for text-to-speech synthesis,Speech Communication,1990(9),453-456
    [10] ValbretH, MoulinesE,TubachJP,Voice transformation using PSOLA techniques,Speech Communication, 1992, 11(2), 175-187
    [11] 张后旗,俞振利,张礼和,基于TD-PSOLA算法的汉语普通话韵律合成,科技通报,2002( 1),6-13
    [12] TaoJian-hua,HuaYi-man,A Chinese speech rule synthesis system based on psola,Journal of Nanjing University, 1998(1),85～92
    [13] 王兵,苏恩泽,汉语语音的时域声调转换方法,数据采集与处理,1996(1), 10-13
    [14] 张钦,李辉,戴蓓倩,基于协同发音现象的一种汉语语音合成方法,小型微型计算机系统,2003(6),1091-1094
    [15] 吴宗济,普通话语句中的声调变化,中国语文,1982(6),433-450
    [16] 黄南川,邓振杰,王嵬嵬,张皓健,语音合成技术的研究与发展,华北航天工业学院学报,2002(9),37-39
    [17] 王仁华,语音合成技术最新研究进展及其应用展望,中兴通讯技术,2003(5)

    [18] W.Bastiaan Kleijn,W.Granzow, Methods for waveform interpolation in speech coding, Digital Signal Processing,1993(1),215-230
    [19] W.Bastiaan Kleijn, Encoding Speech Using Prototype Waveforms,IEEE transactions on speech and audio processing,October 1993 ,VOL 1,NO 4,386-399
    [20] Eddile L.T.Choy,Waveform Interpolation Speech coder at 4kb/s,M.S. Thesis, McGill University, 1998
    [21] M.Leong, P.Kabal, Smooth Speech Reconstruction Using Prototype Waveform Interpolation , Proc. IEEE Workshop on Speech Coding for Telecommunications, Oct. 1993,39-40
    [22] W.Bastiaan Kleijn,K.Kpaliwal, Speech Coding and Synthesis, Elsevier Science B.V,1995
    [23] 易克初,语音信号处理,北京:国防工业出版社,2000
    [24] 王炳锡,语音编码,西安:西安电子科技大学出版社,2002
    [25] 胡广书,数字信号处理—理论、算法与实现,北京:清华大学出版社,1997
    [26] J.Nakhoul, Linear Prediction, Proc.IEEE,1975(63),561-580
    [27] 夏光荣,线谱对及其在声码器中的应用,通信技术,1994(2),13-19
    [28] 王逸军,低码率语音编码的线谱对实现,重庆邮电学院学报,1999(1),49-50
    [29] 陶建华,华一满,自动精确测定浊音中最大激励值的位置,应用声学,1997(5),21-25
    [30] 应志伟,柴佩琪,岳东剑,用峰值校正自相关函数检测的汉语基音周期,同济大学学报,2001(3),366-370
    [31] 高戈,胡瑞敏,艾浩军,李德仁,一种新的鲁棒基音周期估计算法,计算机工程与应用,2002(9),37-39
    [32] 吕声,王炳锡,一种改进的混合激励线性预测的基音周期估计算法,信号处理,2001(1),56-59
    [33] 顾良,刘润生,高性能汉语语音基音周期估计,电子学报,1999(1),8-11
    [34] 罗小冬,裘雪红,刘凯,语音信号的基音标注算法,计算机与现代化,2003(1),3-5
    [35] 徐金标,杜利民,基音同步特征波形内插语音编码算法,声学学报,2000(6),499-533
    [36] 赵胜辉,匡镜明,刘波涛, 一种改进的规则脉冲激励-长时预测语音编码方案,北京理工大学学报,1995(2),199-204
    [37] 陶建华,华一满,基于PSOLA技术的汉语语音规则合成系统,南京大学学报,


    1998(1),85-92
    [38] 王兵,苏恩泽,汉语语音合成中的一种声调调整修正方法,计算机工程,1996(1),6-8
    [39] Edmilson S.Morais, Parl Taylor, Fabio Violaro, Concatenative Text-to-Speech Synthesis Based on Prototype Waveform Interpolation, Proc Of the International Telecommunication Simposium,1998
    [40] 陶建华,蔡莲红,基于音节韵律特征分类的汉语语音合成中韵律模型的研究,声学学报,2003(5),395-402
    [41] Moulines E,Charpentier F, Pitch-synchronous Waveform Processing Techniques
    for text-to-speech synthesis using Diphones, Speech Communication,1990(9),
    453~456
    [42] ChenSinhong,WangYiru, A Mandarin Text-to-Speech System, Computational Linguistics and Chinese Language Processing,1996(8),87~100
    [43] Lee LS, The Synthesis Rules in a Chinese Text-to-Speech System, IEEE Trans ASSP, 1989(9),1309~1320

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700