基音周期检测算法研究及在语音合成中的应用

英文题名：Study of Speech Pitch Period Detection Algorithm and the Application in Speech Synthesis System
作者：李娟
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：基音周期检测 ; 小波变换 ; Hilbert-Huang变换 ; 语音合成
英文关键词：pitch period detection ; wavelet transform ; Hilbert-Huang transform ; speech synthesis
学位年度：2008
导师：张雪英
学科代码：081002
学位授予单位：太原理工大学
论文提交日期：2008-05-01

摘要

语音信号的基音周期是描述激励源的重要特征参数之一,准确的检测语音信号的基音周期对高质量的语音分析与合成、语音压缩编码、语音识别等都具有重要意义。本文讨论了几种常用的基音周期检测方法以及小波变换和Hilbert-Huang变换,提出了抗噪性很好的自相关能量函数和幅度差能量函数相结合的基音周期检测算法,并将Hilbert-Huang变换应用于TD-PSOLA语音合成系统的基音标记中。
     文中首先介绍了几种常见的语音基音周期检测方法如自相关函数法(ACF)、平均幅度差法(AMDF)、倒谱法。自相关函数方法适合于噪声环境下,但单独使用经常发生基频估计结果为其实际基音频率的二次倍频或二次分频的情况;平均幅度差法、倒谱法在静音环境下或噪声较小时可以取得较好的检测结果,但在语音环境较恶劣、信噪比较低时,检测的结果下降较快,难以让人满意。基于此,本文提出了一种抗噪性很好的自相关能量函数(ACEF)和幅度差能量函数(MDEF)相结合的基音周期检测算法,抑制了自相关函数不必要的峰值,提高了抗噪性,有效弥补了传统基音周期检测算法的缺点。
     论文介绍了小波变换理论,包括连续小波变换、离散小波变换、多分辨率分析、Mallat算法等,并通过实验分析了基于Mallat算法的基音周期检测方法—小波分解与重构算法(高频置零)以及在Mallat算法基础上衍生出的多孔算法。直接用Mallat算法分解语音信号时,需要降采样,每一级分解后的分量长度是上一级分解分量长度的一半;而采用多孔算法时是直接对滤波器系数插值,每一级分解后的分量长度都与原信号的长度相等,有利于基音周期的提取。
     论文介绍了Hilbert—Huang变换理论,并将它应用于基音周期检测中。与传统方法相比,Hilbert-Huang变换不需要对语音信号进行短时平稳假设,检测精度高,适应范围广,帧长大大增加;与小波变换相比,Hilbert—Huang变换依据信号本身的信息对信号进行分解,随信号本身变化而变化,表现了信号内含的真实物理信息,具有更好的自适应性和优越性。
     论文将Hilbert—Huang变换应用于TD-PSOLA语音合成系统基音标注中,大大拓展了Hilbert-Huang变换的应用范围,并以实验证明:通常使用的自相关方法只求得每帧语音信号的平均基音周期,然后对所求得的基音周期在帧内采用插值技术标注,准确性不高;而用Hilbert-Huang变换方法给语音信号做基音标注,基本检测出了一段语音信号的所有基音峰值点,体现出每帧内微小的周期变化,比通常使用的自相关方法准确性高。
Pitch period of speech signal is a very important character parameter to describe the excitation source. Detecting the pitch period of speech signal accurately has very important significance for speech analysis and synthesis, speech compression and coding, speech recognition. The paper discusses several common methods for pitch period detection and wavelet transform, Hilbert-Huang transform, this paper proposes the algorithm of AutoCorrection Energy Function (ACEF) combined with Magnitude Difference Energy Function (MDEF) which has good performance in anti-noise, meanwhile applies the Hilbert-Huang transform to pitch synchronous mark of TD-PSOLA speech synthesis system.
     This paper first introduces some kinds of commonly used speech pitch period detection. For example AutoCorrection Function (ACF), Average Magnitude Difference Function (AMDF), cepstrum etc. ACF is suitable for noise environment, but it is possible to produce the situation that period estimating results is double or half times of the actual results, AMDF and cepstrum can receive good detection results under silence environment or less noisy environment, but the decline of the result is fast under bad environment or low SNR environment and the result is difficult to be satisfacted. Therefore, we proposed a method which has good anti-noise performance--AutoCorrection Energy Function (ACEF) combined with Magnitude Difference Energy Function (MDEF), It improves the anti-noise performance, compensates the shortcomings of traditional pitch period detection method effectively.
     Next, The paper introduces the wavelet transform theory, including continuous wavelet transform, discrete wavelet transform, multi-resolution analysis, Mallat algorithm, Etc. This paper proposed a method of pitch period detection based on Mallat algorithm—wavelet decomposition and reconstruction algorithm (high frequency set 0) and trous algorithm which is derivated from Mallat algorithm. Mallat algorithm decompose speech signal directly, it needs to drop sampling, the length of each level of decomposition component is half of the length of decomposition component of the last level, but the trous algorithm interpolates to the filter coefficients directly, the length of each level of decomposition component is equal to the length of the original signal, it is conducive to pitch period extraction.
     This paper introduces Hilbert-Huang transform and applies it in pitch period detection, Comparing with traditional methods, Hilbert-Huang transform doesn't need to do assumption of short-term stationary for speech signal and has highly detection accuracy, widely application scope, The length of frame greatly increases. Comparing with wavelet transform, Hilbert-Huang transform decomposes signal according to signal' own information, changes with signal itself, it reflect the real physical information of the signal and has a better adaptability and superiority.
     In paper. Hilbert-Huang transform is applied in pitch mark of TD-PSOLA speech synthesis system, it expands the application scope of Hilbert-Huang transform. The experiment shows: The commonly used methods only can achieve an average pitch period of each frame, and then mark the pitch period by interpolation technology, the accuracy is not high. Marking pitch period by Hilbert-Huang transform can detect almost all the pitch peaks, reflect small changes in the frame, it has highly accuracy than ACF.

引文

[1]拉宾纳谢弗,语音信号数字处理,科学出版社,1983
    [2]韩纪庆,张磊,郑铁然,语音信号处理,:清华大学出版社,2005
    [3]冯康,时慧琨,语音信号基音检测的现状及展望,微机发展,2004,Vol.14,No.3,pp.37-39
    [4]胡广书,现代信号处理教程,清华大学出版社,2005
    [5]飞思科技产品研发中心,辅助信号处理技术与应用,电子工业出版社,2005
    [6]柏静,韦岗,一种基于线性预测与自相关函数法的语音基音周期检测算法,语音技术,2005,Vol.43,No.4,pp.42-45
    [7]王长富,林志钢,戴蓓倩,张劲松,基于小波变换的语音基音周期检测,中国科技大学学报,1995,Vol.25,No.1,pp.47-52
    [8]杨志华,齐东旭,杨力华,一种基于Hilbert-Huang变换的基音周期检测新方法,计算机学报,2006,Vol.29,No.1,pp.106-115
    [9]M.J.Ross,H.L.Shaffer,A.Cohen.etal,Average magnitude difference function Pitch extractor,IEEE Trans.on Acoustics Speech and Signal Proc,1974,Vol.22,No.5pp.353-362
    [10]LI Jing and BAO Changchun,A Pitch Detector Based On the Dyadic Wavelet Transform and The Autocorrelation Function,Electronic Information and Control Engineering,College Beijing Polytechnic University China
    [11]鲍长春,樊昌信,基于归一化互相关函数的基音检测算法,通信学报,1998,Vol.19,No.10,pp.27-31
    [12]Xu Gang,Tang Liang-rui.Speech Pitch Period Estimation Using Circular AMDF,The 14~(th)IEEE 2003 International Symposium on Personal,Indoor and Mobile Radio Communication Proceedings
    [13]Yu-Min Zeng,Zhen-Yang Wu.Hal-Bin Liu.Lin Wou,Modified AMDF Pitch Detection Algorithm,Proceedings of the Second International Conference on Machine Learning and Cybernetics,November 2003,pp.2-5
    [14]王小亚,倒谱在语音的基音和共振峰提取中的应用,工程应用,2004,Vol.34,No.1, pp.57-61
    [15]L.R.Rabiner,On the use of autocorrelation analysis for Pitch detection,IEEE Trans.on Acoustics Speech and signal proc,1977,ASSP Vol.26,No.1,pp.24-33
    [16]A.M.Noll,Cepstrum Pitch determination,J.Aeoust.Soc.Amer.1967,Vol.41,No.2,pp.293-309
    [17]许钢,黄冰,基于小波变换和归一化自相关的基音检测算法,桂林电子工业学院学报,2003,Vol.23,No.6,pp.2-5
    [18]飞思科技产品研发中心,小波分析理论与MATLAB7实现,电子工业出版社,2005
    [19]徐佩霞,孙功宪,小波分析与应用实例,中国科学技术大学出版社,1996
    [20]于德介,程军圣,杨宇,机械故障诊断的Hilbert-Huang变换方法,科学出版社,2007
    [21]Norden E.Huang,Zheng Shen,Steven R.Long,Manli C.Wu,Hsing H.Shih,Quanan Zheng,Nai-Chyuan Yen,Chi Chao Tung and Henry H.Liu,The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,The Royal Society,1996
    [22]黄南川,邓振杰,王嵬嵬,张皓健,语音合成技术的研究与发展,华北航天工业大学学报,2002,No.3,Vol.12,pp.37-39
    [23]冯哲,孙吉贵,张长胜,王岩,汉语语音合成的研究进展,吉林大学学报,2007,Vol.25.No.2,pp.198-206
    [24]朱维彬,吕士楠,基于语义的语音合成—语音合成技术的现状及展望,北京理工大学学报,2007,No.5,Vol.27,pp.408-412
    [25]陶建华,蔡莲红,计算机语音合成的关键技术及展望,清华大学计算机科学与技术部
    [26]张后旗,俞振利,张礼和,基于TD-PSOLA算法的汉语普通话韵律合成,科技通报,2002,No.1,Vol.18,pp.6-9
    [27]刘鲁源,李宗勃,从傅立叶变化到小波变化,自动化与仪器,2000,Vol.15,No.6,pp.1-27
    [28]芮国胜,康健等译,小波与傅里叶分析基础,北京,电子工业出版社,2004
    [29]张旭东,卢国栋,冯健,图像编码基础和小波压缩技术—原理、算法和标准,北京,清华大学出版社,2004
    [30]杨力华,戴道清,黄文良等译,信号处理的小波导引,北京,机械工业出版社,2002
    [31]何其超,周建忠,激流,离散小波变换(DWT)在语音处理中的应用,四川大学学报(自然科学版),1995,Vol.32,No.3,pp.289-294
    [32]DOWNIE T R,SILVERMAN B W,The discrete multiple wavelet transform and thresholding methods,IEEE Transon Signal Processing,1998,Vol.46,No.9,pp.2558-2561
    [33]LI Jiang,BAO Chang chun,A Pitch Detection Based on The Dyadic Wavelet Transform and the autocorrelation function,IEEE,ICSP'02 Proceedings,pp.414-417
    [34]钱清泉,基于小波变换的信号奇异性指数计算方法及其应用,电力自动化设备,2000,Vol.20,No.3,pp.12-15
    [35]薛松涛,谢强,基于小波理论的奇异信号分析,上海海运学院学报,2001,Vol.22,No.3,pp.79-80
    [36]Shubha Kadambe,G.Faye Boudreaux-Bartels.Application of the Wavelet Transform for Pitch Detection of Speech Signals.IEEE Transactions of Information Theory.March 1992,Vol.38,No.2,pp.917-924
    [37]张奉军,周燕,曹建国,MALLAT算法快速实现方法及应用研究,自动化与仪器仪表,2004,No.2,pp.26-27
    [38]王建中,赵军,张晖,图像边缘提取的小波多孔算法及改进,2004,Vol.26,No.1,pp.76-79
    [39]王植存,张元,王珂,基于多孔变换的心电信号标定,生物医学工程研究,2006,Vol.25,No.1,pp.147-150
    [40]赵建伟,楼红伟,徐大为,刘重庆,噪声环境下的基音检测方法,红外与激光工程,2002,Vol.31,No.1,pp.5-8
    [41]刘建,郑方,邓箐,吴文虎,基于混合幅度差函数的基音提取算法,电子学报,2006,Vol.34,No.10,pp.1925-1928
    [42]邓洪省,舒大文,小波包变换的频带分析的实现,机械,2004,Vol.31,pp.59-60
    [43]吴勇,吴传生,刘小双,小波包分析在振动测试信号去噪中的应用,安徽师范大学学报,2007,No.1,Vol.30,pp.28-30
    [44]于德介,程军圣,杨宇,Hilbert-Huang变换在齿轮故障诊断中的应用,机械工程学报,2004,Vol.41,No.6,pp.102-106
    [45]孙涛,刘晶憬,孔凡,万平,小波变换和希尔伯特—黄变换在时频分析中的应用,中国水运,2006,Vol.4,No.11,pp.111-113
    [46]石春香,罗奇峰,时程信号的Hilbert-Huang变换与小波分析,地震学报,2003,Vol.25,No.4,pp.398-405
    [47]方青,国辛纯,洪锐,TD-PSOLA算法对基音频率和时长的控制,电子测量技术,2006,No.6,Vol.12,pp.175-176
    [48]何峰,于东武,林嘉宇,一种语音更改技术的研究与实现,语音技术,2007,No.2,Vol,31,pp.54-59
    [49]罗小冬,裘雪红,刘凯,语音信号的基音标注算法,计算机与现代化,2003,No.1,Vol.3,pp.3-5

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700