基于正弦模型的线性预测低速率语音编码算法研究

英文题名：Research on Linear Predictive Speech Coding Based on Sinusoidal Model at Low-Bit-Rates
作者：黄鹤
论文级别：硕士
学科专业名称：电路与系统
中文关键词：语音编码 ; 线性预测 ; 正弦模型 ; 浊音度 ; 谐波相位恢复
英文关键词：Speech Coding ; Linear Predictive ; Sinusoidal Model ; Voicing ; Harmonic Phase Reconstruction
学位年度：2002
导师：鲍长春
学科代码：080902
学位授予单位：北京工业大学
论文提交日期：2002-05-01

摘要

随着通信网的不断发展，高质量的低速率语音编码成为目前研究的热点。本文以北京工业大学通信与信号处理研究室开发的2.4kb／s谐波激励线性预测(HELP)低速率语音编码算法为基础，针对进一步提高语音质量的问题，研究了一系列改进方法。
首先，本文通过对HELP算法的深入分析，根据语音信号谐波相关程度能反映浊音度强弱的性质，开发了一种基于最小均方误差准则的谐波相关浊音度参数提取方法。其次，本文开发了一种高效频域基音检测方法，使基音检测结果精确到分数样点。最后本文根据最小相位假设研究了一种谐波相位的恢复方法。计算机仿真及主观试听结果表明，本文提出的这些改进方案有效地改善了2.4kb／sHELP语音编码的语音质量。
With the development of the Telecommunication network, high quality speech coding at low bit rates has become research focus. Some new methods are presented in this paper to improve the quality of Harmonic Excited Linear Predictive (HELP) coding which Telecommunication and Signal Processing Laboratory in Beijing Polythenic University develop.
First, after deeply investigating HELP model, a harmonic related voicing detection algorithm based on MSE criterion is developed, with the knowledge that voicing algorithm can be showed by degree of harmonic relation. Second, efficient pitch detection is developed in frequency domain, which find fractional pitch. Finally, harmonic phase reconstruction that exploits minimum-phase model is proposed. Computer simulation and subject listening test show that the algorithm of this paper can efficiently improve the quality of reconstructed speech of 2.4kb/s HELP speech coding.

引文

[1]倪维桢，“语音编码综述”，《数字通信》1998年第2期，pp．3—6。
    [2]"speech class",Communications Research Group Department of Electronics and Computer Science University of Southampton
    [3]"commonly used speech codecs",Communications Research Group Department of Electronics and Computer Science University of Southampton
    [4]Bishnu S.Atal & Nikil S.Jayant,"The Continuing Need for Speech Compression",AT&TBell Laboratories,MurrayHill,New Jersey,USA
    [5][美]L.R拉宾纳，R.W谢弗著，朱雪龙译，语音信号数字处理，科学出版社，1987年8月。
    [6]DANIEL W.GRIFFIN AND JAE S.LIM,"Multiband Excitation Vocoder",IEEE vol.36.NO.8,AUGUST 1998
    [7]R.J.McAulay and T.F.Quatieri,"MAGNITUDE-ONLY RECONSTRUCTION USING A SINUSOIDUAL SPEECH MODEL",MIT Lincoln Laboratory Lexington,MA 02173-0073
    [8]R. J.McAulay and T.F.Quatieri,"MID-RATE CODING BASED ON A SINUSOIDUAL REPRESENTATION OF SPEECH",MIT Lincoln Laboratory Lexington,MA 02173-0073
    [9]张海，“硕士学位论文”，北京工业大学，语音通信研究室。
    [10]Bao chang chun,"High Quality Harmonic Excited Linear Predictive Speech Coding at 2.4kb/s",Chinese Journal of Electronics 2002.Vol.11.No.1,January.2002,pp:19～23.
    [11]R.J.McAulay and T.F.Quatieri,"SPEECH ANALYSIS/SYNTHESIS BASED ON A SINUSOIDUAL REPRESENTATION OF SPEECH",MIT Lincoln Laboratory Lexington
    [12]"Waveform Codecs",Communications Research Group Department of Electronics and Computer Science University of Southampton
    [13]胡广书，“数字信号处理”，清华大学出版社
    [14]鲍长春，低比特率编码的若干问题的研究，博士后研究工作报告，西安电子科技大学，1997年11月。
    [15]余小清，樊昌信，多带激励语音编码器的研究，西安电子科技大学。
    [16]鲍长春，“低比特率数字语音编码基础”，北京工业大学出版社。
    [17]R.J.McAulay and T.F.Quatieri,Sinusoidal Coding,Speech Coding and Synthesis,edited by W.B.Kleijn and K.K.Paliwal,Elsevier Science B.V,1995:121～173
    [18]黄鹤，鲍长春，基于谐波正弦语音合成模型的基音检测算法，中国电子学会第七届学术年会，2001年8月


    [19]黄鹤，鲍长春，一种坚韧的低延时基音检测算法，信号处理(增刊)，1999年10月：167～171
    [20]江灏，“低速率语音编码的研究”，申请清华大学工学博士学位论文，1998年11月
    [21]Takahiro Unno,Thomas P.Barnwell Ⅲ,Kwan Truong,"AN IMPROVED MIXED EXCITATION LINEAR PREDICTION CODER",IEEE19990-7803-5041-3/99
    [22]D.B.Paul. "The Spectral Envelop Estimation Vocoder" in IEEE Trans on Acoust.Speech and Signal Proc. ASSP-29,1981.pp.786-794
    [23]鲍长春、戴逸松，“线谱对参数的一步插值预测矢量量化”，长春邮电学报，第13卷，第4期，PP．1-7，1995年。
    [24]鲍长春、樊昌信、王都生，“线谱频率参数的分裂矢量量化”，电子科学学刊，第20卷，第4期，PP．508-514，1998年7月。
    [25]卓力，鲍长春，“一种高效透明的频率参数矢量量化器”CCSP'99，PP．154-158，1999年10月
    [26]Suat Yeldener,Juan Carlos De Martin,and Vishu Viswananthan,"A MIXED SINUSOIDUAL EXCITED LINEAR PREDICTION CODER AT 4KB/S AND BELOW", Dallas,Texas,USA.
    [27]WW.Chang,D.-Y.Wang,"Quality enhancement of sinusoidual transform vocoders"
    [28]Azhar Mustapha Suat Yeldener, "AN ADAPTIVE POST-FILTERING TECHNIQUE BASED ON THE MODIFIED YULE-WALKER FILTER",COMSAT Lboratories.
    [29]王田，冯重熙，唐昆，1.2Kb/s～2.4Kb/s低速率语音编码技术的研究，清华大学工学博士学位论文，1996年3月。
    [30]陈永彬，王人华，语音信号处理，中国科技大学出版社，1990年4月。
    [31]杨行峻、迟惠升，“语音信号数字处理”，电子工业出版社。
    [32]S.Yeldener,A.M.Kondoz,B.G.Evans, "Multiband linear predictive speech coding at very low bit rates",lEE,1994.Vol.141.No.5,October 1994.
    [33]DVSI, "Inmarsat-M Voice Codec,Version 2," Inmarsat-M Specification,Inmarsat,Feb.1991:15～20
    [34]Suat Yeldener,"A 4KB/S TOLL QUALITY HARMONIC EXCITATION LINEAR PREDICTIVE SPEECH CODER",COMSAT Laborotaries,IEEE 1999,0-7803-5041-3


    [35] "Speech Coding Using an Analysis by Synthesis Sinusoidal Model" , CAGRI OZGENC ETEMOGLU UCSB,ECE,SCL.
    [36] Yong Duk Cho,Moo Young Kim and Sang Ryong Kim, "A SPECTRALLY MIXED EXCITATION VOCODER WITH ROBUST PARAMETER DETERMINATION" ,Human &Computer Interaction Lab.Samsung Advanced Institute of Technology.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700