基于非线性理论的汉语语音编码技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音的数字化分析和处理是语音信号数字传输和数字存储的重要过程。随着语音通信技术的发展,高音质、低带宽等优点一直是人们追求的目标,语音压缩编码在实现这一目标的过程中担当着十分重要的角色。
     目前语音信号的分析与压缩编码都是采用线性理论和线性预测编码技术,而语音信号的产生系统是一个复杂的非线性时变系统,具有混沌性和分形特征,所以采用线性方法来对语音进行处理无法从根本上提高语音传输和存储的性能。因此,论文在深入研究了语音信号非线性特性的基础上,结合径向基神经网络(简称RBF神经网络,Radical Basis Function Network)构造了一个语音信号非线性预测模型,并基于该模型设计出一个非线性预测编码系统。论文主要研究工作和创新点如下:
     (1)语音信号的混沌性检测和分形特征
     在非线性理论的基础上,针对汉语语音音素非线性特征参数的求解算法进行了研究,提出采用Wolf算法计算出33个汉语语音音素的最大Lyapunov指数,所得结果证明了汉语语音信号具有混沌性。然后采用GP算法求解出33个汉语语音音素的关联维数,根据所得结果说明浊音信号的产生系统是低维系统,而部分清音的发音系统是高维系统。
     (2)语音信号的相空间重构及其参数确定
     对语音信号非线性预测的理论依据以及预测工具进行了分析,并研究相空间重构参数——延迟时间和嵌入维数的确定方法。针对C-C算法存在的局限性,采用结合自相关算法、虚假近邻法的方法分别求解出汉语语音音素的延迟时间和嵌入维数。针对实验中采样率的选择和语音源的问题,论文运用统计分析的方法进行了研究,所得结果表明计算出的延迟时间和嵌入维数对不同的采样率和语音源具有较强的鲁棒性。
     (3)基于RBF神经网络的汉语语音非线性预测模型
     将汉语语音音素的非线性特征参数与RBF神经网络分析方法相结合,提出根据所计算出的33个汉语语音音素的延迟时间及嵌入维数作为RBF神经网络模型中三层网络神经元个数,构造出一个基于RBF神经网络的汉语语音信号非线性预测模型,并将该预测模型与现有的ADPCM线性预测模型进行了性能比较,仿真结果表明非线性预测模型预测误差较小,说明所提出的非线性预测模型具有更好的预测性能。
     (4)基于小波变换的语音增强处理
     针对语音信号的预测编码性能在噪声环境下会迅速下降的问题,研究了基于小波变换的语音增强处理技术,着重对小波去噪算法中的阈值去噪法进行了研究。一方面,针对阈值去噪算法中的传统阂值的选取难以适应非平稳噪声的这一缺点,将MCRA算法应用于小波域计算其噪声方差,得到随实时变化的噪声估计,并利用谱平坦度自适应调整阈值;另一方面,针对传统的软硬阈值函数的不足,在Breiman提出的非负死区阈值函数的基础上进行了改进,设计出一种改进的阈值函数,并从连续性、单调性等方面进行分析,验证其合理性。
     (5)语音E-CENP编码系统的设计
     运用构造出的非线性预测模型,结合增强处理和CELP语音编码算法,设计了一个非线性预测编码系统——E-CENP。系统中,预处理部分加入了所提出的小波变换的语音增强处理,预测器部分采用了所设计的RBF神经网络的非线性预测模型。仿真结果表明:与CELP线性预测编码系统相比,该非线性预测编码系统具有编码语音质量高、鲁棒性好等优点。
     论文运用非线性的理论和方法,构造了一个E-CENP语音编码系统,与CELP编码系统相比,该编码系统编解码后恢复出的语音信号的音质比较高而且鲁棒性较好,说明所提出的非线性理论的研究方法适合于具有非线性特性的语音,为语音信号的处理技术提供了新的思路和新的方法。
Speech digital analysis and treatment are important process of speech digital transmission and digital storage. With the development of speech communication technology, advantages of high quality and low bandwidth and so on have been pursuing by people. Speech coding plays a significant role in the process of achieving the goal.
     At present, the analysis and prediction of speech signal are all using linear theory and linear prediction technique, but the speech production system is complicated nonlinear and has chaotic property as well as fractal feature, so linear methods can't fundamentally improve performance of the speech transmission and storage. Therefore, the nonlinear characteristic of Chinese speech are further studied, combined with Radical Basis Function Network(RBF Network for short), a nonlinear predictor is designed. Then a nonlinear predictive code system is designed based on the predictor. Main works and results are as follows:
     (1) Speech signal chaotic property detection and fractal feature
     Based on nonlinear theory, nonlinear characteristic parameters of Chinese speech phonemes are studied. The maximum Lyapunov components of33Chinese speech phonemes are solved by Wolf-algorithm. The results indicate Chinese speech has chaotic characteristics. Correlation dimensions of33Chinese speech phonemes are solved by GP-algorithm, the results show that the production system of voiced are low-dimensional system, and the production system of some unvoiced are high-dimensional system.
     (2) Phase space reconstruction of speech signal
     Theoretical basis of speech signal nonlinear prediction and prediction tools are analyzed, and methods of solving phase space reconstruction parameters containing delay time、embed dimension are further studied, which are firstly solved by C-C algorithm, according to the limitation of results, then combined with auto-correlation algorithm and FNN(False Neatest Neighbors) algorithm are solving respectively. According to select sample rate and speech source at experimentations, statistical method is used to study. The results show that sample rate and speech source have little influence on delay time and embed dimension.They have strong robustness.
     (3) Nonlinear predictor model based on RBF network
     Combined with nonlinear characteristics of Chinese speech signal and Radical Basis Function (RBF) network analysis methods, The averages of the delay time and embedding dimension for33Chinese speech phonemes determine the neurons number of the three layers for RBF neural network model, nonlinear prediction model based on RBF network is designed. Compared with the ADPCM linear predictor, the simulation results indicate prediction error of nonlinear predictor based on RBF network is significantly decreased and has higher performance as well as prediction accuracy.
     (4) Speech enhanced treatment based on wavelet transform
     Predictive coding performance of speech signal may drop swiftly at noise circumstance, to be aimed at this problem, speech enhanced treatment technologies based on wavelet transform are studied. Designing threshold function in wavelet threshold de-noising algorithm is studied primarily. On one hand, in order to overcome the drawback of the traditional threshold selection difficult to adapt to the non-stationary noise in threshold denoising algorithm, this paper get noise estimated with real-time changes by applied the MCRA algorithm to the wavelet domain to calculate the noise variance and get adaptive adjustment threshold value by used of spectral flatness. On the other hand, An improved threshold function design on the basis of non-negative dead zone threshold function which not only has good continuity but also overcome the lack of the fixed deviation existence in the soft threshold function and considers the characteristics of the attenuation of the noise wavelet modulus values conform exponentially.
     (5) The design of speech E-CENP code system
     Based on the nonlinear prediction model, CELP speech coding algorithm and enhanced treatment are applied to design a nonlinear predictive coding system——E-CENP whose pretreatment joined enhanced treatment. Linear predictor of CELP is replaced with the nonlinear prediction model. The simulation results indicate:Compared with the linear predictive coding system, nonlinear predictive coding system has high quality、good robustness and so on.
     Based on theories of nonlinear dynamics, a nonlinear predictive coding system——E-CENP is designed. Compared with CELP coding system, the acoustics of the speech signal after by decoding is higher and has good robustness. The results show that the new methods and theories of nonlinear dynamics are adapt to speech,which provides a new idea and solution to the research of technique of speech processing.
引文
[1]黄仲.基于混沌系统分析的非线性语音编码研究[D]:[硕士学位论文].湖南长沙:中南大学,2008.1-2.
    [2]Bell Northern Research.Objective Evaluation of Non-linear Disortion Effects on Voice Transmission Quality [M].Comtribution to CCITT.COM Ⅻ-46-E,March 1986.
    [3]Bell Northern research.Re-ebaluation of the Objective Method for Measurement of Non-linear Disortion [M].Contribution to CCITT.COM Ⅻ-175-E. June 1987.
    [4]林嘉宇.语音信号非线性分析与处理[D]:[博士学位论文].湖南长沙:国防科技大学,1998.23-56.
    [5]S.P.Lloyd. Least Squared Quantization in PCM. IEEE Frans. On information Theory.1982,129-136
    [6]D.Mitra and R.M.Gotz. An Adaptive PCM System Designed for Noisy Channel and Digital Implementations. Bell System Tech.1978,2272-2763
    [7]D.W.Peter.32kb/s ADPCM-DLQ Coding for Network Applications. Proc. IEEE Globcom'82 Conference.1982,A8.3.1-8.3.5
    [8]J.P.Ccampbell,Jr.Thomas,E.Tremain,et al. The Proposed Federal Standard 1016 4800 bps Voice coder:CELP. SPEECH TECHNOLOGY,APR/MAY,1990
    [9]J.H.Chen, R.V.Cox, Y.C.Lin,et al. A Low Delay CELP Coder for the CCITT 16kb/s Speech Coding Standard. IEEE Journal on Selected Areas in Communication. June,1992,830-849
    [10]CCITT Draft Recommendation G.728.1992
    [11]CCITT Draft Recommendation G.729.1996
    [12]张雄伟,曹铁勇.DSP芯片的原理与开发应用(第二版).北京:电子工业出版社,2000,93-221
    [13]TMS320C54x DSP Reference Set, Volume 1:CPU and Peripherals.2001(3),1-507
    [14]Optimized DSP Library for C Programmers on the TMS320C54x. TI,2000,1-98
    [15]N.Kitawaki,H.Negabuchi,K.Itoh.Objective quality evalution for low-bit-rate speech coding systems [J]. IEEE Journa on Sel.Areas in Communications,1988, 6(2):242_248.
    [16]W.B.Kleijn. Speech Coding Below 4kb/s Using Waveform Interpolation. GLOBECOM,1991,1879-1883
    [17]Y.Shoham. High-Quality Speech Coding at 2.4 to 4.0KBPS Based on Time-Frequency Interpolation. ICASSP,1993,vol.2,167-170
    [18]丁瑾,钟涛,胡健栋.语音质量的一种新的评价方法[J].电子学报.1997,25(4):6-9.
    [19]林焘,王里嘉.语音学教程.北京,北京大学出版社.1992,1-87
    [20]R.Kubichek,E.A.Quincy,L.L.Kiser.Speech quality assessment using expert pattern recognition techniques [J]. IEEE Pacific Rim Conference on Computers, Communication and Signal processing, Jun.1989:216_219.
    [21]S.Voran. Estimation of perceived speech quality using measuring normalizing blocks [A]. Proceedings of the 1997 IEEE Speech Coding Workshop [C],1997:83-84.
    [22]Rix, Antony W. (Psytechnics Limited), Hollier, Michael P. Hekstra, Andries P., Beerends, John G, Perceptual evaluation of speech quality (PESQ):The new ITU standard for end-to-end speech quality assessment. Part I-Time-delay compensation. Source:AES:Journal of the Audio Engineering Society, v 50, n 10, October,2002, p 755-764
    [23]陈明义,夏玥.一种基于DSP的语音录放系统的设计.电子技术,2002.8,56-58
    [24]陈明义,夏玥.基于DSP的语音通信系统的研究.湖南大学学报,2002,Vo129,No.4,63-67
    [25]TMS320C54x DSP Reference Set, Volume 5:Enhanced Peripherals. June,1999,1-278
    [26]TMS320C54x Optimizing C/C++ Compiler User's Guide. TI,2001,1-299
    [27]M.Schroeder and B.S.Atal. Code-Excited Linear Prediction(CELP) High Quality Speech at Very Low Bit Rates. ICASSP, Mar,1985,937-940
    [28]B.S.Atal and J.R.Remde. A New Model of LPC Excitation for Producing Natural Sounding Speech at Low Bit Rates. ICASSP,1982,614-617
    [29]K.Ozawa, T.Araseki. Low Bit Rate Multi-Pulse Speech Coder with Natural Speech Quality. ICASSP,1986,457-460
    [30]R.J.McAulay, T.F.Quatieri. Sine-Wave Phase Coding at Low Data Rates. ICASSP,1991,577-580
    [31]R.Kubichek, Mel-cepstral measure for objective speech quality assessment [A]. in Proc.IEEE Pacific Conf [C]. Communications,Computer, and signal Processing,1993:125-128.
    [32]R.Zelinski, P.Noll. Adaptive Transform Coding of Speech Signals. IEEE Trans. ASSP,1977,25(4),299-309
    [33]Herzel H. Bifurcations and chaos in voiced signals. Appl. Mech. Rev, 1993.46(7):399-413.
    [34]Mende W, Herzel H, Wermke K. Bifurcation and chaos in newborn infant cries.1990.145(8,9):418-424.
    [35]Kumar A, Mullick S k. Nonlinear dynamical analysis of speech. J.Acoust.Soc.Am,1996.100-615.
    [36]Narayanan S S, Alwen A A. A nonlinear dynamic system analysis of fricative consonants. J. Acoust. Soc. Am,1995.97-2511.
    [37]Sengupta R, Dey N, Dipali N andDatta A K. Comparative study of fratical behavior in quasi-random and quasi-periodic speech wave map, Fractal, 2001.9(4):403-414.
    [38]胡水清,张宇,花一满,等.汉语语音的非线性动力特性分析[J],声学学报,2000,25(4):329-334
    [39]林嘉宇,王跃科,黄平之,等.一种新的基于混沌的语音、噪声判别方法[J],通信学报,2001,22(2):123-128
    [40]陈国,胡修林,张蕴玉,等.基于短时分形维数的汉语语音自动分段技术研究[J],通信学报,2000,21(10):6-13
    [41]Thompson C, Mulpur A,and Mehta V. Transition to chaos in acoustically driven flow (acoustic streaming).J.Acoust.Soc.Am,1991.90:2097-2103
    [42]Maragos P. Fractal aspects of speech signals:dimension and interpolation. Proc of ICASSP,1991.417-420
    [43]Bandbrook M, Mclaughlin S and Mann I, Speech characterization and synthesis by nonlinear methods, IEEE Trans on Speech and Audio Processing,1999,7(1): 1-17
    [44]韦岗,陆以勤,欧阳景正.混沌、分形理论与语音信号处理[J].电子学报,1996,24(1):34-39
    [45]Petry A and Barone D A C, Speaker identification using nonlinear dynamical feature, Chaos, Solitons & Fractals,2002,13(2):221-231
    [46]Ishii S and Sato, Reconstruction of chaotic dynamic based on on-line EM algorithm, Neural Network, vol.14, no.9, pp.2001.1239-1256
    [47]张雨浓,杨逸文,李巍.神经网络权值直接确定法[M].广州:中山大学出版 社,2010.15-21
    [48]丁平.基于人工神经网络的航空发动机故障诊断研究[D]:[硕士学位论文].天津:中国民航大学,2008.48-50
    [49]Thyssen J, Nielsen H, Hansi S D. Nonlinear short-term prediction in speech coding. IEEE,Proc of ICASSP,1994(5):185-188
    [50]sungnan L T, Bill G Home, Lee Giles Learning long-term dependencies in NARX recurrent neural network[J], IEEE Transitions on neural network,1996, 7(6):329-338
    [51]Haykin S and Li L. Nonlinear Adaptive Prediction of Non-stationary Signals. IEEE Trans.on SP.1995,43(2):526-535
    [52]Williams R J and Mipser D. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Comput.1989,1(2):270-280
    [53]Jun zhang,Gilbert G Wavelet Neural Networks for Function Learning [J].IEEE Trans on Signal Processing,1995,2(6):485-496
    [54]欧阳缮,陈云宁,方惠均.一种改进的语音信号非线性自适应预测编码方案[J].电路与系统学报.1999,4(2):7-9
    [55]王跃科,林嘉宇等.语音信号非线性分析与处理[J].通信技术.Vol.2000,108(1):61-65
    [56]林嘉宇,刘莹.用于语音信号非线性建模的RBF神经网络的训练方法及其性能[J].信号处理,2001,17(4):322-328
    [57]徐歆,胡水清,陶超,等.短时非线性预测方法对汉语语音特性的研究[J].应用声学.2003,22(5):36-40
    [58]李志宏,韩如成,王安红.基于动态小波神经网络的语音信号非线性预测器[J].太原科技大学学报,2005,26(2):115-119
    [59]K.H.Lam, O.C.Au. Objective speech measure for Chinese in wireless environment [A]. Proc.1996 IEEE ICASSP [C],1995:227-280.
    [60]S.R.Quackenbush,T.P.Barmwell Ⅲ, M.A.Clements.Objective Measures of Speech Quality [M].Engle wood Cliffs,NJ:Prentice Hall,1988.
    [61]W.Liu, A.GAndreou,and M.H.Goldstein Jr. Voiced-Speech Representation by an Analog Silicon Model of the Auaitory Periphery. IEEE Trans, on Neural Networks,1992,vol.3,no.3,477-487
    [62]罗平.语音信号非线性预测编码[D]:[硕士学位论文].广州:华南理工大学,1999.48-50
    [63]K.H.Lam, O.C.Au. Objective speech measure for Chinese in wireless environment [A]. Proc.1996 IEEE ICASSP [C],1995:227-280.
    [64]W.B.Kleijn. Continuous Representations in Linear Predictive Coding. ICASSP,1991,vol.1,201-204
    [65]Thomas E.Tremain. The Government Standard Linear Predictive Coding Algorithm:LPC-10. Speech Technology.1982,40-49
    [66]Kokkmos I, Maragos P, et al. Nonlinear speech analysis models for chaotic systems.IEEE Transaction on speech and audio processing,2005,13(6):1098-1109
    [67]Faundez M, Monte E, Vallverdu F. A comparative study between linear and nonlinear speech prediction [J]. International Work-conferenceon Artificial and Natural Neural Networks.1997(1240):1154-1163
    [68]I. Cohen. Noise spectrum estimation in adverse environments:improved minima controlled recursive averaging[J]. Speech and Audio Processing,2003, 11(5):466-475
    [69]王海燕,卢山.非线性时间序列分析及应用[M].北京:科学出版社,2006.36-78
    [70]邵阳.基于混沌理论和神经网络的太阳能发电预测研究[D]:[硕士学位论文].江苏南京:东南大学,2009,20-26
    [71]Packard N H, et al, Geometry from A Times Series. Phys,Rev,Lett.1980,45: 1712
    [72]Wolf A, Swift J B, Swinney H L, et al. determining Lyapunov exponents from a time series, Physica D,1985(16):285-317
    [73]Barana G and Tsuda I. A new method for computing Lyapunov exponents, Phys. Lett,A,1993(175):421-427
    [74]Mandelbort B B, The fractal geometry of name, NY:WH,Freeman,1982
    [75]何维军.基于分形、小波理论的碳纤维复合材料加工表面形貌研究[D]:[硕士学位论文].辽宁大连:大连理工大学,2008,35-45
    [76]陈国.胡修林.张蕴玉.朱耀庭.多标度分形理论及其在语音质量客观评价中的应用.声学学报(中文版)2002年06期:531-535
    [77]Grassberger P and Procaccia I. Measuring the Strangeness of strange Attractors. Physica D,1983(9):189-208
    [78]Kim H S, Eykholt R, Salas J D. Nonlinear dynamics, delay times, and emdedding windows, Physica D,1999(127):48-60
    [79]Kim H S, Eykholt R, Salas J D, et al. Nonlinear dynamics, delay times, and embedding windows. Physica D:Nonlinear Phenomena.1999,127(1-2):48-60
    [80]王晨.基于非线性时间序列的胎儿心电信号分析与提取[D]:[硕士学位论文].北京:北京工业大学,2009,45-86
    [81]李素芝,万建伟.时域离散信号处理.湖南:国防科技大学出版社,1994,83-424
    [82]吕金虎,陆君安,陈士华.混沌时间序列分析及其应用[M].武汉:武汉大学出版社,2005.57-66
    [83]侯建军.舰船摇荡混沌动力学分析及其时域预报研究[D]:[硕士学位论文].辽宁大连:大连海事大学,2010,35-46
    [84]白建东,叶德谦,李春兴.混沌时间序列的Volterra级数多步预测研究[J].计算机仿真.2008,25(6):274-280
    [85]Rosenstein M T, Collins J J, Carlo D L J. Reconstruction expansion as a geometry-based framework for choosing proper delay times. Physica D,1994,73: 82-98
    [86]林嘉宇,王跃科,黄平之,等.语音信号相空间重构中时间延迟的选择——复自相关法[J].信号处理.1999,15(3):220-225
    [87]Kennel M B, Brown R, Abarbanel H D I. Determining embedding dimension for phase-space reconstruction using geometrical construction. Phy Rev A,1992, 45(6):3403-3411
    [88]Buzug T, Pfister G Comparison of algorithms calculating optimalembedding parameters for delay time coordinates. Physic D,1992,58(1-4):127-137
    [89]Cao L. Practical method for determining the minimum embedding dimension of a scalar time series. Physica D:Nonlinear Phenomena.1997,110(1-2):43-50
    [90]Kugiumtzis D. State space reconstruction parameters in the analysis of chaotic time series-the role of the time window length, Physica D:Nonlinear Phenomena.1996,95(1):13-28
    [91]Kennel M B, Brown R, Abarbanel H D I. Determining embedding dimension for phasespace reconstruction using a geometrical construction [J], Physical Review A,1992,45(6):3403-3411
    [92]魏海坤.神经网络结构设计的理论与方法[M].北京:国防工业出版社,2005.16-19
    [93]ZHANG Y, WANG J, XIA Y.A dual neural network for redundancy resolution of kinematically redundant manipulators subject to joint limits and joint velocity limits [J]. IEEE Transaction on Neural Networks,2003,14(3):628-667
    [94]Hopfield J J. Neural networks and physical systems with emergent collective computational abilities. Proceeding of the National Academy of Science,1982, 79:2554-2558
    [95]飞思科技产品研发中心.神经网络理论与MATLAB7实现[M].北京:电子工业出版社,2005.26-36
    [96]覃爱娜,黄仲,桂卫华.基于混沌系统模型的非线性语音预测模型[J].计算机工程与应用,2008,44(18):141-143
    [97]吴佩贤.基于微分进化免疫和聚类的RBF网络学习算法研究[D]:[硕士学位论文].江苏苏州:苏州大学,2008,45-50
    [98]夏妍妍.基于RBF神经网络的语音识别方法的应用研究[D]:[硕士学位论文].辽宁大连:大连海事大学,2008,48-52
    [99]Kokkinos I, Maragos P. Nonlinear speech analysis using models forchaotic systems [J]. IEEE Transaction On Speech and Audio Processing,2005,13(6): 1098-1109
    [100]黄泽镇,杨行峻.用HLPC算法估计共振峰参数的精度研究.电子学报,1990,Vo118,No5.27-33
    [101]丛爽.面向MATLAB工具箱的神经网络理论与应用[M].安徽:中国科技大学出版社,2009.60-78
    [102]W.Yang,R.Yantorno. Improvement of MBSD by scaling noise masking threshold and correlation analysis with MOS difference instead of MOS [A].Proc.l999 IEEE ICASSP [C],1999:673-666.
    [103]W.Yang, M.Dixon,R,Yantorno. A modified bark spectral distortion measure which uses noise masking threshold [A], in Proc.1997 IEEE Workshop Speech Coding for Telecommunications [C],1997:55-56.
    [104]侯丽敏.基于非线性理论和信息融合的说话人识别[D]:[博士学位论文].上海:上海大学,2005,40-42
    [105]Kennel M B, Brown R, Abarbanel H D I. Determining embedding dimension for phasespace reconstruction using a geometrical construction [J], Physical Review A,1992,45(6):3403-3411
    [106]张雄伟,陈亮,杨吉斌.现代语音处理技术及应用[M].北京:机械工业出版社,2003,248-258
    [107]胡昌华,李国华,刘涛等.基于]matlab6.x的系统分析与设计—小波分析[M].陕西:西安电子科技大学出版社.2004,5-9
    [108]成礼智,王红霞,罗永等.小波的理论与应用[M].北京:科学出版社.2004, 35-48
    [109]胡广书.现代信号处理教程[M].北京:清华大学出版社.2004,324-368
    [110]易克初,田斌,付强.语音信号处理.北京:国防工业出版社,2000,51-115
    [111]王仁华.面向2000年通信的语音处理技术.中兴新通讯,1996,2(1).40-43
    [112]孙延奎.小波分析及其应用[M].北京:机械工业出版社.2005,47-54
    [113]Gai Guanghong, Qu Liangsheng. Transition-Invariant Based Adaptive Threshold Denoising for Impact Signal[J]. Chinese Journal of Mechanical Engineering,2004,17(4):552-555
    [114]Z. Tufakci, J. N. Gowdy. Feature Extraction using Discrete Wavelet Transform for Speech Recognition[J]. Proceedings of the IEEE,2000,4(7-9):116-123
    [115]D. G. Childers. Matlab之语音处理与合成工具箱(影印版)[M].北京:清华大学出版社,2005,57-85
    [116]I. Cohen, B. Berdug. Noise estimation by minima controlled recursive averaging for robust speech enhancement[J]. Signal Processing,2002,9(1):12-15
    [117]I. Cohen. Noise spectrum estimation in adverse environments:improved minima controlled recursive averaging[J]. Speech and Audio Processing,2003, 11(5):466-475
    [118]R, Martin. Spectral subtraction based on minimum statistics[J]. Seventh European Signal Processing Conference,1999,9:1182-1185
    [119]飞思科技产品研发中心.小波分析理论与MATLAB7实现[M].北京:电子工业出版社,2005,128-145
    [120]董长虹,高志,余啸海.Matlab小波分析工具箱原理与应用[M].北京:国防工业出版社,2004,17-24
    [121]Zhang Wei Qiang, Song Guo Xiang. A translation-invariant wavelet denoising method based on a new thresholding function[C]. International Conference on Machine Learning and Cybemetics.2003,11(4):2341-2345
    [122]K.Ozawa, T.Araseki. Multi-Pulse Excited Speech Coder Based on Maximum Crosscorelation Search Algorithm. GLOBECOM,1987,vol.2,794-798
    [123]夏玥,陈明义,覃爱娜.语音信号编码的回顾与展望.中南大学信息科学与工程学院研究生学术年会论文集.2002.12,
    [124]刘志勇.4.0kb/s~8.0kb/s中速率语音编码技术的研究.北京:清华大学工学博士论文,1997,1-57
    [125]王炳锡.语音编码.西安:西安电子科技大学出版社,2002,118-256
    [126]Stephen D. Voran. Objective Estimation of Perceived Speech Quality Using Measuring Normalizing Blocks. NTIA Report 98-347.1-2
    [127]S.Vorna. Objective estimation of perceived speech quality part Ⅰ:Development of the measuring normalizing block technique [J].IEEE Trans on Speech and Audio Processing,1999,7(4):371-372.
    [128]S.Voran. Objective estimation of perceived speech quality part Ⅱ:Evaluation of measuring normalizing block technique [J].IEEE Trans on Speech and Audio Processing,1999,7(4):383-390.
    [129]Rix, Antony W. (Psytechnics Limited), Hollier, Michael P., Hekstra, Andries P., Beerends, John G Perceptual evaluation of speech quality (PESQ)-A new method for speech quality assessment of telephone networks and codecs. Source: AES:Journal of the Audio Engineering Society, v 50, n 10, October,2002, p 755-764
    [130]I.Gerson, M.Jasiuk. Vector Sum Excited Linear Prediction(VSELP). In Advances in Speech Coding,B-S.Atal,V.Cuperman and A.Gersho,Eds. 1991,69-79
    [131]Adil Benyassine et al.Mixture excitations and finite-state CELP Speech Coders, proceeding ICASSP -92,Vol.Ⅰ pp 1-345---Ⅰ-348
    [132]W.B.Kleijn, P.Kroon. A 5.85kb/s CELP Algorithm for Cellular Applications. ICASSP,1993,vol.2,596-599
    [133]Z.Huang,X.Yang,et al. Homomorphic Linear Predictive Coding, a New estimation algorithm for all-pole speech modeling. IEEE Proceedings,Vol.137,Pt.Ⅰ No.2 April,1990,103-108
    [134]S.Voran. A simplified version of the ITU algorithm for objective measure of speech codec quality [A].Proc,1998 IEEE ICASSP [c],1998:537-540.
    [135]V.Gupta and K.Virupaksha. Performance Evalnation of Adaptive Quantizers for a 16-kbits Subband Coder. Proc. ICASSP.1982,1688-1691
    [136]DVSI. INMARSAT-M Voice Codec, Version 2. INMARSAT-M Specification.,INMARS AT,1991
    [137]D.W.Griffin, J.S.Lim. Multi-Band Excitation Vocoder. IEEE Trans, on ASSP, 1998,vol.36,No.8,1223-1235
    [138]J.C.Hardwick, J.S.Lim. A 4.8KBPS Multi-Band Excitation Speech Coder. ICASSP,1988,374-377
    [139]Griffin D.W, Lim J.s. Multi-Band Excitation Vocoder. IEEE Trans. ASSP. 1988,36(8),1223-1335
    [140]Rabiner L R.Applications of voice processing to telecommunicstions.Proc of IEEE,Feb 1994,82(2):199-228