基于匹配跟踪的低位率语音编码研究

英文题名：Study on Matching Pursuit Based Low-bit Rate Speech Coding
作者：张文耀
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：匹配跟踪 ; 语音编码 ; 语音增强 ; 正弦建模 ; 基音估计 ; 心理声学模型
英文关键词：Matching Pursuit ; Speech Coding ; Speech Enhancement ; Sinusoidal Modeling ; Pitch Estimation ; Psychoacoustic Model
学位年度：2002
导师：王裕国
学科代码：081203
学位授予单位：中国科学院研究生院（软件研究所）
论文提交日期：2002-10-01

摘要

语音编码技术在高速率和中速率上已经能够产生质量非常高的重构语音，但是低位率乃至极低位率的高质量语音编码仍然是一个具有前沿理论意义和潜在实际应用价值的挑战性研究课题，促使许多研究人员探索新的技术手段和方法，如新的正弦建模技术，新的参数量化方法等等，以期实现低位率高质量语音编码。本文正是沿着正弦建模正弦分析的方向，采用匹配跟踪技术，结合心理声学模型，研究了新的建模方法以及模型参数的量化编码，对低位率语音编码及相关问题进行了有益的探索，并取得了如下创新性研究成果：
     1．运用匹配跟踪技术处理了语音信号增强问题，给出了匹配跟踪信号增强过程中相干比阈值的确定方法，实现了在未知信号与噪声统计特性的情况下，在相当大的范围内明显增强信号的目的。
     2．研究了基于匹配跟踪的正弦建模问题，提出了动态掩蔽阈值、感知梯度等概念，以及感知梯度正弦建模算法。感知梯度正弦建模比较好地利用了心理声学模型，在建模过程中最大限度地增加合成信号的感知信息，提高了建模效率。即使在模型精度不高的情况下，该方法也能得到合成质量比较好的语音。
     3．针对正弦模型参数的量化编码，提出了幅度参数矢量量化、频率参数差分量化等方法，并探讨了频率盒量化模型以及随机相位和零相位模型等。这些方法有效地降低了编码位率。
     4．围绕编码位率的降低和语音质量的提高，以逐步求精层层递进的方式研究了一系列压缩编码方案，并最终提出一个位率在1.5～2.4kbps的综合编码方案。针对各种不同建模方法和参数量化技术，本文探讨了基于普通匹配跟踪正弦建模的压缩编码、感知梯度正弦建模压缩编码、基于动态字典匹配跟踪的压缩编码、分类动态字典压缩编码，以及结合感知梯度正弦建模和分类动态字典的综合编码方案。结果发现匹配跟踪正弦建模在低位率语音编码上具有很大潜力，为低位率高质量语音编码探索了一条新的技术路线。最后提出的综合编码方案比较多地考虑了心理声学因素，融合了分类处理、动态字典和感知梯度建模思想，在编码位率和合成语音质量上都比现有的一些国际编码方法和标准要好。
     5．提出了CAMDF函数，以及基于CAMDF的语音分类与基音估计算法，并在本文的压缩编码方案中得以运用。由于CAMDF克服了传统AMDF函数的不足，新的基音检测算法不仅有效地降低了误判率，而且简化了基音检测过程，提高了估计值的精度。利用CAMDF的语音分类也取得了比较满意的结果。
     最后，总结全文，分析了目前研究工作中有待进一步完善的地方，指出了下一阶段的研究方向以及对本领域的一些展望。
The speech coding technology has achieved high quality of reconstructed speech at high-bit rate and medium-bit rate. For low-bit rate and even very-low-bit rate, however, to achieve high speech quality is still a challenge problem that has important significance in theory and potential application value in practice. This makes lots of researchers explore new methods and techniques for the goal, such as techniques for sinusoidal modeling and methods for parameter quantization, and so on. Following the direction of sinusoidal modeling and sinusoidal analysis, this thesis adopted the matching pursuit techniques along with the psychoacoustic model, explored some novel methods for sinusoidal modeling as well as the quantization of model parameters, and discussed the low bit rate speech coding and its related problems. The major contributions of this thesis are included in the following:
    1. The matching pursuit techniques are applied to enhance speech signal, and a method to determine the threshold of coherent ratio is provided in the enhancement procedure based on matching pursuit. With the method, the noisy signal can be efficiently enhanced in a rather wide range while the statistical property of signal and noise is unknown.
    2. The sinusoidal modeling based on matching pursuit is studied in this thesis, and the concepts of dynamic masking threshold and perceptual gradient are proposed as well as the algorithm of sinusoidal modeling with perceptual gradient. The newly proposed method makes good use of the psychoacoustic model. And the perceptual information contained in the synthesized signal is increased in a furthest way during the modeling procedure. Therefore the efficiency of modeling is improved. The quality of the synthesized speech by this approach is rather high even though the model precision is low.
    3. In order to encode the parameters of sinusoidal model, the vector quantization techniques for amplitude parameters and the differential quantization for frequency parameters are proposed and discussed. At the same time, the frequency bin model, the random phase model and the zero phase model are also discussed. All of these reduce efficiently the coding bit rate.



    4. Aimed at the reduction of bit rate and the improvement of speech quality, a serial of speech coding schemes are studied in a gradual refinement way, and an integrated coding scheme at 1.5-2.4kbps is presented finally. With different modeling methods and quantization techniques, the speech compression schemes discussed in this thesis include: the compression based on general matching pursuit sinusoidal modeling, the compression based on sinusoidal modeling with perceptual gradient, the compression based on dynamic dictionary matching pursuit, the compression scheme using classified dynamic dictionaries, and the integrated compression scheme that combines the sinusoidal modeling with perceptual gradient and the classified dynamic dictionaries. From these schemes it can be seen that matching pursuit based sinusoidal modeling has great potential in low bit rate speech coding, and provides a new way to study this problem. The finally proposed compression scheme takes more psychoacoustic effe
    cts into consideration, and takes the advantage of classified process, dynamic dictionary and sinusoidal modeling with perceptual gradient. Both of its bit rate and speech quality are superior to some existing international coding schemes and standards.
    5. A function named CAMDF is proposed as well as the CAMDF-based algorithms for speech classification and pitch estimation. The algorithms are used for the coding schemes in this thesis. Because the CAMDF conquers the defect of traditional AMDF, the new pitch detection algorithm not only efficiently decreases the estimation errors, but also simplifies the detection process and improves the precision of estimated value. Speech classification using CAMDF also obtains satisfying results.
    Finally, the key points of the thesis are summarized, some improvements to be done in the

引文

[1] Gilbert Held. Data Compression Techniques and Application. New York: Wiley, 1983.
    [2] Thomas W. Parsons. Voice and Speech Processing. New York: McGraw-Hill, 1986.
    [3] J.D. Markel, A.H.Gray,Jr. Linear Prediction of Speech. New York: Springer-Verlag, 1976.
    [4] 姚天任．数字语音处理．华东理工大学出版社，1992年．
    [5] 周崇经编著．滤波理论与波形编码．辽宁科学技术出版社，1986．
    [6] 杨行峻，迟惠生等编著．语音信号数字处理．北京：电子工业出版社，1995．
    [7] 易克初等编著，语音信号处理．北京：国防工业出版社，2000年。
    [8] 吴乐南编著．数据压缩的原理与应用．北京：电子工业出版社，1995
    [9] 许织新编著．数据压缩．北京：国防工业出版社，1990．
    [10] 高文著．多媒体数据压缩技术．北京：电子工业出版社，1994．
    [11] Jerry D.Gibson等著，李煜晖等译．多媒体数字压缩原理与标准．北京：电子工业出版社．2000．
    [12] J.L. Flanagan. Speech Analysis, Synthesis and Perception. 2d ed. New York: Springer-Verlag, 1972.
    [13] C.E. Shannon. A Mathematical theory of communication. Bell Sys. Tech. J., Vol.27, No.3, pp.379-423,623-625, July 1948.
    [14] A.N. Ince. Digital Speech Processing—Speech Coding, Synthesis and Recognition. Boston: Kluwer Academic Publisher, 1992.
    [15] B.M. Oliver, J.Pierce, and Shannon. The Philosophy of PCM. Proc. IRE, pp.1324-1331, Nov. 1948
    [16] H.Dudley. Remaking Speech. J. Acoust. Soc. Am., Vol.11, p169, 1939.
    [17] N.S. Jayant. Digital Coding of Speech Waveforms: PCM, DPCM and DM Quantizers. Proc. IEEE, vol.62 pp.611-632, May 1974.
    [18] T.L.R. Lei and D.L. Shilling. Adaptive Delta Modulation System for Video Encoding. IEEE Trans. on Comm., vol. COM-25, November 1977.
    [19] N.S. Jayant. Adaptive Delta Modulation with a One-bit Memory. Bell Sys. Tech.J., Vol.49, No.3, pp.321-343, March 1970.
    [20] CCITT Recommendation G.721.32kb/s Adaptive Differential Pulse Code Modulation (ADPCM). Blue Book, Oct.1988.
    [21] B.S. Atal and M.R. Schroder. Adaptive Predictive Coding of Speech Signals. Bell System Tech. Journal, vol.49, Oct. 1970, pp.1973-1986.
    [22] R. Zelinski and P Noll. Adaptive Transform Coding of Speech Signals. IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-25, No.4, pp.299-309, Aug.1977.
    [23] N.S. Jayant. Pitch-Adaptive DPCM Coding of Speech with Two-Bit Quantization and Fixed Spectrum Prediction. Bell Sys. Tech. J., Vol.56, No.3, March 1977, pp.439-454.
    [24] CCITT Recommendation G.711. Pulse Code Modulation (PCM) of voice frequencies.1988.


    [25] B.Atal.Predictive Coding of Speech at Low Bit Rates.IEEE Trans,on Comm., COM-30 No.4,April 1982．
    [26] J.L.Flanngan et al.Speech Coding.IEEE Trans,on Comm.,Vol.COM-27,No.4, p.722,April 1979．
    [27] R.Crochiere et al.Real-time Speech Coding.IEEE Trans,on Comm.,Vol. COM-30,No.4,p.621,April 1982．
    [28] Y.Linde,A.Buzo,and R.M.Gray.An algorithm for vector quantizer design. IEEE Trans.on Comm.,Vol.COM-28,No.1,pp.84-95,1980．
    [29] A.Buzo,A.M.Gray,R.M.Gray,and J.D.Marhel.Speech Coding based upon vector quantization.IEEE Trans.Acous.,Speech & Signal Processing,Vol. ASSP-28,pp562-574,1980．
    [30] H.Abut,R.M.Graay and G.Rebolledo.Vector quantization of speech and speech-like waveforms.IEEE Trans.Acoustic,Speech and Signal Processing, Vol.ASSP-30,pp.423-435,June 1982
    [31 ] R.M.Gray.Vector Quantization.IEEE ASSP Magazine,pp.4-29,April 1984．
    [32] J.Marhoul,S.Roucos,and H.Gish.Vector quantization in Speech Coding.Proc. IEEE,Vol.73,No.11,pp.1551-1588,1985．
    [33] A.Oppenheim and R.Schafer.Homomorphic Analysis of Speech.IEEE Trans. AU-16,pp.221-226,June 1968．
    [34] B.Atal and J.Remde.A new model for LPC excitation for producing natural sounding speech at low bit rates.Proc.ICASSP-82,pp.614-617,April 1982．
    [35] P.Kroon,E.Deprettere,and R.J.Sluyeter.Regular-Pulse Excitation-A Novel Approach to Effective and Efficient Multi-pulse Coding of Speech.IEEE Trans. ASSP-34(5) ,OCt.1986．
    [36] M.R.Schroeder and B.Atal.Code-Excited Linear Prediction(CELP): High Quality Speech at Very Low Bit Rates.Proc.ICASSP-85,p.937,Tampa,Apr.1985．
    [37] I.Gerson and M.Jasiuk.Vector Sum Excited Linear Prediction(VSELP) Speech Coding at 8kbits/s.Proc.ICASSP-90,pp.461-464,New Mexico,April 1990．
    [38] D.Griffin and J.Lim.Multiband Excitation Vocoder.IEEE Trans.ASSP-36,No.8,p.l223,Aug.1988．
    [39] R.J.McAulay,T.F.Quatieri.Speech Analysis-Synthesis Based on a Sinusoidal Representation.IEEE Trans.Acoustics,Speech and Signal Processing,34(4) :744-754,1986．
    [40] R.J.McAulay and T.F.Quatieri.Multirate Sinusoidal Transform Coding at Rates From 2． 4kbits/s to 8kbits/s.Proc.ICASSP-87,Dallas,APr.1987．
    [41 ] W.B.Kleijin and K.K.Paliwal.Speech Coding and Synthesis.1995 Elsevier Science B.V.
    [42] W.B.Kleijin.Encoding speech using prototype waveforms.IEEE Trans,on Speech and Audio Process.,Vol.1 No.4 pp.386-399,1993．
    [43] Jayant,N.S.,and P.Noll.Digital Coding of waveforms: Principles and applications to speech and video.Englewood Cliffs,NJ: Prentice Hall,1984．
    [44] P.L.Chu,D.G.Messerschmitt.A Weighted Itakura-Saito Spectral Distance Measure.IEEE Trans,on Acoustics,Speech,and Signal Processing,vol.ASSP-30,no.4,pp.545-560,August 1982．
    [45] A.H.Gray and J.D.Markel.Distance measures for speech processing.IEEE Trans.Acoustics,Speech and Signal Processing,ASSP-24(5) ,pp.380-391,Oct.1976．
    [46] S.Wang,A.Sekey and A.Gersho.An objective measure for predicting subjective

    quality of speech coders.IEEE J.on Select.Areas in Comm.,vol.SAC-10,pp.819-829,1992．
    [47] W.R.Daumer.Subjective evaluation of several efficient speech coders.IEEE Trans,on Communications COM-30:655-662,April 1982．
    [48] P.E.Papamichalis.Practical approaches to speech coding.Englewood Cliffs,NJ:Prentice Hall,1987．
    [49] W.D.Voiers.Diagnostics acceptability measure for speech communication systems.Proceedings 1977 IEEE ICASSP,204-207．
    [50] S.Mallat and Z.Zhang.Matching pursuit with time-frequency dictionaries.IEEE Transactions on Signal Processing Vol.41,No.12,1993,pp3397-3415．
    [51] S.Jaggi et al.High Resolution Pursuit for Feature Extraction.Technical Report LIDS-P-2371,MIT,Nov.1995．
    [52] M.Goodwin.Matching pursuit with damped sinusoidal.In IEEE International conference on Acoustic,Speech,and Signal Processing Conference Proceedings, 3:2037-2040,May 1997．
    [53] M.Goodwin and M.Vetterli.Matching pursuits and atomic signal models based on recursive filter banks.IEEE Trans.On Signal Processing,vol.47,no.7, pp.1890-1902,July 1999．
    [54] M.Goodwin.Adaptive Signal Models: Theory,Algorithms,and Audio Applications.Ph.D.thesis,University of California at Berkeley,1997．
    [55] D.Gabor.Theory of communication.Journal of the Institute of Electrical Engineer,vol.93,pp.429-457,1946．
    [56] D.Gabor.Acoustical quanta and the theory of hearing.Nature,vol.159,No.4044, pp.591-594,May 1947．
    [57] [美]崔锦泰著，程正兴译。小波分析导论。西安交通大学出版社，1995。
    [58] O.Rioul and M.Vetterli.Wavelets and signal processing.IEEE Signal Processing Magazine,Oct.,1991
    [59] I.Daubechies.Ten Lectures on Wavelets.SIAM,Philadelphia,PA,1992．
    [60] S.Mallat.A Wavelet tour of Signal Processing.Academic Press,Boston,MA,1998．
    [61] B.K.Natarajan.Sparse approximate solutions to linear system.SIAM journal on computing,vol.24,pp.227-234,Apr.1995．
    [62] G.Davis.Adaptive Nonlinear Approximations.Ph.D.thesis,New York University,Sept.1994．
    [63] B.Rao.Analysis and extensions of the FOCUSS algrithm.in Conference Record of the thirtieth Asilomar Conference on Signals,Systems,and Computers,vol.2,pp.1218-1223,November 1996．
    [64] S.Chen,and D.L.Donoho.Atomic decomposition by basis pursuit.Technical report 479,Statistics Department,Standford University,1995．
    [65] S.Qian and D.Chen.Signal Approximation via Data-Adaptive Normalized Gaussian Functiona and its Applications for Speech Processing.IEEE ICASSP'92,vol.1,pp.141-144．
    [66] S.Qian and D.Chen.Signal Representation via Adaptive Normalized Gaussian Functions.IEEE Trans.On Signal Processing,vol.36,no.1,Jan.1994．
    [67] J.H.Friedman and W.Stuetzle.Projection pursuit regression.Journal of the American statistical Association,vol.76,pp.817-823,1981．
    [68] P.J.Huber.Projection pursuit.The Annals of the Statistics,vol.13,No.2, pp.435-475,1985．
    [69] S.Chen and D.L.Donohp.Basis Pursuit.In the 28th Asilomar Conference on

    Signals,Systems and Computers,Vol.1,pp.41-44,1994．
    [70] S.Chen,and D.L.Donoho.Atomic decomposition by basis pursuit.Technical report 479,Statistics Department,Standford University,1995．
    [71] R.Gribonval,E.Bacry,S.Mallat,P.Depalle,X.Rodet.Analysis of sound signals with high resolution matching pursuit.Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis,Page(s): 125-128,1996．
    [72] R.Montufar-Chaveznava,F.Garcia-Ugalde.Quantized high resolution pursuit. Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis,Page(s): 189-192,1998．
    [73] M.Goodwin and M.Vetterli.Atomic signal models based on recursive filter banks.In Conference record of the thirty-first Asilomar Conference on signal, systems and computers,November 1997．
    [74] M.Goodwin and M.Vetterli.Atomic decompositions of audio signals.In proceedings of the IEEE workshop on Applications of signal processing to audio and acoustics,October 1997．
    [75] M.Goodwin.Matching pursuit with damped sinusoidal.In IEEE International conference on Acoustic,Speech; and Signal Processing Conference Proceedings,3:2037-2040,May 1997．
    [76] M.Goodwin and M.Vetterli.Matching pursuits and atomic signal models based on recursive filter banks.IEEE Trans.On Signal Processing,vol.47,no.7,pp.1890-1902,July 1999．
    [77] Y.C.Pati,R.Rezaiifar and P.S.Krishnaprasad.Orthogonal Matching Pursuit: Recursive Function Approximation with Application to Wavelet Decomposition. Proceedings of the 27th Annual Asilomar Conference on Signal,System,and Computers,Nov.1993．
    [78] M.Gharavi-Alkhansari,and T.S.Huang.A fast orthogonal matching pursuit algorithm.In Proc.ICASSP'98,pp.1389-1392,Seattle,USA,May 1998．
    [79] S.F.Cotter,M.N.Murthi,B.D.Rao.Fast Basis Selection Methods.Conference Record of the Thirty-First Asilomar Conference on Signals,Systems & Computers,Volume: 2,Page(s): 1474-1478,1998
    [80] Byeungwoo Jeon; Seokbyeung O; Seoung-Jun Oh.Fast Matching pursuit method with distance comparison.Proceedings of 2000 International Conference on Image Processing,vol.1,Page(s): 980-983,2000．
    [81] R.Gribonval.Fast matching pursuit with a multiscale dictionary of Gaussian Chirps.IEEE Transactions on signal processing,Volume: 49 Issue: 5,Page(s):994-1001,May 2001．
    [82] Kin-Pong Cheung,Yuk-Hee Chan.A Fast Two-stage algorithm for realizing matching pursuit.Proceedings of 2001 International Conference on Image Processing,Vol.2,Page(s): 431-43,Oct 2001．
    [83] A.Serir,J.-C.Pesquet.Multiplivative matching pursuit.IEEE International Conference on Acoustics,Speech,and Signal Processing,ICASSP'00,Volume: 4,Page(s): 1935-1938,2000．
    [84] Shane F.Cotter and Bhaskar D.Rao.Application of tree-based searches to matching pursuit.Proceedings of IEEE International Conference on Acoustics, Speech,and Signal Processing,ICASSP'2001,Volume: 6,Page(s): 3933-3936,2001．
    [85] C.D.Vleeschouwer and B.Macq.Subband dictionaries for low-cost matching pursuits of video residues.IEEE Trans.Circuits and Systems for Video

    Technology,Vol.9,pp.769-773,1998．
    [86] M.Vetterli,T.Kalker.Matching pursuit for compression and application to motion compensated video coding.IEEE International Conference on Image Processing (ICIP-94) ,Vol.1,pp.725-729,1994．
    [87] P.Czerepinski,C.Davies,N.Canagarajah and D.Bull.Matching pursuits video coding: dictionaries and fast implementation.IEEE Trans.Circuits and Systems for Video Technology,Vol.10 No.7,pp.1103-1115,Oct 2000．
    [88] R.Neff and A.Zakhor.Dictionary aproximation for matching pursuit video coding.Proceedings,IEEE ICIP'OO,vol.2,pp.828-831,2000．
    [89] Q.Liu,Q.Wang and L.Wu.Dictionary with tree structure for matching pursuit video coding.Electronics Letters,Vol.36． No.15,pp.1266-1268,July 2000．
    [90] P.J.Durka,D.Ircha,K.J.Blinowska.Stochastic time-frequency dictionaries for matching pursuit.IEEE Transactions on Signal Processing,Volume: 49 Issue: 3, Page(s): 507-510 March 2001．
    [91] A.Bultan.A four-parameter atomic decomposition of chirplets.IEEE Transactions on Signal Processing,Volume: 47 Issue: 3,Page(s): 731-745,March 1999．
    [92] Shie Qian,Dapang Chen,Qinye Yin.Adaptive chirplet based signal approximation.Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing,Volume: 3,Page(s): 1781-1784,1998．
    [93] J.C.O'Neill,P.Flandrin.Chirp hunting.Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis,1998． ,Page(s): 425-428,1998．
    [94] Y.H.Chan.An Efficient Weight Optimization Algorithm for Image Representation using Nonorthogonal Basis Images.IEEE Signal Letters,Vol.5, No.8,pp.193-195,Aug 1998．
    [95] H.R.Rabiee,R.L.Kashyap,S.R.Safavian.Adaptive multiresolution image coding with matching and basis pursuits.Proceedings of International Conference on Image Processing,Volume: 1,Page(s): 273-276,1996．
    [96] F.Bergeaud and S.Mallat.Matching pursuit: Adaptive representations of images and sounds.Comput.Applied Math.,vol.15,no.2,Oct.1996．
    [97] P.J.Durka,E.F.Kelly,K.J.Blinowska.Time-frequency analysis of stimulus-driven EEG activity by matching pursuit.Engineering in Medicine and Biology Society,1996． Bridging Disciplines for Biomedicine.,18th Annual International Conference of the IEEE,Volume: 3 Page(s): 1009-1010,1997．
    [98] M.Akay,E.Mulder.Examining fetal heart-rate variability using matching pursuits.IEEE Engineering in Medicine and Biology Magazine,Vol.15 Issue: 5,Page(s): 64-67,Sept.-Oct.1996．
    [99] M.Akay,H.H.Szeta.Analyzing fetal breathing rates using matching pursuits. IEEE Engineering in Medicine and Biology Magazine,Vol.14 Issue.2,pp.195-198,March-April 1995．
    [100] Z.Xuan,L.Durand,L.Senhadji,H.C.Lee,J.-L.Coatrieux.Analysis-synthesis of the phonocardiogram based on the matching pursuit method.IEEE Transactions on Biomedical Engineering,Volume: 45 Issue: 8,Page(s): 962-971,Aug.1998．
    [101] K.Wang,D.M.Goblirsch.Extracting Dynamic Features Using the Stochastic Matching Pursuit Algorithm for Speech Event Detection.1997 IEEE Workshop on Automatic Speech Recognition and Understanding,Page(s): 132-139,1997．
    [102] P.Runkle,L.Carin,L.Couchman,TJ.Yoder,J.A.Bucaro.Multiaspect target identification with wave-based matched pursuits and continuous hidden Markov

    models.IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 21 Issue: 12,Page(s): 1371-1378,Dec.1999．
    [103] A.Shmilovici,O.Maimon.Fuzzy systems identification with Orthogonal Matching Pursuit.Proceedings of the Fifth IEEE International Conference on Fuzzy Systems,Vol.3,pp.2059-2064,1996．
    [104] P.J.Phillips.Matching pursuit filters applied to face identification.IEEE Transactions on Image Processing,Volume: 7 Issue: 8,Page(s): 1150-1164,Aug.1998．
    [105] M.R.McClure,L.Carin.Matching Pursuit with a Wave-Based Dictionary.IEEE Transactions on signal processing,vol.45,no.12,December 1997．
    [106] Pascal Vincent and Yoshua.Kernel Matching Pursuit.Technical Report 1179, University of Montreal,Aug.2000．
    [107] T.Sato,Y.Tada.Noise reduction and identification of subsurface radar images using recursive wavelet decomposition.Proceedings of IEEE 2000 International Geoscience and Remote Sensing Symposiun,IGARSS 2000,Volume: 2,Page(s): 660-662,2000．
    [108] T.S.Verma and T.Meng.Sinusoidal modeling using frame-based perceptually weighted matching pursuits.In Proc.Of IEEE-ICASSP,May 1998．
    [109] M.Goodwin.Multiscale overlap-add sinusoidal modeling using matching pursuit and refinements.In proceedings of the IEEE workshop on Applications of signal processing to audio and acoustics,21-24 October 2001,New York.
    [110] R.Heusders,R.Vafin and W.Kleijn.Sinusoidal modeling of audio and speech using psychoacoustic-adaptive matching pursuit,in Proc.of IEEE-ICASSP,May 2001．
    [111] S.J.Orfanidis,Introduction to Signal Processing,Prentice Hall International Inc.,1996．
    [112] D.L.Donoho,I.M.Johnstone,Ideal denoising in an orthonormal basis chosen from a library of bases.Computes Rendus Aca.Sci.Paris A,319:1327-1322,1994．
    [113] B.L.Sim,Y.C.Tong,J.S.Chang,and C.T.Tan.A Parametric Formulation of the Generalized Spectral Subtraction Method.IEEE Transactions on Speech and Audio Processing,vol.6,no.4,pp.328-336,July 1998．
    [114] A．V奥本海姆，R．W.谢弗．离散时间信号处理(黄建国、刘树棠译)．北京：科学出版社，1998。
    [115] Edited by A.Nejat Ince.Digital Speech Processing: Speech Coding,Synthesis and Recognition.Boston: Kluwer Academic Publishers,pp.191-197,1992．
    [116] Y.Ephraim and D.Malah.Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator.IEEE Trans.Acoust.,Speech,and Signal Processing,vol.32,pp.1109-1121,Dec.1984．
    [117] D.L.Donoho and I.Johnstone.Ideal spatial adaptation via wavelet shrinkage. Biometrika,Vol.81,December 1994,pp425-455．
    [118] D.L.Donoho.De-Noising by Soft-Thresholding.IEEE Transactions on Information Theory,Vol.41,No.3 May 1995,pp613-627．
    [119] B.L.Sim,Y.C.Tong,J.S.Chang,and C.T.Tan.A Parametric Formulation of the Generalized Spectral Subtraction Method.IEEE Transactions on Speech and Audio Processing,vol.6,no.4,pp.328-336,July 1998．
    [120] R.M.Crozier,B.M.G.Cheetham,C.Holt,and E.Munday.Speech enhancement employing spectral substraction and linear predictive analysis.Electronic Letters, vol.29,pp.1094-1095,June,1993．


    [121] S.F.Boll.Suppression of acoustic noise in speech using spectral substraction. IEEE Transactions on Acoustic,Speech,and Signal Processing,vol.27,pp.113-120,Apr.1979．
    [122] R.R.Coifman and D.L.Donoho,Translation-Invariant De-Noising.Technical Report,Yale University and Standford University.
    [123] I.Daubechies,Ten Lectures on Wavelets,SIAM,Philadelphia,PA,1990．
    [124] 何焰兰，苏勇，高永楣．一种自适应小波去噪算法．电子学报，Vol．28，No．10，pp．127-130，Oct．2000．
    [125] S.Mallat.A theory for multiresolution signal decompsition: the wavelet represention.IEEE Trans,on Patt.Mach.Into.,11(7) :674-693,1989．
    [126] D.L.Wang and J.S.Lim.The unimportance of phase in speech enhancement.IEEE Trans.Acoust.,Speech,Signal Processing,vol.ASSP-30,pp.679-681,Aug.1982．
    [127] 张文耀，许刚，王裕国．基于匹配跟踪的信号增强方法．软件学报，第13卷，增刊，pp．80-85，2002．
    [128] R.J.McAulay,T.F.Quatieri.Sinusoidal Coding,in: Speech Coding and Synthesis,edited by W.B.Kleijin and K.K.Paliwal,Elsevier Science B.V.,1995．
    [129] E.B.George,M.J.T.Smith.Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model.IEEE Trans,on Acoustics, Speech and Signal Processing,5(5) :389-406,1997．
    [130] D.Morgan,E.B.Georger,L.T.Lee,S.Kay.Co-channel speaker separation by harmonic enhancement and suppression.IEEE Trans.On Acoustics,Speech and Signal Processing,5(5) :407-425,1997．
    [131] J.Smith,X.Serra.PARSHL: An analysis/synthesis program for non-harmonic sounds based on a sinusoidal representation.Proc.1987 Int.Computer Music Conf.,Champaign-Urbana,1987．
    [132] X.Serra and J.Smith.Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition.Computer Music Journal,vol.14,pp.12-24,Winter 1990．
    [133] T.F.Quatieri,R.B.Dunn,and T.E.Hanna.Time-scale modification with temporal envelope invariance.in Proceedings of the IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics,pp.127-130,October 1993．
    [134] J.L.Flanagan and R.M.Golden.Phase Vocoder.Bell Syst.Tech.J.,45,1966,pp.1493-1509．
    [135] M.Portnoff.Short-Time Fourier Analysis of sampled Speech.IEEE Trans. Acoust.,Speech and Signal Processing,ASSP-29,(3) ,1981,pp.364-373．
    [136] D.Malah.Time-Domain algorithms for Harmonic Bandwidth Reduction and Time Scaling of Speech Signals.IEEE Trans.Acoust.,Speech and Signal Proc., ASSP-27(2) ,1979,pp121-133．
    [137] P..Hedelin.A Tone-Oriented Voice-Excited Vocoder.Proc.in IEEE Int.Conf. Acoust.,Speech and Signal Proc.,Atlanta,GA,1981,pp.205-208．
    [138] L.B.Almeida and F.M.Silva.Variable-Frequency Synthesis: An Improved Harmonic Coding Scheme.Proc.IEEE Int.Conf.Acoust.,Speech and Signal Proc.,San Diego,CA,1984,pp.27． 5． 1-27． 5． 4．
    [139] T.F.Quatieri and R.J.McAulsy.Shape invariant time-scale and pitch modification of speech.IEEE Transactions on Signal Processing,vol.40, pp.497-510,March 1992．
    [140] J.S.Marques and L.B.Almeida.Frequency-varying sinusoidal modeling of

    speech.IEEE Transactions on Acoustics,Speech,and Signal Processing,vol.37, pp.763-765,May 1989．
    [141] J.S.Marques and A.J.Abrantes.Hybrid harmonic coding of speech at low bit-rates.Speech Communication,vol.14,pp.231-247,June 1994．
    [142] J.M.Kates.Speech enhancement based on a sinusoidal model.Journal of Speech and Hearing Research,vol.37,pp.449-464,April 1994．
    [143] E.B.George,M.J.T.Smith.Analysis-by-synthesis/overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones.J.Audio Eng. Soc.,40(6) :497-516,1992．
    [144] R.J.McAulay and T.J.Quatieri.Pitch Estimation and Voicing Detection Based on a Sinusoidal Model," in Proc.IEEE Int.Conf.Acoust,Speech and Signal Proc.,Albuquerque,NM,April 1990,pp.249-252．
    [145] G.Nagaratnam and D.Rowe.Spectral Magnitude Modeling for Sinusoidal Coding.Proceedings of 1995 IEEE Workshop on Speech Coding for Telecommunications,Page(s): 81-82,1995．
    [146] S.Ahmadi and A.S.Spanias.A New Phase Model for Sinusoidal Transform Coding of Speech.IEEE Transactions on Speech and Audio Processing,pp.495-501,Vol.6,No.5,Sep.1998．
    [147] Tony S.Verma,Teresa H.Y.Meng.Sinusoidal Modeling Using Frame-Based Perceptually Weighted Matching Pursuits.Proc.Int.Conf.Acoustics,Speech, and Signal Processing,ICASSP-99(4) : 981-984,Phoenix,1999．
    [148] Mark Black and Mehmet Zeytinoglu.Computationally Efficient Wavelet Packet Coding of Wide-band Stereo Audio Signals.ICASSP-1995,pp.3075-3078．
    [149] E.Zwicker and H.Fastl.Psychoacoustics Facts and Models.Springer-Verlag,1990． .
    [150] H.Fletcher.Auditory Patterns.Rev.Mod.Phys.,pp.47-65,Jan.1940．
    [151] Terhardt,E..Calculating Virtual Pitch.Hearing Research,pp.155-182,1,1979．
    [152] Bosse Lincoln.An Experimental High Fidelity Perceptual Audio Coder. http://www-ccrma.stanford.edu/~jos/bosse/.
    [153] Ted Painter,Andreas Spanias.Perceptual Coding of Digital Audio Signal. Proceedings of the IEEE,88(4) : 451-513,2000．
    [154] M.Schroder,et al..Optimizing Digital Speech Coders by Exploiting Masking Properties of the Human Ear.J.Acoust.Soc.Am.,pp.1674-1652 Dec.,1979．
    [155] ISO/IEC JTC1 /SC29/WG11 MPEG Committee.Information Technology-Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1． 5Mbits/s,Part3: Audio.ISO/IEC 11172-3,1992．
    [156] J.Johnston.Transform Coding of Audio Signals Using Perceptual Noise Criteria. IEEE J.Sel.Areas in Comm.,pp.314-323,Feb.1988．
    [157] N.Jayant.Signal compression: Technology targets and research directions.IEEE J.Select.Topics Commun.,vol.10,pp.796-818,June 1992．
    [158] P.Noll.Wideband Speech and Audio Coding.IEEE Comm.Mag.,pp.34-44,Nov.1993．
    [159] S.Mallat and Z.Zhang.Matching pursuit with time-frequency dictionaries.IEEE Transactions on Signal Processing Vol.41,No.12,1993,pp3397-3415．
    [160] 邹红星，周小波，李衍达．在不同噪声背景下Dopplerlet变换在信号恢复中的应用．电子学报．Vol．28，No．9，Sep．2000．
    [161] S.E.Ferrando,L.A.Kolasa and N.Kovacevic.A Flexible Implementation of Matching Pursuit for Gabor Dictionaries,http://www.scs.ryerson.ca/~lkolasa/.
    [162] 张文耀，许刚，王裕国．基于匹配跟踪的感知梯度正弦建模方法．软件学

    报，2002年(已录用待发表)．
    [163] S.Mallat.A Wavelet Tour of Signal Processing.Academic Press,2 Edition,1999．
    [164] G.Davis,S.Mallat and M.Avellaneda.Adaptive Greedy Approximations.Journal of Constructive Approximations,vol.13,pp.57-98,1997．
    [165] Pascal Frossard and Pierre Vandergheynst.A Posteriori Quantized Matching Pursuit,in Proceedings of IEEE Data Compression Conference,Snowbird,UT.March 2001,p.94．
    [166] R.Montufar-Chaveznava and F.Garcfa-Ugalde.Quantized High Resolution Pursuit.IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis,October 6-9,1998,Pittsburgh,Pennsylvania,U.S.A.,pp.189-192．
    [167] Mohammad Gharavi-Alkhansari.A Model For Entropy Coding in Matching Pursuit,in Proceedings of the IEEE international Conference on Image Processing,1998,vol.1,pp.778-782．
    [168] V.K.Goyal,M.Vetteri and N.T.Thao.Quantized Overcomplete Expansions in RN: Analysis,Synthesis and Algorithms.IEEE Transactions on Information Theroy, vol.44,no.1,pp.16-31,Jan.,1998．
    [169] V.K.Goyal and M.Vetteri.Consistency in Quantized Matching Pursuit.Proc. IEEE Int.Conf.Acoust.,Speech,and Signal Proc.(Atlanta,Georgia),Page III-1788,May 1996．
    [170] S.J.Orfanidis,Introduction to Signal Processing,Prentice Hall International Inc., 1996． PP386-387c
    [171] D.Huffman.A method for the construction of minimum-redundancy codes.Proc. of the I.R.E.,40(9) : 1090-1101,1952．
    [172] J.-H Chen,,R.V.Cox,Y.-C.Lin,N.Jayant,and M.J.Melchner.A low-delay CELP coder for the CCITT 16kb/s speech coding standard.IEEE J.selected Areas Corrnn.10(5) :830-849,1992．
    [173] J.P.Campell,V.C.Welch and T.E.Tremain.The new 4800 bps voice coding standard.Proc.Military Speech Tech.,64-70 1989．
    [174] http://www.eas.asu.edu/~speech/research/stc/index.html
    [175] C.Etemoglu,V.Cuperman and A.Gersho.Speech Coding with an Analysis-by-Synthesis Sinusoidal Model.IEEE ICASSP'2000,June 2000,Istanbul,Turkey.
    [176] B.C.J.Moore and B.R.Glasberg.A revision of Zwicker's loudness model. ACTA Acustica,vol.82,pp.335-345,1996．
    [177] B.C.J.Moore and B.R.Glasberg.Suggested formula for calculating auditory-filter bandwidth and excitation patterns.J.Acoust.Soc.America,vol.74,pp.750-753,1983．
    [178] B.C.J.Moore,R.W.Peters,and B.R.Glasberg.Auditory filter shapes at low center frequencies.J.Acoust.Soc.Amer.,vol.88,pp.132-140,July 1990．
    [179] M.J.Shailer,B.C.J.Moore,B.R.Glasberg,and N.Watson.Auditory filter shapes at 8 and 10 kHz.J.Acoust.Soc.Amer.,vol.88,no.1,pp.141-148,July 1990．
    [180] Oded Ghitza.Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition.IEEE Transactions on Speech and Audio Processing,Vol.2,No.1,PartⅡ,pp.115-132,January 1994．
    [181] Julius O.Smith,Jonathan S.Abel.Bark and ERB Bilinear Transforms.IEEE Transactions on Speech and Audio Processing,Vol.7,No.6,pp.697-708, November 1999．
    [182] B.S.Atal and L.R.Rabiner.A pattern recognition approach to voiced-unvoiced-silence classification with application to speech recognition.IEEE Trans.

    Acoust,Speech,Signal Processing,Vol.ASSP-24,pp.201-212,June 1976．
    [183] L.J.Siegel and A.C.Bessey.Voiced/unvoiced/mixed excitstion classification of speech.IEEE Trans.Acoust.,Speech,Signal Processing,Vol.ASSP-28,pp.451-460,June 1982．
    [184] D.G.Childers,M.Hahn and J.N.Larar.Silent and Voiced/Unvoiced/Mixed Excitation(Four Way) Classification of Speech.IEEE Trans.Acoust.,Speech, Signal Processing,Vol.ASSP-37,pp.1171-1174,Nov.1989．
    [185] Amitava Das,Ajit V.Rao,and Allen Gersho.Variable-Dimension Vector Quantization of Speech Spectral for Low-Rate Vocoders.Proc.IEEE Date Compression Conf.April 1994:420-429．
    [186] A.Gersho and R.M.Gray.Vector Quantization and Signal Compression.Kluwer Academic Publishers,1991．
    [187] Wolfgang Hess.Pitch Determination of Speech Signals.Berlin: Springer-Verlag,1983．
    [188] M.J.Ross,et al.Average Magnitude Difference Function Pitch Extractor.IEEE Trans,on Acoustics,Speech,and Signal Processing,vol.22,No.5,pp353-362,1974．
    [189] 顾良，刘润生．高性能汉语语音基音周期估计．电子学报，Vol．27，No．1，PP．8-11． 1999．
    [190] A．V奥本海姆，R．W．谢弗．离散时间信号处理．北京：科学出版社，1998．
    [191] Shubha Kadambe and G.Faye Boundreaux-Bartels.Application of the Wavelet Transform for Pitch Detections of Speech Signals.IEEE Transactions on Information Theory,Vol.38,No.2,pp.917-924,March 1992．
    [192] http://www.arl.wustl.edu/~jaf/lpc/lpc10-1-5． tar.gz.
    [193] Y.Suh,K.Hwang,O.Kwon and J.Park.Improving Speech Recognizer by Broader Acoustic-Phonetic Group Classification.5th International Conference on Spoken Language Processing,ICSLP'98 Proceedings,Volume 3,pp.1107-1110,Sydney,Australia,1998．
    [194] D.G.Childers.Speech processing and Synthesis toolboxes.New York: Wiley,2000．
    [195] Noll,A.M..Cepstrum Pitch Determination.J.Acoustical Society of America,Vol.41,No.2,pp.458-465,Feb 1969．
    [196] A.V.Oppenheim.A Speech Analysis-Synthesis System Based on Homomorphic Filtering.J.Acoustical Society of America,Vol.45,No.2,pp.293-309,Feb 1967．
    [197] R.W.Schafer,and L.R.Rabiner.System for Automatic Formant Analysis of Voiced Speech.J.Acoustical Society of America,Vol.47,No.2,pp.458-465,Feb 1970．
    [198] Wenyao Zhang,Gang Xu,Yuguo Wang.Pitch estimation based on circular AMDF.2002 IEEE International Conference on Acoustics,Speech,and Signal Processing,ICASSP2002,Volume: 1,Page(s): 341-344,2002．
    [199] 张文耀，许刚，王裕国．循环AMDF及其语音基音周期估计算法．电子学报，2002，已录用．
    [200] L.Rabiner and B-H,Juang.Fundamentals of Speech Recognition.Englewood Cliffs:Prentice Hall,1993．
    [201] W.Yang,K.Krishnamachari and R.Yantorno.Improvement of the MBSD objective speech quality measure using TDMA data.Proceedings ICASSP,vol.2,1999．
    [202] W.Yang,M.Benbouchta and R.Yantorno.Performance of the modified bark spectral distortion as an objective speech quality measure.ICASSP,vol.1,pp.

    541-544,Seattle,1998．
    [203] W.Yang,M.Dixon and R.Yantorno.A modified bark spectral distortion measure which uses noise masking threshold.IEEE Speech Coding Workshop,pp.55-56,Pocono Manor,1997．
    [204] M.M.Meky and T.N.Saadawi.A perceptually-based objective measure for speech coders using abductive network.ICASSP,vol.1,pp.479-482,1996．
    [205] S.Voran and C.Sholl.Perception-based objective estimators of speech quality. IEEE Speech Coding Workshop,pp.13-14,Annapolis 1995．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700