车载环境下语音识别方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
半个多世纪以来,语音识别一直是人们研究的重点。语音是人类交流最常用的方式,因此采用语音识别作为人机接口的设备能够给人们的使用带来很大便利。在我国,汽车在最近的十年中越来越多的进入人们的生活,汽车给人们的生活带来的便利是多种多样的,然而现在人们越来越喜欢功能强大的汽车,这就要求车内电子设备的种类越来越多;由此造成的操作也越来越复杂,而人们在开车的时候离开方向盘去操作这些设备是很危险的,因此为车载电子设备配备语音控制人机接口成为一个最佳的选择。由于我国类似的系统还处于空白阶段,因此在这方面进行研究能填补我国在这方面的空白。
     首先,本文对车内语音识别技术难点之一的端点检测进行了深入了解,并仔细研究了流行的端点检测方法,由于使用环境的噪声导致了流行的端点检测算法在车内环境下检测精度降低。对此本文提出了一种基于自适应坑函数子带熵的端点检测方法,它能够很好的在车内噪声环境下实现语音端点检测。系统在某些情况下会遇到的汽车鸣笛声音对系统识别的干扰问题,本文提出了一种基于频带特征变化解决方法,成功的解决了这个问题。
     其次,在实际应用环境中会不可避免的遇到车内噪声,本文研究了去除噪声的两种主要方法,也就是谱减法和功率谱减法,以及它们在实际应用中应该注意的问题。本文采用了基于谱减法的噪音去除技术,成功的实现了语音增强。
     再次,研究了语音识别中常用语音特征参数,主要是线性预测系数和基于美尔频标的倒谱系数。噪声中被语音掩蔽的部分人耳虽然无法听到,但是却会造成语音特征参数的改变,进而造成识别率的下降。如果能够去除这部分就能带来识别率的提升,根据车内噪声的实际情况,本文提出基于听觉心理学的掩蔽效应改进的美尔频标倒谱系数,并且通过试验证明车内噪声的环境下能够对识别率有一定的提高。
     然后,本文对动态时间规整和隐形马尔可夫等识别方法做了详细的研究,包括动态时间规整的算法及其改进、隐形马尔可夫模型、实现中要解决的问题和基于聚类的隐形马尔可夫模型快速算法。这些工作为最终的试验识别方法、语音特征参数的选择起到了决定性的作用。
     最后,本文试验部分给出了试验所用的方法、步骤和语音资料库。语音识别分两个试验,一个是基于动态时间规整算法的语音识别试验;另外一个是基于隐形马尔可夫模型的试验,并且提出了一种在满足识别率的要求情况下能够提高计算速度的新方法。通过试验表明隐形马尔可夫较动态时间规整的识别效率要高一些,能够适应词汇量较大的识别系统,并且识别率能够高达98%。因此本文设计的基于隐形马尔可夫模型的车内环境下的语音识别系统能够作为车载电子设备的语音控制人机接口。填补了我国在这方面的空白,为驾驶安全提供了新的途径。
The study of speech recognition has been under way for well over half a century. Speech recognition offers great convenience in people's live. In our country, car plays more and more important role in recently ten years. Cars change people's life greatly in many respects, but people like cars with lots of functions, so cars have more and more electrical devices. More electrical devices means more complex operations, But it is very dangerous for drivers to leave steering wheel to operate the electrical devices. Car electrical devices with speech control Human Machine Interface may be the best solution for this problem. Because in our country similar of speech recoginition is still placed in a blank stage, therefore our conducts the research in this aspect will be enable our contury filll blank in this area.
     Firstly we analysis the technical diffcults of speech recognition in car noise envirments, and give solution to these problems. Speech endpoint detection in car noise envirments is more diffcult than in pure speech. We investigate several popluar endpoint detection technologies, and find the weakness of these technologies. A new method named single-well function based adaptive subband entropy is chose to solve the problem, and it works well than other methods in car noise environment. In our special noise background, car horn is very similar to speech at spectrogram view, so speech is confused by cars' horn. A new method based on frequency subband variety is adopted to compensate it, and it works well.
     Secondly, speech contaminated by car noise has low recognition rate. To overcome this drawback, we study two popluar noise cancellations technology, which are spectral subtraction and power spectral subtraction. To use it in practice we study the detail of the pratical tips. Our system adopts spectral subtraction technology to achieve speech enhancement.
     Thirdly, we study speech features used in speech recognition. Speech feature used in speech recognition mainly are Linear Predictive Coefficients and Mel-Frequency Cesptral Coefficients. Because of the noisy environment, the noise masked by speech may not be heard, but it still influences the ratio of speech recognition. So we must get rid of it. In this paper, we utilize psychoacoustics to modify Mel-Frequency Cesptral Coefficients, and experiments show it can improve recognition ratio.
     After that, we study Dynamic Time Warpped and Hidden Markov Chain carefully. Also we investigate how to modify it to achieve a good performance, we adopt clustering of Hidden Markov Model and it gives the foundation of classifier, setup of experiments and speech feature.
     Lastly, we give out our experiments setup, speech database. Our experiments use Dynamic Time Warpped and Hidden Markov Chain as classifier. In Hidden Markov Chain experiments we use a new method to speed up the calculation without decrease the speech recognition ratio. Experiments prove that Hidden Markov Chain classifier adapts large vocabulary system, and has good speech recognition ratio than Dynamic Time Warpped classifier. Experiments results shows the speech recognition ratio can achieve 98%, so it can use as car electrical devices' Human Machine Interface, and it gives out a new method to achieve car safe and fills the blank in this area.
引文
[1]杨行峻,迟惠生等.语音信号数字处理.北京:电子工业出版社,1995:330-331页
    [2]易克初,田斌,付强.语音信号处理.国防工业出版社,2000:150-160页
    [3]蔡莲红,黄德智.现代语音技术基础与应用.北京:清华大学出版社,2003:1-2页
    [4]杜利民,侯自强.汉语语音识别研究面临的一些科学问题.电子学报,1995,23(10):110-116页
    [5]徐金甫.基于特征提取的抗噪声语音识别研究.华南理工大学工学博士学位论文.2000
    [6]蒋文建.噪声环境下语音识别新算法研究.华南理工大学工学博士学位论文.2001
    [7]Yifan Gong.Speech Recognition in Noisy Environments:A Survey.Speech Communication,1995,16(3):261-291P
    [8]Jean-Claude Junqua,Jean-Paul Haton.Robustness in automatic speech recognition:fundamentals and applications.America:Kluwer Academic Publishers,1996
    [9]Vargas Fabian,Fagundes Rubem D.R.A new on-line robust approach to design noise-immune speech recognition systems.Journal of Electronic Testing:Theory and Applications(JETTA).2003,19(1):61-72P
    [10]Molla,Md Khademul Islam Hirose,Keikichi.On the efectiveness of MFCCs and their statistical distribution properties in speaker identification.2004 IEEE Symposium on Virtual Environments,Human-Computer Interfaces and Measurement Systems,VECIMS.2004:136-141P
    [11]B.Burchard,R.Romer,O.Fox.A single chip phoneme based HMM speech recognition system for consumer applications.IEEE Trans. Onconsumer electronics.2000,46(3):914-919P
    [12]Siriboon,Kritawan.HMM topology skipping estimation using two stage training for on-line Thai handwrote recognition.Proceedings of the IASTED International Conference on Artificial Intelligence and Soft Computing.2003,7:188-190P
    [13]Li,Weifeng,Takeda,Kazuya.Adaptive log-spectral regression for in-car speech recognition using multiple distributed microphones.IEEE Signal Processing Leters.2005,12(4):340-343P
    [14]Salazar Jaime,Robinson Marc.A Hybrid HMM-Neural Network withGradient Descent Parameter Training.Proceedings of the International Joint Conference on Neural Networks,2003,2:1086-1091P
    [15]Prasanna,S.R.Mahadeva,Zachariah,Jinu Mariam.Neural NetworkModels for Combining Evidence from Spectral and Suprasegmental Features for Text-Dependent Speaker Verification.Proceedings of International Conference on Intelligent Sensing and Information Processing,ICISIP 2004.2004:359-363P
    [16]Furui Sadaoki,Itoh Daisuke.Neural-network-based HMM adaptation for noisy speech recognition.Acoustical Science and Technology.2003,24(2):69-75P
    [17]Christoph Neukirchen,Jorg Rotland,Daniel Willet and Gerhard Rigoll.A Continuous Density Interpretation of Discrete HMM Systems and MMI-Neural Networks.IEEE Transactions on Speech and Audio Processing.2001,9(4):367-377P
    [18]Morse P.M.,Ingard U.K.Theoretical Acoustics.Princeton,New Jersey:Princeton University Press,1968
    [19]Rabiner,L.R.,Schafer,R.W.Digital Processing of Speech Signals.北京:科学出版社,1983
    [20]吴宗济,林茂灿等.实验语音学教程.北京:高等教育出版社,1989
    [21]Thomas,T.J.A finite element model of fluid flow in the vocal tract.Computer Speech Language.1986,1:131-151P
    [22]S.Seneff.Real-time Harmonic Pitch Detector.IEEE trans.On Acoustics,Speech and Signal Processing,1978,26(4):358-365P
    [23]L.R.Rabiner and M.R.Sambur,An Algorithm for Determining the Endpoints of Isolated Utterances,Bell Sjjsl.Tech.J.1975,Vol.54,297-315P
    [24]Yang Xingjun etc,Digital Processing of Speech Signaf,Publishing House of Electronics Industry,Aug 1995
    [25]J.C.Junqua,B.Mak,and B.Reaves,"A Robust Algorithm for Word Boundary Detection in the Presence of Noise",IEEE Trans.on Speech and Audio Processing,Apr.1994 Vol.2,No.3,406-412P
    [26]Atal,B.;Rabiner,L.,A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition.Acoustics,Speech,and Signal Processing[see also IEEE Transactions on Signal Processing],IEEE Transactions on,Volume:24,Issue:3,Jun 1976:201-212P
    [27]D.G.Childers,M.Hand,J.M.Larar,Silent and Voiced/Unvoied/Mixed Excitation,Classification of Speech.IEEE Transaction on ASSP,Vol-37,No-11,Nov 1989:1771-1774P
    [28]李祖鹏,姚佩阳.一种语音段起止端点检测新方法.电讯技术.2001,28(3):13-14页
    [29]J.A.Haigh,and j.S.Mason,Robust voice activity detection using cepstral features.Computer Communication,Control and Power Engineering.Proceedings of the IEEE Region 10 Conference TENCON,Vol.3,1993:321-324P
    [30]胡光锐,韦晓东.基于倒谱特征的带噪语音端点检测[J].电子学报.2000,28(10):95-97页
    [31]J.Junqua,H.Wakita,A comparative study of cepstral lifters and distance measures for all pole models of speech in noise.ICASSP-89,Vol.1,1989:476-479P
    [32]C.Shannon,A mathematical theory of communication.Bell Syst.Tech. J.2000:1751-1754P
    [33]J.L.Shen,J.W.Hung,and L.S.Lee.Robust entropy-based endpoint detection for speech recognition in noisy environments.Presented at the ICSLP,1998
    [34]陈四根,和应民.一种基于信息熵的语音检测方法.应用科技.2001,28(3):13-14页
    [35]Wu,B.-F.Wang,K.-C.Robust Endpoint Detection Algorithm Based on the Adaptive Band-Partitioning Spectral Entropy in Adverse Environments.Speech and Audio Processing,IEEE Transactions on.2005 V13:762-775P
    [36]Rabiner L R,Juang B H.Fundaments of Speech Recognition.New Jersey,USA:Murray Hill,1993
    [37]D.Hermes.Pitch analysis.In Cooke et al.:Visual Representations of Speech Signals,John Wiley & Sons,1993
    [38]Willems,L.F.Robust formant analysis.In IPO Annual report 21,Eindhoven,The Netherlands,1986:34-40P
    [39]Weber,K.,de Wet,F.,Cranen,B.,Boves,L.,Bengio,S.,and Boulard,H.Evaluation of formant-like features for ASR.In Proceedings of ICSLP 2002,Denver,U.S.A
    [40]D.Hermes.Measurement of pitch by subharmonic summation.J.Acoust.Soc.Am 83(1),January 1988:257-264P
    [41]Weber,K.Bengio,S.,and Bourlard,H.HMM2 - extraction of formant structures and their use for robust ASR.In Proceedings of Eurospeech 2001:607-610P
    [42]Holmes,J.,Holmes,W.,and Garner,P.Using formant frequencies in speech recognition.In Proceedings of Eurospeech 1997:2083-2086
    [43]Hillenbrand,J.and Gayvert,R.Vowel classification based on fundamental frequency and formant frequencies.Journal of Speech and Hearing Research,1993,36:694-700P
    [44]Garner,P.and Holmes,W.On the robust incorporation of formant features into hidden Markov models for automatic speech recognition. In Proceedings of ICASSP 1998: 1-4P
    [45] SF Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoustics, Speech and Signal Processing,vol. ASSP-27, Apr, 1979: 113-120P
    [46] E.I. Knudsen, M. Konishi. "Mechanisms of Sound Localization in the Barn Owl," Journal of Comparative Physiology, Vol.133, 1979:13-21P
    [47] M. Berouti, R. Schwartz, and J. Makhoul, "Enhancement of speech corrupted by acoustic noise," Proc. ICASSP, Aug. 1979: 208-211P
    [48] Walter Etter and George S. Moschytz, "Noise reduction by noise-adaptive spectral magnitude expansion," J. Audio Eng. Soc, Vol.5, 1994: 341-348P
    
    [49] 吴兆熊,黄振兴,黄顺吉等.数字信号处理.北京:国防工业出版社,1985
    [50] K. Uwe Simmer, J. Bitzer, and C. Marro, "Post-filter technique," M.Brandstein and D. Ward, editors, Microphone Arrays Signal Processing Techniques and Applications, chapter 3, Springer, 2001: 36-60P
    [51] J.S.Lim, A.V.Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proc. of the IEEE, 1979,Vol. 67,1586-1604P
    [52] J. Makhoul, Spectral analysis of speech by linear prediction. IEEE Transactions on Audio Electroacoust.Vol. 21, June 1973: 140-148P
    [53] J. Makhoul, Spectral linear prediction, properties and applications.IEEE Trans. Acoustics, Speech and Signal Processing, June 1975,283-296P
    [54] S. B. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.IEEE Transactions on Acoustics, Speech and Signal Processing, Aug.1980,Vol. ASSP-28, No. 4, 357-365P
    [55] J. Hernando and C. Nadeu, Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition. IEEE Transactions on Speech and Audio Processing, Jan. 1997,Vol. 5, no. 1,80-84P
    
    [56] H. Widom, "Toeplitz Matrices," in Studies in Real and Complex Analysis, edited by I.I. Hirschmann, Jr. MAA Studies in Mathematics,Prentice-Hall, Englewood Cliffs, NJ, 1965
    
    [57] E.E. Tyrtyshnikov. A unifying approach to some old and new theorems on distribution and clustering. Linear Algebra and its Applications,1996,Vol. 232,1-43P
    
    [58] R.M. Gray. On Unbounded Toeplitz Matrices and Nonstationary Time Series with an Application to Information Theory. Information and Control, 24, 1974: 181-196P
    
    [59] W. F. Trench. Asymptotic distribution of the even and odd spectra of realsymmetric Toeplitz matrices. Linear Algebra Appl, 1999,Vol. 302,155-162P
    
    [60] E.E. Tyrtyshnikov. Influence of matrix operations on the distribution of eigenvalues and singular values of Toeplitz matrices. Linear Algebra and its Applications, 1994, Vol. 207, 225-249P
    [61] W. F. Trench, Absolute equal distribution of the eigenvalues of discrete Sturm-Liouville problems, J. Math. Anal. Appl., in press
    [62] G. Cybenko. The numerical stability of the Levinson-Durbin algorithm for Toeplitz systems of equations. SIAM J. Sci. Statist. Comput., 1980,Vol. 1.303-320P
    [63] Markel, J.D., and A.H. GRAY Jr. Linear Prediction of Speech, Springer Verlag, New York, 1976: 10-42P
    [64] Alexander, S.T., Adaptive Signal Processing: Theory and Applications,Springer-Verlag, New-York, 1986: 123-141P
    
    [65] Kroon, P., Time-Domain Coding of (Near) Toll Quality Speech al Rates below 16 kB/s, Ph.D. dissertation,Techniche Hogeschool,Delft, 1985
    [66] Cadzow, J. A.,Signal Processing via Least Squares Error Modeling. IEEE ASSP Magazine, October,1990: 12-31P
    [67] Golub, G.H., and C.F. Van Loan, Matrix Computations, Johns Hopkins University Press, London, 1989
    [68] B. J. Shannon and K. K. Paliwal. Mfcc computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition.in Accepted to Proc. ICSLP, 2004
    [69] Bojan Kotnik, Damjan Vlaj.Robust Mfcc Feature Extraction Algorithm Using Efficient Additive and Convolutional Noise Reduction Procedures. ICSLP'02 Proceedings, Denver, Colorado, USA, 2002: 445-448P
    [70] Junqua, J. C. and Haton, J. P.Robustness in Automatic Speech Recognition. Kluwer Academic Publishers, Norwell, Massachusetts,USA, 1996
    [71] Hermansky, H., Morgan, N., .RASTA Processing of Speech., IEEE Trans. Speech and Audio Proc, Vol. 2, No.4, October 1994.
    [72] Kotnik, B., Kacic, Z., and Horvat, B. A Multiconditional Robust Front-End Feature Extraction with a Noise Reduction Procedure Based on Improved Spectral Subtraction Algorithm. Eurospeech 2001 Proceedings, 197-200P
    [73] Hirsh, H. G, and Pearce, D.The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions. ISCA ITRW ASR2000, Paris, France, September 2000
    [74] J. He, L. Liu, and G. Palm.On the use of residual cepstrum in speech recognition. Proc. ICASSP, Atlanta, GA, May 1996: 5-8P
    [75] G.N. Ramaswamy and P.S. Gopalakrishnan. Compression of acoustic features for speech recognition in network environments. Proc.ICASSP, Seattle, WA, May 1998: 977-980P
    [76] T. Eriksson, J. Linden, and J. Skoglund.Interframe LSF quantization for noisy channels. IEEE Trans. Speech Audio Process., vol.7, no.5, Sept.1999:495-509P
    [77] B. C. J. Moore, An Introduction to the Psychology of Hearing, 4th ed.New York: Academic, 1997
    [78] Q. Li, F. Soong, and O. Siohan. A high-performance auditory feature for robust speech recognition. In Proc. Int. Conf. Spoken Language Processing, 2000
    [79] H. G. Hirsch and D. Pearce. The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems underNoisy Conditions. in ISCA ITRW ASR1WD Automatic Speech Recognition: Challenges for the Nerf Millennium, September 2000
    [80] B. Mak and Y. C. Tam. Performance of Discriminatively Trained
    [81] Auditory Features on Aurora2 and Aurora3. ICSLP, Denver, Colorado,USA, Sept, 2002,Vol1, 33-36P
    
    [82] H. Fletcher. Auditory patterns. Modern Physics, Vol12, 1940, 47-65 P
    [83] T. Sporer and K. Brandenburg, Constraints of filter banks used for perceptual measurement. J. Audio Eng. Soc. Mar, 1995. Vol. 43, No.3,107-116P
    [84] JOHNSTON, J. D.Estimation of perceptual entropy using noise masking criteria. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 1988, 2524-2527P
    
    [85] JOHNSTON, J.D.and BRANDENBURG, K. Widebandcoding: perceptual consideration for speech and music. In FURUI, S. and SONDHI, M. NI.(Eds.): Advances inspeech signal processing, Marcel-Dekker, 1992,109-140P
    [86] SCARF, B. Critical bands in TOBIAS, J. V. (Ed.): Foundations of modem auditory theory, Academic Press, 1970,Vol.1, 155-199P
    [87] Zwickere and Terhardt. Analytical expressions for critical-band rate and critical bandwidth as function of frequency. J. Acoust. Soc. Am.1980: 1523-1525P
    [88] H, Sakoe and S. Chiba. Dynamic programmingoptimization for spoken word recognition, IEEE Trans. Acoustics, Speechand Signal Processing,Vol. ASSP-26. No.1, Feb. 1978: 43-49P
    [89] K.K. Paliwal, A. AgarwalandS.S. Sinha, Automatic recognition of spoken (Hindi) digits using DP time warping, IndianJournal of Technology, 1982.
    [90] L.R. Rabiner, .A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition., Proceedings of the IEEE, vol. 77,no. 2, Feb. 1989, 257-285 P
    [91] L. E. Baum and T. Petrie, Statistical inference for probabilistic functions of finite state Markovchains. Ann. Math. Stat. Vol.37, 1966,1554-1563P
    [92] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains," Ann. Math. Stat, 1970,Vol. 41, No.1, 164-171P
    [93] L. A. Liporace, Maximum likelihood estimation for multivariate observations of Markov sources, IEEE Trans. Informat. Theory, 1982,Vol. IT-28, no. 5, 729-734P
    [94] B. H. Juang, S. E. Levinson, and M. M. Sondhi, Maximum likelihood estimation for multivariate mixture observations of Markov chains, IEEE Trans. Informat. Theory, Mar, 1986, Vol. IT-32, No. 2, 307-309P
    [95] L. Baum et. al. A maximization technique occuring in the statistical analysis of probabilistic functions of markov chains. Annals of Mathematical Statistics, 1970:164-171P
    [96] S. E. Levinson, Continuously variable duration hidden Markov models for automatic speech recognition, Comput. Speech Lang., Vol. 1, No.1,1986,29-45 P
    [97] V. Krishnamurthy, J. B. Moore, and S. H. Chung, Hidden fractal model signal processing, Signal Processing, Aug. 1991,Vol. 24, No.2,177-192P
    [98] Andrew J. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory April 1967,260-269P
    [99] G. D. Forney. The Viterbi algorithm. Proceedings of the IEEE March 1973,61(3):268-278P
    [100] David L. Neuhoff, The Viterbi algorithm as an aid in text recognition. IEEE Trans. Information Theory, March, 1975, Vol. 21, 222-226P
    [101] Rajjan Shinghal and Godfried T. Toussaint Experiments in text recognition with the modified Viterbi algorithm IEEE Trans. on Pattern Analysis and Machine Intelligence PAMI-1, 2, 1979, 184-192P
    [102] W. Doster and J. Schurmann An application of the modified Viterbi algorithm used in text recognition Proc. of the Fifth Int. Conf. on Pattern Recognition 1980, 853-855P
    [103] P. M. Baggenstoss, A modified Baum-Welch algorithm for hidden Markov models with multiple observation spaces. IEEE Trans Speech and Audio, to appear 2000.
    [104] E. Bocchieri. Vector Quantization for Efficient Computation of Continuous Density Likelihoods. Proc. ICASSP,1993, Vol.2, 692-695 P
    [105] L. Fissore, P. Laface, and G. Micca. Comparison of Discrete and Continuous HMMs in a CSR Task over the Telephone. Proc. ICASSP,1991.253-256P
    [106] K.M. Knill, M.J.F Gales, and S. Young. Use of Gaussian Selection in Large Vocabulary Continuous Speech Recognition Using HMM's. Proc.ICASSP, 1996, Vol.l470-473P
    
    [107] M.J.F. Gales, K.M.Knill, and S.J.Young.State-Based Gaussian Selection in Large Vocabulary Coutinuous Speech Recognition using HMM's. IEEE Trans on Speech and Audio Processing, 1999,Vol.7,No.2,152-161P
    
    [108] J. Fritsch, I. Rogina. The Bucket Box Intersection (BBI) Algorithm for Fast Approximative Evaluation of Diagonal Mixture Gaussians. In Proc.of ICASSP,1996, 837-840P
    [109]Y.Linde,A.Buzo,R.M.Gray.An Algorithm for Vector Quantization.IEEE Trans.On COM,Jan.1980,28(1)
    [110]B.Mark and E.Barmard.Phone Clustering using the Bhattacharyya Distance.Proc.Int.Conf.Spoken Language Processing,Vol.4 1996
    [111]Timothy J.Hazen,Stephanie Seneff and Joseph Polifroni.Recognition confidence scoring and its use in speech understanding systems.Computer Speech and Language(2002) 16,49-67P
    [112]Vergin,R.O'Shaughnessy.D.Pre-emphasis and speech recognition.Electrical and Computer Engineering.Canadian Vol.2,1995,1062-1065P
    [113]L.Liu,J.He and G.Palm.Effects of phase on the perception of intervocalic stop consonants",Speech Communication,Vol.22,1997:403-417P
    [114]刘建辉,卢珞先,黄涛.一种小词汇量快速语音识别系统的实现.武汉理工大学学报信息与管理工程版.Vol.28(2)2006.10-13页
    [115]Steven B,Davis and Paul Mermelstein.Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences.IEEE trans on ASSP,1980,28(4):357-366P
    [116]G.M.White and R.B.Neely.Speech Recognition Experiments with Linear Prediction,Bandpass Filtering,and Dynamic Programming.IEEE trans on ASSP,1976(24):183-188P
    [117]徐永华,孙炯宁.语音识别系统中多种特征参数组合的抗噪性.金陵科技学院学报.Mar.2006 Vol.22.No.1,35-37P
    [118]F.Jelinek,Statistical Methods for Speech Recognition.Cambridge,MA:MIT Press,1997
    [119]Ramasubramanian V.,Paliwal K.K.Fast k-Dimensional Tree Algorithms for Nearest Neighbor Search with Application to Vector Quantization Encoding.IEEE Trans.On Signal Processing,1992,Vol.40,No.3

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700