语音识别技术在语言教学软件中的应用研究

作者：熊飞丽
论文级别：硕士
学科专业名称：精密仪器及机械
中文关键词：语音识别 ; 线性预测编码 ; 倒谱特征 ; 动态匹配
英文关键词：Voice recognition ; Linear prediction coding(LPC) ; Cepstrum characteristic ; Dynamic mapping
学位年度：2002
导师：张玘
学科代码：080401
学位授予单位：国防科学技术大学
论文提交日期：2002-01-01

摘要

随着计算机技术的发展，计算机辅助教学已成为现代教育技术在教育领域运用的一个重要方面。越来越多的学习软件已经在帮助人们学习语言。计算机丰富的图形、图象、声音处理功能有力促进了人们的语言学习效果。因此，探索有效的语言学习方法，开发具有语音识别／判别能力的教学软件，把语音识别技术与多媒体技术相结合，已成为这一类语言教学的热点。
作者在本论文中，对国内外语音识别技术发展状况做了较全面的总结分析，对语音信号产生模型、线性预测编码方法、求解LPC正则方程的德宾递推算法、语音信号同态处理方法、LPC倒谱特征计算、动态特征匹配等语音识别的关键环节的技术问题进行了深入的理论分析和仿真研究，用Matlab语言编写了语音信号滤波、分帧、特征计算和匹配软件，并给出了仿真计算结果。实验结果表明，与采用LPC特征相比，采用LPC倒谱特征和动态匹配算法进行短时语音识别，会有较高的识别率；对不同语音信号有特征空间离散度大、易于确定判别门限的特点。特征计算所需要的递推算法也易于在DSP上实现。因此在未来的智能多媒体语言教学系统中，LPC倒谱特征语音识别方法具有较好的应用前景。所做的论文工作，为在DSP上进行语音识别算法开发提供了理论分析与仿真实验依据。
With the rapid development and increasing popularization of computer technology,. Computer-Assisted Instruction(CAI) has been widely used in the teaching process. More and more education software become popular in helping people to acquire knowledge,Especially in studying language. The multimedia ability of computer will play an important role in learning featured by vivid voice and picture,as well as the large storage of information. Right now,in the research field of language studying,much more attention has been paid to develop valid voice recognition (abbr. VR)strategy so as to made people study language easier. But it is a pity that language education software has rarely valid voice recognition function. So,in this paper,the theory and algorithm of VR are being developed.
In this paper,several key problems in VR process are being discussed both in theory and application,which include pre-processing ,frame decomposing of raw voice signal,characteristic selection and calculation,dynamic mapping of characteristics. Linear prediction model ,model coefficients(LPC) ,as well as cepstrum coefficients are well analyzed both in analysis and calculation aspects . Dynamic mapping algorithm is also illustrated in details. Through the computer simulation to some real short-time voice signal samples using Matlab language. The result shows that the recognition efficiency using cepstrum coefficients mapping is better than what made by LPC mapping . This conclusion is more attractive in the application development of language education system using Digital Signal Processor(DSP).

引文

[1] 杨行峻，迟惠生等，语音信号数字处理，北京：电子工业出版社，1998
    [2] 杨行峻，郑君里，人工神经网络，北京：高等教育出版社，1996
    [3] 杨行峻，迟惠生等，语音信号数字处理，北京：电子工业出版社，1998
    [4] 王沫然，MATLAB 5．X与科学计算，北京：清华大学出版社，2000
    [5] 易克初，田斌，语音信号处理，北京：国防工业出版社，2000
    [6] 胡光锐，语音处理与识别，上海：上海科学技术文献出版社，1994
    [7] 胡昌华，张军波等，基于MTLAB的系统分析与设计—小波分析，西安：电子科技大学出版社，1999
    [8] 施阳等，MATLAB语言精要及动态仿真工具SIMULINK，西安：西北工业大学出版社，1997
    [9] 施阳等，MATLAB语言工具箱—TOOLBOX实用指南，西安：西北工业大学出版社，1997．
    [10] 胡道元，计算机局域网，北京：清华大学出版社，1996
    [11] 陈廷标，多媒体通信，北京：北京邮电大学出版社，1996
    [12] 何振亚，多维数字信号处理，北京：国防工业出版社，1995
    [13] Y. Bennani and P. Gallinari, On the Use of TDNN Extracted Features Information In Talker Identification, Proceedings of ICASSP'91, Toronto, May 1991
    [14] 焦李成，神经网络系统理论，西安：电子科技大学出版社，1990
    [15] 聂敏，语音识别及其关键技术，微波与卫星通讯，1999，4期
    [16] 陈炳和，随机信号处理，北京：国防工业出版社，1996
    [17] 刘国岁，随机信号理论与应用，北京：兵器工业出版社，1992
    [18] 邓集贤，许刘俊，随机过程，北京：人民教育出版社，1992
    [19] J。S麦迪成，随机最优线性估计与控制，黑龙江：黑龙江人民出版社，1984
    [20] 高鹏，周刚，听力训练，大连：大连理工大学出版社，1997
    [21] 李建民等，语音识别技术概述，中国计算机用户，1991，4期
    [22] 江铭虎等，神经网络语音识别的研究及进展，电信科学，1997，7月
    [23] 陈方，高升，语音识别技术及发展，电信科学，1997，10月


    [24] 神经网络语音识别技术应用研究，长春光学精密机械学院学报，1997，3月，Vol．20 No．1
    [25] A.V．奥本海姆，R. W．谢弗，数字信号处理，北京：科学出版社，1989
    [26] 章森，胡庆水，语音识别的研究，新浪潮，1996，4
    [27] 程佩青，数字信号处理教程，北京：清华大学出版社，1994
    [28] 周敬利，余胜升，多媒体计算机声卡技术及应用，北京：电子工业出版社，1998
    [29] A. V. Oppenheirn and R. W. Schafer, "Digiiai Signal Processing", Prentice--Hall, Inc, Englewood Cliffs, N. S., 1975. (中译本，科学出版社，1978)
    [30] L. R. Rablner and R。w。Schafer, "DigitaI Processing of Speech Signal5", Prentice—Hall, Inc, 1978. (中译本，科学出版社，1983)
    [31] 牟缜，让电脑“听话”—语音识别软件VoiceNet VRS 98，多媒体世界，96
    [32] C. N. Lee, "robust Linear Prediction of Speech" , IEEE Trans. on ASSP, 1988. pp. 642～650.
    [33] Z. Huang, X. Yang et al, "Homomorphic Linear PredictiveCoding, a New estimation algorithm for all pole speech modelling", IEEE Proceedings, Vol. 131, Pt. INO, 2 April. 1990pp. 103～108.
    [34] 黄泽镇，杨行峻，用HLPC算法估计共振峰参数的精度研究．电子学报．Vol．18，NO．5．1990(9)．27～33
    [34] F. K. Soong and B.N. Juang, "Linear Spectrunm Pair (LSP)and Speech Data Compression" , IEEE IC on ASSP, 1984, pp. 1.10.1～1.10.4.
    [35] C. E. Shannon, "Coding theorems for a discrete source with a fidelity criterion" IRE Nat. Cony. Rec. (Pt. 4)pp. 142～163, 1959.
    [36] T. Berger, "Rate-Disiortion Theory" (书) Englewood Cliffs, NJ: Prentice-Hall, 1971.
    [37] 江铭虎，袁保宗，语音识别与理解的研究进展，电路与系统学报，1999，4(2)．53-59。


    [38] 聂敏，语音识别及其关键技术，微波与卫星通信，1999，8(4)．53-56．
    [40] 刘巍，语音识别系统的应用前景，上海微计算机，1998，(33)．21-21．
    [41] 杨华民，姜会林，基于神经网络的语音识别技术应用研究，电子技术应用，1997，23(9)．8-10．
    [42] 王承法，语音识别中的噪省抑制方法，计算机科学，1997，24(6)．59-62．
    [43] 马明，张焱，对语音识别中短时自关特性的研究，电脑开发与应用，1997，10(1)．2-4．
    [44] 赵力，周桑漪，有关语音识别的几个模式快速匹配算法，苏州大学学报：自然科学，1991，7(3)．291-294．