连续数字语音识别系统的研究与实现

英文题名：Research and Implementation of Connected Digit Speech Recognition System
作者：章学勇
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：语音识别 ; 语音特征提取 ; 隐马尔可夫模型(HMM) ; HTK ; 自动标绘
英文关键词：Speech Recognition ; Feature Extraction ; Hidden Markov Model(HMM) ; HTK ; Auto Plotting
学位年度：2006
导师：何丕廉
学科代码：081202
学位授予单位：天津大学
论文提交日期：2006-01-01

摘要

随着计算机和信息技术的发展,语音交互成为了人机交互的必要手段。语音识别技术是计算机技术的重要发展方向,语音识别已经形成了一定规模的理论系统,基于PC平台的识别系统的研究也在技术上取得了一些成果。虽然现在的语音识别研究基础性理论已经相当完善,并且已经进入了商业应用阶段,但由于语音本身多样性的特点,使得没有一个通用的平台可以适应所有的应用,对于一个领域往往需要进行专门的研发,以适应实际需要。
     本文首先介绍了语音识别技术的国内外发展状况,分析了汉语连续数字语音识别中面临的困难,在此基础上阐明本课题的研究背景和意义。对语音识别过程中的语音数字模型、语音的端点检测和语音特征提取等过程进行介绍,并确立本系统中所采用的算法和模型。
     本文中的语音识别采用隐马尔可夫模型(Hidden Markov Model, HMM),在HTK(Hidden Markov Model ToolKit)的基础上,结合远方播报语音信号的特点进行设计和实现。文中对语音采集、语音识别和自动标绘三个阶段的技术难点及解决方案进行详细的介绍。系统采用语音自动重叠技术以减少语音分割中产生的误差,提高识别准确率;并对语音信号的数字和电码两种播报方式分别建模和识别;在航迹标绘过程中,详细讨论了对于识别数字串的分割和航迹点数据的存储方式及标绘过程中对航线的三次样条拟合。
     最后对语音识别及航路模拟系统的总结及今后工作的展望。
With the development of the computer and information technology, the speech interaction is an essential human-computer interaction means mutually. The speech recognition technology is one of the most important directions of computer technology. The speech recognition has been developed as an integrated theory, on the other hand the speech recogniton systems run on the PC have been developed so well and have gotten some success. Although the basic theory of speech recognition is quite perfect and lots of commercial applications are successful, there is not any universal system can adapt to all applications because of the variaty of the speech. So, we ofen have to develop the system specially for an application in the field.
     Firstly, this paper introduces the development of speech recognition and the difficulties we faced in speech recogintion of chinese connected digits, elucidates the background and significance of the research. It describes the digital models of speech, endpoint detection, feature extraction in the process of speech recognition, and chooses the arithmatics and models of the application system.
     In this paper, we choose HMM (Hidden Markov Model), design and implement the system on the basis of HTK (Hidden Markov Model ToolKit) according to the characters of the speech of remote broadcasted. This paper describes the techique difficulties and solutions in three phases: speech gathering, speech recognition and auto plotting. First, adopting the speech auto overlapping technique to decrease wrong separating rate and increase recognition accuracy will be presented. Second, modeling and recognizing the digital signal and code seperately. Third, discussing the partition of the digital cluster, the storage of the data point of fligt path, and the skyway simulation by cubic spline interpolating.
     Finally, this article concludes the system and prospects of the future works of Speech Recognition and Skyway Simulation System.

引文

[1] Rabiner L, Juang B. Fundamentals of Speech Recognition. Englewood Cliff, New Jersey: Prentice-Hall, 1993
    [2] Sakoe H, chiba S. Dynamic Programming Algrithm for Spoken Word Recognition. IEEE Trans. Acoustics, Speech, Signal Proc. 1978, ASSP-26(1):43~49
    [3] F Jelinek. Continuous Speech Recognition by Statistical Methods. Proc. IEEE, 1976, 64(4):532~556
    [4] Lesser V, Fennell R, Erman L, et al. The Hearsay-II Speech Understanding System. IEEE Trans. Acoustics, Speech, Signal Proc., 1975, ASSP-23(1): 11~24
    [5] Jelinek F, Bahl L, Mercer R. Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech. IEEE Trans. Information Theory, 1975, IT-21: 250~256
    [6] Levinson, S.E. Roe, D.B. A perspective on speech recognition. Communications Magazine, IEEE, 1990, 28(1):28 – 34
    [7] Lee K, Hon H, Reddy D. An Overview of the SPHINX Speech Recognition System. IEEE Trans. Acoustics, Speech, Signal Proc., 1990, 38: 600~610
    [8] Lee K, Automatic Speech Recognition . the Development of the SPHINX System. Boston: Kluwer Academic Publishers, 1989
    [9] Rabiner L, A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition. Proc. IEEE, 1989, 77(2): 257~286
    [10] M. Cohen, H. Franco, N. Morgan, et al. Combining Neural Networks and Hidden Markov Models for Continuous Speech Recognition, Proceedings of the DARPA Speech and Natural Language Workshop, Harriman, NY, 1992.
    [11] H. Bourlard, N. Morgan. Continuous Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, 1994
    [12] R Cardin, Y normandin E Millien, Inter-word coarticulation modeling and MMIE training for improved connected digit recognition, ICASSP[C], 1994, 243~246
    [13] 徐华,俞一彪,基于 MIM_LB 算法的连续数字语音识别,微电子学与计算机,2004.3,21(5) 33~35
    [14] 王炳锡,语音编码,西安:西安电子科技大学出版社,2002,36～37
    [15] 邵素宏,基于 HMM 的汉语数码串语音识别:[硕士学位论文],北京:北京邮电大学,2003
    [16] S Sneff. Real-time Harmonic Pitch Detecter. IEEE trans. On Acoustics, Speech and Signal Processing, 1978, 26(4):358~365
    [17] 陈国等,汉语普通话语音的分形特征及其盒维数的统计分析,信号处理,2000(4):297～301
    [18] S Seneff. Real-time Harmonic Pitch Detector. IEEE trans. On Acoustics, Speech and Signal Processing, 1978, 26(4):358~365
    [19] Yiu-Kei Lau, Chok-Ki Chan. Speech recognition based on zero crossing rate and energy. IEEE trans. 1985, 33(1):320~323
    [20] Thomas M cover, Joy A Thomas. Elements of Information Theory. New York:John Wiley and Sons, 1991
    [21] Rodriguez-Porcheron D, Faundez-Zanuy M. Speaker recognition with a MLP classifier and LPCC codebook. IEEE. ICCASP, 1999, 2(15-19):1005-1008
    [22] Lee, K.F. Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition. IEEE trans. 1990, 38(4):599~609
    [23] Ssnderson, C, Paliwal, K.K. Effect of different sampling rates and feature vector sizes on speech recognition performance. IEEE. 1997, 1(1):161~164
    [24] Xu Shao, Milner, B. Pitch prediction from MFCC vectors for speech reconstruction. IEEE. ICCASP. 2004, 1(1):97~100
    [25] Jeih-weih Hung. Optimization of filter-bank to improve the extraction of MFCC features in speech recognition. IEEE. ISIMP. 2004:675~678
    [26] Ricotti, L.P. Multitapering and a wavelet variant of MFCC in speech recognition. Ip-vis. 2005, 152(1):29~35
    [27] Skowronski, M.D, Harris, J.G.. Increased MFCC filter bandwidth for noise-robust phoneme recognition. ICASSP ’02. 2002, 1(1):801~804
    [28] Strope B, Alwan A. A model of dynamic auditory perception and its application to robust speech recognition. IEEE. ICASSP. 1996, 1(1):37~40
    [29] Jialong He, Li Liu; Palm G.. On the use of residual cepstrum in speech recognition. IEEE. ICASSP. 1(1):5~8
    [30] 胡航,语音信号处理,黑龙江:哈尔滨工业大学出版社,2000,96～97
    [31] Wilpon J G, Rabiner L R, Lee C L. Automatic Recognition of Keywords in Unconstrainted Speech Using Hidden Markov Models. IEEE Trans. ASSP, 1990, 38(11):1870~1878
    [32] 王炳锡,屈丹,彭煊,使用语音识别基础,湖南:国防工业出版社,2005,182～183
    [33] 姚天任,江太辉, 数字信号处理, 武汉:华中理工大学出版社,1998
    [34] Rabiner L R, Schafer R W. Digital Processing of Speech Signals. Englewood Cliffs(New Jersey): Prentice-Hall Inc., 1978
    [35] Levinson S E, Rabiner L R, Sondhi M M. An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition. BSTJ, 1983, 62(4):1035~1074
    [36] http://htk.eng.cam.ac.uk
    [37] Anant G. Veeravlli, W.D. Pan, Reza Adhami. A Tutorial on Using Hidden Markov Models For Phoneme Recognition. SSST ’05. 2005:154~157
    [38] 尉洪,杨鉴. 连接数字串语音识别. 云南大学学报(自然科学版), 2002,24 ( 4):262～265
    [39] 徐济仁,陈家松,徐屹,语音信号预处理技术综述,计算机应用,27(6),2001,26～27
    [40] 袁俊. HMM 连续语音识别中 Viterbi 算法的优化及应用. 电子技术,2001,2:48～30
    [41] 陈小平,于盛林,刘文波,基于遗传算法的三次样条函数拟合,中国航空学会第二届青年电子学术会论文集,中国航空学会,1999.07.25
    [42] 杨健,徐春林,高扬英,基于三次样条曲线的雷达目标航迹模拟,中国电子学会系统仿真及计算机辅助设计在雷达技术中的应用研讨会论文集,中国电子学会,2001.08.01

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700