语音识别算法研究及实现

英文题名：Speech Recognition Algorithm and Implementation
作者：涂俊辉
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：语音识别 ; HMM ; 隐马尔可夫模型 ; HTK ; TIMIT
英文关键词：Speech recognition ; Hidden markov model ; htk ; TIMIT
学位年度：2010
导师：续晋华
学科代码：081203
学位授予单位：华东师范大学
论文提交日期：2010-04-01

摘要

通常意义上,我们所说的语音识别指的是将语音信号转换成文字的一个过程。语音识别作为模式识别领域中一个重要的研究方向,其重要性不言而喻。语音识别技术的发展可以使得人们与计算机等设备能更方便的进行交互。其最基本的应用就是实现语音输入。语音输入可以代替键盘的功能,提高输入速度,也节省人们宝贵的时间。此外还可能将语音识别技术用来控制某些机器,汽车,飞机,手机等。
本文对语音识别的一些基本理论及算法进行了一些研究和实验。首先在第二章对语音信号的处理及特征提取进行了介绍,简要的介绍了两种常见的特征提取方法,并且比较了两种特征在用于孤立词的识别时性能的差异。接下来讨论了基于隐马尔可夫模型(Hidden Markov Model)的语音识别算法。在利用隐马尔可夫模型进行孤立词识别的基础上,尝试将该模型用于英文连续词的语音识别。该部分内容中介绍了一个连续语音识别系统的构成,讨论了对声学建模单元的选取,模型参数的改进,识别算法以及统计语言模型的使用,并且介绍了一个语音识别工具HTK。利用该工具在一个大词汇量非特定人的连续语音数据库TIMIT上进行相关的实验。
Generally speaking, speech recognition is a process, through which the speech signal is converted into text. It goes without saying that the research on speech recognition is of great significance, as it's one of the important research fields in pattern recognition and has lots of application. For example, it will facilitate human's interaction with the machines. Voice can be used as an input method, and it will save people's time and effort when they are inputting text on a computer. Besides, speech recognition can also be used to control some machines, like automobiles, airplanes or mobile phones.
This thesis introduces some theories about speech recognition and also presents the results of some experiments of improving the speech recognition algorithms. In chapter 2 we describe the processing of the speech signal and the feature extraction. We mainly focus on two types of features and make comparison between them when we are carrying out the experiment of isolated-word speech recognition. And in the next chapter, we move on to the Hidden Markov Model and its application in speech recognition. After the basic introduction of this mathematic model, we try to use it in isolated-word speech recognition. And then we continue with the continuous speech recognition using hidden markov model. The structure of a continuous speech recognition system is introduced and we also discuss several topics like, how to select the speech unit, how to improve the parameters of the hidden markov model. A toolbox called HTK and a speech database TIMIT are introduced and then they are used to carry out the experiments of large vocabulary speaker-independent continuous speech recognition.

引文

[1]Odette Scharenborg, "Reaching over the gap:A review of efforts to link human and automatic speech recognition research" Speech Communication 49 (2007) 336-347。
    [2]M. Benzeghiba, R. De Mori,0. Deroo, S. Dupont*, T. Erbes, D. Jouvet, L. Fissore, P. Laface, A. Mertins, C. Ris, R. Rose, V. Tyagi, C. Wellekens, "Automatic speech recognition and speech variability:A review"
    [3]Davis, Biddulph, Balashek "Automatic Recognition of Spoken Digits" J. Acoust. Soc. Am.24(6):637-642,1952
    [4]杨行峻,迟惠生。语音信号数字处理北京：电子工业出版社247-337。
    [5]Vintsyuk. "Speech discrimination by dynamic programming" Kibernetika, 4(2):81-88,1968。
    [6]H. Sakoe,"two-level DP matching-a dynamic programming-based pattern matching algorithm for connected word recognition" IEEE trans. acoustics, speech, signal proc., Assp-27(6):588-595, December 1979。
    [7]L Rabiner, B Juang. An introduction to hidden Markov models IEEE ASSp Magazine,1986。
    [8]Mari Ostendorf, Vassilios V. Digalakis, and Owen A. Kimball, "From HMM's to Segment Models:A unified view of Stochastic Modeling for Speech Recognition" IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 4, NO 5。
    [9]赵力语音信号处理机械工业出版社2003。
    [10]LAWRENCE R. RABINER, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition Proceedings of the IEEE,1989。
    [11]J. D. Ferguson, "Hidden Markov Analysis:An Introduction," in Hidden Markov Models for speech, Institute for Defense Analyses, Princeton 1980。
    [12]S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, " An introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition, " Bell System Tech, J. 62(4):1035-1074, April 1983。
    [13]K. F. Lee, Automatic Speech Recognition-The Development of the SPHINX System, Kluwer Academic Publishers, Boston,1989。
    [14]G. D. Forney, "The Viterbi algorithm" Proc. IEEE,61:268-278, March 1973。
    [15]A. p. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm" J. Roy. Stat. Soc.,39(1):1-38,1977。
    [16]李军,刘晓明,李龙.基于B_W算法训练连续语音的关键技术大庆石油学院学报,2005。
    [17]L. R. Bahl, F. Jelinek, and R. L. Mercer, "A maximum likelihood approach to continuous speech recognition," IEEE Trans. Pattern Anal.Machine Intell., PAMI-5:179-190,1983.
    [18]L Lamel, JL Gauvain. High performance speaker-independent phone recognition using CDHMM. Proc. Eurospeech,1993。
    [19]L. R. Rabiner, B. H. Juang, S. E. Levinson, and M. M. Sondhi, "Recognition of Isolated Digits Using Hidden Markov Models With Continuous Mixture Densities," AT&T Tech. J.,64(6):1211-1234,1985。
    [20]C. H. Lee, L. R. Rabiner, R. Pieraccini, and J. G. Wilpon, "Acoustic Modelling for Large Vocabulary Speech Recognition," Computer Speech and language,4:1237-165, January 1990。
    [21]R.Schwartz et al., "The BBN BYBLOS Continuous Speech Recognition System," Proc. DARPA Speech and Natural Language Workshop, pp.94-99,1989。
    [22]L. R. Rabiner, Jay G. WILPON and Frank K. SOONG, "High performance connected digit recognition Using hidden markov models" IEEE TRANSACTIONS ON ACOUSTICS, SPEECH. AND SIGNAL PROCESSING, VOL.37, NO. 8。
    [23]L. Deng et al., "Acoustic Recognition component of an 86000 Word Speech Recognizer," Proc. ICASSP 90,Albuquerque, NM, PP.741-744
    [24]Lawrence Rabiner, Biing-Hwang Juang:"Fundamentals of Speech Recognition" Prentice Hall.
    [25]音节的定义http://zh.wikipedia.org/zh-cn/%E9%9F%B3%E8%8A%82
    [26]B. H. Juang, L. R. Rabiner, "The Segmental K-Means Algorithm for Estimating Parameters of Hidden Markov Models," IEEE Trans. Acoustics,Speech, Signal Proc.,38(9):1639-1641,1990。
    [27]Glauco F. G. Yared, F'abio Violaro and Ant?onio Marcos Selmini, " HMM Topology in Continuous Speech Recognition Systems"
    [28]Steve Young, "statistical modeling in continuous speech recognition", cambridge university engineering Dept., UK 2000。
    [29]袁俊,HMM连续语音识别中Viterbi算法的优化及应用.《电子技术》
    [30]Young S J. Token Passing:A Simple Conceptual Model for Connected Speech Recognition Systems. Cambridge University Engineering Dept., Technical Report:CUED/F-INFENG/TR38, July 31 1989。
    [31]The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus-TIMIT NIST Speech Disc CD1-1.1.
    [32]The HTK Book, Cambridge University Engineering Departament,2002。
    [33]C. H. Lee, L. R. Rabiner, R. Pieraccini, "Speaker Independent Continuous Speech Recognition Using Continuous Density Hidden Markov Models," Proc. NATO-ASI, Speech Recognition and understanding:recent advances, trends and applications, Italy.
    [34]J.R. Bellegarda and D. Nahamoo, "Tied Mixture Continuous Parameter Models for Large Vocabulary Isolated Speech Recognition," Proc. ICASSP 89, Glasgow, Scotland, PP13-16,1989。
    [35]Odell J. J., "The Use of Decision Trees With Context Sensitive Phoneme Modeling." MPhil Thesis, Cambridge University, Engineering Dept.
    [36]Y. J. Chung and C. K. Un, "Use of different number of mixtures in continuous density hidden markov models," Electronics Letters, vol. 29, no.9, PP.824-825,1993。

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700