汉语孤立词语音识别的研究与实现

英文题名：Research and Implement on Isolated Mandarin Speech Recognition
作者：李建宁
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：语音识别 ; 端点检测 ; 美尔频率倒谱系数 ; LPC美尔倒谱系数 ; 连续隐马尔可夫模型
英文关键词：Speech Recognition ; End-point Detection ; Mel Frequency Cepstrum Coefficient (MFCC) ; LPC Mel Cepstrum Coefficient (LPCMCC) ; Continuous Density Hidden Markov Model (CDHMM)
学位年度：2007
导师：冯宏伟
学科代码：081202
学位授予单位：西北大学
论文提交日期：2007-06-01

摘要

孤立词语音识别实现简单、技术成熟，有着广泛的应用领域，是深入进行语音识别研究的基础。隐马尔可夫模型(HMM)是目前最流行的语音识别技术，许多成功的语音识别系统都是基于该技术实现的。本文通过一个在Windows平台上用VC++实现的基于连续隐马尔可夫模型(CDHMM)的汉语小词汇量、非特定人、孤立词语音识别系统，对孤立词语音识别进行了研究。
     论文首先研究了语音识别的基本知识，主要包括语音识别的原理；语音信号处理的基本知识；各种语音识别和训练的方法。然后研究了隐马尔可夫模型的原理及其在语音识别中的应用。
     在此基础上论文主要工作有以下：
     1)完成了一个使用连续隐马尔可夫模型的汉语小词汇量、非特定人、孤立词语音识别系统的设计和实现，并进行了实验。由于使用VC++实现系统，对信号处理较为复杂。因此在实现时没有选用美尔频率特征系数(MFCC)，而是选用了近似于MFCC但计算相对简单的LPC美尔倒谱系数(LPCMCC)作为特征参数。
     2)实验时发现系统中的双门限端点检测方法对噪声较敏感，当语音信号中混入噪声时，检测结果就会变得不准确；针对这一问题，对端点检测做了研究，提出了一种变帧长自适应门限的端点检测方法；
     3)分析了特征参数各维系数在语音识别中的贡献，给出了提高特征参数抗噪声性能的方法；
     4)最后本文针对Baum-Welch算法进行HMM参数估计速度慢、效率低的问题，给出了改进的方法。在使用Baum-Welch算法训练HMM模型时，语音识别系统的速度和效率比较低，因此优化训练方法尤为重要。
Isolated speech recognition is easy to implement and has been a mature state of technique. It can be applied broadly in many fields and is the base of deeply researching on speech recognition. Currently Hidden Markov Model is the trend of speech recognition, and most of successful speech recognition systems are based on this technique. This paper researches on isolated speech recognition by implement of a basic Mandarin speech recognition system of small scale vocabulary, isolated words and speaker independence using VC++ on Windows platform.
     First, the paper focus on fundamentals of speech recognition, including: principle of speech recognition, basic knowledge of speech signal processing, and all kinds of methods of speech training and recognition. Then study theories of Hidden Markov Model and it's applications on speech recognition.
     Based on the basic theories, the paper has most works as follow:
     1) Accomplishes design and implement of a basic Mandarin speech recognition system of small scale vocabulary, isolated words and speaker independence using Continuous Density Hidden Markov Model, and makes an experiment on this system. Because it is difficult to process speech signal by VC++ developing the system, the paper doesn't select Mel Frequency Cepstrum Coefficient (MFCC) as Feature Parameters. It chooses LPC Mel Cepstrum Coefficient (LPCMCC) as Feature Parameters that is almost equal to MFCC and easier to compute.
     2) In the experiment, it finds that the end-point detection method of two thresholds is sensitive to noisy. It can't get exact results of the end-point detection when wave data contain some noisy. In order to solve this problem, the paper researches on the end-point detection of speech signal, and present an endpoint detection method based on dynamic frame and self-adaptive threshold.
     3) Analyzes the contribution of each dimension of MFCC and gives methods of resisting noisy for feature coefficient.
     4) Finally, the paper indicates the methods to improve speed and efficiency of the Baum-Welch algorithms to re-estimate parameters of HMM. When Using the Baum-Welch algorithms to train the HMM, the speech recognition system is slow and poor efficient. So, it is necessary to give optimistic methods.

引文

[1] 韩纪庆，张磊，郑铁然．《语音信号处理》．北京：清华大学出版社，2004，191～192
    [2] http://tech.sina.com.cn/s/s/2006-08-09/095962801/shtml 新浪网／科技时代，Vista将包含语音识别功能，2006年8月9日
    [3] Rabiner L, Juang B. Fundamentals of Speech Recognition. Englewood Cliff, New Jersey: Prentice-Hall, 1993
    [4] Olson H, Belar H. Phonetic Typewriter. J. Acoust. Soc. Am, 1956, 28(6): 1072～1081
    [5] Lesser V, Fennell R, Erman L, et al. The Hearsay-Ⅱ Speech Understanding System. IEEE Trans. Acoustics, Speech, Signal Proc., 1975, ASSP-23(1): 11～24
    [6] Jelinek F, Bahl L, Mercer R. Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech. IEEE Trans. Information Theory, 1975, IT-21: 250～256
    [7] Rabiner L, Levinson S, Rosenberg A, et al. Speaker Independent Recognition of Isolate Words using Clustering Techniques. IEEE Trans. Acoustics, Speech, Signal Proc., 1979, ASSP-27(8): 336～349
    [8] 刘加．汉语大词汇量连续语音识别系统研究进展．电子学报，2000，28(1)：85～91
    [9] 王炳锡，屈丹，彭煊．《实用语音识别基础》．国防工业出版社．2005年1月56～57
    [10] Li Deng, Xuedong Huang. Challenges in adopting speech recognition. Communications of the ACM. January 2004/Vol.47, No. 1
    [11] 李虎生，刘加，刘润生．语音识别说话人自适应研究现状及发展趋势[J]．电子学报，2003，31(1)：103～108
    [12] 李昌立．语音信号处理的现状和展望．物理，2005，No．4
    [13] 江铭虎，朱小燕，袁保宗．语音识别与理解的研究进展．电路与系统学报 1999 Vol．4，No．2
    [14] 吴宗济，林茂灿．实验语音学概要[M]．北京：高等教育出版社，1987 33～34
    [15] 拉宾纳L R，谢弗R W．朱雪龙等译．语音信号数字处理．科学出版社，1983
    [16] Bush K, Ganapathiraju A, Komman P. A Comparison of Energy-based End-point Detectors for Speech Signal Processing[C]//MS State DSP Conference, 1995: 85-98
    [17] 易克初，田斌，付强．《语音信号处理》．国防工业出版社．2000年6月．56～57．
    [18] 王昆仑．语音识别中信号特征的提取和选择．新疆师范大学学报(自然科学版)2000年6月，Vol．19，No．2
    [19] Xuedong Huang, Alejandro Acero, Hsiao-Wuen Hon. Spoken Language Processing. Prentice Hall PTR. 2001. 288～289
    [20] 杨行峻，迟惠生．语音信号数字处理[M】．北京：电子工业出版社，1995．75～76
    [21] 丁爱民．作为说话人识别特征参量的MFCC的提取过程．电子工程师Jan．2006，Vol32，No．1
    [22] 雷静．语音识别技术的研究及基本实现．武汉理工大学硕士学位论文，2002，3
    [23] 梅勇，王群生，徐秉铮．语音识别后处理中的混合统计模型．电子技术应用．1998年第3期．
    [24] 梅勇，徐秉铮．一种基于马尔可夫模型的汉语语音识别后处理中的音字转换方法．中文信息学报．1997，Vol．11，No．4
    [25] Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, Vol. 77, No. 2, February 1989.
    [26] 谢锦辉．隐马尔科夫模型及其在语音处理中的应用．武汉：华中理工大学出版社，1995．47～48
    [27] 张敬怀，马道钧．WAV语音文件格式的分析与处理．北京电子科技学院学报，2004年6月，Vol．12，No．2
    [28] Microsoft Windows Platform SDK of MSDN version July, 2000.
    [29] 古丽拉·阿东别克，于迎霞．基于LPC美尔倒谱特征的带噪语音端点检测．电声技术，2004-02
    [30] 赵力．语音信号处理．北京：机械工业出版社，2005年1月 63～64
    [31] A. P. Dunmur, D. M. Titterington. The influence of initial conditions on maximum likelihood estimation of the parameters of a binary hidden Markov model. Statistics & Probability Letters 40 (1998) 67-73
    [32] 胡光锐．语音识别与处理．上海科学技术文献出版社，1994，291
    [33] 何强，何英．《MATLAB扩展编程》．清华大学出版社，2002年6月，356～357
    [34] 张焱，张杰，黄志同．语音识别中隐马尔可夫模型状态数的研究．南京理工大学学报，1998年6月，Vol．22，No．3
    [35] 徐大为，吴边，赵建伟，刘重庆．一种噪声环境下的实时语音端点检测算法．计算机工程与应用，2003年1月，115～117
    [36] 朱淑琴，裘雪红．一种精确检测语音端点的方法．计算机仿真，2004年5月，Vol．22，No．3
    [37] 江官星，王建英．一种改进的检测语音端点的方法．微计算机信息，2006年，Vol．22．No．5-1
    [38] 甄斌，吴玺宏，刘志敏，迟惠生．语音识别和说话人识别中各倒谱分量的相对重要性．北京大学学报(自然科学版)，2001年5月，第37卷，第3期
    [39] Wei-Wen Hung, Hsiao-Chuan Wang. On the Use of Weighted Filter Bank Analysis for the Derivation of Robust MFCCs. Signal Processing Letters, IEEE, 2001-03, 8(3), P70-73
    [40] Juang B H, Rabiner L, Wilpon J G. On the Use of Bandpass Liftering in Speech Reconnition. IEEE Tran on Acoustics, Speech Signal Processing, 1987, 35(7): 947-953
    [41] 崔双喜，朴春俊．噪声环境下的语音识别性能研究．计算机测量与控制，2005，13(11)
    [42] Juang B H, Rabiner L R. The segmental k-means algorithm for estimating parameters of Hidden Markov Models. IEEE Trans. ASSP, 1990, 38: 1639-1641
    [43] Juang B H. Maximum likelihood estimation for mixture multivariate stochastic observations of Markov chains. AT&T Tech. J., 1985, 64: 1235-1249
    [44] 王红睿，赵黎明，裴剑．均衡化的改进K均值聚类法．吉林大学学报(信息科学版)，2006年3月，Vol．24，No．2
    [45] Dunn JC. Well-separated clusters and the optimal fuzzy partitions. J Cybernet, 1974, 4: 95
    [46] 马小辉，富煜清，陆佶人．基于分段模糊c—均值的连续密度HMM语音识别模型参数估计．声学学报，1997年11月，Vol．22，No．6

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700