小词汇量非特定人连续语音识别系统的研究

英文题名：The Research of Small Vocabulary Speaker-Independent Continuous Speech Recognition System
作者：范长青
论文级别：硕士
学科专业名称：检测技术与自动化装置
中文关键词：虚拟仪器技术 ; LABVIEW ; 连续语音识别 ; 美尔频率倒谱系数及其差分 ; 矢量量化-隐马尔可夫模型算法
英文关键词：Virtual Instrument technology ; LABVIEW ; Continuous Speech Recognition ; MFCC and its difference ; VQ-HMM Algorithm
学位年度：2008
导师：华宇宁
学科代码：081102
学位授予单位：沈阳理工大学
论文提交日期：2008-01-03

摘要

语音识别是指让计算机能听懂人类的语言,并根据语言的内容执行一定的命令或任务,在电话拨号、家电遥控、工业控制、信息查询等领域有着广泛应用。语音识别包括孤立词语音识别、连接词语音识别、连续语音识别,本文主要研究小词汇量非特定人连续语音识别系统的开发与实现。
     本文详细阐述了连续语音识别系统的基本原理,研究了识别过程中的特征参数提取、模型选择和识别规则等关键技术。同时,在“硬件的软件化”思想和对信号分析处理的基础上,利用LabVIEW语言和MATLAB语言相结合的方法,开发并设计了基于虚拟仪器技术的连续语音识别系统。将虚拟仪器技术应用于语音识别系统,实现了仪器的软件化,真正体现了“软件就是仪器”的思想。
     从语音信号的实时采集开始,对语音信号进行预加重、小波消噪、端点检测等处理,滤除了语音信号中的无声段和噪声段,为语音特征参数的提取提供了有效的语音段,并采用美尔频率倒谱系数及其差分系数相结合的参数特征提取方法,通过矢量量化(VQ)与隐马尔可夫模型(HMM)来实现系统的训练与识别,构建了基于LabVIEW平台的非特定人连续语音识别系统。
     通过实验分析及其运行结果表明,利用LabVIEW开发平台构建的非特定人连续语音识别系统,对特定环境下的语音内容无需再训练,移植性好,并且语音样本容易采集,成本比较低廉,对语音内容的识别正确率达到90%左右,基本符合了实际应用的要求,具有一定的实际应用价值。
Speech recognition means that computer can understand human’s speech and execute certain command or assignment according to phonetic content. It is widely used in many fields such as dial system of telephone, household appliance system of remote control, industry control, information search system and so on. It can be classified into three branches, including isolated word recognition, joint word recognition and continuous speech recognition. This paper is mainly about the research on the development and accomplishment of speaker-independent continuous speech recognition system with a small vocabulary.
     This paper elaborates the basic principle of continuous speech recognition system in detail, and investigates key techniques such as characteristic withdraw, model choice and verdict rule, which are adopted in identifying process. At the same time, continuous speech recognition system is developed and designed according to virtual instrument technique and on the framework of conception“the software instead of hardware”and theories of digital signal process, while LabVIEW language and MATLAB language are combined together as the method. The application of virtual instrument technique to the speech recognition system enables instrument to be softwared, embodying the thought that "the software is an instrument".
     With the start of real-time collection of speech signal, through the sound signal pretreatment including preweight, wavelet noise elimination and endpoint examination, silent segment and noisy segment in speech signals are eliminated. Therefore, valid speech segment for speaker feature extraction is provided. Parameter feature retrieve method of the Mel frequency cepstrum coefficient (MFCC) with its step difference coefficient is also used. Then after the system is recognized through Vector Quantization(VQ)-Hidden Markov Model(HMM), and the speaker-independent continuous speech recognition system on the platform of LabVIEW is designed .
     The results of experiment indicate that the continuous speech recognition system on the platform of LabVIEW has many advantages. For example, speech training is not needed in the same circumstance; it is easy to replant; speech is easy to be collected; and the cost of the system is lower. The correct rate of speech recognition is about 90%, almost up to the application requirement. So the continuous speech recognition system is proved to have practical potential.

引文

[1]赵力.语音信号处理.北京:机械工业出版社,2003
    [2]张雄伟等编著.现代语音处理技术及应用.北京:机械工业出版社,2003
    [3]韩纪庆,张磊等编著.语音信号处理.北京:清华大学出版社,2004
    [4]易克初等编著.语音信号处理.北京:国防工业出版社,2000
    [5]胡航.语音信号处理.哈尔滨:哈尔滨工业大学出版社,2000
    [6]胡光锐.语音处理与识别.上海:上海科学技术出版社,1994
    [7]徐科,徐金梧.一种新的基于小波变换的白噪声消除方法.电子科学学刊,1999,21(5):31-34
    [8]于鹏.语音信号的非线性处理方法研究:基于小波变换的语音消噪方法及分析理论在语音中的应用.西安交通大学硕士学位论文,2000
    [9]胡昌华,张军波,夏军.基于MATLAB的系统分析与设计-小波分析.西安:西安电子科技大学出版社,1999
    [10]王洪,唐凯编著.低速率语音编码.北京:机械工业出版社,2006
    [11]陈尚勤,罗承烈.近代语音识别.四川:电子科技大学出版社,1991
    [12]杨行峻,迟惠生等.语音信号数字处理.北京:电子工业出版社,1995
    [13]黄昌宁,夏莹.语言信息处理专论.北京:清华大学出版社,南宁:广西科学技术出版社,1996
    [14]李虎生.汉语数码串语音识别及说话人自适应.清华大学硕士学位论文,2001
    [15]顾良,刘润生编著.汉语数码语音识别:发展现状难点分析与方法比较.电路与系统学报,1997.11:32-39
    [16]李晶皖,孙杰,张俐,姚天顺.语音识别中HMM与自组织神经网络结合的混合模型.东北大学学报(自然科学版),1999,4:144-147
    [17] S.Young,etal."The HTK Book".http://htk.eng.cam.ac.uk
    [18] A.J.Viterbi."Error bounds for convolutional codes and an asymptotically optimal decoding algorithm".IEEE Trans.Information Theory, Princeton,NJ,1980
    [19]陈永彬.语音信号处理.合肥:中国科技大学出版社,1991
    [20]张红.基于听觉感知机理的语音特征研究.西南交通大学博士学位论文,1998
    [21] L R, Rabiner, RW, Schafer.Digital Signal Processing. Prentice-Hall, Inc.1988.4 (2):27-29
    [22]甄斌,吴玺宏等.语音识别和说话人识别中各倒谱分量的相对重要性.北京大学学报(自然科学版),2001.37(3):371-378
    [23]朱小燕,王星,徐伟.基于循环神经网络的语音识别模型.中文信息学报,2001,15(2):45-50
    [24] Chow Y L,Donham M O,etal.BYBLOS:The BBN Continuous Speech Recogni- tion System.In Proc Of IEEE ICASSP-87.Apr, 1997:89-92
    [25] Steve Young etal.The HTK Book (for HTK version 3.2) .Cambridge University Engineering Department,2002
    [26] Murthy H,Bcaufays F,Heck L etal.Robust Text-Independent Speaker Identifica- tion over Telephone Channels.IEEE Trans on Speech and Audio Processing, 1999, 7(5), 554-568
    [27] Rabiner L.R., Juang B.H..Fundamentals of Speech Recognition.Prentice Hall,Englewood Cliffs,NJ,1993
    [28] Huang X,Acero A,Hon H W.Spoken Language Processing:A Guide to Theory,Algorithm and System Development.Prentice Hall PTR,2001
    [29]王炳锡,屈丹,彭煊.使用语音识别基础.北京:国防工业出版社.2005
    [30]袁俊.HMM连续语音识别中Viterbi算法的优化及应用.电子技术学报, 2001
    [31]侯国屏,王珅,叶齐鑫编著.LabVIEW 7.1编程与虚拟仪器设计.北京:清华大学出版社,2005
    [32]薛年雪.MATLAB在数字信号处理中的应用.北京:清华大学出版社,2003
    [33]杨纪刚.基于虚拟仪器技术的说话人识别系统的研究与实现.沈阳理工大学硕士学位论文.2004
    [34]杨洁.基于LABVIEW的文本无关说话人识别系统的研究与实现.沈阳理工大学硕士学位论文.2006

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700