语音识别特征提取算法的研究及实现

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

语音识别特征提取算法的研究及实现

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：The Research of Feature Extraction Algorithm for Speech Recognition and the Realization
作者：惠博
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：语音识别 ; 端点检测 ; 美尔频率倒谱系数 ; 动态时间规整
英文关键词：Speech-recognition ; Endpoint detection ; MFCC ; DTW
学位年度：2008
导师：冯宏伟
学科代码：081202
学位授予单位：西北大学

摘要

语音信号具有很强的时变特性,在较短的时间间隔中语音信号的特征可看作基本保持不变,这是语音信号处理的一个重要出发点。语音识别率的高低,也都取决于语音信号特征提取的准确性和鲁棒性。因此,语音信号特征提取在语音信号处理应用中具有举足轻重的地位。
     论文首先研究了语音识别的基本知识,主要包括语音识别的原理;语音信号处理的基本知识;各种语音识别和训练的方法。在此基础上本文完成的工作有:
     1、着重研究了目前使用广泛的美尔频率倒谱系数(MFCC)参数,以24维MFCC参数为例,采用增减分量的方法分析了高阶参数缺失对识别率的影响,找出了对噪音不敏感的高阶MFCC参数,在识别率变化不大的情况下对24维MFCC参数进行了优化组合。
     2、使用VC++根据动态时间规整(DTW)模型实现了一个连接数字串语音识别系统,并进行了实验分析。系统的组成模块和语音识别系统的基本构成模型一致。在实现时选用了美尔频率系数(MFCC)。
     3、实验过程中发现了汉语数码易于混淆的问题,在模板训练方法和参考模板两方面做了改进,提出了使用多对特征矢量序列进行鲁棒性训练和进行声韵母分割来构造参考模板的方法。
     4、最后本文研究了汉语连续语音识别中的声学建模方法,给出了识别汉语易混淆词的方法。
     本文通过对实际语音识别系统各个部分的实验和研究,为进一步开发实用性语音识别系统的工作做了基础性的工作。
Since the speech signals have strong time variance, it is an important springboard of speech signal processing that the voicing features can only be considered invariable in little time interval. The rate of speech recognition depends on the accuracy and robustness of voice feature extraction. So, extract the voicing features of speech signal play an important role in speech signal processing.
     First, the paper focus on fundamentals of speech recognition, including: principle of speech recognition, basic knowledge of speech signal processing, and all kinds of methods of speech training and recognition. Based on the basic theories, the paper has most works as follow:
     1、The paper focus on MFCC which widely used, as 24-dimensional MFCC terms example, analysis the impact of lacking of high MFCC terms on speech-recognition rates by changing the number of the terms, find out the high terms which not sensitive to noises are given, and optimize the 24-dimensional MFCC terms under recognition rates change is not big situation.
     2、Use Visual C++ 6.0 to implement a figure string speech-recognition system which based on DTW model, and makes an experiment on this system. The system is consistent with the model of the speech-recognition system. The paper select Mel Frequency Cepstrum Coefficient (MFCC) as feature terms.
     3、In experiment, it finds that the Chinese digital easy to confuse, in two aspects, training and reference template, we have made improved, and present a way of use more vector sequences to robust train, and a method by dividing the initial and final into two segments, and construct a reference template.
     4、Finally, the paper researches acoustics modeling method of Chinese continuous speech-recognition, and indicates the method to recognize the word which easily confused in Chinese words.
     Through the experiment and research of the actual speech-recognition system, it carries out the fundamental and exploring research for the further application of speech-recognition system.

引文

[1]韩纪庆,张磊,郑铁然.《语音信号处理》[M].北京:清华大学出版社,2004,191-192
    [2]Olson H,Belar H.Phonetic Typewriter[J].Acoust.Soc.Am,1956,28(6),1072-1081
    [3]Rabiner L,Juang B.Fundamentals of Speech Recognition[M].Englewood Cliff,New Jersey:Prentice-Hall,1993
    [4]T.B.Martin,A.L.Nelson,H.J.Zadel.Speech recognition by feature abstraction techniques,Tech Report AL-TDR-64-176,Air Force Avionics Lab,1964
    [5]Vintsyuk T K.Speech discrimination by dynamic programming[J].Kibernetika,1968,4(2),81-88
    [6]Lesser V,Fennell R,Erman L,et al.The Hearsay-Ⅱ Speech Understanding System.IEEE Trans.Acoustics,Speech,Signal Proc.,1975,ASSP-23(1),11-24
    [7]王炳锡,屈丹,彭煊,《实用语音识别基础》[M].国防工业出版社.2005年1月,56-57
    [8]余洪涌,赵庆卫,颜永红.一种基于滑动窗口的语音端点检测算法,微计算机应用[J],第27卷第6期,2006年11月
    [9]张雄伟,陈量,杨吉斌.现代语音处理技术及应用[M]北京:机械工业出版社,2003.
    [10]Vintsyuk T K.Speech discrimination by dynamic programming[J].Kibernetika,1968,4(2),81-88
    [11]Rabiner L,A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition[J].Proc.IEEE,1989,77(2),257-286
    [12]何强,何英.《MATLAB扩展编程》[M].清华大学出版社.2002年6月,350-352
    [13]Levin E,pieraccinir,Bocchierie.A Hybrid Framework for speech Recognition,Time-Warping Network,Advances in Neural Information Processing System[M],Morgan Kaufmann Publishers Inc,1992
    [14]H.Hermansky.PercePtual linear Predictive(PLP)for speech[J].Acoust.Soc.Am.1990,1738-1752
    [15]焦志平,张雪英.一种基于听觉模型的抗噪语音识别特征提取方法[J],太原理工大学学报,2005年第36卷第1期。
    [16]杨行峻,迟惠生.数字语音信号处理[M].北京:电子工业出版社,1995
    [17]Atal B S.Automatic Recognition of Speakers from Their Voices[J].Proceeding of IEEE,1976,64(4),460-475
    [18]Kanedera N,Arai T,Hermansky H,et al.On the Importance of Various Modulation Frequencies for Speech Recognition[J].In:Proceedings of EUROSPEECH,1997,Rodos,Greece.
    [19]甄斌,吴玺宏,刘志敏,迟惠生.语音识别和说话人识别中各倒谱分量的相对重要性[J],北京大学学报(自然科学版),第37卷,第3期,2001年5月
    [20]李宵寒,戴蓓倩,方绍武,刘鸣.高阶MFCC的说话者识别性能及其噪音鲁棒性[J],信号处理,第十七卷第二期,2001年4月
    [21]许鑫,苏开娜,胡起秀.几种改进的MFCC特征提取方法在说话人识别中的应用[C],第一届建立和谐人机环境联合学术会议(HHME2005)论文集,2005年
    [22]林玮,杨莉莉,徐柏龄.基于修正MFCC参数汉语耳语音的说话者识别[J],南京大学学报(自然科学),第42卷第1期,2006年1月
    [23]L.Rabiner,B.H.Juan,Fundamentals of Speech Recognition[M],USA.Prentice Hall,1993
    [24]梅勇,王群生,徐秉铮.语音识别后处理中的混合统计模型[J].电子技术应用.1998年第3期.
    [25]Edward C.Lin,Kai Yu,Rob A.Rutenbar,Tsuhan Chen A 1000-Word Vocabulary,Speaker-Independent,Continuous Live-Mode Speech Recognizer Implemented in a Single FPGA[J],Carnegie Mellon University Pittsburgh,PA 15213 U.S.A.
    [26]李邵梅,陈鸿昶,王凯.基于DSP的高速实时语音识别系统的设计与实现[J]现代电子技术,2007年第15期总第254期
    [27]E.P.Giachin,C.H.Lee,L.R.Rabiner,A.E.Rosenberg,R.Pieraccini.On the use of inter-word context-dependent units for word juncture modeling[]J.Computer Speech and Language,1992,6:197-213
    [28]Mei-Yuh Hwang,XueDong Huang,and A.Alleva.Predicting unseed Triphones with senones.IEEE Trans.on SAP,Nov.1996,SAP-4(6),412-419
    [29]Kai-Fu,Lee.Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition[J].IEEE Trans on ASSP,Apr.1990,ASSP-38(4),509-609
    [31]赵庆卫,王作英等.汉语连续语音识别中上下文相关的识别单元(三音子)的研究[J],电子学报,第27卷第6期,1999年6月
    [32]徐向华,朱杰,郭强.汉语连续语音识别中的分级聚类算法的研究和应用[J],信号处理,第20卷第5期,2004年10月
    [33]Lee.C.H,Improved acoustic modeling for speaker independent large Vocabulary continuous speech recognition[J].ICASSP,Toronto,Canada:1991,161-169
    [34]Lee.L.S,Tseng.C.Y,Liu.F.H,Special speech recognition approaches for the highly confusing Mandarian sylllables based on Hidden Markovs[J].Computer Speech and Language,1991,5,181-201
    [35]韩纪庆,张磊,郑铁然.语音信号处理[M],清华大学出版社,2004年9月,226-228
    [36]顾良,刘润生,改进汉语数码语音识别中的语音特征提取性能[J],电路与系统学报,第2卷第4期,1997年11月.
    [37]吴宗济,林茂灿.实验语音学概要[M],高等教育出版社,1987年153-154
    [38]乔春雷,吴及,王作英.在汉语语音识别中应用声调信息的研究[J],计算机工程与应用,2002年12月.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700