语音识别中个人特征参数提取研究

英文题名：Research of the Characteristics Parameters Extraction in the Personal of Speech Recognition
作者：张志霞
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：语音识别 ; 语音信号 ; 特征提取 ; 端点检测 ; 动态时间规整
英文关键词：Speech Recognition ; Signal Processing ; Feature Extraction ; Endpoint detection ; Dynamic Time Warping
学位年度：2009
导师：韩慧莲
学科代码：081002
学位授予单位：中北大学
论文提交日期：2009-06-03

摘要

随着计算机的不断发展,语音识别拥有可观的应用背景,不仅是指机器通过学习实现从语音信号到文字符号的理解过程,同时作为一门交叉学科也具有深远的理论研究价值。
     语音识别实质上就是语音训练与模式识别的过程,但是要保证识别效果的相对完好,与语音信号特征参数的有效提取是分不开的。特征参数的提取主要是为了提取语音信号中能代表语音特征的信息,减少语音识别时所要处理的数据量,尽量能够完全、准确地表达语音信号。本文以语音识别整体框架结构、语音识别技术为导向,对语音信号特征参数提取算法进行研究,对语音识别具有重要的理论与实际意义。
     首先,介绍了语音识别的基础知识,研究了语音信号的预处理、个人特征参数提取算法、语音识别模型匹配和训练技术——动态时间规整算法原理和隐马尔科夫模型,重点分析了本文用到的动态时间规整算法,给出语音信号特征参数提取的整体方案。
     其次,在办公室环境下对语音信号进行采集,直接剔除那些明显被偶然因素干扰和因说话人本身造成的不规则样本,并且显示所采集的语音信号。
     然后,对所采集的语音信号进行预处理,包括语音信号预加重、分帧和加窗,端点检测等。在此基础上,对语音信号进行特征参数提取,着重实现线性预测倒谱系数和美尔频标倒谱系数的提取,并分析其在办公室环境下提取的特征参数对个别个体语音识别的影响。
     最后,针对美尔频标倒谱系数,利用动态时间规整算法对所经过预处理之后的个别个体特定声音进行识别并实验仿真,然后分析实验结果。对动态时间规整算法的不足之处,提出改进方案。
With the development of the technology of computer increasingly, speech recognition is very promising in application. As an interdisciplinary field, it is also theoretically very valued.
     In fact, speech recognition is the process of pattern recognition. However, to ensure relative intact of speech recognition, it has close contact with the effective extraction of the voice signal characteristic parameters. Extraction of the characteristics parameters is mainly to attain the information that are able to represent voice characteristics, and reduce the amount of data to deal with during the speech recognition, so as to express the voice signal as possible as accurately. This paper analyzes the overall structure and technology of speech recognition system, researches speech signal feature extraction. It is important theoretical and practical significance for speech recognition
     First, introduce the basic knowledge of and speech recognition. Study the preprocessing of the voice signal, feature parameter extraction algorithms, speech recognition technology and training model matching, including Dynamic Time Warping and Hidden Markov Models. Focus on the analysis of the Dynamic Time Warping algorithm used in this article. Give the overall scheme of speech signal feature parameters to extract.
     Secondly, gather the voice signal in the office environment, excluding directly those obvious interference was accidental and caused by its own speak of irregular samples. And then display collected voice information.
     Furthermore, pre-processing of speech signals. On this basis, carry out voice signal feature parameter extraction, focusing on implementing, linear prediction cepstrum coefficient and Mel frequency cepstrum coefficient. Eventually, analyze its effects to individual speech recognition in the office environment.
     Finally, on the basis of Mel frequency cepstrum coefficient, realizes the individual speech recognition using dynamic time warping algorithm. And then analysis the results of experimental, put forward improved algorithm of dynamic time warping algorithm.

引文

[1]王仁华,刘庆峰.开创语音技术产业的新纪元.微电脑世界,2000(52)
    [2]侯风雷.电话信道下说话人识别技术研究.[博士学位论文].郑州:解放军信息工程大学,2004
    [3]刘加.汉语大词汇量连续语音识别系统研究进展.电子学报,2000,28(1):85-91
    [4]L.R.Rabiner,B H Juang.Fundamentals of Speech Recognition.New Jersey:Prentice-Hall,1993
    [5]F-Jelinek Continuous Speech Recognition by Statistical Methods.Proc.IEEE,1976,64(4):532-556.
    [6]郑方,吴文虎方棣棠.连续无线语音流中关键词语音识别的研究现状.第四届全国人机语音通信学术会议(NCMMSC)论文集.北京:1996:13-12
    [7]冯成林.吴淑珍.一种噪声环境下语音识别的研究方法.北京大学学报(自然科学版),2000,9,Vol.36 No.5:665-671.
    [8]韩纪庆,王承发,吕成国,张磊,仁为民,马永林.噪声环境下顽健的语音识别系统.电声技术,2002,1:27-29.
    [9]H.Bourlard and N.Morgan.Connectionist Speech Recognition A Hybrid Approach.Amsterdam,Kluwer,1994.
    [10]L.F.Lee,etc.A best-first Language processing modal integrating unification grammar and marker language modal for speech recognition application.IEEE Trans.SAP,1993,1(2)221-240
    [11]Tomaz Rotovnik,Mirjam Sepesy Maucec,Zdravko Kacic.Large vocabulary continuous speech recognition of an inflected language using stems and endings.Speech Communication 49(2007) 437-452.
    [12]C J Chen,R A Gopinath,et al.A Continuous Speaker-Independent Putonghua Dictation System.Proc.International Conference Signal Processing,1996:821-824
    [13]王仁华.倪晋富.汉语语音识别系统评估.自动化学报,1994,20(4):509-510
    [14]王作英.计算机自动语音识别.计算机世界,1999,19(3):426
    [15]吴宗济.林茂灿.实验语音学概要[M],高等教育出版社,1999,153-154
    [16]H.Hermansky.PerceptualLinearPredictive(PLP)forspeech[J].Acoust.Soe.Am.1990,1738-1752
    [17]焦志平,张雪英.一种基于听觉模型的抗噪语音识别特征提取方法[J],太原理工大学学报,2005年第36卷第1期.
    [18]姚天任.数字语音处理.武汉:华中理工大学出版社,1992
    [19]张雄伟,陈量,杨吉斌.现代语音处理技术及应用[M].北京:机械工业出版社,2003.
    [20]胡航.语音信号处理.哈尔滨:哈尔滨工业大学出版社,2000:6-7
    [21]G.Fant.Acoustic.Theoryof Speech Production.The Hague(The Netherlands):Muton,1980
    [22]陈永彬,王仁华.语音信号处理.合肥:中国科学技术大学出版社,1990
    [23]吴宗济.林茂灿.实验语音学概要[M],高等教育出版社,1987年153-154
    [24]Biing-H Juang,Sadaoki Furui.Automatic Recognition and Understanding of Spoken Language-A First Step Toward Natural Human Machine Communication.Proceeding of the IEEE.2000,Vol.88(8):1142-1164
    [25]李虎生.汉语数码串语音识别及说话人自适应.清华大学工学硕士学位论文.2002
    [26]谢湘.汉语非特定人连续语音识别的研究.北京理工大学博士学位论文.2002
    [27]李晖.汉语大词汇两语音识别系统.华中理工大学硕士学位论文.1994
    [28]L.R.Rabiner,B H Juang.Fundamentals of Speech Recognition.New Jersey:Prentice-Hall,1993
    [29]吴善培.Digital processing of speech signal.北京大学.研究生语音通信与处理教程.2003
    [30]Richard V.Cox,Candance A.Kamm.Speech and Language Processing for Next-Millennium Communication Services.Proceedings of the IEEE.2000,Vol.88(8):1314-1335
    [31]累静.语音识别技术的研究及基本实现.武汉理工大学硕士学位论文.2002
    [32]易克初等.语音信号处理[M].北京:国防工业出版社,2002.
    [33]张军英.说话人识别的现代方法与技术[M].贵州:西北大学出版社,1994
    [34]王炳锡,屈丹,彭煊.实用语音识别基础.北京:国防教育出版社,2005:26-27.
    [35]H.Liang-Sheng and Y.Chung-Ho,A novel approach to robust speech endpoint detection in car environments[C],Proceedings of International Conference on Acoustic,Speech and Signal Processing,Istanbul,434-438,2000
    [36]Chuan JLA,Bo XU.An Improved Entropy-Based Endpoint Detection Algorithm[C].ISCSLP,96-102,2002
    [37]刘庆生,徐宵鹏.一种语音断点检测方法的探讨[J],计算机工程,29(3):120-123.2003
    [38]徐大为,吴边.一种噪声环境下的实时语音断点检测算法[J],计算机工程与应用,32(1):115-117,2003
    [39]于迎霞,史家茂.一种改进的基于倒谱特征的带噪断点检测方法[J].计算机工程,2004,30(19):85-87
    [40]甄斌,吴玺宏,刘志敏等.语音识别和说话人识别中各倒谱分量的相对重要性.北京大学学报(自然版),2001
    [41]Thomas F.Quatieri著,赵胜辉,刘家康,谢湘等译.离散时间语音信号处理原理与应用[M].北京:电子工业出版社,2004
    [42]胡光锐.语音处理与识别[M].上海:上海科学技术文献出版社,1994
    [43]胡广书.数字信号处理——理论、算法与实现(第二版)[M].北京:清华大学出版社,2003
    [44]WU.G D,LIN C T.Word boundadry detection with Mel-scale frequency bank in noisy environment[J].IEEE Trans on Speech and Audio Processing,2000,8(5):541-554
    [45]L.R.Rabiner.Application of voice procession to telecommunications.Proceedings of the IEEE.1994,82(2):199-228
    [46]B.H.Juany,L.R.Rabiner.The Segmental K-means Algorithm for Estimating Parameters of Hidden Markov Models.IEEE Transactions on Acoustics.Speech and Signal Processing.1990,38(9):1639-1642
    [47]Vaseghis V.State duration modeling in Hidden Markov Models.Signal Process,1995,41:31-41
    [48]王炳锡,屈丹,彭煊.实用语音识别基础.北京:国防教育出版社,2005:129-186.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700