VQ与HMM联合模型语音信号的实验研究

作者：李国峰
论文级别：硕士
学科专业名称：声学
中文关键词：语音信号识别 ; 隐马尔可夫模型(HMM) ; 矢量化模型(VQ) ; 美尔(Mel)参数
英文关键词：Speech signal recognition ; Hidden Markov Model ; vector-based model ; MFCC
学位年度：2010
导师：吴胜举
学科代码：070206
学位授予单位：陕西师范大学
论文提交日期：2010-05-01

摘要

人类一直都在梦想着能够通过语言直接指挥机器做出相应的各种动作,以便于完成特殊环境下的工作。但是,在很长的时间里,这个梦想没有能够实现。直到信息化时代的今天,计算机科学及其相关学科的高速发展为人类的这个梦想提供了高效的实现手段,使机器理解人的语言成为可能,这种使机器理解语言的技术就是语音信号分析识别技术。
     近二十年来,人类社会的高速发展对语音信号分析识别技术提出了越来越高的要求,同时科学技术的进步也为语音信号分析识别技术提供了各个方面的理论与技术支持,使语音信号分析识别技术取得显着进步,开始从实验室走向市场。在不久的将来,语音信号分析识别技术将进入工业、家电、通信、汽车电子、医疗、家庭服务、消费电子产品等人类生产生活的各个领域。
     本文在语音信号分析识别的基本原理和基本技术的基础上,通过对马尔可模型(HMM)和矢量量化模型(VQ)的研究和分析,针对HMM模型虽然建模能力很强,但是识别能力受到环境影响很大,而矢量量化模型建模能力虽然不强,但由于矢量的相似性使得它的识别能力很好的特点；在分析了两者的优缺点基础上,提出了新的模型和算法。同时,依据实验条件,选择Mel参数作为识别特征参数。在新的模型下,建立了语音分析识别系统,对所选取的语音信号进行特征参数提取和语音信号分析识别。在相同条件下,对同一语音信号的分析识别结果与HMM模型的分析识别结果进行了对比,研究结果表明：联合模型的识别结果普遍高于单一HMM模型,联合模型的性能要优于HMM模型；并进一步应用建立的联合模型在指定样本、指定语音信号、不指定样本不指定语义信号这三种情况下,做了联合模型的稳定性实验,得出了联合模型的性能比较可靠、运算比较良好。
Humans have been dreaming through direct language can make all sorts of corresponding machine, in order to finish the work under special environment. But, in the very long time, can not achieve the dream. Until today, the information age of computer science and related disciplines for the rapid development of the human dream provides efficient means of realization, make the machine understanding human language, the language of the machine understanding speech signal analysis & identification technique, technique.
     Nearly two years, the rapid development of human society in speech signal analysis is increasingly high recognition technology, and the requirements for the advancement of science and technology of speech signal analysis to identify technology provides all aspects of the theory and technology support, make the speech signal analysis made significant progress in identifying technology, start from the lab to the market. In the near future, the speech signal analysis to identify technology will enter industry, electrical appliances, communications, automotive electronics, medical, family service, consumer electronic products such as production all spheres of life.
     Based on the analysis of the speech signal recognition principle and basic technical basis, through the markov model (HMM) and vector quantization model (VQ), according to the study and analysis of the modeling ability strong HMM, but the ability to identify the influence by environment, While vector quantization model ability is not strong, but because, although the vector similarity makes it good ability to identify, This study analyzed the advantages and disadvantages of both in put forward, based on the new model and algorithm. At the same time, according to the experimental conditions, choose Mel parameters as recognition characteristic parameters. In the new model is established, the speech recognition system, the selection of the speech signal of speech signal feature extraction and recognition of speech signal analysis. Under the same conditions, the analysis of the same speech signal recognition results and the analysis of the model identification results HMM compares the results of recognition, joint model is higher than single HMM model, the joint model has better performance than the HMM model, And to further establish joint model is applied in the designated specimen, designated speech signal, not specified sample not specified semantic signal the three cases, joint model experiment, the stability of the performance of joint model is reliable, the operation is in good condition

引文

[1]杨行峻,迟惠生等.语音信号数字处理.北京：电子工业出版社,1995.
    [2]易克初,田斌,付强.语音信号处理.国防工业出版社,2000.
    [3]Nikola K. Evolving connectionist systems:A theory and a case study on adaptive speech recognition[A].Proc In t Joint Conf Neural Networks[C].Washington,1999
    [4]Bing-Hwang Jang,Sadaoki,Fu rui. Automatic Recognition and Understanding of Spoken Language-A First Step Toward Natural Human-Machine Communication. Proceedings of the IEEE,2000,88(8):1142-1165.
    [5]姚天任.数字语音处理.武汉：华中科技大学出版社,1992.
    [6]何湘智.语音识别的研究与发展.计算机与现代化.2002,79(3)：3-6.
    [7]李晓霞,王东木,李学耀.语音识别技术评述.计算机应用研究,1999,10：1-3
    [8]蔡莲红,黄德智,蔡锐编著.现代语音技术基础与应用.北京：清华大学出版社,2003
    [9]Seiichi NAKAGAWA.A Survey on Automatic Speech Recognition. IEICE. TRANS & SYST INF.VOL.85-D,NO.3,MARCH,2000
    [10]Chin-H u, LEE. On Automatic Speech Recognition at the Dawn of21th Century, IEICE.JRANS.INF & SYST.VOL.E86-D,NO.3,2003
    [11]唐福华.语音处理技术浅析及展望.高性能计算技术.2003,04
    [12]龙银东刘宇红敬岚乔卫民.在MATLAB环境下实现的语音识别.微计算机信息,2007年第23卷第12-1期.
    [13]李香萍.MATLAB在说话人识别算法中的应用.实验室研究与探索,2008年1月.
    [14]Ra bin er L. Toward vision 2001:voice and audio processing consideration. AT&T Technical Journal,1995,74 (2):4～13
    [15]Norio Tama ki, Shinji Matsuoka, Kei ji Harada. Recent application and development in speech recognition technologies. NTTReview,1994,16(3) 66～75
    [16]Chin Hui Lee,Ra biner L R. Direction in automatic speech recognition. NTTReview,1995,17 (2):19～29.
    [17]韩磊、张磊、郑铁然等,语音信号处理,清华大学出版社,2004.
    [18]王炳锡区丹彭煊,实用语音识别基础,国防工业出版社,2005.
    [19]何强、何英,喊扩展编程,清华大学出版社,2002.
    [20]赵红怡、张常年,数字信号处理及其实现,化学工业出版社,2002.
    [21]赵博.MATLAB在语音分析中的应用.计算机系统应用,2005年,第2期.
    [22]王婧,朱黎.一种基于改进的LPC参数倒谱分析的说话人识别方法.大众科技.2008年第8期.
    [23]王明奇,中南大学,基于HMM的孤立词语音识别系统的研究。
    [24]尹江艳,吉林大学,基于HMM和ANN的语音识别方法。
    [25]魏凯,华中科技大学,声纹识别中关键技术的研究。
    [26]魏艳娜,河北工程大学,语音识别的矢量量化技术研究。
    [27]付强,西安电子科技大学,语音的参数表示和质量客观评价研究。
    [28]刘加.汉语大词汇量连续语音识别系统研究进展.电子学报.2000,28(1)：85-91
    [29]Ra bin er L., J uang B-H... Fundaments of Speech recognition. Prentiee I lall, Engle wood Cliffs,1993
    [30]Yong S... Areviewof Large-voeabulary Continuous-Speeech Recognition. IEEE 5ignalMagazine.1996, SeP.45-57.
    [31]李苇营.基于神经网络的语音识别新方法研究[D].西安电子科技大学.1994
    [32]刘加.汉语大词汇量连续语音识别系统研究进展[J].电子学报,2000,28(1)：85-91
    [33]Chao Huang, Tao Chen, Eric Chang. Speaker Selection Training for Large Vocabulary Continuous Speech Recognition[C].Proceedings International Conference on Acoustics, Speech, and Signal Processing(ICASSP2002),Orlando, 2002,609-612
    [34]Sa wit Ka suriya, Chai Wu tiwiwat chai, Varin A chariya kulporn.Comparative Study of Continuous Hidden Markov Models(CHMM)and Artificial Neural Network(ANN)on Speaker Identification System[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems, Vol.9,No.6(2001): 673～683
    [35]姚天冰,姚天任,韩涛.稳健语音识别技术研究[J].计算机工程与技术,2002(07)：69～71
    [36]赵蕤,王作英.用于语音识别的基于频谱调整的信道自适应方法[J].清华大学学报(自然科学版),2005,45(4)：441～444
    [37]Cui Tao,Zhang Tai yi.Speaker-Independent Speech Recognition Based on Fast Neural Network[J].International Journal on artificial Intelligence Tools, Vol.12,No.4(2003):481～487
    [38]Roberto GEMELLO, Dario ALBESANO, Franco MANA.CSELT Hybrid HMM/Neural Networks Technology for Continuos Speech Recognition [J].Proceedings of the IEEE INNS ENNS International Joint Conference on Neural Networks(IJCNN'00),2000,152～153
    [39]许海天,吴及,王作英.汉语连续数字串语音识别系统[J].计算机工程与应用,2002,2：97～98
    [40]刘鸣,戴蓓倩,李辉等.鲁棒性话者辨识中的一种改进的马尔科夫模型[J].电子学报,2002,30(1)：46～48
    [41]吕萍,颜永红.基于回归分析的语音识别快速自适应算法[J].声学学报,2005,30(3)：222～228
    [42].M E J New man. The structure and function of complex networks [J].SIAM Review,2003; 45(2):167
    [43]Albert R, A-L Bara ba si. Statistical mechanics of complex networks [J].Reviews of Modern Physics,2002; 74(1):47
    [44]Huang Xuedong, A cero A, Hon H W. Spoken Language Processing. Prentice Hall,2001
    [45]赵力.语音信号处理[M].北京：机械工业出版社,2003.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700