小词汇非特定人的孤立词语音识别系统的研究与设计
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音识别技术是语音信号处理中的一个分支,语音识别技术就是让机器通过识别和理解过程把语音信号转变为相应的文本或命令的技术。语音识别是一门交叉学科,涉及到人工智能、模式识别、数字信号处理、计算机科学、语言声学、心理学、生理学和认知科学等许多学科领域,具有深远的研究价值。语音识别和语音合成技术已经成为现代技术发展的一个标志,也是现代计算机技术研究和发展的一个重要领域。虽然语音识别技术已经取得了一些成就,也有部分产品面世,但是,大多数语音识别系统仍局限于实验室,远没有达到实用化要求。目前语音识别技术研究的热点是如何实现在线无监督的学习和多方法综合自适应学习算法;制约实用化的根本原因可以归为两类,识别精度和系统复杂度。
     语音识别按照任务的不同可以分为四个方面:说话人识别、关键词检出、语言辨识和连续语音识别。本文主要对小词汇非特定人的孤立词语音识别算法进行研究。语音识别的主要流程包括:语音信号的预处理、端点检测、特征提取、建立语音模板库、模式匹配。
     本文首先探讨了语音识别的基本原理和各种语音识别算法的特点,比较并选取了有效的非特定人孤立词语音识别算法,对其实现进行了深入分析,最后利用VC进行了开发。采用动态时间归整模型形成的经典语音识别算法常用在非特定人小词汇量语音识别系统中,本文提出了具有一定鲁棒性的端点检测语音识别技术,对传统的基于过零率与短时能量的双门限端点检测方法进行了改进,提出了根据语音文件数据自动调节门限的可变门限端点检测方法,并对该算法在Matlab进行仿真测试,试验表明该算法对语音端点检测的准确度有一定的改善,然后本文使用VC对该算法的进行了编程实现。在语音信号采集时,通过调用底层API,在一定程度上减小了噪声对语音数据的影响。论文对语音波形的特征提取线性预测倒谱系数(LPCC),利用动态时间规整技术(DTW)对模板进行匹配和聚类的方法建立模板库。最后,论文对算法的实验结果进行了测试分析。
Speech Recognition is a branch of voice signal processing technology, through the process of identifying and understanding the voice signal speech recognition make its into the appropriate text or order. Speech Recognition is an interdisciplinary, involving artificial intelligence, pattern recognition, digital signal processing, computer science, language acoustics, psychology, physiology and cognitive science, and many other fields. As an interdisciplinary field, speechr ecognitionist heoretically very valued. Speech recognition has become one of the important research fields and a mark of the development of science. Although speech technology has got some achievements, most speech recognition systems are still limited in lab and would have problems if migrated from lab which are much far from practicality. How to achieve online unsupervised learning methods and more integrated adaptive learning algorithm is the hotspots of current research of speech recognition technology. The ultimate reasons for restricting practicality can be classified to two kinds, one is precision for recognition and the other is complexity of the system.
     Speech Recognition in accordance with the different tasks can be divided into four areas: speaker recognition, keyword detection, language identification and continuous speech recognition. This paper mainly focuscs on speaker independent isolated word speech recognition algorithm.
     Fundamentals of speech recognition and its algorithm have been studied in this paper. We compare the difference of the speaker independent isolated word speech recognition algorithm, and select some effective approaches for our system. Then we research on how to realize our speaker independent isolated word recognition algorithm on person computer. The algorithm was realized by Visual C++ in computer finally. The DTW (Dynamic Time Warping) model, which is typically algorithm, is recognition often used in independent small vocabularies speech systems. In this paper, an innovative endpoint detection technology for robust speech recognition is presented, according to the voice data information, this technology, based on the traditional zero-rate and short-term energy endpoint detection methods which is called dual-threshold endpoint detection methods, automatically adjust the threshold variable threshold. We simulate and test the algorithms by Matlab. Tests show that this endpoint detection algorithm improves the certain of accuracy. Then the algorithm was programmed by the VC. In the voice signal acquisition, by calling the API of Windows, to a certain extent, reduce the noise on the voice data. In this speech recognition system, the feature extraction algorithm is linear prediction analysis (LPCC), the pattern matching algorithm is dynamic time warping(DTW) and the construction process of speech corpus by clustering. Finally, the algorithm test results were analyzed in this paper.
引文
[1]郑方.语音端点检测、前端处理和特征抽取的研究.清华大学硕士论文,1990
    [2]王炳锡,屈丹,彭煊等.实用语音识别基础.北京:国防工业出版社,2005
    [3]刘加.汉语大词汇量连续语音识别系统研究进展.电子学报,2000,28(1):85-91
    [4]L R Rabiner,B H Juang.Fundamentals of Speech Recognition.New Jersey:Prentice-Hall,1993
    [5]魏阳.孤立词语音识别技术的研究与DSP实现.西南交通大学通信与信息系统硕士学位论文.2006
    [6]J Bellegard.Statistical Techniques for Robust ASR:Review and Perspectives.Pro.Eurospeech'97,Greece,1997:KN33-KN36
    [7]R Zheng,Z Y Wang.Speaker Adaption:An Overview.Chinese Journal of Electronics,1998,7(2):122-127
    [8]B H Juang.The Past,Present and Future of Speech Processing.IEEE Signal Processing Magazine,1998:24-48
    [9]E Charniak.Statistical Language Learning.Bradford MIT Press,1993
    [10]R Rosenfeld.A Hybrid Approach to Adaptive Statistical Modeling.Pro Human Language Technology Workshop.Plainsboro NJ,Morgan Kaufman Publishers Inc.1994:76-81
    [11]韩纪庆,张磊,郑铁然.语音信号处理.北京:清华大学出版社,2004
    [12]易克初,田斌等.语音信号处理.北京:国防工业出版社,2000
    [13]时晓东.孤立词语音识别系统设计研究.浙江大学电路与系统硕士学位论文.2006
    [14]胡航.语音信号处理.北京:机械工业出版社,2003
    [15]蔡莲红,黄德智,蔡锐.现代语音技术基础与应用.北京:清华大学出版社,2003
    [16]林常志.连续语音识别系统声学模型研究,哈尔滨工业大学工学硕士学位论文,2005
    [17]安英花.基于动态词表的孤立词语音识别系统,北京邮电大学工学硕士学位论文,2005
    [18]Davis K H,et al.Automatic Recognition of Spoken Digits.Journal of the Acoustical Society of America,1995,Vol.24,No.6:637-642.
    [19]F Jelinek.Continuous Speech Recognition by Statistical Methods.Proc.IEEE,1976,64(4):532-556
    [20]徐大为,吴边等.一种噪声环境下的实时语音端点检测算法.计算机工程与应用,2003,(1):115-117
    [21]L.R.拉宾纳,R.W.谢弗编著.朱雪龙等译.语音信号数字处理.科学出版社,1987
    [22]http://en.wikipedia.org/wiki/Window_function
    [23]马俊.语音识别技术研究,哈尔滨工程大学硕士学位论文,2004
    [24]S.B.Davis and P.Mermelstein,"Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences ",IEEE Trons.Acoust.,Speech,Signal Processing,vol.28,no.4,Aug.19 80,pp.357-366.
    [25]周长发.Visual C++.NET多媒体编程.第1版.北京:电子工业出版社,2002
    [26]何强,何英.MATLAB扩展编程.北京:清华大学出版社,2002
    [27]高丙朋 基于dsp的小词汇量语音识别系统 新疆大学硕士学位论文 2006
    [28]冯晓亮,于水源.三种基于DTW的模板训练方法的比较第八届全国人机语音通信学术会议.www.cuc.edu.cn/shengxue/papers/E-16.pdf
    [29]Cross-words Reference Template for DTW-based Speech Recognition Systems IEEE TENCON 2003,Bangalore,India.
    [30]Chuan JIA,Bo XU AN IMPROVED ENTROPY-BASED ENDPOINT DETECTION ALGORITHM National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing cjia,xubo@nlpr.ia.ac.cn
    [31]Jia-lin Shen,Jeih-weih Hung,Lin-shan Lee.Robust Entropy-based Endpoint Detection for Speech Recognition in NoisyEnvironments,Institute of Information Science,Academia Sinica Taipei,Taiwan,Republic of China,jlshen@iis.sinica.edu.tw
    [32]F.Sha.Large margin training of acoustic models for speech recognition.PhD thesis,University of Pennsylvania,2007
    [33]龙银东,刘宇红,敬岚等.在MATLAB环境下实现的语音识别,微计算机信息2007,23(12-1):255-256、276
    [34]Chuan JIA,Bo XU.An Improved Entropy-based Endpoint Detection Algorithm.National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing.cjia,xubo@nlpr.ia.ac.cn
    [35]http://www.msdn.com
    [36]Osamu Segawa,Kazuya Takeda,Fumitada Itakura.Continuous speech recognition without end-point detection Electrical Engineering in Japan,12 Jul 2006,Volume 156,Issue 4,Pages 43-50
    [37]张跃进,刘邦桂,谢昕.噪声背景下语音识别中的端点检测,华东交通大学学报.2007,24(5):135-138
    [38]Victor Zue,Ronald A.cole."spoken Language Input" in Survey of the state of the Art in Human Language Technology.1996.
    [39]Biing-Hwang Juang and Sadaoki Furi,"Automatic recognition and understanding of spoken language-A first step toward natural human - machine communication".Proceeding of The IEEE,2000.8,Vol.88,No.8:1142-1165
    [40]C-H.Lee."On Stochastic Feature and Model Compensation Approaches to Robust Speech Recognition".Speech Communication,1998,Vol.25:29-47.
    [41]Manning,Christopher D.Foundations of Statistical Natural Language Processing.Cambridge,Mass MIT Press,2002.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700