基于FPGA的语音识别技术研究

英文题名：The Research of Speech Recognition Technology Based on FPGA
作者：谢秋云
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：语音识别 ; FPGA ; 隐马尔可夫模型 ; MFCC ; Viterbi
英文关键词：speech recognition ; FPGA ; HMM ; MFCC ; Viterbi
学位年度：2007
导师：肖铁军
学科代码：081203
学位授予单位：江苏大学
论文提交日期：2007-10-01
答辩委员会主席：鞠时光

摘要

许多已有的语音识别系统都是基于计算机软件的。但现在很多应用却要求体积压缩,方便携带和低功耗。所以基于集成电路的语音识别专用芯片有广阔的发展空间。当前语音芯片都采用以DSP为核心的结构,费用高,设计缺乏灵活性,很难进一步提高处理性能。FPGA(Field-Programmable Gate Array,现场可编程门阵列)具有功耗低、体积小、集成度高、速度快、开发周期短、费用低、用户可定义功能及可重复编程和擦写等许多优点,可以实现高性能并行算法。
     本文主要研究的就是用FPGA来实现语音识别算法。主要工作包括:
     研究并实践了数字处理算法的多种FPGA设计方法——VLSI结构的设计方法;硬件DSP的Matlab建模设计方法:IP核设计方法等。运用这些方法,设计实现了一些基础运算功能的硬件实现,并用于语音识别算法。
     语音识别的前端处理及硬件实现。包括预加重,分帧,加窗和端点检测。采用了基于能量变迁的语音的端点检测方法。并在该方法上改进,采用实时分帧,不但能够实现实时的端点检测,还具有一定的抗噪性。
     语音特征提取及其硬件设计。采用Mel频标倒谱参数(Mel FrequencyCepstrum Coefficient,MFCC),充分模拟了人的听觉特性,具有较高的识别性能和抗噪能力。该参数计算主要包括快速傅立叶变换(FFT)、三角滤波、取对数和离散余弦变换(DCT)等过程。本文在每个过程的硬件结构上都进行了巧妙的设计,提高了速度和效率:FFT中针对实数的FFT做了硬件结构的改进减少了FFT点数,使速度提高了约40%;三角滤波器将其中心频率转化为频谱中对应点,提高了运算效率;取对数中用了查表和线性插值结合的方法,提高了精度。最后提出了三级流水计算MFCC参数的硬件结构,进一步加快了MFCC参数计算。矢量量化硬件设计中采用与最小值比较的方法来提高码本的搜索速度。
     Viterbi识别算法及其硬件实现。采用隐马尔可夫(HMM)来进行声音建模和匹配。HMM在计算量和存储量上被认为是最有效的方法。在Viterbi识别中,对传统的Viterbi算法公式做了改进,进行了剪枝,使搜索速率大大提高,采用了4个ACS并行处理,简化了电路,提高了识别速度。
Many speech recognition systems are based on software, but more and more applications now require physical compactness, portability, in addition to low-power. Therefore, the dedicated speech recognition chip based on integrate circuit has an extensive development space. Current speech chips based on DSP cost too expensive, and lack of flexility in design, so the performance can't be improved more. FPGA(Field-Programmable Gate Array) has a lot of advantages such as low power consumption, small size, hign integration and speed, short development cycle, low-cost, User-definable function, programming and erasing repeatedly, so it has good performance in Parallel arithmetic.
     This paper studies how to realize algorithms of speech recognition with FPGA. The main task is as follows:
     A variety of FPGA design methods of digital processing algorithm are studied and realized, such as VLSI architecture design method; Matlab modeling of DSP hareware design method; IP core design method. Some basic computing function units based on hardware are implemented with thses moehods, and used for speech recognition algorithm.
     The front-end processing of speech recognition, including pre-emphasis, enframing, windowing and endpoint detecting. A method based on energy changing is proposed and improved by real-time enframing so it can perform well in real-time endpoint detecting as well as some antinoise capability.
     The feature extraction of speech recognition and its hardware design. The Mel Frequency Cepstrum Coefficient (MFCC) fully simulates the characteristics of the hearing, so it has high performance and antinoise capability in recognition. However, its computation is very complex including Fast Fourier Transform(FFT), triangular filter, logarithm and Discrete Cosine Transform(DCT). In this paper, the hardware design of each process has improved its speed. In FFT, by reducing FFT points of real number, the speed is improved by 40%. In triangular filter computation, the center frequency is converted into the corresponding point in frequency spectrum to get high calculating efficiency. In logarithm, the look-up table and linear interpolation are used to improve the precision. Finally, afrer analysis of the MFCC process, a three pipeline processing hardware structure is presented. It can perform triangular filter, logarithm and DCT almost parallelly, which accelerates the MFCC extraction speed. In Vector Quantization(VQ), the efficiency of codebook search is improved by compareing result with minimum.
     Viterbi recognition arithmetic and its hardware implemetation. The Hidden Markov Model(HMM) is used for modeling an matching, and it could be considered the most powerful technique in terms of computation and storage requirements. A method according to the HMM structure, which improved the formula of traditional Viterbi algorithm, can achieve high searching speed by pruning. Four ACS units are used for parallel processing, which simplify the circuit and improve the recognition speed.

引文

[1]王志飞.小词汇量非特定人孤立词语音识别的FPGA实现[D],大连理工大学,硕士,2005:4-5
    [2]Nedevschi,S.;Patra,R.K.;Brewer,E.A.Hardware speech recognition for user interfaces in low cost,low power devices[R].Design Automation Conference,2005.Proceedings.42nd 13-17June 2005 Page(s):684-689
    [3]马志欣,王宏,李鑫.语音识别技术综述[J].昌吉学院学报,2006(3):93-97
    [4]张宜.汉语语音识别技术的研究与发展[J].广西广播电视大学学报,2003,14(4):18-22
    [5]齐海鹏.语音识别系统芯片及其发展[J].中国科技信息,2005,(6):25-25
    [6]张玲华,郑宝玉,杨震.基于LPC分析的语音特征参数研究及其在说话人识别中的应用[J].南京邮电学院学报(自然科学版),2005,25(6):1-6
    [7]陈亮,陈敏.LSP参数的快速计算及其高效量化研究[J].解放军理工大学学报(自然科学版),2001,2(5):24-27
    [8](英)[L.拉宾纳]Lawrence Rabiner,(英)[阮平望]Biing-Hwang Juang.语音识别基本原理[M].北京清华大学出版社,1999:55-60
    [9]陈杰,张玲华.说话人识别中语音特征参数的研究[J].信息技术,2006(11):88-89,93
    [10]刘长明,任一峰.语音识别中DTW特征匹配的改进算法研究[J].中北大学学报(自然科学版),2006,27(1):37-40
    [11]张保轩,邵献之.基于ANN的汉语数字语音识别[J].山东电子,1995(1):20-22
    [12]苗苗,马海武.HMM在语音识别系统中的应用[J].现代电子技术,2006,29(16):64-66
    [13]陈雁翔,戴蓓蒨,周曦.一种适于非特定人语音识别的并行隐马尔可夫模型[J].电子与信息学报,2004,26(10):1601-1606
    [14]陈晓霖.基于隐马尔可夫模型的语音识别方法的研究[D],山东大学,硕士,2005:39-40
    [15]Elmisery,F.A.;Khalil,A.H.;Salama,A.E.;Hammed,H.F.A FPGA-based HMM for a discrete Arabic speech recognition system[R].Microelectronics,2003.ICM 2003.Proceedings of the 15th International Conference on 9-11 Dee.2003 Page(s):322-325
    [16]何珏,刘加.汉语连续语音中HMM模型状态数优化方法研究[J].中文信息学报,2006,20(6):83-88
    [17]薛亮,陈少波,张正炳.语音的采集与回放[J].电声技术,2003(10):47-50
    [18]S.B.Davis,P.Mermelstein.Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[C].IEEE Trans.on Acoustics,Speech,and Signal Processing,1980,28(4):357-366
    [19]刘静萍,姜占财,德熙嘉措.语音信号的预处理技术探讨[J].甘肃联合大学学报(自然科学版),2006,20(5):61-64
    [20]赵力.语音信号处理[M].北京机械工业出版社,2003:34
    [21]吴龙梅,张建军,赵风光等.一类新的实时语音端点检测方法[J].上海大学学报(自然科学版),2005,11(4):372-374,385
    [22]张仁志,崔慧娟.基于短时能量的语音端点检测算法研究[J].电声技术,2005(7):52-54,59
    [23]李昱,林志谋,黄云鹰等.基于短时能量和短时过零率的VAD算法及其FPGA实现[J].电子技术应用,2006,32(9):110-113
    [24]黄秋安,姜波,汪秉文.基于有限状态机的汉语数字语音端点检测[J].湖北大学学报(自然科学版),2004,26(1):35-38
    [25]丁吴,姚天任.基于mel标度频谱和音素分割的汉语语音单词端点检测方法[J].计算机与数字工程,2005,33(3):57-59
    [26]董力,陈宏钦,马争鸣.基于小波变换的语音段起止端点检测算法[J].中山大学学报(自然科学版),2005,44(3):116-118
    [27]覃溪,钟明辉,曹乃文.基于ICA增强和谱熵的语音端点检测方法[J].电声技术,2006(10):49-50,54
    [28]Wei HAN,Cheong-Fat CHAN,Chin-Sing CHOY and Kong-Pang PUN.An efficient MFCC extraction methos in speech recognition[J].Circuits and Systems,2006.ISCAS 2006.Proceedings.2006 IEEE International Symposium on.2006:408-412
    [29]李宏松,苏健民,黄英来.基于声音信号的特征提取方法的研究[J].信息技术,2006,30(1):91-94
    [30]甄斌,吴玺宏,刘志敏等.语音识别和说话人识别中各倒谱分量的相对重要性[J].北京大学学报(自然科学版),2001,37(3):371-378
    [3l]易克初,田斌,付强.语音信号处理[M].国防工业出版社,2000(6):54-60
    [32]丁吴.汉语连续数目字语音识别的研究[D],华中科技大学,硕士,2005:23-27
    [33]刘皖等.FPGA设计与应用[M].清华大学出版社,2006(6):100-124
    [34]Yoshizawa,S.;Miyanaga,Y.;Wada,N.A low-power VLSI design of an HMM basod speech recognition system[J].Circuits and Systems,2002.MWSCAS-2002.The 2002 45th Midwest Symposium on Volume 2,4-7 Aug.2002 Page(s):Ⅱ-489-Ⅱ-492 vol.2
    [35]潘松,黄继业,王国栋.现代DSP技术[M].西安电子科技大学出版社,2003(8):7-12
    [36](美)Uwe Mayer-Baese.数字信号处理的FPGA实现[M].北京清华大学出版社,2006:55-70
    [37]门爱东,杨波,全子一.数字信号处理[M].北京人民邮电出版社,2003:68-79
    [38]许开宇,祝忠明,卢亚玲.数字信号处理[M].北京电子工业出版社,2005:200-218
    [39]BokoGue Park,Koon-shik Cho,Jun-Dong Cho.Low power VLSI architecture of Viterbi scorer for HMM-based isolated word recognition[C].International Symposium on Quality Electronic Design,2002:235-239
    [40]王金明.Verilog HDL程序设计教程[M].北京人民邮电出版社,2004:114-116
    [41]Melnikoff,S.J.;Quigley,S.E;Russell,M.J.Implementing a simple continuous speech recognition system on an FPGA[J].Field-Programmable Custom Computing Machines,2002.Proceedings.10th Annual IEEE Symposium on 22-24 April 2002 Page(s):275 - 276

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700