基于TMS320C5409的语音识别系统
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着语音识别技术的深入发展,在语音识别领域已经有许多成熟算法并且有了成功的应用实例。随着DSP技术的飞速发展,其系统功能、数据处理能力以及与外部设备通信功能地不断增强,采用DSP实时实现语音识别系统已经成为可能。本文介绍了以TMS320C5409 DSP为核心的小词汇量实时语音识别系统的设计与实现。
     由于DSP的运算速度和存储空间都是有限的,所以在DSP上实现语音识别系统,除了在硬件设计上,更多的工作在算法选择和软件编程上。文中重点介绍了系统的软件设计过程,给出了程序流程图,并对编程时的注意事项进行了说明。针对汉语数字语音及TMS320C5409 DSP的特点,系统采用常用的能量和过零率相结合的方法进行端点检测,分别采用了三种语音识别特征参数—LPC倒谱系数、Mel频率倒谱系数(MFCC)和过零峰值幅度(ZCPA)。在语音识别的后端中选用RBF神经网络进行识别。
     模糊信息理论是一种基于模糊集理论的信息科学,是指导工程实践的一个有普遍意义的强有力工具。本文将模糊理论引入RBF神经网络对其进行了改进。常规RBF神经网络的隐层一般采用高斯函数,然而任意输入对于中心点的隶属程度并不总是服从高斯分布,本文用任意输入模式与各类中心的隶属度来替代原有的径向基函数的输出。实验结果表明识别率得到了提高。
     系统算法由C和汇编语言编写,并在TMS320C5409板上对语音识别算法进行了调试和实验。系统选用汉语数字孤立词为识别对象,识别结果通过异步通信串口送到PC上实时显示。实验结果证明了系统的有效性。
With the development of speech recognition technology, there are many algorithms in the filed of speech recognition and successful applied in examples. With the development of DSP, its system function, data processing ability and correspond with the exterior equipments to strengthen continuously, real time speech recognition system can be realized by DSP. This paper introduces designing and realizing of a real time speech recognition system based on the core of TMS320C5409 DSP.
     Because the DSP operation speed and memory are limited, so realize speech recognition system on DSP, in addition to designing the hardware, more works on the choose of algorithm and software program. This text introduces the software design process of the syetem, presents program flow diagram, and explains program regulation. Through studying the peculiarity of the Chinese and TMS320C5409 DSP, Speech endpoint detection of the average instantaneous energy and the average instantaneous zero crossing rate, Linear Prediction Cepstrum Coefficient,Mel Frequency Cepstrum Coefficient and Zero-Crossings with Peak Amplitudes are used in the system. Radial Basis Function neural network is used in the backend of system.
     Fuzzy information theories is a kind of information science based on the theories of fuzzy set. It is a power tool to lead engineer practice. Fuzzy theories are introduced to Radial Basis Function neural network in the paper. The hidden layer of normal RBF adopts Gauss function generally, however the subordinate of inputs to the central point is not always the distribute of Gauss function. This text uses the subordinate of inputs to the central point to place the output of Radial Basis Function. The experiment results show the rates of recognition are increased.
     All algorithms are realized on the software C and assembly language in thesystem. Those algorithms are carried on the board of TMS320C5409.Validity is proved by real time recognition experiment of Chinese digit isolated words. The experiment results are sent to PC through asynchronous correspond serial port.
引文
[1] 顾良,刘润生,汉语数码语音识别:发展现状难点分析与方法比较,电路与系统学报,1997,2(4),32-39
    [2] 沈怡,特定人孤立词汉语识别系统的研究,[硕士学位论文],江苏,南京气象学院,2004
    [3] 尹志宇,李青茹等,一种基于L-M算法的组合神经网络模糊控制器,电光与控制,13(1),Jan,2004
    [4] 赵明忠,顾斌等,DSP应用技术,西安,西安电子科技大学出版社,2004
    [5] 李真芳,苏涛,黄小宇,DSP程序开发,西安,西安电子科技大学出版社,2003
    [6] 苏涛,蔺丽华等,DSP实用技术,西安,西安电子科技大学出版社,2002
    [7] 周霖,DSP系统设计与实现,北京,国防工业出版社,2003
    [8] 胡宾,嵌入式语音识别技术的研究,[硕士学位论文],湖北,武汉理工大学,2006
    [9] 赵力,语音信号处理,北京,机械工业出版社,2003
    [10] 袁玉倩,改进的机遇矢量量化的文本相关说话人识别方法研究,[硕士学位论文],河北,河北工业大学,2006
    [11] R Vergin D O Shanghnessy, V Gupta. Compensated Mel Frequency Cepstrum Coefficients. Proceedings of IEEE ICASSP. Atlanta, 1996 (1), 323-326
    [12] 张勇,曾炽祥,周好斌,陈滨著,TMS320C5000系列DSP汇编语言程序设计,西安,西安电子科技大学出版社,2004
    [13] 徐盛,胡剑凌,数字信号处理器,上海,上海交通大学出版社,2003
    [14] 乔瑞萍,崔涛,张芳娟,TMS320C54x DSP原理及应用,西安,西安电子科技大学出版社,2005
    [15] 陈栋,语音信号前端处理技术研究,[硕士学位论文],陕西,西北工业大学,2005
    [16] 余华,蒋春晖,赵力等,基于TMS320C54DSP的语音识别装置的研究与实践,电气电子教学学报,2004
    [17] Fakotakis N, Sirigos J.A igh Performance Text-Independent Speaker Identification System Based on Vowel Spotting and Neural Nets. In: Proceedings of IEEE Int Conf on Acoustics, Speech and Signal Processing, Atlanta, GA, USA, 1996, 661-664
    [18] 易克初,田斌,付强,语音信号处理,北京,国防工业出版社,2000,249-264
    [19] 宋叔庵,神经网络在语音识别中的应用研究,[硕士学位论文],陕西,西北工业大学,2002
    [20] 侯雪梅,基于改进LP倒谱特征和神经网络的语音识别算法研究,[硕士学位论文],山西,太原理工大学,2006
    [21] Hermansky H. Perceptual Linear Predictive (PLP) Analysis for Speech. Journal of the Acoustical Society of America , 1990, 87(4), 1738—1752
    [22] Improved MFCC-Based Feature for Robust Speaker Identification, TSINGHUA SCIENCE AND TECHNOLOGY ISSN1007-0214 05/23, 2005, 10(2), 158-161
    [23] Comparison of Different Implementations of MFCC. J. Comput. Sci. & Technol, 2001, 16(6)
    [24] 何强,蒙山,DSP语音控制器中MFCC参数的定点快速算法,五邑大学学报,1999,4(13)
    [25] 赵力,语音信号处理,北京,机械工业出版社,2003,4
    [26] 焦志平,改进的ZCPA语音识别特征提取算法研究,[硕士学位论文],山西,太原理工大学,2005
    [27] 赵姝彦,HMM和神经网络用于语音识别的算法研究,[硕士学位论文],山西,太原理工大学,2005
    [28] 王炜,吴耿锋,张博锋,王媛,径向基函数(RBF)神经网络及其应用,中国地震,2005,25(2)
    [29] Broomhead DS, Lowe D. Multivariable. Functional interpolation andadaptive networks, Complex System, 1988(2), 321-355
    [30] 张海燕,冯天瑾,新的组合激活函数BP网络模型研究,青岛海洋大学学报,2002,32(4),621-626
    [31] 王刚,刘力柱,基于神经网络的模糊模式识别算法研究,信息工程大学学报,2006,7(1)
    [32] 高林,顾幸生,神经网络多模型软测量技术及应用,华东理工大学学报,2004,30(5)
    [33] Bates J M, Granger C W J. The combination of forecasts[J]. O perations Research Quarterly, 1969, 20, 319-323
    [34] Cho S B, Kim J H. Combining multiple neural network by fuzzy integral for robust classification[J]. IEEE Trans on Systems, Man and Cybernetics, 1995, 25 (2), 380-384
    [35] Sushmita Mitra, Jayanta Basak. FRBF: A Fuzzy Radial Basis Function Network [J]. Neural Computing Application, 2001, (10), 244-252
    [36] 王燕,王唯一,孙卓,张明,一种模糊RBF神经网络在漏钢动态波形识别中的应用,铸造,2004,53(2)
    [37] 王正群,陈世福,陈兆乾,基于模糊划分的神经网络集成,南京大学学报,2006,42(1)
    [38] Bezdek J C Patten recognition with fuzzy objective algorithm, New York, Plenum Press, 1981, 309-321
    [39] 邱立存,闻武,刘海英,TMS320C54x系列DSP上FFT运算的实现,微计算机信息,2005,7(2)
    [40] TEXAS INSTRUMENTS, TMS320C54x DSP Library Programmer's Reference, SPRU422H, October, 2004

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700