基于DSP和HMM的语音识别系统设计与实现

英文题名：Design and Realization of Speech Recognition System Based on HMM and DSP
作者：项勇
论文级别：硕士
学科专业名称：控制理论与控制工程
中文关键词：语音识别 ; LPCC ; 隐马尔可夫模型 ; TMS320VC5402
英文关键词：Speech Recognition ; LPCC ; HMM ; TMS320VC5402
学位年度：2008
导师：吴谨
学科代码：081101
学位授予单位：武汉科技大学
论文提交日期：2008-05-12
答辩委员会主席：程耕国

摘要

当今社会是数字信息化时代,信用卡号码、电话语音拨号、个人身份证号码、电子密码等都具有数字化特征。同时,随着语音识别技术的发展,使得对数字的语音识别成为可能。数字语音识别可以识别用户说出的数字,向用户提供最自然、最灵活和最经济的人机接口界面,从而能有效解决军用和民用领域中遇到的大量数据录入问题。而且,由于电话网络的日益普及,数字自动语音识别可用于电话人口统计、远程股票交易号码的远程认证等。因此,数字语音识别具有非常高的实用价值。
     在此背景下,本文详细介绍了一个基于DSP的非特定人汉语孤立数字语音识别系统的设计过程,系统通过AD50芯片将模拟语音信号采集到DSP芯片中,再采用语音识别算法对采集到的信号进行处理,并将识别的结果用LED输出。
     首先,介绍了语音信号的基本理论,并从语音信号预处理、特征提取、训练问题、解码问题四个方面对语音识别的基本问题进行了探讨,并简单介绍了语音识别的声学模型。
     其次,重点研究了基于DSP芯片的语音识别系统的硬件电路设计。系统采用TMS320VC5402搭建硬件平台,用TLC320AD50采集语音信号,设计的硬件电路还包括存储器扩展模块、LED显示、JTAG电路和电源电路。
     最后,详细讨论了基于TMS320VC5402的语音识别系统的软件设计。为了提高识别准确率,系统采用VUS算法进行端点检测,特征向量选用12阶LPCC系数、12点一阶差分倒谱系数和12点一阶差分能量系数,采用HMM模型进行语音模型的训练和识别。最后设计了硬件的驱动程序。
The present society is the era of digital information; the credit card number, the voice dialing of telephone, the Identification Card number and electronic password, all have the features of digitalization; besides, with the development of speech recognition technology, it is possible for us to recognize the digits. The automatic digital speech recognition system can recognize a string of numbers speaked by the users and provide them with the most natural, flexible and economical interface between men and computers so as to solve a large number of problems of data input existing in the field of both military use and civil use. In addition, as a result of increasing spread of telephone network, the automatic digital speech recognition can be applied to telephone population statistics, long-distance numbers attestation of long-distance buying and selling of stocks. Therefore, the digital speech recognition system has a highly practical value.
     Based on this background, the design process of a speaker-independent and isolated digital speech recognition system is described in detail in this paper, the system collects the analog speech signal into DSP with AD50, then the collected signal is processed by speech recognition algorithm, and the recognition results are outputted using LED.
     First, this paper introduces the basic theory of speech signal, and explores the four basic problems of speech recognition, including speech signal pre-processing, feature extraction, training problem and decoding problem, and introduces acoustic model of speech recognition.
     Second, the hardware circuit design of speech recognition system based on DSP is deeply researched in this paper. The hardware platform of the system is established with TMS320VC5402, speech signal is collected using TLC320AD50.The hardware circuit of this paper also includes memory expansion module, LED display module, JTAG circuit and power circuit.
     Finally, the paper detailedly discusses the software design of speech recognition system based on TMS320VC5402. In order to improve veracity of identification, the system uses VUS for endpoint detection. 12th-order LPCC coefficients,12-point differential cepstrum coefficients of first-order and 12-point differential energy coefficients of first-order are selected as feature vector. And HMM module is used for training and recognition. Finally drivers of the hardware are designed.

引文

[1] A.V. Oppenheim,R.W. Schafer. Digital Signal Processing[M]. Prentice-Hall, Englewood Cliffs,NJ.,1975
    [2] Masao Namiki, Takayuki Hamamoto, Seiichiro Hangai. Spoken word recognition with digital cochlea using 32 DSP-boards[J]. IEEE Trans.on Acoust., Speech, Signal Processing,2001, pp.969-972
    [3] L.R.Rabiner and B.H. Juang, Fundamentals of Speech Recognition[M]. Prentice-Hall, Englewood Cliffs,NJ,1993
    [4] 王炳锡,屈丹等. 实用语音识别基础[M]. 北京: 国防工业出版社,2005
    [5] Jason Chong,Roberto Togneri. Speaker Independent Recognition of Small Vocabulary Centre for Intelligent Information Processing Systems Department of Electrical and Electronic Engineering The University of Western Australia. pp 2-5
    [6] 赵力. 语音信号处理[M]. 北京: 机械工业出版社,2003
    [7] X.D. Huang,A.Aceroand H.W. Hon,Spoken Language Processing[M]. Prentice-Hall,Englewood Cliffs,NJ,2001
    [8] 李虎生. 汉语数码串语音识别及说话人自适应[D]. 北京: 清华大学电子工程系,2001
    [9] 蔡莲红,黄德智,蔡锐. 现代语音技术基础及应用[M]. 北京: 清华大学出版社,2003
    [10] 苏明武. 基于 DSP 的语音识别技术研究及实现[D]. 哈尔滨: 哈尔滨工程大学,2005
    [11] 庞雄昌. 语音识别及其定点 DSP 实现[D]. 西安: 西安电子科技大学,2004
    [12] 王茜,姚娅川. 基于 SPCE061A 单片机的语音识别系统开发[J]. 四川理工学院学报, 18(1), 2005
    [13] 雷传华,张秀彬,孙济宇. 连续数字语音识别系统的 DSP 实时实现[J]. 上海交通大学学报,38 (12),1999
    [14] 陈志鑫. 基于 TMS320VC54X DSP 的实时语音识别系统[J]. 半导体技术,2001,4,21-25
    [15] Texas Instruments. TMS320C54xDSP Reference Set . Volume 1:CPU and Peripherals,1999
    [16] Texas Instruments. TMS320C54x DSP Reference Set. Volume 4:Applications Guide,1999
    [17] Texas Instruments. TMS320C54x DSP Reference Set. Volume 5:Enhanced Peripherals,1999
    [18] Texas Instruments. TLC320AD50 Data Manual,1996
    [19] 周霖. DSP 通信工程技术应用[M]. 北京: 国防工业出版社,2004
    [20] CYPRESS. CY7C1021 datasheet,1998
    [21] 汪安民,程昱,徐保根. DSP 嵌入式系统开发典型案例[M]. 北京: 人民邮电出版社,2007
    [22] Texas Instruments. SN54LVTH16244/SN54LVTH16245 Data Manual,1999
    [23] Silicon Storage Technology. SST39VF160 datasheet,1998
    [24] 周霖. DSP 系统设计与实现[M]. 北京: 国防工业出版社,2003
    [25] 薛雷,张金艺,彭之威. DSP 原理及应用教程[M]. 北京: 清华大学出版社,2006
    [26] 周霖. DSP 信号处理技术应用[M]. 北京: 国防工业出版社,2004
    [27] Texas Instruments. TPS767D318 datasheet,2000
    [28] 高鹏,安涛. Protel99 入门与提高[M]. 北京: 人民邮电出版社,2000
    [29] 杨行峻,迟惠生等. 语音信号数字处理[M]. 北京: 电子工业出版社,1995
    [30] 易克初,田斌,付强. 语音信号处理[M]. 国防工业出版社,2000
    [31] Zhang Jie Zhang Yan. A Recognition Algorithm Without Ending-Point Detection of Chinese Based on the DTW and HMM Unified Model[J]. IEEE. Trans. on Speech and Audio Processing, pp 4280-4281
    [32] 程启明. 语音信号端点检测的实验研究[J]. 声学与电子工程,1997,3,33-36
    [33] 姚天任. 数字语音处理.第一版[M]. 武汉: 华中科技大学出版社,1992
    [34] 蔡莲红,黄德智,蔡锐. 现代语音技术基础及应用[M]. 北京: 清华大学出版社,2003
    [35] Mei-Yuh Hwang,et al. Shared-Distribution Hidden Markov Models for Speech Recognition[J]. IEEE. Trans. on Speech and Audio Processing,1993,1(4): 414-420
    [36] Rabiner L R, Juany B H,L evinson S E. Recognition of isolated digits using Hidden Markov Models with continuous mixture densities[J]. AT&T,1985,64(6): 1211-1233
    [37] Texas Instruments.TMS320C54x DSP Reference Set. Volume 3:Algebraic Instruction,1999
    [38] 刘益成. TMS320C54X DSP 应用程序设计与开发[M]. 北京: 北京航天航空出版社,2002
    [39] 王军宁,何迪,马娟等译. TI DSP/BIOS 用户手册及驱动开发[M]. 北京: 清华大学出版社,2007
    [40] 汪安民,程昱. DSP 应用开发实用子程序[M]. 北京: 人民邮电出版社,2005
    [41] Texas Instruments. TMS320C54x DSP Reference Set. Volume 2:Memonic Instruction,1999
    [42] 戴明桢,周建红. TMS320C54X DSP 结构、原理及应用[M]. 北京: 北京航天航空出版社,2001
    [43] Texas Instruments. TMS320C54xChip Support Library API Reference Guide,2003
    [44] 王华,张健. DSP 原理及其 C 编程开发技术[M]. 北京: 电子工业出版社,2006

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700