基于语音识别的拼音学习系统设计与实现

英文题名：Design and Implementation of Pinyin Learning Sytem Based on Speech Recognition
作者：李璐
论文级别：硕士
学科专业名称：软件工程
中文关键词：HSK ; 语音识别 ; SAPI ; 多媒体
英文关键词：HSK ; speech recognition ; SAPI ; multimedia
学位年度：2010
导师：张笑燕
学科代码：081203
学位授予单位：北京邮电大学
论文提交日期：2010-11-01

摘要

语音识别技术的发展使得人与计算机的交互成为可能,针对目前汉语拼音在自学过程中缺乏反馈性意见的指导,本文把HSK大纲的初级词汇作为识别内容,结合语音识别的相关原理,利用VC7.0集成开发环境设计了汉语拼音的学习系统,系统按照学习、练习和测试三部分功能来实现,并在练习功能中嵌入语音识别技术与用户进行交互,给出用户发音的评价和反馈信息。
     在课程的组织中,利用ACCESS2000数据表的方式对课程内容进行存储和分类,利用DAO数据库驱动方式建立于Visual C++应用程序的数据交换,利用Visual C++7.0构建了学习系统的框架和功能的详细实现,并对用户信息和学习过程的存储和管理。
     在学习和测试功能中,结合FLASH和WAVE等多媒体技术来实现对拼音发音过程中的动态演示。文中Flash的播放通过利用第三方Shockwave控件完成,WAVE文件的播放通过Windows MCI API函数来实现。
     在练习环节中,通过DirectSound技术对用户发音的录入,存储为WAVE格式文件并利用SAPI5.1所提供的COM接口和语法规则来进行音节和声调的识别。本文介绍了识别中的动态时间弯折算法、语料库的建立、声韵分割技术以及评价分级标准,描述了系统识别系统的结构、功能和流程。最后给出了系统各个界面的功能说明和测试。
With the development of speech recognition, the interactive between people and computer became possible. Aiming at the condition of Pinyin self-learning lack of tutor's feedback suggestion, the Chinese pinyin learning system with speech recognition principle was designed using the tools of visual C++7.0 with the elementary content of HSK outline in this paper. The system included the three parts of follow me, practice and assignment, and in the practice module, the speech recognition technology was embedded and give out the evaluation and feedback information of user's pronunciations.
     In the course organization, using access 2000 database to save and classify the course and setting up the data exchanging between the visual C++and database via the DAO database drivers. The frame of learning system and function of detail implementation was built, and the user's basic information and learning process also were saved and managed.
     In the modules of follow me and assignments, the course of pingyin pronunciation was danaymic demonstrated by multimedia of flash and wave techniques. The flash playing was implemented by the third party tool of shockwave control and wave file playing done by Windows MCI API functions.
     In the module of practice, throught the DirectSound technology, the user's pronunciation speech was recorded and saved as wave format file. Using the interface of COM provided by SAPI5.1 and grammar rules, the recorded file's syllable and tone were recognized. The arithmetic of dynamic time warping, speech material building, and segmentation of initial and final part and criterion of pronunciation grade were introduced in the process of recognization. The structure, function and flow of recognization system were described. And the user interface of application software was narrated and tested.

引文

[1]王仁华,刘庆峰.开创语音技术产业新纪元[J].微电脑世界,2000(52)：20-21.
    [2]郑方,吴文虎,方棣棠.连续无限语音流中关键词识别的研究现状[C].第四届全国人机语音通信学术会议(NCMMSC-96)论文集.北京：1996：13-21.
    [3]杜利民.自动语言辨识[J].电子科技导报,1996(4)：16-25.
    [4]侯风雷.电话信道下说话人识别技术研究[D].郑州：解放军信息工程大学,2004：87-88.
    [5]李禹材,左友东,郑秀清,王玲.基于SpeechSDK的语音控制应用程序的设计与实现[J].计算机应用,2004(24)：114.
    [6]朱杰,张中生.基于COM技术的语音应用程序的设计和实现[J].计算机工程,2001(27)：143-145.
    [7]Microsoft Speech 5.1 SDK Help. Microsoft Corporation.
    [8]韦晓东,朱杰,胡光锐.汽车噪声中自动语音的识别程序[J].上海交通大学学报,1998(10)：45-16.
    [9]陈四根,和应民.一种基于信息熵的语音端点检测方法[J].应用科技,2001(3)：1213一1214.
    [10]吕国云,蒋冬悔,张艳宁.基于动态贝叶斯网络的大词汇量连续语音识别和因素切分研究[J].西北工业大学学报.2008,26(2)
    [11]王欢良,钱瑶.基于声调建模的带噪汉语数字串语音识别[J].声学学报.2007,32(5)
    [12]黄浩,,朱杰.汉语语音识别中基于区分性权重训练的声调集成方法[J].声学报.2008,33(1)
    [13]韦岗.混沌分形理论与语音信号处理[J].电子学报,1996(1)：34-39.
    [14]吕军,曹效英.基于语音识别的汉语发音自动评分系统的设计与实现[J].计算机工程与设计,2007(28)：32-35.
    [15]彭飞,彭德厚.自动语音识别技术在聋儿计算机辅助教学中的应用[J].现代教育技术,2007(17)：69-70.
    [16]付跃文,伸伟波.基于多词汇树的对话语音识别搜索策略[J].微计算机信息,2007(23)：262-263.
    [17]邓莎,张振宇.语音识别技术分析及展望[J].现代计算机,2007(4)：34-35.
    [18]易克初,田斌,付强.语音信号处理[M].北京：国防工业出版社,2000.154-233.
    [19]韩纪庆,张磊,郑铁然.语音信号处理[M].北京：清华大学出版社,2004.191-258.
    [20]蔡莲红,黄德智,蔡锐.现代语音技术基础与应用[M].北京：清华大学出版社,2003.232-278.
    [21]韩纪庆,王欢良,李海峰,等.基于语音识别的发音学习技术[J].电2声技术,2004(9)：47-51.
    [22]张江安,杨洪柏,林良明,等.一种基于段间距离测度的语音自动分割方法[J].上海交通大学学报,2001(9)：1362-1365.
    [23]郑竞华.用VC编程实现语音合成[J].自动化技术与应用.2006,25(12)
    [24]吕夏.文本转化为语音的VC-+编程方法[J].计算机应用.200121(7)
    [25]姚涵珍,录文秀.TTS中文语音合成技术的研究与实践[J].天津科技大学学报.2004,19(1)
    [26]王永琦,王立功,许焱平.语音信号的特征量分析和消噪处理[J].焦作工学院学报(自然科学版).2003,22(1)
    [27]Leonardo Neumeyer, Horacio Franco, Mitchel Weintraub, et al.Automatic text-independent pronunciation scoring of foreign language student speech[C]. Philadelphia, USA:ICSLP, Fourth International Conference,1996.1457-1460.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700