基于互信息估计的连续数字语音识别

英文题名：Connected Digit Speech Recognition Based on Mutual Information Estimation
作者：徐华
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：互信息估计 ; 连续数字语音识别 ; MI_OneStage算法 ; MIM_LB算法
英文关键词：Mutual Information Estimate ; Connected Digit Speech Recognition ; MI_OneStage algorithm ; MIMJLB algorithm
学位年度：2003
导师：俞一彪
学科代码：081001
学位授予单位：苏州大学
论文提交日期：2003-04-01

摘要

连续数字语音识别在现实中具有广泛的应用前景，在电话语音拨号、自动数据录入、身份证号码证实等多方面连续数字语音识别都有着重要的应用价值。
     汉语连续数字语音识别是语音识别中的一个重要分支，同英语发音的识别情况相比还有一定的差距，主要难点在于，首先汉语是单音节字，音节越少语音间的混淆程度越高，识别越困难。其次汉语连续数字发音连续程度较高，这主要由于汉语数字发音中零声母语音出现的较多。另外汉语连续数字串中各个数字的协同发音现象较严重也给汉语连续数字语音识别带来了困难。
     本文以互信息理论为基础，从语音模式间的交互信息量的角度，研究了一个完整的语音识别系统的各个组成部分，其中包括预处理、参考模式的训练、连续识别算法以及后续处理部分。预处理部分主要研究了语音信号的端点检测技术，不仅包括安静环境下的的语音信号端点检测，还考虑了在较低信噪比的情况下端点的准确检测，研究了基于倒谱参数的端点检测算法，实验表明在信噪比较低的环境下，传统检测算法性能明显下降，而基于倒谱参数的检测算法显示了良好性能。参考模式的训练方面，本文采用了改进的K均值算法MKM算法，聚类的类别数由事先指定，将训练得到的语音参数文件聚类成有限的几个类别作为参考模式，需要注意的是这里的聚类针对的是不同帧长的语音模式，文中研究了两种求聚类中心的方法，一是语音模式都映射到平均帧长上求中心，另一种是求一个最有代表性的模式作为中心。连续识别算法作为本文的重点内容，在互信息估计的现有成果的基础上，提出了两种算法MI OneStage算法和MIM LB算法，实验表明两种算法在连续数字的识别任务上是有效的，在仅采用二阶、三阶LPCC特征参数的条件下，识别性能与传统算法采用十二阶特征

    从于红_有;息估计的连续数字语音识别
    摘公
    参数时性能相当。后续处理模块，本文从语音的声调模型和声韵切分
    模型两个方面进行了初步的研究，汉语数字中存在儿个易混数字对，
    “6[liu41”和“9口iu3]”、“Zler4]”和“8[bal]”及“1[yil]”和“7「qil]”，
    它们的主要区别在于声调和声母的发音，因此基于声调和切分出的声
    母可以更好的提高易混数字对的正识率。本文第七章给出了实验的一
    些具体的结果。最后从系统识别率的提高和系统的抗噪性能两个方
    面，提出了说话人自适应、多层贝叶斯模型等提高识别率的思想和方
    法以及采用语音增强器和识别器的级联来提高系统的抗噪性能。
Connected digit speech recognition is a crucial branch of continuous speech recognition.It has a long time attracted the attentions of many researchers because it has wide applications such as speech autodialer,personal number verification, etc.
    Mandarin connected digit speech recognition, for its own pronunciation characteristic's more difficult than English digit recognition.
    Each unit of a speech recognition system is studied in this paper,which includes the preprocessing,the training of reference patterns,the connected digit speech recognition algorithm and the postprocessing of confusing pairs.The research of endpoint detection includes the endpoint detection not only in the quiet environment but also in the low signal-to-noise(SNR) situation.As far as the training of the reference patterns is concerned,the modified K-means algorithm is adopted.The most important part of this paper is the research of connected speech recognition algorithm.Two kinds of algorithm are put forward,which are based on the existing fruit of Mutual-Information Match theory. The proposed algorithm is effective which is proved by the



    result of experiment. Tone model and the segmentation of the consonant and vowel are adopted in the postprocessing unit to reduce the error rates of the confusing pairs of Mandarin digits.Some ideas for the improving of the system are put forward in the last chapter.

引文

1 R.Cardin, Y.Normandin and E.Millien, "Inter-word coarticulation modeling and MMIE training for improved connected digit recognition", ICASSP, pp.243-246, 1994
    2 W.Chou, C.H.Lee and B.H.Juang,, "Minium error rate training of inter-word context dependent acoustic model units in speech recognition", ICSLP, pp.439-442, 1994
    3 苑宝生，俞铁城，“连呼汉语识别研究”，声学学报，Vol．14，No．1，1989
    4 王跟东，林道发等，“非特定人连续汉语数字语音识别”，模式识别与人工智能，Vol．6，No．4，1993
    5 张春涛，吴善培，“连接数字语音识别”，北京邮电大学学报，Vol．20，No．4，1997
    6 李虎生，刘加等，“高性能汉语数码串语音识别”，电子学报，Vol．9，No．5，2001
    7 雷传华，张秀彬等，“连接数字语音识别系统的DSP实时实现”，上海交通大学学报，Vol．33，No．12，1999
    8 梅冰峰，“连接数字语音识别研究”，中国科学院半导体研究所硕士论文，1996
    9 刘晓龙，“非特定人连续数字语音识别”，北京航空航天大学硕士论文，1995
    10 张春涛，“连接数字语音识别”，北京邮电大学博士论文，2000
    11 J.G.Wilpon, C.H.Lee and L.R.Rabiner, "Connected digit recognition based on improved acoustic resolution", Computer Speech and Language, Vol.7, pp. 15-26, 1993
    12 陈永彬，王仁华，“语音信号处理”，中国科技大学出版社，1990
    13 俞一彪，赵鹤鸣，“语音识别的互信息匹配模型及其应用” 通信技术，008，39-42，2001
    14 俞一彪，赵鹤鸣等，“语音信号互信息估计的非线性搜索算法及识别应用”，信号处理，Vol．18，No．2，102-106，2002
    15 陈斐利，朱杰，“一种新的基于自相关相似距离的语音信号端点检测方法”，

    上海交通大学学报，Vol 33，No 9，1999
    16 陈景东，徐波等，“一种基于迟滞编码的自动语音端点检测方法”，电路与系统学报，Vol1，No 4，29-32，1996
    17 L.Rabiner, B.H.Juang. Fundamentals of Speech Recognition,USA, Prentice Hall, 1993:246-253
    18 H.Sakoe, "Two Level DP-matching---A dynamic programming-based pattern matching algrithm for connected word recognition",IEEE Trans.Acoustics,Speech and Signal Processing,Vol.27,PP. 585-595,1979
    19 C.S.Myers and L.R.Rabiner, "Connected digit recogniton using a level-building DTW algorithm",IEEE Trans.Acoustics,Speech and Signal Processing, Vol.29,No.3,pp.351-362,1981
    20 L.R.Rabiner,J.G.Wilpon and F.K.Soong,"High performance connected digit recognition using hidden Markov models",IEEE Trans.Acoustics Speech and signal Processing,Vol.37,No,Spp.1214-1225,1989
    21 C.H.Lee and L.R.Rabiner, "A frame-synchronous network search algorithm for connected word recognition",IEEE Trans.Acoustics,Speech,and Signal Processing,Vol.37,No.11,pp.1646-1659,1989
    22 H.Ney, "The use of a one stage dynamic programming algorithm for connected word recognition" ,IEEE Trans.Acoustics,Speech and Signal Progressing,Vol. 32,No.2,263-271,1984
    23 Chao Wang and Stephanie Seneff, "A Study of Tones and Tempo in Continuous Mandarin Digit Strings and their Application in Telephone Quality Speech",www.sls.lcs.mit.edu
    24 顾良，刘润生，“汉语数码语音识别：发展现状、难点分析与方法比较”，电路与系统学报，Vol．2，No．4，32-39，1997年
    25 顾良、刘润生，“改进汉语数码语音识别中的语音特征提取性能”，电路与系统学报，Vol2，No 4，1997
    26 顾良、刘润生，“利用声调判别提高汉语数码语音识别性能”，清华大学学报(自然科学版)，Vol 38，No 9，36-39，1998


    27 陈尚勤、罗承烈等，近代语音识别，成都：电子科技大学出版社，1991
    28 Ruey-Ching and Shyu,Jhing-Fa Wang, "Combining Multi-Section Bayesian Template with Level-Building Algorithm for Robust Connected Mandarin Recognition",213-217,1993 VLSITSA
    29 李虎生、杨明杰等，“汉语数码语音识别自适应算法”，电路与系统学报，Vol 4，No 2，1999
    30 Ruey-Ching and Shyu,Jhing-Fa Wang, "Improvement in Connected Mandarin Digit Recognition by Explicitly Modeling Coarticulatory Information",Journal of Information Science and Engineering 16,649-660,2000
    31 徐义芳、张金杰等，“语音增强用于抗噪声语音识别”，清华大学学报，Vol 41，No 1，2001
    32 徐华、俞一彪，‘基于MI_OneStage算法的连续数字语音识别”，通信技术，No 3，1-3，2003
    33 朱美红，“基于小波分析的重叠语音分离”，苏州大学硕士论文，2001
    34 易克初等，语音信号处理，北京：国防工业出版社，2000
    35 杨行峻、迟惠生，语音信号处理，北京：电子工业出版社，1995
    36 陈坚等，Windows95多媒体应用程序设计技术，西安：西安电子科技大学出版社，1997

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700