非特定人孤立词语音识别系统若干关键技术的研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

非特定人孤立词语音识别系统若干关键技术的研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Study on the Key Techniques of Speaker-Independent Isolated Words Speech Recognition System
作者：卞洁
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：语音识别 ; 马尔可夫模型 ; Mel倒谱 ; 特征提取 ; 端点检测
英文关键词：Speech Recognition ; Markov Model ; Mel Cepstrum ; End-point detection ; Characteristic Extraction
学位年度：2005
导师：殷福亮
学科代码：081002
学位授予单位：大连理工大学
论文提交日期：2005-03-01

摘要

近年来,随着语音识别技术的不断发展,在小词汇量语音识别领域,已经形成一些成熟算法,并且有了成功的应用实例。随着FPGA技术的快速发展,芯片计算速度也随之飞速上升。如何选择一个适合在FPGA上实现的方案,已成为目前这一研究领域中重要的课题之一。本文系统的研究了组成小词汇量语音识别系统的各种技术,并在此基础上提出了一个适合在FPGA上实现的小词汇量、非特定人、孤立词语音识别系统。
     论文主要完成了以下五个方面的内容
     1、设计了一个包括语音文件、录音信息和管理系统的人名语音数据库,并实际采集了一定规模的语音数据。
     2、介绍了动态时间弯曲(DTW)和隐马尔可夫模型(HMM)算法,分别用这两种方法实现了一个容量48人的人名语音识别系统,并对系统性能进行了详细的分析。
     3、针对可能存在的噪声环境,利用状态转移图辅助进行平滑和去噪。讨论了MFCC各阶分量对语音识别的贡献。改进了基于HMM进行说话人性别识别的方法,以用来构建男女两个不同的模型参数。
     4、改进系统实时实现的几个关键技术:在实时的硬件数据处理及端点检测实现中,存储空间复用及对数运算等问题;根据MFCC算法的特点,得到快速定点化处理的MFCC算法。
     5、就系统的每个关键参数进行实验和分析,给出了小词汇量语音识别系统的实现方案,取得了识别率和复杂度较为均衡的结果。
Recently, with the development of the speech recognition system, in the research field on small vocabulary speech recognition system, some algorithms have been successfully applied. As the development of FPGA technique, the computing ability of chip increases at a very fast speed. Choosing a scheme which is compatible for the implementation FPGA is becoming a more and more important topic. This thesis systemically studies various techniques which are related to small vocabulary speech recognition system. Then a practical scheme which is compatible for the implement of small vocabulary speaker-independent isolated words speech recognition system on FPGA is proposed.The work included in this thesis can be divided into the following 5 parts:1. Design and create a speaker-name database and managing system for training and testing.2. Introduce the DTW and HMM speech recognition systems, and realize a name-speech recognition system by two methods. And also analyze the system characteristic in detail.3. Bring forward the state transition method to reduce the affection of noise. And also discuss the contribution of each MFCC weight for the speech recognition system. Improve on the technique of sex recognition.4. Discuss the key techniques, such as: use iterative memory space on the realization of data processing of real-time hardware and end-point detection; according to the character of MFCC algorithm, get fast fitted MFCC algorithm.5. Do experiment and analyze each key parameter, design small vocabulary speech recognition system, and get balance between correct recognition rate and system complexity.

引文

[1] 杨行峻,迟惠生等.语音信号数字处理.北京:电子工业出版社,1995.330-355,48-90
    [2] 李虎生.汉语数码串语音识别及说话人自适应[硕士学位论文].北京:清华大学电子工程系,2001.
    [3] Steve Young et al. The HTK Book (for HTK version 3.2). Cambridge University Engineering Department, 2002. 14-21
    [4] X. D. Huang et al. The SPHINX-Ⅱ speech recognition system: an overview. Computer Speech and Language, 1993, No.2:137-148
    [5] Y. Norman din et al. High-performance connected digit recognition using maximum mutual information estimation. IEEE Trans. on Speech and Audio Processing, 1994, 2(2): 299-311
    [6] 顾良,刘润生.汉语数码语音识别:发展现状、难点分析与方法比较.电路与系统学报,1997,2(4):32-39
    [7] 易克初,田斌,付强.语音信号处理.北京:国防工业出版社,2000.51-60
    [8] 胡光锐,语音处理与识别.上海科学技术文献出版社.2000.229-244
    [9] 陈尚勤,罗承烈.近代语音识别.四川:电子科技大学出版社,1991.54-55,33-35
    [10] 王炳锡.语音编码.西安:西安电子科技大学出版社,2002.9-17
    [11] 吴宗济,林茂灿.实验语音学教程.北京:高等教育出版社,1989.18-25
    [12] E. S. Dermatas. Fast Endpoint detection algorithm for isolated Word Recognition in office Environment.. IEEE ICASSP, 1991. 733—736
    [13] S. B. Davis, P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech, and Signal Processing, 1980, 28(4):357-366
    [14] M. J. Hunt, C. Lefcbvre. A comparison of several acoustic representations for speech recognition with degraded and undegraded speech. IEEE ICASSP, 1989. 262-265
    [15] C. R. Jankowski et al. A comparison of signal processing front ends for automatic word degraded and undegraded speech. IEEE ICASSP, 1989. 262-265
    [16] R. F. Favero, R. W. King. Wavelet Parameterization for speech recognition: variations in translation and scale parameters. Proc. International Symposium on Speech. Image Processing and Neural Networks, Hong Kong, 1994. 694-697
    [17] Z.Tufekci, J.N.Gowdy. Feature Extraction Using Discrete Wavelet Transform for speech recognition, the IEEE Southeastcon 2000, 116-123
    [18] B. Zhang, S. Sato. A time-frequency distribution of Cohen's class with a compound kernel and its application to speech signal processing. IEEE Trans. Signal Processing, 1994, 42(1): 54-64
    [19] A. Morris, J. L. Schwartz, P. Escudier. An information theoretical investigation into the distribution of phonetic information across the auditory spectrogram. Computer Speech and Language, 1993, 2: 121-136
    [20] C. J. Chen et al.A continuous speaker-independent Putonghua dictation system. 1996 3rd International conference on Signal Processing, China, 1996. 821-824

    [21] S. M. Kay. Modern Spectral Estimation: Theory and Application. New Jersey: Print-Hall .1988
    [22] J. Durbin. The fitting of time-series models, Rev. Inst. Int. Statist., 1960, 28:233-243
    [23] J. Burg, Maximum entropy spectral analysts, PhD. Dissertation, Standard University, 1975:36-48
    [24] G. Pirani. Advanced algorithms and architectures for speech understanding. Springer-Verlag, 1990
    [25] C. R. Jankowski, Hoang-Doan H. Vo., R. P. Lippman. A comparison of signal processing front ends for automatic word recognition. IEEE Trans. on Speech and Audio Processing. 1995, 3(4): 193-286
    [26] J. W. Picone. Signal modeling techniques in speech recognition. Proc. of IEEE, April 1993, 79(4): 460-475
    [27] 殷福亮,宋爱军.数字信号处理C语言程序集.沈阳:辽宁科学技术出版社,1997
    [28] L R Rabiner. A tutorial on hidden markov models and selected application in speech recognition IEEE, Feb. 1989. 77(2): 257-285
    [29] 李东东.非特定人、小词表、孤立词语音识别的研究[硕士学位论文].北京:清华大学电子工程系.1991
    [30] 聂洪发.非特定人连接数字语音识别实时系统的实现[硕士学位论文].北京:清华大学电子工程系.1992
    [31] L. R. Rabiner et al. High Performance connected digit recognition using hidden Markov models. IEEE Trans. on Acoustics, Speech, and Signal Processing, 1989. 37(8):1214-1225
    [32] C. H. Lee, L. R. Rabiner et al. Acoustic modeling for large vocabulary speech recognition. Computer Speech and Language, 1990. 4(1):1237-165
    [33] Kai-Fu Lee. Automatic Speech Recognition, the Development of the SPHINX System. Kluwer Academic Publishers, 1989.
    [34] J. Makhoul et al. vector quantization in speech coding. IEEE, 1985.73(11):1551-1582
    [35] S. P. Lloyd, Least-square quantization in PCM. IEEE Trans.on information theory, 1982. IT-28:129-137
    [36] Y. Linde, A. Buzo, R. M. Gray. An algorithm for vector uantizer design. IEEE Trans. on Communication, 1980. COM-28:84-95
    [37] D. H. Lee et al. Cell-conditioned multistage vector quantizer. IEEE ICASSP, 1991. S9.22:653-656
    [38] K. K. Paliwal, B. S. et al. Efficient vector quantization of LPC Parameters at 24 Bits/Frames. Proc. IEEE ICASSP, 1991. 661-664
    [39] John Grass etal. Methods of improving vector-scalar quantization of LPC coefficients. Proc. IEEE ICASSP, 1991. 657-660
    [40] R. L. Joshi et al. A new MMSE encoding algorithm for vector quantization. ICASSP, 1991.1:645-648
    [41] 蔡莲红黄德智蔡锐.现代语音技术基础与应用.北京:清华大学出版社2003.239—242
    [42] 张雄伟陈亮杨吉斌.现代语音处理技术及应用.北京:机械工业出版社.2003,97-110
    [43] V. Ramasubramanian et al. An efficient approximation-elimination algorithm for fast nearest-neighbor search based on a spherical distance coordinates formulation. Signal Processing V: Theories and Applications, EUSIPCO-90
    [44] V. Ramasubramanian et al. An efficient approximation-elimination algorithm for fast nearest-neighbor search. IEEE ICASSP, 1992. 1:89-92

    [45] X. D. Huang, H. W. Hon, M. Y. Hwang, K. F. Lee. A comparative study of discrete, semi-continuous, and continuous hidden Markov models. Computer Speech and Language, 1993.7(3): 359-368
    [46] L. R. Rabiner, S. Levinson, M. M. Sondi. On the application of vector quantization and hidden markov models to speaker-independent, isolated word recognition, Bell System Technical Journal, 1983.62(4):1075-1105
    [47] L. R. Bahl et al., A maximum likelihood approach to continuous speech recognition. Proc. of IEEE, 1983. MI-5(2): 179-190
    [48] 金连斌,丁庆海,陈显治.PMC在噪声环境下的语音识别中的应用.解放军理工大学学报,2001.2(2):43-45
    [49] 袁俊.HMM连续语音识别中Viterbi算法的优化及应用.电子技术,2001.2:48-51
    [50] 谢锦辉.隐含马尔可夫模型及其在语音处理中的应用.武汉:华中理工大学出版社.1995.
    [51] D. S. Pallett, Speech results on resource management task. Proc. DARPA Speech and Natural Language Workshop, USA, 1989. 18-24
    [52] L. Lamel, R. Kassel, S. Seneff. Speech database development: design and analysis of the acioustic phonetic corpus. Proc. DARPA Speech Recognition Workshop, 1986. 100-109
    [53] R. Cole, M. Fanty, K. Roginsky. A telephone speech database of spelled and spoken names. Proc. Int. Conf. Spoken Language Processing, 1992. 891-893
    [54] 顾良,刘润生.改进汉语数码语音识别中的语音特征提取性能.电路与系统学报,1997,2(4):1-6

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700