孤立词语音识别的算法研究及其基于SOPC的硬件系统实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着近年来信息技术尤其是计算机技术的高速发展,语音识别技术无论在理论还是在实践应用上都有了长足的进步和发展。其中部分较为成熟的技术已经在硬件上逐步得到实现,一些基于嵌入式或集成电路的小规模语音识别芯片也开始在人们的日常生活中有了初步的应用。然而,这些常见的语音芯片一般均采用DSP为核心的固定结构,不但费用高而且设计缺乏灵活性,难以进一步提高处理能力。
     本文在对孤立词语音识别的算法进行深入研究的基础上,针对传统算法的不足,在较为关键的语音端点检测和特征提取部分做了一些改进。在端点检测上,采用了具有抗干扰能力的零能积阈值判决法和状态机的设计方案;在特征提取上运用了更适合硬件设计的舒尔递推算法来求解自相关方程。这些改进不但提高了计算的效率,而且更有利于算法的硬件实现,从实验结果来看效果也比较理想。
     同时,针对当前语音芯片的不足,设计了基于SOPC的FPGA嵌入式系统对整个软件算法进行了实现。在硬件设计中,通过硬件DSP的Matlab建模设计等多种方法,使得传统Matlab算法与硬件的实际底层设计有机结合起来,并充分发挥了NiosⅡ软核处理器用户外设模块可订制、设计功能灵活的特点,对核心硬件系统的I2C主控配置模块、语音数据预处理模块、语音数据采集模块和语音识别软件算法模块分别进行了实现,取得了较好的效果。
With the rapid development of information technology, especially in computer science, speech recognition has gained the great improvement both in theory and practice. Some of mature techniques have been realized by hardware or several speech recognition chips which are based on integrate circuit or embedded system. Due to the expensive price and the lack of design flexibility, the performance of these chips can’t be elevated currently.
     Based on thorough research on algorithms for isolated word recognition, this thesis makes improvements on the endpoint detection and feature extraction. We adopt zero-energy threshold judgment algorithm in detection. In the feature extraction, the Schur algorithm is used to solve the relevant equation, which is more appropriate for hardware implementation. All these not only increase the computation efficiency but also make the hardware implementation of the algrithms much easier, which has been supported by experiments.
     At the same time, we implement the whole algorithm of isolated word recognition on FPGA (Field Programmable Gate Array) embedded hardware system based on the concept of SOPC (System on a Programmable Chip). The abstract algorithm described in Matlab has been connected to the detailed hardware design by the tool of DSP Builder. The realization of the module of I2C configuration, module of speech data preprocessing, module of speech data collection and module of speech recognition have totally shown the advantage of NiosⅡsoft-core CPU in FPGA design, such as user-definable function.
引文
[1] T.Bub, J.Schwinn. WERBMOBIL: the evolution of a complex Large Speech to Speech Translation system [J], Proc. ICSLP, p.2371-2374, 1996.
    [2] T.B.Martin, A.L.Nelson, and H.J.Zadell, Speech Recognition by Feature Abstraction Techniques [R], Tech. Report AL-TDR-64-176, Air Force Avionics Lab, 1964.
    [3] T.K.Vintsyuk, Speech Discrimination by Dynamic Programming [J], Kibernetika, 4 (2): 81-88, Jan.-Feb. 1968.
    [4] D.R.Reddy, An Approach to Computer Speech Recognition by Direct Analysis of the Speech Wave [R], Tech. Report No. C549, Computer Science Dept., Stanford Univ., September 1966.
    [5] K.H.Davis, R.Biddulph and S.Balashek, Automatic Recognition of Spoken Digits [J]. Acoust. Soc. Am., 24 (6): 637-642, 1952.
    [6] J.Suzuki and K.Nakata, Recognition of Japanese Vowels—Preliminary to the Recognition of Speech [J], J. Radio Res. Lab, 37 (8): 193-212, 1961.
    [7] T.Sakai and S.Doshita, The Phonetic Typewriter, Information Processing 1962 [R], Proc. IFIP Congress, Munich, 1962
    [8] V.M.Velichko and N.G.Zagoruyko, Automatic Recognition of 200 Words [J], Man-Machine Studies, 2: 223, June 1970
    [9] H.Sakoe and S.Chiba, Dynamic Programming Algorithm Optimization for Spoken Word Recognition [J], IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-26 (1): 43-49, February 1978.
    [10] F.Itakura, Minimum Prediction Residual Applied to Speech Recognition [J], IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-23 (1): 67-72, February 1975
    [11] H.Sakoe, Two Level DP Matching—A Dynamic Programming Based Pattern Matching Algorithm for Connected Word Recognition [J], IEEE Trans. Acoustic, Speech, Signal Proc., ASSP-27: 588-595, December 1979.
    [12] C.S.Myers and L.R.Rabiner, A level Building Dynamic Time Warping Algorithm for Connected Word Recognition [J], IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-29: 284-297, April 1981.
    [13] C.H.Lee and L.R.Rabiner, A Frame Synchronous Network Search Algorithm for Connected Word Recognition [J], IEEE Trans. Acoustics, Speech, Signal Proc., 37 (11): 1649-1658, November 1989.
    [14] J.S.Bridle and M.D.Brown, Connected Word Recognition Using Whole Word Templates [J], Proc. Inst. Acoust. Autumn Conf., 25-28, November 1979.
    [15] J.Ferguson, Hidden Markov Models for Speech [M]. IDA: Princeton, NJ, 1980.
    [16] L.R.Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition [J], Proc. IEEE, 77 (2): 257-286, February 1989.
    [17] R.P.Lippmann, An Introduction to Computing with Neural Nets [J], IEEE ASSP Mag., 4 (2): 4-22, April 1987.
    [18] A.Weibel, T.Hanazawa, G.Hinton, K.Shikano and K.Lang, Phoneme Recognition Using Time-Delay Neural Networks [J], IEEE Trans. Acoustics, Speech, Signal Proc., 37: 393-404, 1989.
    [19] Edmondo Trentin and Marco Gori, A Survey of Hybrid ANN/HMM Models for Automatic Speech Recognition [J], Neurocomputing, 37: 91-126, April 2001.
    [20] Astrid Hagen and Andrew Morris, Recent Advances in the Multi-stream HMM/ANN Hybrid Approach to Noise Robust ASR [J], Computer Speech & Language, Match 2004.
    [21]姚天任.数字语音处理[M].武汉:华中科技大学出版社, 2002.
    [22]王炳锡.实用语音识别基础[M].北京:国防工业出版社, 2005.
    [23]杨行峻,迟惠生.语音信号数字处理[M].北京:电子工业出版社, 1995.
    [24]任爱峰,初秀琴.基于FPGA的嵌入式系统设计[M].西安:西安电子科技大学出版社, 2004.
    [25]潘松,黄继业. EDA技术实用教程(第二版) [M].北京:科学出版社, 2006.
    [26]侯建军,郭勇. SOPC技术基础教程[M].北京:清华大学出版社、北京交通大学出版社, 2008.
    [27] Altera Corporation. NiosⅡProcessor Reference Handbook [Z]. Altera, 2006.
    [28] Altera Corporation. NiosⅡSoftware Developer’s Handbook [Z]. Altera, 2006.
    [29] Altera Corporation. Avalon Interface Specification [Z]. Altera, 2005.
    [30] Altera Corporation. Altera Embedded Peripherals Handbook [Z]. Altera, 2006.
    [31]李兰英. NiosⅡ嵌入式软核SOPC设计原理及应用[M].北京:北京航空航天大学出版社, 2006.
    [32]周立功. SOPC嵌入式系统基础教程[M].北京:北京航空航天大学出版社, 2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700