用户名: 密码: 验证码:
基于WD/HMM的语音识别算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音识别技术是信息技术领域的重要发展方向之一,目前其面临的一个重要挑战就是如何提高噪声环境下的语音识别率。特征提取作为语音识别的第一步,其性能对整个系统性能——语音识别率——具有至关重要的影响。因此,本文以提高系统在噪声环境下的语音识别率为目标,以提取抗噪声的语音特征参数为研究重点,研究了噪声环境下具有鲁棒性的语音识别系统。
    本文在深入理解语音识别基本原理的基础上,首先,介绍了几种被广泛应用的语音特征参数提取方法。其次,详细探讨了非平稳随机信号的时频分析方法——维格纳分布,从语音信号的时变特性出发,充分利用维格纳分布的优秀特性,把其应用于语音特征提取中,并与语音信号的同态处理方法相结合,提取出两组新的特征参数,即基于维格纳分布的语音倒谱参数WD-MFCC和基于对称相关函数的语音倒谱参数WV-MFCC。同时还得到基于维格纳分布的语谱图。最后,深入研究了隐马尔可夫模型在语音识别中的应用,把本文提出的两组语音特征参数和先前介绍的几种特征参数分别应用于以该模型为识别分类器的语音识别系统中,仿真并分析了噪声环境下利用各种语音特征时该语音识别系统的识别性能。仿真实验结果表明,采用本文提出的两组新的特征参数可以有效地提高系统性能。
Speech recognition is one of main branches in the information and technology field. How to improve the robustness of a recognizer in presence of background noise has been a vital difficulty. In speech recognition, the first step is the extraction of speech features, whose capability are crucial to the performance of the whole speech recognition system. Therefore, this paper aims at improving the performance of speech recognition system in noisy environment, focuses on robust speech feature coefficients extraction, studies robust speech recognition system in noise.
    Based on deeply comprehension in the fundamentals of speech recognition, some kinds of widely-used speech feature coefficients are introduced firstly. Secondly, the time-frequency analysis method of nonstationary random signal is discussed, which is named Wigner Distribution (WD). Based on the time-varying character of speech, this paper makes full use of the excellent characters in WD and applies it to speech processing. Then WD is combined with homomorphic processing technique to compute two kinds of feature coefficients, which are cepstral coefficients based on WD, named WD-MFCC, and cepstral coefficients based on symmetrical correlation function, named WV-MFCC. At the same time, a spectrogram is derived from WD of speech. Lastly, the application of Hidden Markov Model (HMM) to speech recognition is deeply studied. The two kinds of coefficients are applied to a speech recognition system which employs HMM as the recognizer, as well as the previously introduced feature coefficients. What's more, the last part of this paper has simulated and analyzed the robustness of the speech recognition system in noise when applying different speech features. The simulation results shows that the new feature coefficients proposed in this paper can significantly improve the robustness of speech recognition system.
引文
1 赵力.语音信号处理.北京:机械工业出版社,2003:56-235
    2 雷静.语音识别技术的研究及其基本实现.[武汉理工大学硕士学位论文]. 2002:5-13
    3 B. H. Juang. The Past, Present, and Future of Speech of Speech Processing. IEEE Signal Processing Magazine,1998,37(12):24-48
    4 朱民雄,闻新,黄健群,等.计算机语音技术.北京:北京航空航天大学出版社, 2002:242-284
    5 陈永彬.语音信号处理.上海:上海交通大学出版社,1990:130-159
    6 胡航.语音信号处理.第2版.哈尔滨:哈尔滨工业大学出版社,2000:23-205
    7 彭祯艺.语音识别酝酿第二次浪潮.CTI论坛,http://www.ctiforum.com
    8 易克初,田赋,付强.语音信号处理.北京:国防工业出版社,2000:3-8
    9 郑方.非特定人连续数字识别方法与汉语语音数据库的研究.[清华大学硕士学位论文].1992:7-9
    10 Y. F. Gong. Speech Recognition in Noisy Environments: A Survey. Speech Commu- nication,1995,16(3):261-291
    11 杨大利,徐明星,吴文虎.噪声环境下语音识别研究概述.蔡莲红等.第五届现代语音学学术会议论文集,北京,2001.清华大学出版社,350-354
    12 K. Michael, T. Jurgen, K. Birger. Combining Speech Enhancement and Auditory Feature Extraction for Robust Speech Recognition. Speech Communication,2001,34 (1-2):75-91
    13 徐文盛,戴蓓倩,方绍武,等.基于连续HMM的孤立语音鲁棒性识别方法.电路与系统学报,1999,4(4):19-23
    14 张雄伟,陈亮,杨吉斌.现代语音处理技术及应用.北京:机械工业出版社,2003:19-34
    15 郑方.语音端点检测、前端处理和特征提取的研究.[清华大学硕士学位论文]. 1990:15-22
    16 S. Moller, H. Bourard. Analytic Assesment of Telephone Transmission Impact on ASR Performance Using A simulation Model. Speech Communication,2002,38 (3-4):441-459
    17 K. Lamia, M. Arnaud. Towards Improving Speech Detection Robustness for Speech Recognition in Adverse Conditions. Speech Communication,2003,40(3):261-276
    
    
    18 Q. Li, J. S. Zheng, A. Tsai, etc. Robust Endpoint Detection and Energy Normalization for Real-time Speech and Speaker Recognition. IEEE Trans. on Speech and Audio Processing,2002,10(3):146-157
    19 K. Shingo, N. Masaki, Y. Seiichi, etc. Robust Speech Detection Method for Telephone Speech Recognition System. Speech Communication,1999,27(2):135-148
    20 R. Bhiksha, S. Rita. Classifier-based Non-linear Projection for Adaptive Endpointing of Continuous Speech. Computer Speech & Language,2003,17(1):5-26
    21 W. H. Shin. Speech/non-speech Classification Using Multiple Features for Roust Endpoint Detection. Proceedings of IEEE ICASSP, Istanbul,2000,3:1399-1402
    22 M. H. Savoji. A Robust Algorithm for Accurate Endpointing of Speech. Speech Communication,1989,8(2):45-60
    23 L. S. Huang, C. H. Yang. Novel Approach to Robust Speech Endpoint Detection in Car Environments. Proceedings of IEEE ICASSP, Istanbul,2000,3:1751-1754
    24 李祖鹏,姚佩阳.一种语音段起止端点检测新方法.电讯技术,2000,3:68-70
    25 刘庆升,徐霄鹏,黄文浩.一种语音端点检测方法的探究.计算机工程,2003,29 (3):120-138
    26 李桦,安钢,攀新海.短时能频值在语音端点检测中的应用.测试技术学报,1999, 13(1):21-27
    27 B. Ghazale, S. A. Khaled. A Robust Endpoint Detection of Speech for Noisy Environments with Application to Automatic Speech Recognition. Proceedings of IEEE ICASSP, Orlando,2002,4:3803-3811
    28 胡光锐,韦晓东.基于倒谱特征的带噪语音端点检测.电子学报,2000,28(10):95-97
    29 W. Bian, R. X. Lin, L. C. Qing, etc. A Robust Algorithm for Real-time Endpoint Detection in the Noisy Mobile Environments. Chinese Journal of Electronics,2003, 12(4):579-582
    30 徐大为,吴边,赵建伟,等.一种噪声环境下的实时语音端点检测算法.计算机工程与应用,2003,1:115-117
    31 X. D. Wei, G. R. Hu, X. L. Ren. Endpoint Detection of Noisy Speech by the Use of Cepstrum. Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong Univer- sity,2000,34(2):185-188
    32 王炳锡.语音编码.西安:西安电子科技大学出版社,2002:64-99
    C. S. Huang, H. C. Wang. Bandwidth-adjusted LPC Analysis for Robust Speech
    
    33 Recognition. Pattern Recognition Letters,2003,24(9-10):1593-1597
    34 C. X. Ma, Y. Kamp, L. F. Willemes. Robust Signal Selection for Linear Prediction Analysis of Voiced Speech. Speech Communication,1993,12(1):69-81
    35 Q. F. Zhao, T. S. Jouji. Linear Predictive Analysis of Noisy Speech. Proceedings of IEEE Pacific RIM Conference on Communications, Computers, and Signal Proce- ssing, Victoria,1997,2:585-588
    36 K. Noboru, A. Takayuki, H. Hynek, etc. On the Relative Importance of Various Components of the Modulation Spectrum for Automatic Speech Recognition. Speech Communication, 1999,28(1):43-55
    37 R. De. Mori, L. Moisa, R. Gemello, etc. Augmenting Standard Speech Recognition Features with Energy Gravity Centres. Computer Speech&Language,2001,15 (4):341-354
    38 B. Linkai, Ch. Tzi-Dar. Perceptual Speech Processing and Phonetic Feature Mapping for Robust Vowel Recognition. IEEE Trans. on Speech and Audio Processing,2000, 8(2):105-114
    39 R. Vergin, D. O'Shanghnessy, V. Gupta. Compensated Mel Frequency Cepstrum Coefficients. Proceedings of IEEE ICASSP, Atlanta,1996,1:323-326
    40 H. Hermansky. Perceptual Linear Predictive Analysis of Speech. Journal of the Acoustical Society of America,1990,87(4):1738-1752
    41 边肇祺,张学工.模式识别.第二版.北京:清华大学出版社,2000:230-250
    42 J. Hernando, C. Nadeu. Linear Prediction of the One-sided Autocorrelation Sequence for Noisy Speech Recognition. IEEE Trans. on Speech and Audio Proce- ssing,1997,5(1):80-84
    43 T. F. Li. Speech Recognition of Mandarin Monosyllables. Pattern Recognition,2003, 36(11):2713-2721
    44 N. T. Lay, F. W. Say, D. Silva, etc. Speech Emotion Recognition Using Hidden Markov Models. Speech Communication,2003,41(4):603-623
    45 S. E. Bou-Ghazale, J. H. L. Hansen. Speech Feature Modeling for Robust Stressed Speech Recognition. Proceedings of IEEE ICSLP, Australia,1998,3:887-890
    46 何强,何英.MATLAB扩展编程.北京:清华大学出版社,2002:289-372
    47 张春涛,吴善培.人耳听觉感知知识用于孤立数字语音识别.北京邮电大学学报,1997,20(3):76-80
    
    
    48 L. Gu, K. Rose. Perceptual Harmonic Cepstral Coefficients for Speech Recognition in Noisy Environment. Proceedings of IEEE ICASSP, Salt Lake,2001,1:125-128
    49 J. D. Chen, P. K. Satoshi. Cepstrum Derived from Differentiated Power Spectrum for Robust Speech Recognition. Speech Communication,2003,41(2-3):469-484
    50 Y. K. Hwe, W. H. Chuan. Robust Features for Speech Recognition Based on Temporal Trajectory Filtering of Short-time Autocorrelation Sequences. Speech Communication,1999,28(1):13-24
    51 Q. F. Zhu. Noise Robust Front-End Processing for Automatic Speech Recognition. [University of California Los Angeles. A Dissertation Submitted in Partial Satisfaction of the Requirements for the Degree Doctor of Phiosophy in Electrical Engineering].2001:23-155
    52 Q. F. Zhu, A. Alwan. Non-linear Feature Extraction for Robust Speech Recognition in Stationary and Non-stationary Noise. Computer Speech and Language,2003,17 (4):381-201
    53 Z. K. Tufekci. Local Feature Extraction for Robust Speech Recognition in the Presence of Noise.[A Dissertation Presented to the Graduate School of Clemson University. In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Electrical Engineering].2001:9-125
    54 S. Mallat. A Theory of Multiresolution Signal Decomposition: the Wavelet Representation. IEEE Trans. on Pattern Anal Machine Intell,1989,11:674-693
    55 刘鸣,戴蓓倩,李辉.基于离散小波变换和感知频域滤波的语音特征参数.电路与系统学报,2000,5(1):21-25
    56 胡广书.数字信号处理——理论、算法与实现.北京:清华大学出版社,1997:55-409
    57 王宏禹.非平稳随机信号分析与处理.北京:国防工业出版社,1999:29-124
    58 Z. W. Wanda, O. Tokunbo. Formant and Pitch Detection Using Time-frequency Distribution. International Journal of Speech Technology,1999,3(1):35-49
    59 杨福生.随机信号分析.北京:清华大学出版社,1990:259-314
    60 S. K. Lee. A New Method for Smoothing Non-oscillation Cross-terms in Sliced Wigner Fourth-order Moment Spectra. Mechanical Systems and Signal Processing, 2001,15(5):1023-1029
    61 P. G. Baum. A Time-filtered Wigner Transformation for Use in Signal Analysis. Mechanical Systems and Signal Processing,2002,16(6):955-966
    
    
    62 D. J. Lgor, S. L. Jubisa. An Algorithm for Winger Distribution Based Instantaneous Frequency Estimation in A High Noise Environment. Signal Processing,2004,84 (3):631-643
    63 R. N. Sudarshan, P. S. Mohairir. Conjunctive Descriptions Based on Non-Fourier Representations. Digital Signal Processing:A Review Journal,1999,9(1):1-17
    64 樊养余,陶宝祺,熊克.加性噪声抵消的对称相关函数法.数据采集与处理,2001, 16(3):333-337
    65 王春玲.隐马氏模型的建立及其应用.[国防科学技术大学硕士学位论文]. 2002:8-42
    66 [美]罗宾纳.语音识别基本原理.英文版.北京:清华大学出版社,1999:69-141
    67 M. David, B. H. Juang. A Family of Distortion Measures Based upon Projection Operation for Robust Speech Recognition. IEEE Trans. on Acoustic, Speech, and Signal Processing,1989,37(11):1659-1671
    68 K. F. Lee, H. W. Hon. Speaker-Independent Phone Recognition Using Hidden Markov Models. IEEE Trans. on Acoustic, Speech, and Signal Processing,1989, 37(11):1641-1648
    69 S. T. Panagiotis, D. Apostolos, D. Vassilis. Configurable Logic Based Architecture for Real-time Continuous Speech Recognition Using Hidden Markov Models. Journal of VLSI Signal Processing Systems,2000,24(2-3):223-240
    70 H. Ney. The Use of One-stage Dynamic Programming Algorithm for Connected Word Recognition. IEEE Trans. on Acoustic, Speech, Signal Processing,1984,32 (2):263-271
    71 M. A. Bush, G. E. Kopec. Network-based Connected Digit recognition. IEEE Trans. on Acoustic, Speech, Signal Processing,1987,35(10):1404-1413
    72 L. R. Rabiner, J. G. Wilpon, A. M. Quinn, etc. On the Application of Embedded Digit Recognition. IEEE Trans. on Acoustic, Speech, Signal Processing,1984,32 (2):272-280
    73 Y. Tohkura. A Weighted Cepstral Distance Measure for Speech Recognition. IEEE Trans. on Acoustic, Speech, Signal Processing,1987,35(10):1414-1422
    74 J. A. Bilmes. Buried Markov Models: A Graphical-modeling Approach to Automatic Speech Recognition. Computer Speech and Language,2003,17(2-3):213-231
    L. R. Rabiner, J. G. Wilpon, B. H. Juang. A Segmental k-Means Training Procedure
    
    75 for Connected Word Recognition Based on Whole Word Reference Patterns. AT&T Tech. J.,1986,65(3):21-31
    76 Y. Sun, O. Y. Hwan. Segmental-feature HMM for Speech Pattern Modeling. IEEE Signal Processing Letters,2000,7(6):135-137
    77 A. Dadas, D. Nahamoo, A. M. Picheny. Speech Recognition Using Noise-adaptive Prototypes. IEEE Trans. on Acoustic, Speech, and Signal Processing,1989,37 (10):1495-1502
    78 C. H. Lee, L. R. Rabiner. A Frame-synchronous Network Search Algorithm for Connected Word Recognition. IEEE Trans. on Acoustic, Speech, and Signal Proce- ssing,1989,37(11):1649-1658
    79 L. R. Rabiner, J. G. Wilpon, F. K. Soong. High Performance Connected Digit Recognition Using Hidden Markov Models. IEEE Trans. on Acoustic, Speech, and Signal Processing,1989,37(8):1214-1225
    
    
    
    攻读硕士学位期间承担的科研任务与主要成果
    本人研究课题为“基于WD/HMM的语音识别算法研究”,在读期间发表论文情况如下:
    1 杨鼎才,修国浩,姜霞. 蓝牙安全机制在无线耳机中的应用.现代电子技术,2003,15: 105-107
    2 修国浩,杨鼎才,姜霞.一种噪声环境下语音特征参数提取方法研究.燕山大学学报,(已投稿)
    3 修国浩,简伟,杨鼎才. 基于维格纳分布的语音特征参数提取方法.军事通信技术,(已修改)
    4 修国浩,简伟,杨鼎才.一种带噪语音特征参数提取的改进算法.军事通信技术,(已修改)
    5 宋国森,姜霞,修国浩.AFH技术在蓝牙中的应用及其性能分析.北京电子科技学院学报,2003,11(2):35-37
    6 宋国森,姜霞,修国浩.蓝牙与其它无线通信技术的比较.无线电工程,(已录用)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700