语音识别算法及应用技术研究

英文题名：Research on Speech Recognition Algorithm and Application
作者：李秀珍
论文级别：硕士
学科专业名称：电路与系统
中文关键词：语音识别 ; 连续隐马尔可夫模型 ; 鲁棒性 ; 家电控制
英文关键词：Speech Recognition ; Continuous Density Hidden Markov Models ; Robust ; Appliance Control
学位年度：2010
导师：仲元昌
学科代码：080902
学位授予单位：重庆大学
论文提交日期：2010-04-01

摘要

随着科技的发展,人们生活水平的提高,出现了各种各样的家用电器,丰富了人们文化生活、减轻了人们生活负担。如何更有效地管理这些家电,使其更好地为我们的生活服务已成为研究的热点。针对此问题,本课题提出把语音识别技术应用到家电控制中,构建具有语音识别功能的家电集中控制系统,从而实现对家电的集中管理。
     本文深入研究了语音识别的预处理、特征参数提取、基于连续隐马尔可夫模型(简称CHMM)的训练、识别算法等基本理论。原始语音信号预处理包括:预滤波、预加重、短时加窗、端点检测等;特征参数提取是对预处理后的语音信号提取Mel频率倒谱系数(简称MFCC);训练、识别算法则利用CHMM进行声学建模,建立了基于CHMM的孤立词语音识别算法。
     研究表明,基于CHMM的语音识别算法在环境噪声干扰的情况下,识别精度显著下降。针对此算法缺陷,从信号空间、特征空间、模型空间三个层次进行语音补偿,构建了一种新的语音识别算法。该算法有效结合了维纳滤波、直方图均衡、向量泰勒级数这三种算法的优点,具有较好的鲁棒性,本文简称该算法为“混合鲁棒语音识别算法”。
     利用混合鲁棒语音识别算法,采用TI公司的TMS320VC5402为核心芯片,外扩存储电路、语音信号采集电路、LCD显示电路和无线通信模块等;选用电视、DVD、电冰箱、空调、洗衣机、电灯等家电作为控制对象,构建了基于语音识别的家电集中控制平台,最终实现家电的语音控制。
     系统测试结果表明,在室内噪声环境下,采用混合鲁棒语音识别算法的家电集中控制系统的识别率为98.00%,比没有考虑噪声干扰的基于CHMM的语音识别算法的识别率有显著提高,达到了本课题的预期目的。
With the development of technology, various appliances come in our houses, which are enriching people's lives and reducing the burden of people's lives. How to manage these appliances more effectively so that they can give us better services for our lives has become a hot issue. For this problem, the paper proposes the appliances can be controlled using the technology of speech recognition, and constructs Appliance Control System.
     This paper thoroughly studied basic principles of speech recognition, which contins Pre-processing, Feature extraction, Training, Recognition. In the process of Pre-processing, the original input speech signal was followed by the treatments of Pre-filtering, Pre-emphasis, Windowing, Endpoint detection and so on; In order to reduce the redundancy of information, speech signal must be followed by feature extraction, and Mel frequency cepstral coefficients (MFCC) was chosed as the feature parameters; Continuous Density Hidden Markov Models (CHMM) was chosen as the acoustic model of the acoustic unit, and the CHMM-based isolated word speech recognition algorithm has been developed.
     Studies show that the recognition accuracy of CHMM-based speech recognition algorithm decreased significantly, in noisy environment. To this defect, the paper proposed a new speech recognition algorithm, which was compensated in three levels of signal space, feature space, model space. The algorithm combines the advantages of Wiener Filtering, Histogram Equalization, Vector Taylor Series, and has better robustness, so it is called "the hybrid robust speech recognition algorithm".
     Taking TMS320VC5402 as the core chip,the writer constructed memory circuits, speech signal acquisition circuits, LCD display circuits and wireless transceiver circuits of Appliance Control System, which realized management of TV, DVD, refrigerators, air conditioners, washing machines, lights and other appliances.
     The results show that: in interior noise environment, the recognition rate of the appliance control system is 98.00% by the hybrid robust speech recognition algorithm, which reached the purpose of this subject.

引文

[1]王炳锡,屈丹,彭煊.实用语音识别基础[M].北京:国防工业出版社, 2005.
    [2] S.J.Young, P.C.Woodland. Tree-based state tying for high accuracy acoustic modeling[J]. Proc.Human Language Technology Workshop,March 1994,307-312.
    [3] Matsuoka Toshinobu, Ishida Yoshihisa. DP matching-based digit recognition using LVQ[J]. IEEE International Conference on Neural Networks-Conference Proceedings,1995, v5:2900-2903.
    [4] Wan Chun, Liu Lili. Research and improvement on embedded system application of DTW-based speech recognition[C]. 2nd International Conference on Anti-counterfeiting, Security and Identification, ASID 2008,401-404.
    [5]崔毓菁.语音识别系统速度优化算法研究[D].北京:北京邮电大学,2008.
    [6] Douglas A, Reynolds, Richard C.Rose. Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models[J]. IEEE Transactions on Speech and Audio Processing, January 1995,Volume 3,No.1:72-83.
    [7] Kumaran Raghunandan S., Narayanan Karthik, Gowdy John N.. Myoelectric signals for multimodal speech recognition[C]. 9th European Conference on Speech Communication and Technology,2005,1189-1192.
    [8] A Acero. Acoustical and Environmental Robustness inAutomatic Speech Recognition[J].PhD thesis, Carnegie Mellon University,Pittsburgh,PA,1990.
    [9]杨海燕,杨斌,景新幸.说话人识别技术在智能家居中的应用[J].电声技术, 2004.
    [10] S.E.Levinson. Continuous speech recognition by means of acoustic-phonetic classification obtained from a hidden Markov model[J]. in Proc.ICASSP’87(Dallas TX),Apr.1987.
    [11] LAWRENCE R, RABINER,FELLOW. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[C]. Proceeding of the IEEE, FEBRUARY.1989,257-286.
    [12] A.J.Viterbi. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm[J].IEEE Trans.Informat Theory, 1967,vol.IT-13:260-269.
    [13] Laird N M,Lange N,Stram D.Maximum Likelihood Computations with Repeated Measures: Applications of the EM algorithm[J]. Journal of the American Statistical Association, 1987(82):97~105.
    [14]韩纪庆,张磊,郑铁然.语音信号处理[M].北京:清华大学出版社, 2004.
    [15] Zhang Xueying, Li Gaoyun, Qiao Feng. A speech endpoint detection algorithm based on entropy and RBF neural network[C]. Proceedings - 2007 IEEE International Conference onGranular Computing,2007, GrC 2007:506-509.
    [16] Nasersharif Babak, Akbari Ahmad, Homayounpour Mohammad Mehdi. Mel sub-band filtering and compression for robust speech recognition[C]. International Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007, v1:105-108.
    [17] Wang Jia-Ching, Wang Jhing-Fa, Weng Yu-Sheng. Chip design of MFCC extraction for speech recognition[J]. Integration, the VLSI Journal, November 2002, v32, n1-2:111-131.
    [18] Abdulla Waleed H., Kasabov Nikola K.. Feature Selection for Parallel CHMM Speech Recognition Systems[C]. Proceedings of the Joint Conference on Information Sciences, 2000,v5, n2: 874-878.
    [19] Rabiner L R,Schafer R W. Digital Processing of Speech Signals[J]. Englewood Cliffs(New Jersey):Prentice-Hall Inc.,1978.
    [20] Yuan Li-Chi. A speech recognition method based on improved hidden Markov model[J]. Zhongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Central South University (Science and Technology),December 2008,v39, n6:1303-1308.
    [21] Ciaramella A., Cravero M., Fissore L., Pieraccini R., Pirani G., Raineri F., Venuti G.. MARKOV MODELS IN SPEECH RECOGNITION AND UNDERSTANDING. [J]. CSELT Technical Reports,Aug 1986,v14, n4:293-296.
    [22] Sugawara Kazuhide, Nishimura Masafumi, Toshioka Koichi, Okochi Masaaki, Kaneko Toyohisa. ISOLATED WORD RECOGNITION USING HIDDEN MARKOV MODELS[C]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1985:1-4.
    [23] Sugawara Kazuhide, Nishimura Masafumi, Kuroda Akihiro. SPEAKER ADAPTATION FOR HIDDEN MARKOV MODEL[J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1986:2667-2670.
    [24] S.E.Levinson.Continuous speech recognition by means of acoustic-phonetic classification obtained from a hidden Markov model[C].in Proc.ICASSP’87(Dallas TX),April 1987.
    [25]唐尧.基于DSP平台的语音识别算法的研究与实现[D].南京:南京航空航天大学,2007.
    [26]张有为.人机自然交互[M].北京:国防工业出版社, 2004.
    [27]段红梅,汪军,马良河,徐冉.隐马尔可夫模型在语音识别中的应用[J].工科数学. 2002,12:16-19.
    [28] De La Torre,ángel Peinado, Antonio M Segura, JoséC Pérez-Córdoba, JoséL Benítez, Ma Carmen;Rubio, Antonio J. Histogram equalization of speech representation for robust speech recognition[C].IEEE Transactions on Speech and Audio Processing,May 2005,v13,n3: 355-366.
    [29] Acero Alejandro, Stern Richard M. Environmental robustness in automatic speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings,1990,v2:849-852.
    [30] Gong Yifan. speech recognition in noisy environments[J]. Speech Communication,Feb 2009, v16, n3:261-291.
    [31] Akbacak Murat, Hansen John H.L..Environmental sniffing: Noise knowledge estimation for robust speech systems[C]. IEEE International Conference on Acoustics, Speech and Signal Processing– Proceedings,2003, v2:113-116.
    [32] Kim Do Yeong, Un Chong Kwan, Kim Nam Soo.Speech recognition in noisy environments using first-order vector Taylor series[J]. Speech Communication,April 1998, v24, n1:39-49.
    [33] Tinston Michael, Ephraim Yariv. Speech Enhancement using the Multistage Wiener Filter[C].Proceedings - 43rd Annual Conference on Information Sciences and Systems, CISS 2009: 55-60.
    [34] Arakawa Takayuki, Tsujikawa Masanori, Isotani Ryosuke. Model-based wiener filter for noise robust speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings,2006, v1: I537-I540.
    [35] Yanlu Xie, Minghui Liu, Zhiqiang Yao, Beiqian Dai. Improved two-stage Wiener filter for robust speaker identification[C]. Proceedings - International Conference on Pattern Recognition, 2006,v4:310-313.
    [36] Qiu Zhengquan, Fan Xiaochun, Wang Junnian. Speaker recognition based on Wiener filtering and hybrid model[J]. Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument,July 2009,v30, n7: 1436-1439
    [37] Kosaka Tetsuo, Katoh Masaharu, Kohda Masaki. Histogram equalization for noise-robust speech recognition using discrete-mixture HMMs[J]. Acoustical Science and Technology,2008, v29, n1:66-73.
    [38] De La Torreángel, Segura JoséC.,Benítez Carmen, Peinado Antonio M., Rubio Antonio J..Non-linear transformations of the feature space for robust speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing-Proceedings,2004,v1: v401-v404.
    [39] Suh Youngjoo, Ji Mikyong, Kim Hoirin. Probabilistic class histogram equalization for robust speech recognition[C]. IEEE Signal Processing Letters,April 2007,v14, n4:287-290.
    [40] Moreno Pedro J, Raj Bhiksha, Stern Richard M. Vector Taylor series approach for environment-independent speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing-Proceedings,1996,v2:733-736.
    [41] Kalinli Ozlem, Seltzer Michael L.,Acero Alex. Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing– Proceedings,2009:3825-3828.
    [42] Shen Haifeng, Li Qunxia, Guo Jun, Liu Gang. HMM parameter adaptation using the truncated first-order VTS and EM algorithm for robust speech recognition[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),2005,v3801 LNAI:979-984.
    [43]吴黎明等.语音信号及单片机处理[M].北京:科学出版社, 2007.
    [44] Majdalawieh Osama, Gu Jason, Meng Max.An HTK-developed Hidden Markov Model (HMM) for a voice-controlled robotic system[C].2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2004,v4:4050-4055.
    [45] Yong S.,Kershaw D., Odell J., Ollason D.,Valtchev V., Woodland P.. The HTK Book (for HTK Version 2.2). Cambridge University,1999.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700