基于多尺度熵和遗传算法改进的语音识别技术

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于多尺度熵和遗传算法改进的语音识别技术

详细信息查看全文 | 推荐本文 |

英文篇名：Improved speech recognition technology based on multi-scale entropy and genetic algorithm
作者：樊海花 ; 穆春阳 ; 马行
英文作者：FAN Haihua;MU Chunyang;MA Xing;Institute of Information and Communication Technology,North Minzu University;College of Mechatronic Engineering,North Minzu University;
关键词：模态分解 ; 语音识别 ; 局部收敛 ; 多尺度熵 ; 隐马尔可夫模型 ; 遗传算法
英文关键词：modal decomposition;;speech recognition;;local convergence;;multi-scale entropy;;hidden Markov model;;genetic algorithm
中文刊名：XDDJ
英文刊名：Modern Electronics Technique
机构：北方民族大学信息与通信技术研究所;北方民族大学机电工程学院;
出版日期：2019-03-13 07:01
出版单位：现代电子技术
年：2019
期：v.42;No.533
基金：国家自然科学基金(61163002);; 国家民委中青年英才培养计划(2016GQR10);; 宁夏自然科学基金(NZ16086);; 宁夏回族自治区高等学校科技创新平台:先进装备关键零部件及系统创新产学研合作基地;; 北方民族大学研究生创新项目(YCX1771)资助~~
语种：中文;
页：XDDJ201906032
页数：6
CN：06
ISSN：61-1224/TN
分类号：134-139

摘要

现代语音识别系统大多数都处在复杂的环境中,语音特征的提取势必会受到噪声的影响;在低信噪比环境下的隐马尔可夫模型,它的训练参数容易收敛于局部最小值,导致识别率下降。首先对采集到的语音信号,利用补充的总体经验模态分解(CEEMD)和多尺度熵算法对信号进行随机性检测,该方法在检测出CEEMD分解的异常分量后,进行经验模态分解(EMD);其次将分解得到的近乎纯净的语音信号,作为基于遗传算法改进的隐马尔可夫模型的输入。实验结果表明,利用多尺度熵与遗传算法改进的隐马尔可夫模型相结合的方式,具有更优的收敛速度和优化性能,识别率至少提高1.23%。
The training parameters of the hidden Markov model in the low signal-to-noise ratio environment tend to con-verge to local minimum values,resulting in recognition rate decline. Therefore,the complementary ensemble empirical model de-composition(CEEMD)and multi-scale entropy algorithm are used to conduct random detection of the collected speech signals.In the method,the empirical model decomposition(EMD)is conducted after detecting the abnormal components decomposed by the CEEMD. The near-pure speech signals obtained by decomposition are taken as the input of the improved hidden Markov model based on the genetic algorithm. The experimental results show that the mode combining the multi-scale entropy and hidden Mar-kov model improved by the genetic algorithm has a good convergence speed and optimization performance,and its recognition rate is increased by at least 1.23%.

引文

[1]于晓明,柏松.基于前向-后向HMM的连续语音识别系统的研究[J].计算机工程与设计,2009,30(18):4339-4341.YU Xiaoming,BAI Song.Research on speech recognition system based on forward and backward HMM[J].Computer engineering and design,2009,30(18):4339-4341.
    [2]王群,曾庆宁,谢先明,等.低信噪比环境下的语音识别方法研究[J].声学技术,2017,36(1):50-56.WANG Qun,ZENG Qingning,XIE Xianming,et al.Research on speech recognition in low SNR environment[J].Technical acoustics,2017,36(1):50-56.
    [3]KWONG S,CHAU C W.Analysis of parallel genetic algorithms on HMM based speech recognition system[J].IEEEtransactions on consumer electronics,1997,43(4):1229-1233.
    [4]LIN C T,NEIN H W,HWU J Y.GA-based noisy speech recognition using two-dimensional cepstrum[J].IEEE transactions on speech&audio processing,2000,8(6):664-675.
    [5]CABALLERO-MORALES S O,TRUJILLO-ROMERO F.Evolutionary approach for integration of multiple pronunciation patterns for enhancement of dysarthric speech recognition[J].Expert systems with applications,2014,41(3):841-852.
    [6]朱宁辉,白晓民,董伟杰.基于EEMD的谐波检测方法[J].中国电机工程学报,2013,33(7):92-98.ZHU Ninghui,BAI Xiaomin,DONG Weijie.Harmonic detection method based on EEMD[J].Proceedings of the CSEE,2013,33(7):92-98.
    [7]MOTIN M A,KARMAKAR C K,PALANISWAMI M.Ensemble empirical mode decomposition with principal component analysis:a novel approach for extracting respiratory rate and heart rate from photoplethysmographic signal[J].IEEE journal of biomedical&health informatics,2018,22(3):766-774.
    [8]李合龙,冯春娥.基于EEMD的投资者情绪与股指波动的关系研究[J].系统工程理论与实践,2014,34(10):2495-2503.LI Helong,FENG Chun’e.Relationship between investor sentiment and stock indices fluctuation based on EEMD[J].Systems engineering:theory&practice,2014,34(10):2495-2503.
    [9]李晶皎,安冬,王骄.基于EEMD和ICA的语音去噪方法[J].东北大学学报(自然科学版),2011,32(11):1554-1557.LI Jingjiao,AN Dong,WANG Jiao.Speech denoising method based on the EEMD and ICA approaches[J]Journal of Northeastern University(Natural science),2011,32(11):1554-1557.
    [10]王姣,李振春,王德营.基于CEEMD的地震数据小波阈值去噪方法研究[J].石油物探,2014,53(2):164-172.WANG Jiao,LI Zhenchun,WANG Deying.A method for wavelet threshold denoising of seismic data based on CEEMD[J].Geophysical prospecting for petroleum,2014,53(2):164-172.
    [11]PAN L,LIU K,JIANG J F,et al.A denoising algorithm based on EEMD in Raman-based distributed temperature sensor[J].IEEE sensors journal,2017,17(1):134-138.
    [12]姚文坡,刘铁兵,戴加飞,等.脑电信号的多尺度排列熵分析[J].物理学报,2014,63(7):427-433.YAO Wenpo,LIU Tiebing,DAI Jiafei,et al.Multi-scale permutation entropy analysis of EEG signals[J].Acta Physica Sinica,2014,63(7):427-433.
    [13]FODOR G,MARCO P D,TELEK M.On minimizing the MSE in the presence of channel state information errors[J].IEEE communications letters,2015,19(9):1604-1607.
    [14]甄斌,吴玺宏,刘志敏,等.语音识别和说话人识别中各倒谱分量的相对重要性[J].北京大学学报(自然科学版),2001,37(3):371-378.ZHEN Bin,WU Xihong,LIU Zhimin,et al.On the importance of components of the MFCC in speech and speaker recognition[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2001,37(3):371-378.
    [15]ZHANG C,LIU Y,XIA Y,et al.Reliable accent-specific unit generation with discriminative dynamic Gaussian mixture selection for multi-accent Chinese speech recognition[J].IEEE transactions on audio,speech&language processing,2013,21(10):2073-2084.
    [16]WANG C,NI J,HUANG J.An informed watermarking scheme using hidden Markov model in the wavelet domain[J].IEEE transactions on information forensics&security,2012,7(3):853-867.
    [17]包亚萍,郑骏,武晓光.基于HMM和遗传神经网络的语音识别系统[J].计算机工程与科学,2011,33(4):139-144.BAO Yaping,ZHENG Jun,WU Xiaoguang.Speech recognition based on a hybrid model of hidden Markov models and the genetic algorithm neural network[J].Computer engineering and science,2011,33(4):139-144.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700