基于ESN-RBF框架的声效模式检测

英文篇名：Vocal effort detection based on ESN-RBF framework
作者：晁浩 ; 董亮
英文作者：CHAO Hao;DONG Liang;College of Computer Science and Technology,Henan Polytechnic University;
关键词：声效检测 ; 回声状态网络 ; 储备池 ; 径向基函数 ; 支持向量机
英文关键词：vocal effort detection;;echo state network;;reservoir;;radial basis function;;support vector machine
中文刊名：JGXB
英文刊名：Journal of Henan Polytechnic University(Natural Science)
机构：河南理工大学计算机科学与技术学院;
出版日期：2019-05-13 10:26
出版单位：河南理工大学学报(自然科学版)
年：2019
期：v.38;No.189
基金：国家自然科学基金资助项目(61502150,61403128);; 河南省高等学校重点科研项目(19A520004);; 河南省高等学校青年骨干教师科研项目(2015GGJS-068);; 河南省高校基本科研业务费专项项目(NSFRF1616)
语种：中文;
页：JGXB201904016
页数：6
CN：04
ISSN：41-1384/N
分类号：119-124

摘要

针对声效检测过程中基于帧的谱特征不能描述语音现象中固有的时间相关性和动态变化信息的问题,提出一种结合回声状态网络和径向基函数网络的声效检测方法。首先将声学观测特征序列输入到回声状态网络,根据回声状态网络中储备池的节点状态对输入的观测矢量序列进行编码,从而将基于语音帧的声学观测矢量序列映射到高维编码空间;然后径向基函数网络被用于拟合每种声效模式编码后的概率密度函数;最后使用最小错误率贝叶斯决策方法来确定声效模式。对拥有5 000个孤立词的测试集进行声效检测试验,获得79.5%的识别精度。结果表明,所提方法可以有效获取语音帧之间的相关性信息,克服帧间独立假设的缺陷。
The frame based spectral feature cannot describe the inherent temporal correlation and dynamic change information in speech phenomena for vocal effort detection.In view of this,a vocal effort detection method based on ESN-RBF framework was proposed.The acoustic observation sequence was fed to an echo state network,and the reservoir of this echo state network was used to map the acoustic observation sequence to a vector in the high dimensional coding space.Then,RBF was employed to fit the probability density function of each VE mode by using the vectors in the high dimensional coding space.Finally,the minimum error rate Bayes decision was employed to judge the vocal effort mode.Experiments were conducted on test set with 5 000 isolated words,and the proposed method achieved 79.5% average recognition accuracy.The results showed that the proposed method could effectively obtain the correlation information between speech frames and overcome the defect of the independent hypothesis between frames.

引文

[1] GHAFFARZADEGAN S,BO effort Ⅱ:Analysis and constrained-lexicon recognition of whispered speech[C]//ICASSP.Florence:IEEE,2014:2563-2567.
    [2] SAEIDI R,ALKU,BACKSTROM T.Feature extraction using power-law adjusted linear prediction with application to speaker recognition under severe vocal effort mismatch[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2016,24(1):42-53.
    [3] TUMOMO R,ANTTI S,JOUNI P,et al.Analysis and synthesis of shouted speech[C]// INTERSPEECH.Lyon:ISCA,2013:1544-1548.
    [4] ZHANG C,HANSEN,JOHN H L H.Advancements in whisper-island detection within normally phonated audio streams[C]// INTERSPEECH.Brighton:ISCA,2009:860-863.
    [5] CARLIN M A,SMOLENSKI B Y,WENNDT S J.Unsupervised detection of whispered speech in the presence of normal phonation[C]// INTERSPEECH.Pittsburgh,PA:ISCA,2006:685-688.
    [6] ZHANG C,JOHN H L H,Analysis and classification of speech mode:whispered through shouted[C]// INTERSPEECH.Antwerp:ISCA,2007:2289-2292.
    [7] PETR Z,MILAN S,JIRI S.Impact of vocal effort variability on automatic speech recognition[J].Speech Communication,2012,54(6):732-742.
    [8] 晁浩,宋成,刘志中.基于元音模板匹配的声效多级检测[J].北京邮电大学学报,2016,39(4) :98-102.CHAO H,SONG C,LIU Z Z.Multi-level detection of vocal effort based on vowel template matching[J].Journal of Beijing University of Posts and Telecommunications,2016,39(4):98-102.
    [9] CHAO H,LU B Y,LIU Y L,et al.Vocal effort detection based on spectral information entropy feature and model fusion[J].Journal of Information Processing Systems,2018,14(1):218-227.
    [10] ZELINKA P,SIGMUND M.Automatic vocal effort detection for reliable speech recognition[C]// IEEE International Workshop on Machine Learning for Signal Processing.Kittil?:IEEE,2010:349-354.
    [11] JAEGER H,HAAS H.Harnessing nonlinearity:Predicting chaotic systems and saving energy in wireless communication[J].Science,2004,304(5667):78-80.
    [12] XU M L,HAN M.Adaptive elastic echo state network for multivariate time series prediction[J].IEEE Transactions on Cybernetics,2016,46(10):2173-2183.
    [13] 田中大,李树江,王艳红,等.基于混沌理论与改进回声状态网络的网络流量多步预测[J].通信学报,2016,37(3):55-70.TIAN Z D,LI S J,WANG Y H,et al.Network traffic multi-step prediction based on chaos theory and improved echo state network[J].Journal on Communications,2016,37(3):55-70.
    [14] QIAO J F,LI F J,HAN H G,et al.Growing echo-state network with multiple subreservoirs[J].IEEE Transactions on Neural Networks and Learning Systems,2017,28(2):391-404.
    [15] SUN X C,LI T,LI Q,et al.Deep belief echo-state network and its application to time series prediction[J].Knowledge-Based Systems,2017,130:17-29.
    [16] BAI H F,WANG D S,WANG L C,et al.Hierarchy echo state network based service-awareness in 10G-EPON[J].Journal of China Universities of Posts and Telecommunications,2016,23(2):91-96.
    [17] CHANG C C,LIN C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,27(2):1-27.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700