基于状态后验概率的语音唤醒识别系统

英文论文题名：Wake-Up-Word speech recognition system based on the posterior of states
论文作者：李文凤 ; 徐及 ; 张鹏远
英文论文作者：LI Wen-feng ; XU Ji ; ZHANG Peng-yuan ; Chinese Academy of Science ; The Key Laboratory of Speech Acoustics and Content understanding
年：2016
作者机构：中国科学院语言声学与内容理解重点实验室;
论文关键词：语音唤醒 ; 深度神经网络 ; 词图 ; 后验概率
英文论文关键词：Wake-Up-Word Speech ; Deep Neural Network ; Lattice ; Posterior
会议召开时间：2016-10-28
会议录名称：2016年全国声学学术会议论文集
语种：中文
分类号：TN912.34
学会代码：OGSM
会议名称：2016年全国声学学术会议
会议地点：中国湖北武汉
主办单位：中国声学学会
学会名称：中国声学学会
页数：4
文件大小：565k
原文格式：D
会议级别：全国

摘要

语音唤醒系统是一种个性化较强的语音识别应用系统,在特定应用中所积累的数据量一般比较有限。这种情况导致语音唤醒系统的声学模型相对较弱,如果基于连续语音识别的解码结果进行分析通常会带来较高的错误水平。本文提出了一种基于N-BEST路径上的状态后验概率来进行唤醒判别的方法,该方法可以有效克服弱声学模型所带来的解码结果错误问题,在维持低虚警率的情况下大幅提升系统的唤醒率。本文详细介绍了语音唤醒系统框架以及所提出方法的具体步骤,并通过实验对方法的有效性进行了验证。
Wake-Up-Word system is a kind of personalized speech recognition system, which always suffers from data limitation in practical application. In this case, analysis based on continuous speech recognition decoding results easily leads to high error rate, as the acoustic model is not well-trained. This paper proposes an approach of awaken analysis based on the posterior probability in the path of N-BEST. This method can effectively overcome the problem of decoding errors bringing by the weak acoustic model, and greatly increase the active rate under a low false alarm rate. The article introduces the framework of Wake-Up-Word speech recognition system as well as the algorithmic details. Experiment results are also listed to verify the effectiveness of the proposed approach.

引文

[1]Lee H,Chang S,Yook D,et al.A voice trigger system using keyword and speaker recognition for mobile devices[J].IEEE Transactions on Consumer Electronics,2009,55(4):2377-2384.
    [2]K?puska V Z,Klein T B.A novel wake-up-word speech recognition system,wake-up-word recognition task,technology and evaluation[J].Nonlinear Analysis:Theory,Methods&Applications,2009,71(12):e2772-e2789.
    [3]Hinton G,Deng L,Yu D,et al.Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.
    [4]Pan J,Liu C,Wang Z,et al.Investigation of deep neural networks(DNN)for large vocabulary continuous speech recognition:Why DNN surpasses GMMs in acoustic modeling[C]//Chinese Spoken Language Processing(ISCSLP),2012 8th International Symposium on.IEEE,2012:301-305.
    [5]Frank Seide,Gang Li,Xie Chen,and Dong Yu.Feature engineering in context-dependent deep neural networks for conversational speech transcription.In Automatic Speech Recognition and Understanding(ASRU),2011 IEEE Workshop on,pages 24–29.IEEE,2011.
    [6]Shih C T.Investigation of Prosodic Features for Wake‐Up‐Word Speech Recognition Task[D].Florida Institute of technology,2009.
    [7]Seltzer M L,Yu D,Wang Y.An investigation of deep neural networks for noise robust speech recognition[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:7398-7402.
    [8]Kilian J,Siegelmann H T.On the power of sigmoid neural networks[C]//Proceedings of the sixth annual conference on Computational learning theory.ACM,1993:137-143.
    [9]Li X,Wu X.Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2015:4520-4524.
    [10]Miao Y,Metze F,Rawat S.Deep maxout networks for low-resource speech recognition[C]//Automatic Speech Recognition and Understanding(ASRU),2013 IEEE Workshop on.IEEE,2013:398-403.
    [11]Swietojanski P,Li J,Huang J T.Investigation of maxout networks for speech recognition[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2014:7649-7653.
    [12]Birkhoff G,Birkhoff G,Birkhoff G,et al.Lattice theory[M].New York:American Mathematical Society,1948.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700