基于状态后验概率的语音唤醒识别系统
详细信息    查看官网全文
摘要
语音唤醒系统是一种个性化较强的语音识别应用系统,在特定应用中所积累的数据量一般比较有限。这种情况导致语音唤醒系统的声学模型相对较弱,如果基于连续语音识别的解码结果进行分析通常会带来较高的错误水平。本文提出了一种基于N-BEST路径上的状态后验概率来进行唤醒判别的方法,该方法可以有效克服弱声学模型所带来的解码结果错误问题,在维持低虚警率的情况下大幅提升系统的唤醒率。本文详细介绍了语音唤醒系统框架以及所提出方法的具体步骤,并通过实验对方法的有效性进行了验证。
Wake-Up-Word system is a kind of personalized speech recognition system, which always suffers from data limitation in practical application. In this case, analysis based on continuous speech recognition decoding results easily leads to high error rate, as the acoustic model is not well-trained. This paper proposes an approach of awaken analysis based on the posterior probability in the path of N-BEST. This method can effectively overcome the problem of decoding errors bringing by the weak acoustic model, and greatly increase the active rate under a low false alarm rate. The article introduces the framework of Wake-Up-Word speech recognition system as well as the algorithmic details. Experiment results are also listed to verify the effectiveness of the proposed approach.
引文
[1]Lee H,Chang S,Yook D,et al.A voice trigger system using keyword and speaker recognition for mobile devices[J].IEEE Transactions on Consumer Electronics,2009,55(4):2377-2384.
    [2]K?puska V Z,Klein T B.A novel wake-up-word speech recognition system,wake-up-word recognition task,technology and evaluation[J].Nonlinear Analysis:Theory,Methods&Applications,2009,71(12):e2772-e2789.
    [3]Hinton G,Deng L,Yu D,et al.Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.
    [4]Pan J,Liu C,Wang Z,et al.Investigation of deep neural networks(DNN)for large vocabulary continuous speech recognition:Why DNN surpasses GMMs in acoustic modeling[C]//Chinese Spoken Language Processing(ISCSLP),2012 8th International Symposium on.IEEE,2012:301-305.
    [5]Frank Seide,Gang Li,Xie Chen,and Dong Yu.Feature engineering in context-dependent deep neural networks for conversational speech transcription.In Automatic Speech Recognition and Understanding(ASRU),2011 IEEE Workshop on,pages 24–29.IEEE,2011.
    [6]Shih C T.Investigation of Prosodic Features for Wake‐Up‐Word Speech Recognition Task[D].Florida Institute of technology,2009.
    [7]Seltzer M L,Yu D,Wang Y.An investigation of deep neural networks for noise robust speech recognition[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:7398-7402.
    [8]Kilian J,Siegelmann H T.On the power of sigmoid neural networks[C]//Proceedings of the sixth annual conference on Computational learning theory.ACM,1993:137-143.
    [9]Li X,Wu X.Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2015:4520-4524.
    [10]Miao Y,Metze F,Rawat S.Deep maxout networks for low-resource speech recognition[C]//Automatic Speech Recognition and Understanding(ASRU),2013 IEEE Workshop on.IEEE,2013:398-403.
    [11]Swietojanski P,Li J,Huang J T.Investigation of maxout networks for speech recognition[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2014:7649-7653.
    [12]Birkhoff G,Birkhoff G,Birkhoff G,et al.Lattice theory[M].New York:American Mathematical Society,1948.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700