Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

详细信息查看全文

作者：Martin Wö ; llmer ; ^{woellmer@tum.de} ; Felix Weninger ; Jü ; rgen Geiger ; Bjö ; rn Schuller ; Gerhard Rigoll
关键词：Automatic speech recognition ; Long Short-Term Memory ; Non-negative matrix factorization ; Tandem feature extraction
刊名：Computer Speech & Language
出版年：2013
出版时间：May, 2013
年：2013
卷：27
期：3
页码：780-797
全文大小：668 K

文摘

This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for context-sensitive Tandem feature extraction and show how the Connectionist Temporal Classification approach can be used as a BLSTM-based back-end, alternatively to Hidden Markov Models (HMM). We combine context-sensitive BLSTM-based feature generation and speech decoding techniques with source separation by convolutive non-negative matrix factorization. Applying our speaker adapted multi-stream HMM framework that processes MFCC features from NMF-enhanced speech as well as word predictions obtained via BLSTM networks and non-negative sparse classification (NSC), we obtain an average accuracy of 91.86 % on the PASCAL CHiME Challenge task at signal-to-noise ratios ranging from ?6 to 9 dB. To our knowledge, this is the best result ever reported for the CHiME Challenge task.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700