Modeling State-Conditional Observation Distribution Using Weighted Stereo Samples for Factorial Speech Processing Models

详细信息查看全文

作者：Mahdi Khademian ; Mohammad Mehdi Homayounpour
关键词：Factorial speech processing models ; State ; conditional observation distribution ; Stereo samples ; Noise ; robust automatic speech recognition
刊名：Circuits, Systems, and Signal Processing
出版年：2017
出版时间：January 2017
年：2017
卷：36
期：1
页码：339-357
全文大小：
刊物类别：Engineering
刊物主题：Circuits and Systems; Electrical Engineering; Signal,Image and Speech Processing; Electronics and Microelectronics, Instrumentation;
出版者：Springer US
ISSN：1531-5878
卷排序：36

文摘

This paper investigates the effectiveness of factorial speech processing models in noise-robust automatic speech recognition tasks. For this purpose, the paper proposes an idealistic approach for modeling state-conditional observation distribution of factorial models based on weighted stereo samples. This approach is an extension to previous single-pass retraining for ideal model compensation which is extended here to support multiple audio sources. Non-stationary noises can be considered as one of these audio sources with multiple states. Experiments of this paper over the set A of the Aurora 2 dataset show that recognition performance can be improved by this consideration. The improvement is significant in low signal-to-noise energy conditions, up to 4 % absolute word recognition accuracy. In addition to the power of the proposed method in accurate representation of state-conditional observation distribution, it has an important advantage over previous methods by providing the opportunity to independently select feature spaces for both source and corrupted features. This opens a new window for seeking better feature spaces appropriate for noisy speech, independent from clean speech features.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700