Use of bimodal coherence to resolve the permutation problem in convolutive BSS

详细信息	查看全文 \| 推荐本文 \|

作者：Qingju Liu ; ^{Q.Liu@surrey.ac.uk} ; Wenwu Wang ; ^{W.Wang@surrey.ac.uk} ; Philip Jackson ; ^{P.Jackson@surrey.ac.uk}
关键词：Convolutive blind source separation (BSS) ; Audio&ndash ; visual coherence ; Gaussian mixture model (GMM) ; Feature selection and fusion ; Adapted expectation maximization (AEM) ; Indeterminacy
刊名：Signal Processing
出版年：2012
期刊代码：162_01651684
类别：et
出版时间：August, 2012
卷：92
期：8
页码：1916-1927
文件大小：1249 K

摘要

Recent studies show that facial information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterization of the coherence between the audio and visual speech using, e.g., a Gaussian mixture model (GMM). In this paper, we present three contributions. With the synchronized features, we propose an adapted expectation maximization (AEM) algorithm to model the audio-visual coherence in the off-line training process. To improve the accuracy of this coherence model, we use a frame selection scheme to discard nonstationary features. Then with the coherence maximization technique, we develop a new sorting method to solve the permutation problem in the frequency domain. We test our algorithm on a multimodal speech database composed of different combinations of vowels and consonants. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS, which confirms the benefit of using visual speech to assist in separation of the audio.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700