Decision Level Fusion for Audio-Visual Speech Recognition in Noisy Conditions
详细信息    查看全文
  • 关键词:Speech recognition ; Audio ; visual speech ; Decision level fusion
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2017
  • 出版时间:2017
  • 年:2017
  • 卷:10125
  • 期:1
  • 页码:360-367
  • 丛书名:Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
  • ISBN:978-3-319-52277-7
  • 卷排序:10125
文摘
This paper proposes a decision level fusion strategy for audio-visual speech recognition in noisy situations. This method aims to enhance the recognition over different noisy conditions by fusing the scores obtained with classifiers trained with different feature sets. In particular, this method is evaluated by considering three modalities, audio, visual and audio-visual, respectively, but it could be employed using as many modalities as needed. The combination of the scores is performed by taking into account the reliability of each modality at different noisy conditions. The performance of the proposed recognition system is evaluated over two isolated word audio-visual databases, a public one and a database compiled by the authors of this paper. The proposed decision level fusion strategy is evaluated by considering different kind of classifier. Experimental results show that a good performance is achieved with the proposed system, leading to improvements in the recognition rates through a wide range of signal-to-noise ratios.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700