Variational conditional random fields for online speaker detection and tracking

详细信息	查看全文 \| 推荐本文 \|

作者：M.H. Moattar ^{moattar@aut.ac.ir} ; M.M. Homayounpour ; ^{homayoun@aut.ac.ir}
关键词：Conditional random fields ; Gaussian mixture model ; Variational approximation ; Speaker verification ; Speaker diarization ; Speaker tracking
刊名：Speech Communication
出版年：2012
期刊代码：112_01676393
类别：cp
出版时间：July, 2012
卷：54
期：6
页码：763-780
文件大小：1193 K

摘要

There are many references that concern a specific aspect of speaker tracking. This paper focuses on the speaker modeling issue and proposes conditional random fields (CRF) for this purpose. CRF is a class of undirected graphical models for classifying sequential data. CRF has some interesting characteristics which have encouraged us to use this model in a speaker modeling and tracking task. The main concern of CRF model is its training. Known approaches for CRF training are prone to overfitting and unreliable convergence. To solve this problem, variational approaches are proposed in this paper. The main novelty of this paper is to adapt variational framework for CRF training. The resulted approach is evaluated on three different areas. First, the best CRF model configuration for speaker modeling is evaluated on text independent speaker verification. Next, the selected model is used in a speaker detection task, in which the models of the existing speakers in the conversation are known a priori. Then, the proposed CRF approach is compared with GMM in an online speaker tracking framework. The results show that the proposed CRF model is superior to GMM in speaker detection and tracking, due to its capability for sequence modeling and segmentation.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700