A monotonic statistical machine translation approach to speaking style transformation

详细信息	查看全文 \| 推荐本文 \|

作者：Graham Neubig ; ^{neubig@ar.media.kyoto-u.ac.jp} ; Yuya Akita ^{akita@ar.media.kyoto-u.ac.jp} ; Shinsuke Mori ^{forest@i.kyoto-u.ac.jp} ; Tatsuya Kawahara ; ^{kawahara@i.kyoto-u.ac.jp}
关键词：Rich transcription ; Speaking style transformation ; Disfluency detection ; Weighted finite state transducers ; Monotonic machine translation
刊名：Computer Speech & Language
出版年：2012
期刊代码：124_08852308
类别：cp
出版时间：October, 2012
卷：26
期：5
页码：349-370
文件大小：615 K

摘要

This paper presents a method for automatically transforming faithful transcripts or ASR results into clean transcripts for human consumption using a framework we label speaking style transformation (SST). We perform a detailed analysis of the types of corrections performed by human stenographers when creating clean transcripts, and propose a model that is able to handle the majority of the most common corrections. In particular, the proposed model uses a framework of monotonic statistical machine translation to perform not only the deletion of disfluencies and insertion of punctuation, but also correction of colloquial expressions, insertions of omitted words, and other transformations. We provide a detailed description of the model implementation in the weighted finite state transducer (WFST) framework. An evaluation of the proposed model on both faithful transcripts and speech recognition results of parliamentary and lecture speech demonstrates the effectiveness of the proposed model in performing the wide variety of corrections necessary for creating clean transcripts.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700