Building Statistical Parametric Multi-speaker Synthesis for Bangladeshi Bangla

详细信息查看全文

作者：Alexander Gutkin ; ^{agutkin@google.com" class="auth_mail" title="E-mail the corresponding author} ; Linne Ha ^{linne@google.com" class="auth_mail" title="E-mail the corresponding author} ; Martin Jansche ; ^{mjansche@google.com" class="auth_mail" title="E-mail the corresponding author} ; Oddur Kjartansson ^{oddur@google.com" class="auth_mail" title="E-mail the corresponding author} ; Knot Pipatsrisawat ^{thammaknot@google.com" class="auth_mail" title="E-mail the corresponding author} ; Richard Sproat ; ^{rws@google.com" class="auth_mail" title="E-mail the corresponding author}
关键词：TTS ; Bangladesh ; HMM ; LSTM-RNN ; acoustic modeling
刊名：Procedia Computer Science
出版年：2016
出版时间：2016
年：2016
卷：81
期：Complete
页码：194-200
全文大小：133 K

文摘

We present a text-to-speech (TTS) system designed for the dialect of Bengali spoken in Bangladesh. This work is part of an ongoing effort to address the needs of new under-resourced languages. We propose a process for streamlining the bootstrapping of TTS systems for under-resourced languages. First, we use crowdsourcing to collect the data from multiple ordinary speakers, each speaker recording small amount of sentences. Second, we leverage an existing text normalization system for a related language (Hindi) to bootstrap a linguistic front-end for Bangla. Third, we employ statistical techniques to construct multi-speaker acoustic models using Long Short-term Memory Recurrent Neural Network (LSTM-RNN) and Hidden Markov Model (HMM) approaches. We then describe our experiments that show that the resulting TTS voices score well in terms of their perceived quality as measured by Mean Opinion Score (MOS) evaluations.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700