Learning HMM State Sequences from Phonemes for Speech Synthesis

详细信息查看全文

作者：Giorgio Biagetti ; Paolo Crippa ; ^{p.crippa@univpm.it" class="auth_mail" title="E-mail the corresponding author} ; Laura Falaschetti ; Simone Orcioni ; Claudio Turchetti
关键词：Learning ; HMM ; Speech synthesis ; EM estimation ; MDCT ; MFCC
刊名：Procedia Computer Science
出版年：2016
出版时间：2016
年：2016
卷：96
期：Complete
页码：1589-1596
全文大小：1205 K

文摘

This paper presents a technique for learning hidden Markov model (HMM) state sequences from phonemes, that combined with modified discrete cosine transform (MDCT), is useful for speech synthesis. Mel-cepstral spectral parameters, currently adopted in the conventional methods as features for HMM acoustic modeling, do not ensure direct speech waveforms reconstruction. In contrast to these approaches, we use an analysis/synthesis technique based on MDCT that guarantees a perfect reconstruction of the signal frame feature vectors and allows for a 50% overlap between frames without increasing the data rate. Experimental results show that the spectrograms achieved with the suggested technique behave very closely to the original spectrograms, and the quality of synthesized speech is conveniently evaluated using the well known Itakura-Saito measure.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700