Phoneme-level articulatory animation in pronunciation training

详细信息查看全文

作者：Lan Wang^a ; ^{lan.wang@siat.ac.cn} ; Hui Chen^c ; ^{chenhui@iscas.ac.cn} ; Sheng Li^a ; ^{sheng.li@siat.ac.cn} ; Helen M. Meng^a ; ^b ; ^{hmmeng@se.cuhk.edu.hk}
关键词：Phoneme-based articulatory models ; HMM-based visual synthesis ; 3D articulatory animation
刊名：Speech Communication
出版年：2012
出版时间：September, 2012
年：2012
卷：54
期：7
页码：845-856
全文大小：1311 K

文摘

Speech visualization is extended to use animated talking heads for computer assisted pronunciation training. In this paper, we design a data-driven 3D talking head system for articulatory animations with synthesized articulator dynamics at the phoneme level. A database of AG500 EMA-recordings of three-dimensional articulatory movements is proposed to explore the distinctions of producing the sounds. Visual synthesis methods are then investigated, including a phoneme-based articulatory model with a modified blending method. A commonly used HMM-based synthesis is also performed with a Maximum Likelihood Parameter Generation algorithm for smoothing. The 3D articulators are then controlled by synthesized articulatory movements, to illustrate both internal and external motions. Experimental results have shown the performances of visual synthesis methods by root mean square errors. A perception test is then presented to evaluate the 3D animations, where a word identification accuracy is 91.6 % among 286 tests, and an average realism score is 3.5 (1 = bad to 5 = excellent).

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700