文摘
This paper proposes a speech synchronized tongue animation system from text or speech. Firstly, an anatomically accurate physiological tongue model is built, and then produces tremendous tongue deformation samples according to the randomly input muscle activation samples. Secondly, these input and output samples are used to train a neural network for establishing the relationship between the muscle activation and tongue contour deformation. Thirdly, the neural network is used to estimate the non-rigid tongue movement parameters, namely tongue muscle activations, from a collected X-ray tongue movement image database of Mandarin Chinese phonemes after removing the rigid tongue movement, and then the estimation results are used for constructing the tongue physeme (the sequences of the tongue muscle activations and the rigid movement) database corresponding to the Mandarin Chinese phoneme database. Finally, the physemes corresponding to the phonemes extracted from input text or speech are blended to drive the physiological tongue model for producing the speech synchronized tongue animation according to the durations of phonemes. Simulation results demonstrate that the synthesized tongue animations are visually realistic and approximate the tongue medical data well.