详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
Realistic synchronized speech facial animation is a heated issue in the field of Computer Graphics and has a lot of applications in Human-Computer Interfaces, Entertainment, Film & Television Production, and Virtual Reality, etc. In the past 30 years, great progress and developments have been made in speech animation. However, at present, speech animation still has a lot of problems. Therefore, how to obtain synchronized speech driven realistic facial animation is a challenge subject which concerns so many problems including the kinematic and dynamic modeling and representation of individualized face, the mechanism of co-articulation and the acoustic and perceptual evaluation of realistic synchronized speech facial animation.
     In this paper, we study the synchronized speech facial animation from the following aspects.
     Firstly, based on the Waters' muscle model, a novel lip muscle model is proposed in this paper. Establishing muscle model for human facial animation is a simple and useful approach. However, too simple muscle model, like Waters' muscle model, can not describe some complicated moving facial expressions naturally. So we proposed a new lip muscle model, which perfects the description of the complicated lips' muscle movements which are not accurate in the Waters' model. According to facial anatomy, the global lip movement is divided into a few sub-movements. These sub-movements are the basic units for the description of the global lip movement. The reconstruction of the lip movement is based on the linear combination of the sub-movements. In the application of modeling talking face, several feature points are marked to get a group of lips parameters. All kinds of lip shapes are synthesized by using the proposed lip muscle model and the adjacent linear muscle model. The experimental results show that the proposed model is practical in view of its low computational cost and ability of producing all kinds of realistic synthesized lip shapes.
     Secondly, based on the previous researches on Chinese mandarin triphone model and co-articulation, a context-dependent visual speech co-articulation model is proposed in this paper. This approach combines the advantages of rule-based and learning-based methods to get realistic speech animation. Our presented model focuses on the visual effect of Chinese mandarin co-articulation. In order to get the key synthesized lip shapes in continuous speech, the rule set of the visual speech co-articulation is constructed and the phones' corresponding visemes weights are calculated by the quantized rule set. We synthesize a sequence of phones' corresponding lip shapes by using our muscle-based facial model. To produce realistic speech animation, a learning-based approach is used to acquire optimal synthesized transition lip shapes between two phones from all possible selections.
     Thirdly, a novel lip movement model related to speech rate is proposed in this paper. In continuous speech, speech rate has a strong effect on the velocity and amplitude of lip movement. At different speech rates, different people select different strategies of lip movement. For increased rate, some speakers decrease amplitude but maintain the velocity of the movement; others increase velocity while maintaining amplitude; and others make adjustments in both parameters. Therefore, according to the above research background, a novel lip movement model related to speech rate, which has high degree of individuality and naturalness, is proposed. According to the former researches, there exists a closed relation between EMG signal and speech rate as well as a relation between EMG signal and muscle force. Also, the area which covers lip muscle can be considered as an independent viscoelastic system. So the model is constructed based on the research results on the viscoelasticity of skin-muscle tissue and the quantitative relationship between lip muscle force and speech rate. In order to show the validity of the model, we have applied it to our Chinese speech animation system.
     Finally, in order to evaluate the quality of the synthesized speech animation system, a systemic evaluation approach of visual Chinese speech animation is proposed in this paper. Basically the approach consists of two main tests: acceptability test and intelligibility test. In acceptability test, the diagnostic acceptability measure approach has been used and the objective evaluation ingredient has been added. In intelligibility test, a novel approach called Visual Chinese Modified Rhyme Test, which is based on the previous Chinese Modified Rhyme Test in synthesized speech evaluation and focuses Chinese speech animation, has been proposed in this paper. At the same time, the factors of "punishment" and "forgiveness" are introduced to simulate the people's perception. At last, the synthesized evaluation result of the 3D speech animation system is concluded in this paper.
     According to the above researches, a Chinese Synchronized Speech Animation Demo System is constructed and a natural and realistic talking head is synthesized in this demo system.
[Albrecht 2002] Irene Albrecht, Jorg Haber, Kolja Kahler, Marc Schroder, and Hans-Peter Seidel.'May I talk to you? :-)'— Facial Animation from Text [C]. Proc. Pacific Graphics. Beijing, China. 2002: 77-86.
    [Basmajian 1985] J.V.Basmajian, C.J.Deluca. Muscles alive: Their functions revealed by electro- myography (5~(th) ed.)[M]. Baltimore: Williams & Wilkins, 1985.
    [Bergeron 1985] P. Bergeron and P. Lachapelle. Controlling facial expressions and body movements. In Advanced Computer Animation [C]. SIGGRAPH'85 Tutorials, ACM, New York.Volume 2, 1985:61-79.,.
    [Bernstein 2000] L.E.Bernstein, M.E.Demorest, and Tucker, P.E. Speech perception without hearing[J]. Perception & Psychophysics, Vol. 62(2), 2000: 233-252.
    [Bernstein 2003] L E. Berenstein. Visual speech perception. In Audiovisual Speech Processing[M]. E. Vatiokis-Bateson, G. Bailly & P. Perrier (Eds.). 2003
    [Blanz 1999] V Blanz, T Vetter. A Morphable Model for the Synthesis of 3D Faces [C]. SIGGRAPH'99 Conf. Proc. Los Angeles, USA, 1999: 187-194.
    [Blanz 2003] Volker Blanz, Curzio Basso, Tomaso Poggio, Thomas Vetter: Reanimating Faces in Images and Video[C]. Comput. Graph. Forum 22(3), 2003: 641-650.
    [Bourne 1973] G.H.Bourne. Structure and function of muscle. In Physiology and Biochemistry[M]. Second edition, Volume III. Academic Press, New York, 1973.
    [Brand 1999] M. Brand. Voice puppetry. In Proceedings of ACM SIGGRAPH 1999. ACM Press/Addison-Wesley Publishing Co. 1999: 21-28.
    [Bregler 1997] C. Bregler, M. Covell, and M. Slaney. Video Rewrite: Driving Visual Speech with Audio[C]. Proc. SIGGRAPH 97, Los Angeles, CA, 1997: 353-360.
    [Cassell 1994]J.Cassell,C.Pelachaud,N.Badler,M.Steedman,B.Achorn,W.Becket,B.Douville,S.Prevost,AND M.Stone.Animated conversation:Rule-based generation of facial expression,gesture and spoken intonation for multiple conversational agents[C].In Proceedings of ACM SIGGRAPH,1994:413-420.
    [Chai 2003]Chai J X,Xiao J,Hodgins J.Vision-based Control of 3D FacialAnimation[C].Eurographics/SIGGRAPH Symposiumon Computer Animation,2003:193-206.
    [Chernoff 1971]H.Chernoff.The use face to represent points in n-dimensional space graphically[R].Technical Report Project NR-042-993,Office of Naval Research,Washington DC,December 1971.
    [Cohen 1990]M.Cohen and D.Massaro.Synthesis of Visible Speech[J].Behavioral Research Methods and Instrumentation.1990,22(2):260-263.
    [Cohen 1993]M.Cohen and D.Massaro.Modeling Coarticulation in Synthetic Visual Speech[M].In N.M.Thalmann and D.Thalmann,editors,Models and Techniques in Computer Animation.Springer-Verlag,1993.
    [Cohen 1994]M.Cohen and D.Massaro.Development and experimentation with synthetic visual speech[J].Behavioral Research Methods,Instrumentation,and Computers.1994,26:260-265.
    [Cohen 1996]M.M.Cohen,R.L.Walker,and D.W.Massaro.Perception of synthetic visual speech[M].In:Speech reading by humans and Machines,D.G.Stroke and M.E.Hennecke(Eds.),New York:Springer 1996:153-168.
    [De Luca 1997]De Luca CJ.The use of surface electromyography in bio-mechanics[J].J Appl Biomech.1997,13:135-163.
    [Ekman 1978] P. Ekman and W. V. Friesen, Manual for the Facial Action Coding System[M]. Consulting Psychologists Press, Inc., Palo Alto, CA, 1978.
    [Epstein 2002] M.Epstein, N. Hacopian, P. Ladefoge. Dissection of the Speech Production Mechanism [M], Los Angeles: The UCLA Phonetics Laboratory, 2002: 12-15.
    [Ezzat 1998] T. Ezzat T. Poggio. MikeTalk: A talking facial display based on morphing visemes[C]. In Proc.Computer Animation Conference, Philadelphia, USA, 1998: 456-459.
    [Ezzat 2000] T. Ezzat, and T. Poggio. Visual speech synthesis by morphing visemes. International Journal of Computer Vision, 38, 2000: 45-57.
    [Ezzat 2002] T. Ezzat, G. Geiger and T. Poggio. Trainable videorealistic speech animation[J], ACM Transactions on Graphics, 2002, 21(3): 388-398.
    [Fung 1993] Y. Fung. Biomechanics: Mechanical Properties of Living Tissues[M]. Slringer Verlag, 1993.
    [Gillenson 1974] M.L.Gillenson. The Interactive Generation of Facial Images on a CRT Using a Heuristic Strategy[D]. PhD thesis, Ohio State University, Computer Graphics Research Group, Columbus, OH, March 1974.
    [Guiard-Marigny 1994] T. Guiard-Marigny, A.Adjoudani, and C.Benoit. A 3D model of the lips for visual speech synthesis[C]. In Proc. 2~(nd) ETRW on Speech Synthesis, New Platz, New York 1994: 49-52.
     [Hill 1988] D. R. Hill, A. Pearce, and B. Wyvill. Animating speech: an automated approach using speech synthesis by rules[J]. The Visual Computer, 1988, 3: 277-289,.
    [Horn 1981] B.K.P Horn and B.G. Schunk. Determining optical flow[J]. Artificial Intelligence, 1981, 17:185-203.
    [贾 2000]贾云得.机器视觉[M].北京:科学出版社,2000:235-239.
    [Kaiberer 2002]G.A.Kalberer,P.Mueller,and L.V.Gool.Speech animation using viseme space[C].In Vision,Modeling,and Visualization VMV 2002.Akademische Verlagsgesellschaft Aka GmbH,Berlin,Germany.2002:463-470.
    [Kalra 1991]P.Kalra,A.Mangili,N.Magnenat-Thalmann,and D.Thalmann.SMILE:a multi layered facial animation system[C].In IFIP WG,Tokyo,1991:189-198.
    [康 2003]康恒,刘文举.基于综合因素的汉语连续语音库语料自动选取[J],中文信息学报,2003,17(4):27-32.
    [Kent 1977]R.D.Kent and F.D.Minifie.Coarticulation in recent speech production models[J].Journal of Phonetics,1977,5:115-135.
    [Kirby 1990]M.Kirby,L.Sirovich.Application of the Karhunen-Loeve procedure for the characterization of human faces.IEEE Transactions on Pattern Analysis and Machine Intelligence,1990,12(1):103-108.
    [Kshirsagar 2000]S.Kshirsagar,and N.Magnenat-Thalmann.Lip Synchronization Using Linear Predictive Analysis[C],Proceedings of IEEE International Conference on Multimedia and Expo,New York,USA,2000:1077-1080.
    [Kuehn 1976]D.P.Kuehn,K.L.Moll.A cineradiographic study of VC and CV articulatory velocities[J].Journal of Phonetics,1976,4:303-320.
    [Lee 1995]Y.C.Lee,D.Terzopoulos,and K.Waters.Realistic face modeljng for animation[C].In Proceedings of SIGGRAPH'95,1995:55-62.
    [Le Goff 1994]B.Le Goff,T.Guiard-Marigny,M.Cohen,and C.Benoit.Real-time analysis-synthesis and intelligibiliIy of talking faces[C].In Proc.2~(nd)ETRW on Speech Synthesis,New Platz,New York,1994:53-56.
    [Lewis 1987]J.P.Lewis and F.I.Parke.Automatic lip-synch and speech synthesis for character animation[C].In Proc.Graphics Interface'87 CHI+CG'87,Canadian Information Processing Society,Calgary,1987:143-147.
    [Li 2000]Z.Li,E.C.Tan,I.McLoughlin,T.T.Teo.Proposal of standards for intelligibility test of Chinese speech[J],IEE Proc.-Vis.Image Signal Process,2000,147(3):254-260.
    [林 1999]林焘,王理嘉.语音学教程[M].北京:北京大学出版社,1999.
    [刘 2002]刘关松,陆宗骐,徐建国等.几种彩色模型在不同光照条件下的稳定性分析[J].小型微型计算机系统,2002,23(7):882-885.
    [Liu 2003]LIU Wen-tao,YIN Bao-cai,JIA Xi-bin,KONG De-hui.A Realistic Chinese Talking Face[C],1~(st)Indian International Conference on Artificial Intelligence(ⅡCAI-03)2003:1244-1254.
    [Lofqvist 1990]A.Lofqvist.Speech as audible gestures[M].In W.J.Hardcastle and A.Marchal,editors,Speech Production and Speech Modeling,.Kluwer Academic Publishers,Dordrecht,1990:289-322.
    [Magnenat-Thalmann 1988]N.Magnenat-Thalmann,N.E.Primeau,and D.Thalmann.Abstract Muscle Action Procedures for Human Face Animation[J],The Visual Computer,3(5):290-297,March 1988.
    [McGurk 1976]Harry McGurk and John MacDonald.Hearing lips and seeing voices[J].Nature 1976,264,746-748.
    [梅 2000]梅丽,鲍虎军,郑文庭,彭群生.基于实拍图像的人脸真实感重建[J].计算机学报,2000,23(9):996-1002
    [梅 2001]梅丽,鲍虎军,彭群生.特定人脸的快速定制和肌肉驱动的表情动画[J].计算机辅助设计与图形学学报,2001,13(12):1077-1082
    [Morishima 1993]S.Morishima and H.Harashima.Facial animation synthesis for human-machine communication system[C].In Proc.5~(th)International Conf.on Human-Computer Interaction,ACM,New York,Volume Ⅱ,1993:1085-1090.
    [Moubaraki 1996]L.Moubaraki,J.Ohya.Realistic 3D Mouth Animation Using a Minimal Number of Parameters[C],IEEE International Workshop on Robot and Human Communication,Tsukuba,Japan,1996:201-206.
    [Ohta 1990] OHTA Naoya. Optical flow detection by color images[J]. NEC Research and Development, 1990,97: 78-84.
    [Ohta 1996] OHTA Naoya. Uncertainty models of the gradient constraint for optical flow computation[J]. IEICE Transactions on Information and Systems, 1996, E79-D(7): 958-964.
    [Ostry 1985] D. J. Ostry, K. G. Munhall. Control of rate and duration of speech movements[J]. Journal of the Acoustical Society of America, 1985, 77: 640-648.
    [Pandzic 1999] Pandzic, I.S., Ostermann, J. and Millen, D. User evaluation: Synthetic talking faces for interactive services[J]. The Visual Computer, 1999, 15: 330-340.
    [Papamichalis 1987] P. E. Papamichalis. Practical approaches to speech coding[M]. Prentice Hall, Englewood Cliffs. NJ, 1987.
    [Parke 1972] F. I. Parke. Computer generated animation of faces[D]. Master's thesis, University of Utah, Salt Lake City, UT, June 1972. UTEC-CSc-72-120.
    [Parke 1974] F. I. Parke. A Parameteric Model for Human Faces[D]. PhD thesis, University of Utah, Salt Lake City, UT, December 1974, UTEC-CSc-75-047.
    [Parke 1975] F.I.Parke. A model for human faces that allows speech synchronized animation[J]. Journal of Computers and Graphics, 1975, 1(1):1-4.
    [Parke 1982] F. I. Parke. Parameterized models for facial animation[J] IEEE Computer Graphics, 1982, 2(9): 61-68.
    [Parke 1990] F. I. Parke, editor. State of the Art in Facial Animation[C], SIGGRAPH '90, Course Notes #26. ACM, New York, August 1990.
    [Parke 1991] F. I. Parke. Control Parameterization for facial animation[M]. Computer Animation, Tokyo: Springer-Verlag, 1991: 3-13.
    [Parke 1996] F. I. Parke. K. Waters. Computer Facial Animation[M]. Wellesley, MA: A. K. Peters, 1996: 1-365.
    [Pearce 1986] A. Pearce, B. Wyvill, G. Wyvill, and D. Hill. Speech and expression: A computer solution to face animation[C]. In Proc. Graphics Interface'86, Canadian Information Processing Society, Calgary, 1986: 136-140.
    [Pelachaud 1991]C.Pelachaud.Communication and Coarticulation in Facial Animation[D].PhD thesis,University of Pennsylvania,Philadelphia,October 1991.Technical Report MS-CIS-91-77.
    [Pelachand 1996]C.Pelachaud,N.Badler,and M.Steedman,Generating Facial Expressions for Speech[J].Cognitive Science,1996,20(1):1-46.
    [皮 1987]皮昕等.口腔解剖生理学[M].北京:人民卫生出版社,1987:111-115.
    [Platt 1980]S.M.Platt.A system for computer simulation of the human face[D].Master's thesis,The Moore School,University of Pennsylvania,Philadelphia,1980.
    [Platt 1981]S.M.Platt and N.I.Badler.Animating facial expressions[J].Computer Graphics,1981,15(3):245-252.
    [Quackenbush 1988]S.R.Quackenbush,T.P.Barnwell Ⅲ,M.A.Clements.Objective measures of speech quality[M].Prentice Hall,Englewood Cliffs,1988.
    [单 2002]单卫,姚鸿勋,高文.唇读中序列口型的分类[J].中文信息学报,2002,16(1):31-36.
    [Sirovich 1987]L.Sirovich,M.Kirby.Low-dimensional procedure for the characterization of human face[J].J.Opt.Soc.Am.1987,4:519-524.
    [Song 2003]Mingli Song,Chun Chen,Jiajun Bu,and Ronghua Liang.3D Realistic Talking Face Co-driven by Text and Speech[C].IEEE International Conference on Systems,Man and Cybernetics,Washington,D.C,USA,2003:2175-2186.
    [Steeneken 1992]H.J.M.Steeneken.Quality evaluation of speech processing systems[M].In INCE,A.N.(Ed.):Digital speech processing:speech coding,synthesis and recognition.Kluwer Academic Publishers,1992.
    [Terzopoulos 1990]D.Terzopoulos,K.Waters.Physically based Facial Modeling,Analysis,and Animation[J]Journal of Visualization and Computer Animation,1990,1(4):73-80.
    [Terzopoulos 91]D.Terzopouls,K.Waters,Techniques for Realistic Facial Modeling and Animation[J]In Proceeding of Computer Animation,Geneva,Switzerland,Springer-Verlad,Tokyo,1991:59-74.
    [Terzopoulos 1993]D.Terzopouls,and K.Waters,Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1993,15(6):569-579.
    [Vatikiotis-Bateson 1996]Vatikiotis-Bateson,E.,Munhall,K.G.,Hirayama,M.,Lee,Y.C.,&Terzopoulos,D.The dynamics of audiovisual behavior in speech.[M].In:Speechreading by humans and machines(NATO-ASI Series F),D.Stork & M.Hennecke(Eds.),Berlin:Springer-Verlag,1996:150,221-232.
    [Voiers 1983]W.D.Voiers.Evaluating processed speech using the diagnostic rhyme test[J].Speech Technol,1983,30-39.
    [王2000]王志明,蔡莲红.汉语音节与口形关系的研究(C).第九届全国多媒体技术学术会议(NCM T'2000),北京,Dec.2000.
    [王2005]王志明,蔡莲红,艾海舟.Text-To-Visual Speech in Chinese Based on Data-Driven Approach.软件学报,2005,16(6):1054-1063.
    [Waters 1987]K.Waters.A Muscle Model for Animating Three Dimensional Facial Expression [J]Computer Graphics(SIGGRAPH'87),1987,22(4):17-24.
    [Waters 1993]K.Waters,T.M.Levergood.DECface:An automatic lip-Synchronization algorithm for synthetic faces[R].DEC Cambridge Research Laboratory,1993.
    [Web 2007]http://www.dynastat.com/
    [Web 2008]http://hwr.nici.kun.nl/~miami/taxonomy/node120.html
    [Williams 1990]L.Williams.Performance Driven Facial Animation[J]Computer Graphics (ACM SIGGRAPH'90),1990,24(4):235-242.
    [Wohlert 2000]A.B.Wohlert,V.L.Hammen.Lip muscle activity related to speech rate and loudness[J].Journal of Speech,Language,and Hearing Research,2000,43:1229-1239.
    [Wu 1994]Y.Wu,N.Magnenat-Thalmann,and D.Thalmann.A Plastic-Visco-Elastic Model for Wrinkles in Facial Animation and Skin Aging[C].In Proc.Pacific Graphics '94,1994:201-214.
    [Wyvill 1988]B.Wyvill,D.R.Hill,and A.Pearce.Animating speech:An automated approach using speech synthesized by rules[J].The Visual Computer,1988,3(5):277-289.
    [徐 1980]徐世荣.普通话语音知识[M].北京:文字改革出版社,1980.
    [徐 2004]徐成华,王蕴红,谭铁牛.三维人脸建模与应用[J].中国图形图像学报,2004,9(8):893-903.
    [晏 1998]晏洁.文本驱动的唇动合成系统[J].计算机工程与设计,1998,19(1):31-34.
    [晏 1999a]晏洁.具有真实感的三维人脸合成方法的研究与实践[D].哈尔滨:哈尔滨工业大学,1999.
    [Yin 1997]B.C Yin,W.Gao.Radial Basis Function Interpolation on Space Mesh[C].Virtual Proceedings of ACM SIGGRAPH97.1997:150.
    [Zhang 2004]Y.Zhang,E.C.Prakash,E.Sung.A new physical model with multilayer architecture for facial expression animation using dynamic adaptive mesh[J].IEEE Trans.on Visualization and Computer Graphics,2004,10(3):339-352.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700