音乐驱动舞蹈编辑技术的研究与系统实现

英文题名：Research on the Music-Driven Dance Edit Technology and System Application
作者：邵未
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：音乐驱动 ; 虚拟人 ; 编钟乐舞 ; MIDI解析 ; AUDIO解析 ; 乐段解析 ; 舞蹈文法 ; 动作关联度 ; 情感模型 ; Hevner修辞表 ; 多媒体同步
英文关键词：Music-Driven ; Virtual Human ; Bianzhong Dance ; MIDI Analysis ; AUDIO Analysis ; Music Analysis ; Dance Grammer ; Motion-Relation ; Sensation Mode ; Hevner Adjective Circle ; multimedia synchronization
学位年度：2004
导师：孙守迁
学科代码：081203
学位授予单位：浙江大学
论文提交日期：2004-03-01

摘要

计算机辅助动画设计的优势在于，可以借助计算机设计软件来完成大量较高难度类型的动作方案，特别是那些需要处理好动作碰撞，冲突关系以及声音同步的动作夸张的人体角色。在真实的世界里，声音一般都是与特定的动作事件联系起来，两者之间存在天然的紧密联系。因此，想要完成有影响力和具感染力效果的动画，必须做到音频与动作的同步。从目前的研究现状来看，在动画设计界，这仍然是一个异常重要而复杂，且基础薄弱的课题。
     随着虚拟人概念的提出和音频解析理论的日臻成熟，以音乐为驱动模式来实现音频、视频同步设计系统的需求越来越迫切得被提到数字化艺术设计领域的议事日程上来。而以编钟乐舞为代表的民族特色舞蹈为音乐驱动运动编辑前沿技术注入了全新的概念和素材。二者的融合不但为音乐驱动运动编辑理念的实现提供了极具实践性和艺术价值的承载平台，同时也是运用虚拟现实技术实现对编钟乐舞的数字化重现中的一项重要工程，将在数字化艺术设计领域以及民族舞蹈创新上做出创造性的贡献。
     系统设计的主要目的就是通过结合对MIDI信号和Audio信号的解析，形成高级的音频信息进行动作的选择与编辑，从而简化并自动生成针对具体音乐片段的同步的动作序列。这种统一的系统框架结构将帮助角色动画设计师们尝试着把不同的音乐／声音解析工具和统一的运动编辑模式形成多样的组合，而无需对音频解析或运动编辑技术中的任何一方有深入的理解和掌握。
     本文首先概述了音频驱动(Music-Driven)理论与技术特点，阐述了音频解析各个阶段特点及具体实现方式，为针对MIDI音源文件的音频解析的具体实现奠定理论基础。在分析音频解析技术原理及应用实例的基础上，本文针对系统所采用的MIDI格式的音乐，提出以MIDI解析、AUDIO解析、乐段解析为主要内容的三层音频解析框架结构，以音乐／动作脚本文件作为音频解析最终结果。之后，文章详述了编钟舞蹈的特征及其数字化方法，在编钟舞蹈文法分析的基础上提出了动作关联度的概念。运用动作关联度模块对不同动作之间衔接进行评价，以得到最优匹配。在完整叙述了音频(MIDI音乐)、视频(Poser定制下的编钟乐舞动作)各自运作流程和阶段性结果之后，本文转入了如何将音频、视频同步结合起来的讨论。本文引入了基于Hevner修辞表的情感模型来实现将音频流及视频对应起来的机制，同时改进原有算法，完成二者之间的同步。最后，应用以上技术，本文设计了一个音乐驱动的舞蹈编辑原型系统，用于实现基于编钟音乐进行特色舞蹈选择、编排的目标，并简要介绍了系统的结构、模块以及设计流程。
Creating animation, with or without the use of a computer, has always been a very time-consuming process. The advent of computer-aided animation has allowed computers to perform a great deal of the effort involved in this type of work, especially for tasks suck as animation human characters, dealing with collisions, and sound synchronization. Sounds are generally associated with motion events in the real world, and as a result thereis an intimate linkage between the two. Hence, producing effective animations requires synchronization of sound and motion, which remains an essential, yet difficult,task in animation.
    With the bringing forward of the Digital Human conception and the growning up of the Frequency Analysis Theory, The requirement that implementing the frequency-motion synchronization design system by Music-Driven Mode has been refered to the research of Digital Art Design domain more imminently. And the national region dance such as Bianzhong Dance give Music-Driven Motion Editing research bran-new ideas and material. The combination of these two not only supply implementing of Music-Driven Motion Edit technology practiced and artistic running foundation, but also a important project of the realization of digital of Bianzhong Dance by Virtual Reality technology, will give contribute to the Digital Art Design region and the innovation of the national dance.
    The goal of this work is to combine the power of advanced music analysis performed on both MIDI and audio signals with motion editing to simplify and automate the generation of synchronised musical animations. This unified framework will enable animators to try out different combinations of music/sound analysis techniques with familiar motion editing methods without having to acquire a deep understanding of either.
    The Paper first summarize the characteristic of Music-Driven technology, expound each phase of Frequency Analysis and the idiographic process, and establish the foundation of the implement of Frequency Analysis in MIDI mode. Then,after assaying the Frequency Analysis technology and applications, referring to the MIDI format, advance a triple Frequency Analysis framework based on MIDI analysis, Audio analysis



    and Music analysis, and output the music/motion script file as the finally result. Afterwards, it depict the characteristic of Bianzhong Dance and the Digital method, and put forward the conception of Motion-Relation based on the Bianzhong Dance grammar. Give the estimation between different motion by Motion-Relation module, and educe the optimization matching. After depiction of the flow and phase result of Frequency Analysis(MIDI) and Motion-Relation(based on Poser),the paper switch to the discussion of how to combine the Frequency flow and Video flow, and make them synchronous. We import the mechanism which make the Frequency flow and Video flow corresponding to each other by Hevner Adjective Circle, at the same time system improve current arithmetic to make them synchronous. Ultimately, by applications of those technology mentioned forward, the paper accomplish a Music-Driven Dance Edit archetypal system, for the purpose of select and edit dance action according to the Bianzhong music, and then simply in
    troduces the architecture, the modules and the design flow of the system.

引文

[1] Marc Cardle, Loic Barthe, Stephen Brooks and Peter Robinson. Music-Driven Motion Editing: Local Motion Transformations Guided By Music Analysis. EGUK, 2002.
    [2] A.Bruderlin and L.Williams. Motion signal processing. SIGGRAPH 1995, August 1995.
    [3] Ekman P, Friesen WV(1975), Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues. Prentice Hall.
    [4] D.Protopsaltou, C.Luible, M.Arevalo, N.Magnenat-Thalmann. A body and garment creation method for an Internet based virtual fitting room, Computer Graphics International 2002 Conference Proceedings, Springer Verlag, pp.105-122, July, 2002.
    [5] D.Thalmann, Human Modelling and Animation, Eurographics '93 State-of-the-Art Reports, Chapter 7.
    [6] P.Fua, R.Plnkers, and D.Thalmann, Realistic Human Body Modeling. In Fifth International Symposium onthe 3-D Analysis of Human Movement, Chattanooga, TN, July 1998.
    [7] R.Boulic, N.Magneaat-Thalmann, D.Thalmann. A Global Human Walking Model with Real-time Kinematics Personification. The Visual Computer, Vol.6, No6, December 1990, pp.344-358.
    [8] R.Boulic, N.Magnenat-Thalmann, D.Thalmann. Human Free-Walking Model for a Real-time Interactive Design of Gaits. Proc. Computer Animation '90, Springer, Tokyo, pp.61-801.
    [9] B.Arnaldi, Dumont G, G.Hégron, N.Magnenat-Thalmann, D.Thalmann, Animation Control with Dynamics in: State-of-the-Art in Computer Animation, Springer, Tokyo, pp.113-124.
    [10] Hegron G, Palamidese P, Thalmann D, Motion Control in Animation, Simulation, and Visualization, Computer Graphics Forum,, North Holland, Vol.8, No.4, 1989, pp.347-352.
    [11] N.Magnenat-Thalmann, D.Thalmann, Motion control of synthetic actors: an integrated view of human animation, Proc. Mechanics, Control and Animation of Articulated Figures, MIT.
    [12] Molet T., Huang Z., Boulic R., Thalmann, D. An Animation Interface Designed for Motion Capture, Proc. Computer Animation'97, IEEE CS Press, 1997, pp.77-85.
    [13] M.C.Silaghi, R.Plaenkers, R.Boulic, P.Fua, D.Thalmann, Local and Global Skeleton Fitting Techniques for Optical Motion Capture, in: N.

    Magnenat Thalmann, D.Thalmann (eds), Modeling and Motion Capture Techniques for Virtual Environments, Lecture Notes in Artificial Intelligence, No1537, Springer, 1998, pp.26-40.
    [14] L.Bezault, R.Boulic, N.Magnenat-Thalmann, D.Thalmann, An Interactive Tool for the Design of Human Free-Walking Trajectories, Proc. Computer Animation'92, Springer, Tokyo, pp.87-104.
    [15] D.Thalmann, Virtual Sensors: A Key Tool for the Artificial Life of Virtual Actors, Proc. Pacific Graphics 95, Seoul, Korea.
    [16] D.Thalmann, C.Babski, T.Capin, N.Magnenat Thalmann, Ⅰ. Pandzic, Sharing VLNET Worlds on the WEB, Proc. Compugraphics'96, Marne-la-Vallee, France.
    [17] T.K.Capin, M. J Jovovic, J. Esmerado, A.Aubel, D.Thalmann, Efficient Network Transmission of Virtual Human Bodies, Proc. Computer Animation'98, Philadelphia, IEEE Computer Society Press, 1998, pp.41-48.
    [18] Norman I. Badler, Martha S.Palmer and Rama Bindiganavale. "Animation Control for Real-Time Virtual Humans, "Communications of the ACM 42(8), August 1999, pp.64-73.
    [19] H. Ko and N. Badler. "Animating human locomotion in real-time using inverse dynamics, balance and comfort control," IEEE Computer Graphics and Applications 16(2), March 1996, pp.50-59.
    [20] Bruderlin A. and Carvert T.W., Goal-Birected, Dynamic Animation of Human Walking, Proc. of ACM SIGGRAPH'89. PP.233-242.
    [21] Kunii T.L.and Sun L., Dynamic Analysis-Based Human Animation, Proc. of CG International'90, pp.3-16.
    [22] Carvert T.W., Welman C., Gaudet S., Schiphorst T. and Lee C.,Composition of multiple figure sequences for dance and animation, The Visual Computer, 7(2-3), 1991, pp.114-121.
    [23] Hodgins J.K., Wooten W.L., Brogan D.C. and O'Brien J. F., Animating Human Athletics, Proc. of ACM SIGGRAPH'95, pp71-78.
    [24] Emilios Cambouropoulos. Extracting significant patterns from musical strings. In Proceedings of the AISB'99 Convention (Artificial Intelligence and Simulation of Behaviour), 2000.
    [25] 宋博．音频信息检索的研究及实现．计算机应用，2003，12，Vol．23．
    [26] 刘晓翔，张树生，王贺，朱玉璋．计算机光学乐谱识别技术．计算机工程，2003，2，Vol．29．
    [27] 刘丹，张乃尧，朱汉城．音乐特征识别的研究综述．计算机工程与应用，2003，

    24，Vol．38．
    [28] 刘丹，张乃尧，朱汉城．基于音乐特征识别的音乐喷泉计算机辅助设计系统．电子技术应用，2003，3，Vol．29．
    [29] 章十峰．认知科学导论．人民出版社，1992．
    [30] 王次昭．音乐美学．高教出版社，1994．
    [31] 林志杰．MIDI爱好者手册音乐数字接口．学苑出版社，1994．
    [32] I．A扎德著．陈国权译．模糊集合、语言变量及模糊逻辑．科学出版社，1982，3，Vol．29．
    [33] 吴祖强．曲式与作曲分析．人民音乐出版社，1981．
    [34] H.Katayose, T Fukuoka, K Takami. Expression extraction in virtuoso music performances. Pattern Recognition, 1990, 1(1): 780～784.
    [35] Margaret L Johnson. Toward an expert system for expressive musical performance. Computer, 1991, 24(7): 30～34.
    [36] M S Jong, H H Chang, K S Yoo. A Melody-Based Similarity Computation Algorithm for Musical Information. Workshop on Knowledge and Data Engineering Exchange, 2000, 1:14～21.
    [37] K.Nagashima, J Kawashima. Experimental study on arranging music by chaotic neural network. International Journal of Intelligent Systems, 1997, 12: 323～339.
    [38] A L P Chen, M Chang, J Chen. Query by music segments: an efficient approach for song retrieval. Proceedings of IEEE International Conference on Multimedia and Expo, 2000, 2:873～876.
    [39] C David. Recombinant music using the computer to explore musical style. Computer, 1991, 24(7): 22～27.
    [40] 杨儒怀．音乐分析论文集．中国文联出版社，2000．
    [41] Y Takashi. Phrase Based Feature Extraction for Musical Information Retrieval. Computers and Signal Processing, 1999:396～399.
    [42] K Bozena. Computer-based recognition of musical phrase using the rough-set approach. Intelligent Systems, 1998, 104(1-2): 15～30.
    [43] 人民音乐出版社编辑部．西洋音乐的风格与流派．人民音乐出版社，1995．
    [44] 邓华辉，王伟，张卫东．DTS音频文件软件解码的实现．电声技术，2003，3．
    [45] Janet R Barrett. Playing with Words, Images, and Sounds Texas Music Educators Convention. University of Wisconsin-Whitewater, 2003.
    [46] Kate Hevner Mueller. American Journal of Psychology. 1936.
    [47] 胡新荣，杨翠萍．音、视频信号同步技术的研究．计算机工程与应用，2003，

    3，Vol．39．
    [48] 陈俊华等．多媒体二级同步模型．计算机研究与发展，1996，33(5)．
    [49] 金涛．分布式多媒体系统中的同步问题研究．计算机研究与发展，1996，36(12)．
    [50] 王素彬等．多媒体系统的交互式同步模型．计算机工程与应用，1997，33(7)：20～21．
    [51] 张承云等．多媒体计算机的音频实时处理．电声技术，2000，1：19～21．
    [52] 郑庆华等．分布式多媒体同步中表现质量的参数计算．通信学报，1999，20(10)．
    [53] 张昱，林志勇，陈意云．分布式多媒体系统中的媒体同步．小型微型计算机系统，2003，3，Vol．24．
    [54] Rangan P V, Kumar S S, Rajan S. Continuity and synchronization in MPEG. IEEE Journal on Selected Area in Communication, Special Issue on Multimedia Synchronization, 1996, 14(1): 52～60.
    [55] Srinivas Ramanathan, P Venkat. Continuous media synchronization in distributed multimedia systems. International Workshop on Network and Operating Systems Support for Digital Video and Audio, 1992,289～296.
    [56] Coding of Moving Pictures and Associated Audio MPEGI. Press of Nanjing University, 1995.
    [57] Andrew S Tanenbaum. Computer networks. Prentice Hall, 1996.
    [58] Jean Chrysostome Bolot, Thierry Turletti. A rate control mechanism for packet video in the internet. Proc. IEEE INFOCOM'94, 1994, 1216～1223.
    [59] Sadka A H, Eryurtlu F, Kondoz A M. Rate control feedback mechanism for packet video networks. Electronic Letters, 1996, 32(8): 716～717.
    [60] Ritter M.Analysis of rate-based control policy with delayed feedback and variable bandwidth availablity. Report NO.133, Research Report Series, Institute of Computer Science, University of W.urzburg, Germany, 1996, 1～21.
    [61] Lin Zhi-yong, Zhang Yu, Chen Yi-yun. The application of direct-show in MPEG-Ⅰ stream player. Computer Engineer, 2001, 27(6): 140～142.
    [62] S.Coquillard, A control-point-based sweeping technique, CAD, 1997
    [63] P.Desain and H.Honing. Computational models of beat induction: the rule-based approach. Journal of New Music Research, pages 1-10, 1995.
    [64] T.Frank, M.Hoch, and G.Trogemann. Automated lip-sync for 3d-character animation. 15th IMACS World Congress on Scientific

    Computation, Modelling and Applied Mathematics, August 1997.
    [65] David Gerhard. Audio Signal Classification. Phd thesis, School of Computing Science, Simon Fraser University, 2000.
    [66] Jounghyun Gerard and Jane Hwang. Musical motion: A medium for uniting visualization and control of music in the virtual environment. Int. Conference on Virtual Systems and Multimedia, 1999.
    [67] J.Hahn, H.Fouad, L.Gritz, and J. Lee. Integrating sounds and motions in virtual environments. Sound for Animation and Virtual Reality, SIGGRAPH 95 Course n.10 Notes, 1995.
    [68] J.K.Hahn, J.Geigel, Jong Won Lee, L.Gritz, T.Takala, and S.Mishra. An integrated approach to motion and sound. The Journal of Visualization and Computer Animation, 6(2):109-124,1995.
    [69] J.Huopaniemi, L.Savioja, and T.Takala. Diva virtual audio reality system. Proc. Int. Conf. Auditory Display (ICAD'96), pages pp.111-116, 1996.
    [70] K.Kashino and H.Tanaka. A sound source separation system with the ability of automatic tone modeling. Proceedings of the 1993 International Computer Music Conference, pages pp.248-255, 1993.
    [71] J.Lasseter, Principles of Traditional Animation Applied to 3D Computer Graphics, SIGGRAPH'87, 1987
    [72] Peter C.Litwinowicz. Inkwell: A21/2-d animation system. SIGGRAPH 1991 Proceedings, 25:pages 113-122, July 1991.
    [73] H.Longuet-Higgins and C.Lee. The perception of musical rhythms. Perception, 1982.
    [74] W.Lytle. More bells and whistles [video]. In SIGGRAPH'91 film show, 1994.
    [75] W.Lytle. Driving computer graphics animation from a musicalscore. Scientific Excellence in Supercomputing: The IBM 1990 Contest Prize Papers, 1990.
    [76] S.Mishra and J.Hahn. Mapping motion to sound and music in computer animation and ve. Invited Paper, Proceedings of Pacific Graphics., 1995.
    [77] Jun ichi Nakamura, Tetsuya Kaku, Tsukasa Noma, and Sho Yoshida. Automatic background music generation based on actors' emotion and motions. In First Pacific Conference on Computer Graphics and Applications, 1993.


    [78] Ken Perlin. Real time responsive animation with personality. IEEE Transactions on Visualization and Computer Graphics, 1:pages 5-15, March 1995.
    [79] Ken Perlin and Athomas Goldberg. Improv: A system for scripting interactive actors in virtual worlds. Computer Graphics, 30(Annual Conference Series):205-216,1996.
    [80] Charles Rose, Michael F. Cohen, and Bobby Bodenheimer. Verbs and adverbs: Multidimensional motion interpolation. IEEE Computer Graphics and Applications, 18(5), 1998.
    [81] Robert Rowe. Interactive Music Systems. The MIT Press, 1993.
    [82] A. L. M. Zs Ruttkay. Chartoon 2.1 extensions;expression repertoire and lip sync. Technical Report INSR0016,Information Systems(INS), 2000.
    [83] L. Savioja, J. Huopaniemi, T.Lokki, and R.Vnnen. Virtual environment simulation-advances in the diva project. Proc. Int. Conf. Auditory Display (ICAD'97),1997.
    [84] F.Silva, Motion cyclification by time ⅹ frequency warping. In Proceedings of SIBGRAPI'99, Ⅻ Brazilian Symposium of Computer Graphics and Image Processing, pages 49-58, 1999.
    [85] Eric Singer Robert Rowe. Two highly integrated realtime music and graphics performance systems. ICMC, Thessaloniki, 1997.
    [86] D.Temperly, The Perception of Harmony and Tonality: An Algorithmic Perspective, University of Columbia, PhD thesis, 1996.
    [87] Munetoshi Unuma, Ken Anjyo and Ryozo Takeuchi, Fourier Principles for Emotion-based Human Figure Animation. SIGGRAPH'95, 1995.
    [88] Adam Woodgain. Visualizing expressive movement in music. Proceedings of San Francisco IEEE Visualization'96, 1996.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700