媒体处理器编译器中SIMD编译优化技术的研究与实践
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
编译器是任何计算机系统不可缺少的重要部分,它负责将用户用高级语言开发的程序翻译为汇编语言,进而转换成可执行的目标机器代码。随着语言和目标平台体系结构的发展,编译器的设计也面临了越来越大的挑战,它不但要适应各种新的高级编程语言,而且要支持目标机器中出现的新的硬件特征。媒体处理器在数字信号处理和多媒体处理等领域有着广泛的应用,为了提高多媒体处理的能力,媒体处理器中在硬件设计方面的一个普遍特征是引入了SIMD指令,从而使得媒体数据的快速处理成为可能。但目前的编译技术不能很好的提供对SIMD指令的支持,因此本文深入研究了面向媒体处理器的编译器中的一些关键技术。
     目前使用SIMD指令的主要手段是程序员在编译器的有限支持下,直接通过汇编语言来编写代码,这样不仅费时费力,而且代码重用率低下,严重影响了软件开发的效率。为了能够让SIMD指令被充分利用,我们需要编译器自动的从高级语言生成与媒体处理器相对应的SIMD指令(称为SIMD编译优化)。
     MD32是浙江大学信息与通信工程研究所自行开发设计的32位媒体处理器,它结合了RISC和DSP各自的特点,主要面向媒体处理等应用。并提供了以MMX技术为基础的多条面向多媒体运算的SIMD指令,称为MDS指令。
     本文首先从媒体处理器MD32中的SIMD指令的特点和编译器中代码生成两个角度去分析传统编译器无法生成SIMD指令原因,并结合基于树模式匹配的代码生成的特点,设计了一种改进的指令选择器来支持SIMD指令的生成。
     MD32还在不断的发展之中,其功能也在不断的增强。为了适应目标机器的这种变化,需要在可重定向编译器中实现对SIMD指令的支持,本文提出了一种基于中间表示树合并的SIMD编译优化方法,在提高了代码质量的同时,能保持原编译器的可重定向性不变,使其能够快速的适应目标机器在指令系统等方面的变化。另外,改进后的编译器也能够十分方便地重定向到其他媒体处理器。
Compiler is an important part of computer system. Usually, it is used to translate source codewritten in a high-level programming language to assembly code or machine code. With thedevelopment of the high-level languages and the structure (or the architecture) of the processors,the design of the compiler is becoming more and more complex and difficult. Media processorsare widely used in the digital signal processing and multimedia processing. In order to meet thespecial needs in media processing, they have a common characteristic in the hardware designingand implementation—SIMD. Nowdays, several compilers have been developed to generate SIMDinstructions. But, their support to SIMD instructions is not good enough as people wished. Thisthesis is involved with some research on compilation techniques for media processors with SIMDinstructions.
     Nowdays many multimedia applications are written in assembly code by hand. Although thiscan take full advantages of the processor's capability, including SIMD, it will lead to poorreadability, portability problem and high cost of software development and maintainance. In orderto ultilize SIMD instruction fully, we need compiler to translate the high-level languages to SIMDinstructions of media processors automatically. This is called SIMD compilation optimization.
     MD32 is a 32-bit media processor designed by information and Communication Institute ofZhejiang University, mainly facing to media processing. It has both the characteristics of RISCand DSP, and provides SIMD instructions called MDS Instructions Set which is based on MMXtechnique.
     The traditional compiler cannot generate SIMD instructions very well because of thecharacteristics of SIMD instructions and the code generation of compilers. An improved approachof instruction selection has been designed and implemented in this thesis to support SIMDinstruction generation.
     MD32 is still being constantly developed and the ability of MD32 for media processing isimproving. In order to adapt the changes of the target machine, the retargetable compilersupporting SIMD instructions is needed. A SIMD compilation optimization approach based on theintermediate representation is proposed in this thesis. This approach improves the performance ofcode effectively. Meanwhile, it can also keep the retargetablity of the original compiler and adaptthe change in instruction of the target machine.
引文
[1] Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman, "Compilers: Principles, Techniques, and Tools", Addison Wesley, 1986.
    [2] Steven S. Muchnick, "Advanced Compiler Design and Implementation", Morgan Kaufmann, 1997.
    [3] Robert Morgan, "Building an optimizing compiler", Digital Press, 1997.
    [4] Keith D. Cooper, Linda Torczon, "Engineering a compiler", Morgan Kaufmann, 2002.
    [5] Randy Allen, Ken Knenedy, "Optimizing compiler for modern architectures, a dependence-based approach", Elsevier Science, 2001.
    [6] Christopher W. Fraser, David R. Hanson, "A Retargetable C Compiler: Design and Implementation", Addison Wesley, January 1995.
    [7] Andrew W. Appel, Maia Ginsburg, "Modern compiler implementation in C", Cambridge University Press, 1998.
    [8] Rainer Leupers, Peter Marwedel, "Retargetable compiler technology for embedded systems: tools and applications", Kluwer Academic Publishers, October 2001.
    [9] Ruby B. Lee, Michael D. Smith, "Media processing: A new design target", IEEE Micro, 16(4): 6-9, 1996.
    [10] Thomas M. Conte, Pradeep K. Dubey, Matthew D. Jennings, Ruby B. Lee, Alex Peleg, Salliah Rathnam, Mike Schlansker, Peter Song, Andrew Wolfe, "Challenges to combining general-purpose and multimedia processors", Computer, 30(12): 33-37, 1997.
    [11] 石教英等编,“计算机体系结构”,浙江大学出版社,1998年10月第1版.
    [12] M. Ferretti "Multi-media extensions in super-pipelined micro-architectures. A new case for SIMD processing?", Proceedings of the Fifth IEEE International Workshop on Computer Architectures for Machine Perception (CAMP'00). IEEE Computer Society, 249, 2000.
    [13] Richard M. Stallman, "Using and Porting the GNU Compiler Collection for GCC 3.1", 2001.
    [14] Daniel P. Bovet, Marco Cesati, "Understanding the Linux kernel", O'Relly, 2001.
    [15] Subramanian Rajagopalan, Sreeranga P. Rajan, Sharad Malik, "A retargetable VLIW compiler framework for DSPs with instruction-level parallelism", IEEE transaction on computer-aided design of integrated circuits and systems, I319-1328, Vol. 20, No. 11, 2001.
    [16] John L. Hennessy, David A. Patterson, "Computer architecture, a quantitative approach (3rd edition)", Elsevier Science, 2003.
    [17] Rainer Leupers, Peter Marweded, "Retargetable compilers for embedded DSPs", 7th European Multimedia, Microprocessor Systems and Electronic Commerce Conference (EMMSEC), 1997.
    [18] M. Ancona, "An optimizing retargetable code generator", Information and Software Technology, 37(2), p87-101, 1995.
    [19] Dick Grnne, Henri E. Bal, Ceriel J. H. Jacobs, Koen G Langendoen, "Modem compiler design", John Wiley & Sons, 2000.
    [20] John L. Hennessy, David A. Patterson, "Computer architecture, a quantitative approach (3rd edition)", Elsevier Science, 2003.
    [21] Subramanian Rajagopalan, Sreeranga P. Rajan, Sharad Malik, "A retargetable VLIW compiler framework for DSPs with instruction-level parallelism", IEEE transaction on computer-aided design of integrated circuits and systems, 1319-1328, Vol.20, No. 11, 2001.
    [22] Rainer Leupers, "Code selection for media processors with SIMD instructions", Design, Automation and Test in Europe Conference and Exhibition 2000, Proceedings, p4-8, 2000.
    [23] Jason Fritts, "Architecture and Compiler Design Issues in Programmable Media Processors", A dissertation presented to the faculty of Princeton university in Candidacy for the degree of doctor of philosophy, 2000.
    [24] Jens Wagner, Rainer Leupers, "C compiler design for a network processor", IEEE transaction on computer-aided design of integrated circuits and systems, Vol.20, No. 11, pl302-1308,2001.
    [25] Andreas Krall, Sylvain Lelait, "Compilation techniques for multimedia processors", International Journal of Parallel Programming, 28(4): 347-36, 2000.
    
    [26] Intel Corporation, "Intel C/C++ compiler user's guide", http://developer.intel.com, 2003.
    [27] Randall J. Fisher, Henry G. Dietz, "Compiling for SIMD within a register", Proceedings of the 11~(th) International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 290-304, 1999.
    [28] Patricio Bulic, Patricio Gustin, "An extended ANSI C for processors with a multimedia extension", Int. J. Parallel Program, 31(2):107-136, 2003.
    [29] Patricio Bulic, Veselko Gustin, "Macro extension for SIMD processing", Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing, Springer-Verlag, 448-451,2000.
    [30] Veselko Gustin, Patricio Bulic, "Extracting SIMD parallelism from 'for' loops", Proceedings of the 2001 International Conference on Parallel Processing Workshops, IEEE Computer Society, 23,2001.
    [31] Franz Franchetti, Markus Puschel, "A SIMD vectorizing compiler for digital signal processing algorithms", Proceedings of the 16~(th) International Symposium on Parallel and Distributed Processing. IEEE Computer Society, 20.2, 2002.
    [32] Rainer Leupers, "Code Selection for Media Processors with SIMD Instructions", Design, Automation, and Test in Europe, 4-8, 2000.
    [33] Rainer Leupers, Steven Bashford, "Graph-based code selection techniques for embedded processors", ACM Trans. Design Autom. Electr. Syst. 5(4): 794-814, 2000.
    [34] Maarten Boekhold, Ireneusz Karkowski, Henk Corporaal, "Transforming and parallelizing ANSI C programs using pattern recognition", Lecture Notes in Computer Science. 1999.
    [35] Rashindra Manniesing, Ireneusz Karkowski, Henk Corporaal, "Automatic SIMD parallelization of embedded applications based on pattern recognition", Proceedings of 6(th) International Euro-Par Conference, 349-356, 2000.
    [36] Rashindra Manniesing, Ireneusz Karkowski, Henk Corporaal, "Evaluation of a potential for automatic SIMD parallelization of embedded applications", Proceedings of 5~(th) Annual Conference of the Advanced School for Computing and Imaging, 103-110, 1999.
    [37] Ruby B. Lee, "Subword parallelism with MAX-2", IEEE Micro, 16(4): 51-59, 1996.
    [38] Samuel Larsen and Saman Amarasinghe, "Exploiting superword level parallelism with multimedia instruction sets", PLDI, 35(5): 145-156, 2000
    [39] Kevin Skadron, Marty Humphrey, Bin Huang, Edgar Hilton, Jihao Luo, Paul Allaire, "The use of mini-vector instructions for implementing high-speed feedback controllers on general-purpose computers", Proceedings of the Third Workshop on Media and Stream Processors, 2001.
    [40] Allen Randy, Ken Kennedy, "Automatic translation of Fortran programs to vector form", ACM Transactions on Programming Languages and Systems, 9(4): 491-542, 1987.
    [41] Free Software Foundation, "Auto-Vectorization in GCC", http://gcc.gnu.org/projects/treessa/vectorization.html.
    [42] 国家高技术研究发展计划课题(863计划),“浙大数芯MediaDSP3201/3202指令集用户手册”,浙江大学信息与通信工程研究所,2004.
    [43] Robert Sedgewich, "Algorithm in C++ (3rd edition)", Addison Wesley, 1998.
    [44] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithm (2nd edition)", MIT Press, 2001.
    [45] John R. Levine, Tony Mason, Doug Brown, "Lex & YACC (2nd edition)", O'Reilly, 2003.
    [46] M. E. Lesk, E. Schmidt, "Lex-a lexical analyzer generator", Computing Science Technical Report 39, At&T Bell Laboratories, 1975.
    [47] Stephen C. Johnson, "YACC-yet another compiler compiler", Computing Science Technical Report 32, At&T Bell Laboratories, 1975.
    [48] Charles Donnelly, Richard Stallman, "Bison - the YACC compatible parser generator", November 1995.
    [49] Steven Bashford, Rainer Leupers, "Constraint Driven Code Selection for Fixed-Point DSP", The 36th Design Automation Conference, 1999.
    [50] David A. Padua, Michael J. Wolfe, "Advanced compiler optimizations for supercomputers", Communications of the ACM, 29(12): 1184-1201, 1986.
    [51] Christopher W. Fraser; David R. Hanson; "A Retargetable Compiler for ANSI C", Reprinted from SIGPLAN Notices 26, 10(Oct. 1991), 29~43. Copyright 1991, Association for Computing Machinery, Inc., reprinted by permission.
    [52] Gang Ren, Peng Wu, David Padua, "A preliminary study on the vectorization of multimedia applications for multimedia extensions", In Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing, Texas A&M University, 420-435, 2003.
    [53] Robert P. Wilson, Robert S. French, Christopher, et al., "SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers", ACM SIGPLAN Notices, 29(12): 31-37, December, 1994.
    [54] Robert S. Glanville, "A Machine Independent Algorithm for Code Generation and its Use in Retargetable Compilers", PhD Thesis, University of California, Berkeley, 1978.
    [55] Alfred V. Aho, Mahedevan Ganapathi, Steven W. K. Tjiang, "Code Generation Using Tree Matching and Dynamic Programming", ACM Transactions on Programming Languages and Systems, 11 (4): 491~516, October 1989.
    [56] Emmelmann, Schr6er, Landwehr, "BEG-a Generator for Efficient Back Ends", Sigplan'89 Conference, Sigplan Notices, Vol. 24 Nr.7 [2] Aβmann, Emmelmann, Grosch: Stein auf Stein, Cocktail: Eine Compiler-Compiler-Toolbox, Ⅸ Magazin 2/92.
    [57] Chiristopher W. Fraser, David R. Hanson, "Engineering a Simple, Efficient Code Generator Generator", ACM Letters on Programming Languages and Systems, 1(3): 213-226, September 1992.
    [58] Todd A. Proebsting; "BURS automata generation", ACM Transactions on Programming Languages and Systems, 17(3): 461~486, May 1995.
    [59] http://gcc.gnu.org
    [60] http://www.cs.princeton.edu/software/lcc
    [61] http://www.trimaran.org
    [62] http://oss.sgi.com/projects/Pro64
    [63] http://www-306.ibm.coin/software/awdtools/ccompilers/
    [64] http://www.codeplay.com/
    [65] Crescent Bay Software Corp. http://www.psrv.com/vast-altivec.html, 2003
    [66] Steven Osman and Ryan Williams, Towards Optimal Instruction Vecotorization. April 2003, http://www.cs.cmu.edu/~sosman/classes/compilers/project/project.ps.
    [67] Alex Peleg, Uri Weiser, "MMX technology extension to the Intel architecture", IEEE Micro, 16(4):42-50 1996.
    [68] J. Tyler, J. Lent, A. Mather, and Huy Nguyen, "AltiVec: Bringing vector technology to the PowerPC processor family", Int'l. Performance, Computing and Communications Conf., IEEE, pp. 437-444, 1999.
    [69] Marc Tremblay et al, "VIS speeds new media processing", IEEE micro, August, 1996.
    [70] Allen Randy, Ken Kennedy, Came Portfield, Joe Warren, "Conversion of Control Dependence to Data Dependence", In:Conf. Record of the 10th Annual ACM Symp. On Principles of Programming Languages, 177~189, 1983.
    [71] Dov Harel, Robert Endre Tarjan, "Fast algorithms for finding nearest common ancestors", SIAM J. Comput., 13(2):338-355, 1984.
    [72] V. Zivojnovic, J.M. Velarde, C. Schlager, H. Meyr, "DSPStone - A DSP-oriented Benchmarking Methodology", Int. Conf. on Signal Processing Applications and Technology (ICSPAT), 1994
    [73] 姜伟华,梅超等,“一种针对多媒体扩展指令集和实际多媒体程序的自动向量化方法”,计算机学报,2005,28(8):1255~1266.
    [74] 赵常智,刘春林等,“一种支持SIMD指令的表驱动的代码选择技术”,计算机应用与研究,2006,25(6):45~48.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700