SIMD编译优化技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
多媒体应用是近年来计算机领域的研究热点。多媒体应用的代码往往具有较高的并行度。为了获得更高的性能,几乎所有的处理器厂商都为其处理器增加了多媒体扩展,以充分利用处理器的计算能力,并提供了具有单指令多数据特点的指令集,简称为SIMD指令。SIMD指令能够对一组数据执行向量形式的运算,每组数据被划分为几个子字,并对所有的子字都执行相同的操作。
     SIMD指令能够带来较高的执行效果、较低的功耗和较好的资源利用率。充分利用SIMD指令可以有效提高应用程序的性能。但是目前编译器对SIMD指令的支持并没有达到足够令人满意的程度。程序员往往需要通过手写汇编代码、内嵌汇编、或者通过使用编译器认识的内部函数等手段在代码中显示的使用SIMD指令。这要求程序员对SIMD指令集有深入了解,提高了多媒体程序的开发难度。此外,由于不同处理器的SIMD指令集之间差异较大,从而使代码的可移植性降低。因此我们希望编译器能够自动的从高级语言生成SIMD指令。称为SIMD编译优化。
     这种优化和传统的针对向量处理器的自动向量化非常类似,因此又称为SIMD向量化。但体系结构和主要应用领域的不同使多媒体扩展和向量处理器之间存在较人差异。两者在向量长度、指令集特点以及存储操作等方面存在着关键区别。妨碍SIMD编译优化的主要有以下几个问题:多媒体程序中复杂的代码形式影响了向量化效果,各种多媒体典型操作变化较多不易识别,存储操作功能较弱制约了优化性能的提升。
     本文提出了一种面向由C语言编写的多媒体应用程序、与目标平台无关的SIMD向量化方法,是对原有的向量化方法的扩展与改进,解决了现有技术中含有指针形式的C程序难以向量化、多媒体典型操作无法有效识别、SIMD寄存器数据利用率不高等问题,能够有效生成SIMD指令。该方法主要在语法树中间表示上进行,不改变编译后端的结构,具有较好的可移植性和可扩展性。
     该方法对传统向量化算法的主要改进之处在于:对多媒体C代码广泛存在的指针访问进行了分析处理;采用模式匹配方法有效识别复杂的多媒体典型操作;采用循环分布、循环交换等方法提高了循环中潜在的并行性;本文还提出了一种SIMD寄存器分配方法减少冗余存储操作,降低因体系结构限制而产生的数据存储操作开销。
The past decade has witnessed multimedia processing become one of the most important computing workloads. To respond to the ever-growing performance demand of multimedia workloads, multimedia extensions (MME) have been added to most existing processors. The multimedia extensions of most processors exploit single instruction multiple data (SIMD) instruction, which provides a form of vectorization where a large machine word is viewed as a vector of subwords and the same operation is performed on all subwords in parallel.
     SIMD instructions offer higher performance, lower energy and better resource utilization. Systematic usage of SIMD instructions can significantly improve program performance. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler known functions (CKFs), which will lead to poor readability, portability problem and high cost of software development and maintainance. In order to ultilize SIMD instruction fully, we need compiler to translate the high-level languages to SIMD instructions of media processors automatically. This is called SIMD compilation optimization.
     Because of the similarity between multimedia extensions and vector processors, one may naturally consider applying traditional vectorization techniques to multimedia applications. However, satisfactory result are yet to be obtained for the vectorization of realistic multimedia programs on MME. The gap between MME vectorization and traditional vectorization is the natural result of both the architectural differences multimedia extensions and traditional vector processors and the differences between multimedia applications and numerical applications. The obstacles of SIMD optimization lie in the following aspects: Complex code form in source code of multimedia applications, the multimedia typical operations and its variations which can not be recognized easily, and the constraints in memory operations.
     In this thesis, the author carry on a research to develop efficient SIMD optimization techniques, and present a SIMD compilation optimization approach that is integrated into LCC compiler. The approach can efficiently generate SIMD instructions, as well as to cope with the use of pointer in C code, and identify the typical oprations in the multimedia application. Becides, The author designed and implemented an approach of SIMD register allocation, improved the performance of register allocation effectively.
引文
[1] Ren G, Wu P, Padua D, "A preliminary study on the vectorization of multimedia applications", 16th International Workshop of languages and Compilers for Parallel Comuting. 2003
    
    [2] Lappalainen V, Hamalainen T D, Liuha P, "Overview of research efforts on media ISA extensions and their usage in video coding", IEEE Transactions on Circuits and Systems for Video Technology, 2002,12(8): 660-670.
    
    [3] Lappalainen V, Liuha P, Hamalainen T D, "Current research efforts in media ISA development", 4~(th) Workshop on Media and Stream Processors. 2002
    
    [4] John L. Hennessy, David A.Patterson, "Computer architecture, a quantitative approach(3~(rd) edition)", Elsevier Science, 2003.
    
    [5] M. Ferretti, "Multi-media extensions in super-pipelined micro-architectures. A new case for SIMD processing? ", Proceedings of the Fifth IEEE International Workshop on Computer Architectures for Machine Perception (CAMP'00). IEEE Computer Society, 249,2000.
    [6] Dorit Nuzman , Richard Henderson, "Multi-platform auto-vectorization" , Code Generation and Optimization, Proceedings of the International Symposium on Code Generation and Optimization . Pages 281-294, 2006.
    [7] Thakkar S T, Huff T, "Internet Streaming SIMD Extensions", Computer, 1999,32(12):26-34.
    [8] Inter Corporation, "IA32 Intel Architecture Software Developer's Manual", Intel Press.2004
    
    [9] Intel Corporation, "Intel SSE4 Programming Reference", Intel Press, 2007
    [10] Oberman S, Favor G, Weber F, "AMD 3DNow! technology: Architecture and implementations", IEEE Micro, 1999,19(2):37-48
    
    [11] Kokn L, Maturana G, Tremblay M, Prabhu A, Zyner G, "The Visual Instruction Set (VIS) in UltraSPARC", IEEE micro, August, 1996
    
    [12] Diefendorff K, Dubey P K, Hochsprung R, Scales H, "AltiVec extension to PowerPC accelerates media processing", IEEE Micro,2000,20(2):85-95
    [13] http://gcc.gnu.org
    
    [14] Dick Grune, Henri E. Bal, Ceriel J.H. Jacobs, Koen G Langendoen, "Modern compiler design", John Wiley & Sons, 2000.
    [15] John R. Levine, Tony Mason, Doug Brown. "Lex & YACC(2~(nd) edition)", O'Reilly, 2003.
    
    [16] Chiristopher W. Fraser, David R. Hanson, "Engineering a simple, efficient code generator generator", ACM Letters on Programming Languages and Systems, 1(3), p213-226, September 1992.
    [17] Rainer Leupers, Peter Marwedel, "Retargetable compiler technology for embedded systems:tools and applications", Kluwer Academic Publishers, October 2001
    [18] Ivan Pryanishnikov, Andress Krall, Nigel Horspool, "Compiler optimizations for processors with SIMD instructions", Software: practice and Experience. Volume 37, Issue 1, Pages 93-113,2006
    [19] Manuel Hohenauer, Christoph Schumacher, Rainer Leupers, et al, "Retargetable Code Optimization with SIMD Instructions", Hardware/software codesign and system synthesis,2006. CODES+ISSS '06. Proceedings of the 4th international conference.
    [20] P. Wu, A.Eichenberger, K. O'Brien: Efficient SIMD Code Generation for Runtime Alignment and Length Conversion, Proc. of the International Symposium on Code Generation and Optimization, 2005
    [21] D.Nuzman, I. Rosen, A.Zaks: Auto-Vectorization of Interleaved Data for SIMD, IBM Research Report No. H-0235, 2005
    [22] A.Kudriavtsev, P.Kogge: Generation of Permutations for SIMD Processors. Proc. Languages,Compilers and Tools for Embedded Systems, 2005
    [23] Rainer Leupers, "Code selection for media processors with SIMD instructions", Design,Automation and Test in Europe Conference and Exhibition 2000, Proceedings, p4-8, 2000
    [24] Franz Franchetti, Markus Puschel, "A SIMD vectorizing compiler for digital signal processing algorithms", Proceedings of the 16th International Symposium on Parallel and Distributed Processing. IEEE Computer Society, 20.2, 2002.
    [25] Zhou J, Ross K A. Implement database operations using SIMD instructions[C]. Proceedings of the 2002 ACM SIGMOD international conference on Management of data. ACM Press,2002 145-156.
    
    [26] Randy Allen, Ken Kennedy, "Optimizing compiler for modern architectures, a dependence-based approach", Elsevier Science, 2001
    
    [27] Kuck D L, "Structure of Computers and Computations", John Wiley & Sons, Inc, 1978
    
    [28] Allen Randy, Ken Kennedy, "Automatic translation of Fortran programs to vector form",ACM Transactions on Programming Languages and Systems, 9(4):491-542, 1987
    
    [29] Banerjee U K, "Dependence Analysis for Supercomputing", Kluwer Academic Publishers,Norwell, MA,USA,1988
    
    [30] Pugh W, "The omega test: a fast and practical integer programming algorithm for dependence analysis", Proceeding of the 1991 ACM/IEEE conference on Supercomputing.ACM Press, New York, USA,1991
    
    [31] Krall A, Lelait S, "Compilation techniques for multimedia processors", International Journal of Parallel Programming, 2000, 28(4): 347-361
    
    [32] Free Software Foundation, "Auto-Vectorization in GCC", http://gcc.gnu.org/projects/treessa/vectorization.html.
    [33]Cheong G,Lam M S,"An optimizer for multimedia instruction sets",Proceedings of 2nd SUIF Compiler Workshop,Stanford University,2002
    [34]Juurlink B,Vassiliadis S,"Implementation and evaluation of the complex streamed instruction set",Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques,IEEE Computer Society,2001
    [35]Talla D,John L K,Burger D,"Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements",IEEE Trans.Comput,2003,52(8):1015-1031
    [36]Slingerland N,Smith A J,"Measuring the performance of multimedia instruction sets",IEEE transactions on computers,2002,51(11):1317-1332
    [37]Weinhardt M,Luk W,"Pipeline vectodzation",The 7~(th) Annual IEEE Symposium on Field-Programmable Custom Computing Machines FCCM'99,1999
    [38]Larsen S,Amarasinghe S.Exploiting superword level parallelism with multimedia instruction sets[J].ACM SIGPLAN Notices,2000,35(5):145-156
    [39]姜伟华,“针对实际多媒体程序和多媒体扩展指令集的SIMD编译优化”,复旦大学,2006年5月
    [40]Bulic P,Gustin V,"Data dependence analysis for intra-register vectorization",Second International Symposium on Parallel and Distributed Computing,2003,50-56
    [41]Keith Cooper A D,Kennedy K.Vizer:A system to vectorize intel x86 binaries[C].Proceedings of the Third Annukal Symposium of the Los Alamos Computer Science Institute.2002
    [42]Bulic P,Gusin V,"An extended ANSI C for processors with a multimedia extension",Int.J.Parallel Program,2003,31(2):107-136
    [43]Gustin V,Bulic P,"Extracting SIMD parallelism from 'for' loops",Proceedings of the 2001International Conference on Parallel Processing Workshops.IEEE Computer Society,2001
    [44]Franchetti F,Puschel M,"A SIMD vectorizing compiler for digital signal processing algorithms",Proceedings of the 16~(th) International Symposium on Parallel and Distributed Processing.IEEE Computer Society,2002
    [45]Alfred V.Aho,Ravi Sethi,Jeffrey D.Ullman,"Compilers:Principles,Techniques,and Tools",Addison Wesley,1986.
    [46]Steven S.Muchnick,"Advanced Compiler Design and Implementation",Morgan Kaufmann,1997.
    [47]Dinesh Rathi;Nitin Rajput.Retargetable Code Generation,1999
    [48]Wen-Tsong Shiue.Energy-Efficient Backend Compiler Design for Embedded Systems.Electrical and Electronic Technology,2001.
    [49]Leupers R,Schenk W,Marwedel P.Retargetable Assembly Code Generation by Booststrapping.High-Level Synthesis,1994.Proceedings of the Seventh Internationl Symposium on,18-20 May 1994,Pages:88-93
    [50]http://www.cs.princeton.edu/software/lcc/
    [51]http://www.cs.virginia.edu/nci/
    [52]Robert Morgan,"Building an optimizing compiler",Digital Press,1997.
    [53]Keith D.Cooper,Linda Torczon,"Engineering a compiler",Morgan Kaufmann,2002.
    [54]Randy Alien,Ken Knenedy,"Optimizing compiler for modern architectures,dependence-based approach",Elsevier Science,2001.
    [55]石教英等编,“计算机体系结构”,浙江大学出版社,1998年10月第1版.
    [56]沈志宇等著,“并行编译方法”,国防工业出版社,2000年

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700