面向SIMD的编译指导与条件分支的编译优化技术
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着SIMD扩展技术的不断发展,几乎所有的主流处理器厂商都为自己的微处理器提供SIMD扩展部件与相关SIMD指令集。然而应用程序一般很难直接适用于SIMD扩展,需要对其进行必要的向量化处理。本文针对SIMD编译优化技术研究的一些常见难题而提出了两种SIMD编译优化技术:条件分支语句的SIMD编译优化技术和基于编译指导的SIMD编译优化技术。
     在条件分支语句的SIMD编译优化技术方面,根据SIMD扩展和条件分支语句的特征,提出了一种在一个向量因子内的条件分支语句变换方法,能够有效地将条件分支语句所引发的控制依赖转换为数据依赖,避免控制依赖对编译器优化分析的影响;在得到向量化循环之后,通过条件分支重建恢复循环的控制流语义,最终达到在保证控制流语义正确的前提下,充分发掘循环数据流中的并行性的目的。
     在基于编译指导的SIMD编译优化技术方面,借鉴并行化编译研究中比较流行的并行编程语言思想,根据程序员对应用程序的了解,利用编译指导设定编译器的状态或者引导编译器完成一些特定的动作,以满足生成所需代码的需求,有效解决编译器因程序分析能力不足而无法自动实现循环向量化或者盲目自动向量化的问题。
     本文所述的两种SIMD编译优化技术已在项目研发的SW-VEC向量化编译系统上得到实现。测试结果表明,这两种SIMD编译优化技术能够有效提高SW-VEC向量化编译系统的向量化识别和生成向量化目标代码的能力,实现应用程序在SIMD扩展上加速运行的目的。
With the development of SIMD Extension, most of microprocessor manufacturers have provided SIMD Extension and related SIMD Instruction Set for their own microprocessors. However, applications can be hardly adapt to SIMD Extension, vectorizing them before running is essential. This thesis introduces two kinds of SIMD Compilation Optimization Technology for the problem in the research on SIMD Compilation Optimization Technology. The first one is SIMD Compilation Optimization Technology of Conditional Branch Statement, and the second one is SIMD Compilation Optimization Technology based on Compiler Directive.
     For Conditional Branch Statement in SIMD Compiler Optimization, the thesis proposes a method to transform Conditional Branch Statement within a Vector Length, according to the characteristics of SIMD Extension and Conditional Branch Statement. Control dependence triggered by Conditional Branch Statement can be effectively transformed into data dependence, and the control dependence?s impact of compiler?s optimization analysis can be avoided. With the transformed vector loop, the control flow?s semantics of the original loop will be rebuilt by rebuilding conditional branch. Finally, under the precondition of the correctness of control flow semantics, the parallelism of the data flow can be fully exploited.
     On the other hand, the idea of popular parallel programming language in parallel compilation is used for reference by SIMD Compilation Optimization Technology based on Compiler Directive. According to the programmer?s understanding of the application, Compiler Directive is used to set the state of the compiler or guide the compiler to perform some special actions in order to generate the required code. Thus, the problems that compiler can not perform auto-vectorization or does blind auto-vectorization because of the low capability of code analysis can be effectively solved.
     The SIMD Compilation Optimization Technologies in this thesis have been achieved in SW-VEC vectorization compiler. Test result shows that the performance of SW-VEC vectorization compiler to identify the statements which can be vectorized and generate the vectorization code can be effectively improved, and application can run faster in SIMD Extension.
引文
[1]朱嘉华. SIMD编译优化技术研究[D].上海:复旦大学, 2005.
    [2]张为华,臧斌宇. SIMD编译优化技术研究概述[J].中国计算机学会通讯, 2007, 3(2):27-36.
    [3]姜伟华,梅超,郭一.一种针对多媒体扩展指令集和实际多媒体程序的自动向量化方法[J].计算机学报, 2005, 28(8):1254-1266.
    [4] Randy Allen, Ken Kennedy.现代体系结构的优化编译器[M].张兆庆等译.北京:机械工业出版社, 2004.
    [5] Larsen S, Amarasinghe S. Exploiting superword level parallelism with multimedia instruction sets[J]. ACM SIGPLAN Notices, 2000, 35(5): 145-156.
    [6]李玉祥.面向非多媒体程序的SIMD向量化方法及优化技术研究[D].合肥:中国科学技术大学, 2008.
    [7] David Kuck. Structure of Computers and Computations[M]. New Jersey: John Wiley & Sons, 1978.
    [8] Randy Allen, Ken Kennedy. Automatic Translation of Fortran Programs to Vector Form[J]. ACM Trans, On Programming Languages and Systems, 1987, 9(4): 491-542.
    [9] Rainer Leupers. Code selection for media processors with SIMD instructions[J]. Design, Automation and Test in Europe Conference and Exhibition, 2000, Proceedings:4-8.
    [10]姜伟华,针对实际多媒体程序和多媒体扩展指令集的SIMD编译优化[D],上海:复旦人学, 2006.
    [11] Keith D. Cooper, Linda Torczon. Engineering a compiler[M]. San Fransisco: Morgan Kaufmann, 2002.
    [12] Keith D. Cooper, Anshuman Dasgupta, and Ken Kennedy. Vizer:A system to vectorize intel x86 binaries[D]. Houston: Rice University, 2002.
    [13] Bulic P, Gusin V. An extended ANSI C for Processors with a multimedia extension[J]. Int. J. Parallel Program, 2003, 31(2):107-136.
    [14] Ivan Pryanishnikov, Andress Krall, Nigel Horspool. Compiler optimizations for processors with SIMD instructions[J]. Software: practice and Experience, 2006, 37(1):93-113.
    [15] Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks and Ira Rosen. Polyhedral-Model Guided Loop-Nest Auto-Vectorization[J]. Parallel Architectures and Compilation Techniques, 2009, Proceedings:327-337.
    [16] Samuel Larsen, Rodric Rabbah and Saman Amarasinghe. Exploiting Vector Parallelism in Software Pipelined Loops[J]. IEEE/ACM International Symposium on Microarchitecture, 2005, Proceedings:11-129.
    [17]李玉祥,施慧,陈莉.面向向量化的局部数据重组[J].小型微型计算机系统, 2009, 30(8):1528-1534.
    [18]李学干.计算系统结构(第四版)[M].西安:西安电子科技大学出版社, 2006.
    [19] A. Peleg and U. Weiser. MMX Technology Extension to the Intel Architecture[J].IEEE/ACM International Symposium on Microarchitecture, 1996, 16(4):42-50.
    [20] Intel Corporation. Intel? 64 and IA-32 Architectures Software Developer's Manual[E]. http://www.intel.com/Assets/PDF/manual/252046.pdf, 2011.
    [21] K. Diefendorff, P.K. Dubey, R. Hochsprung, H. Scale. Altivec extension to PowerPC accelerates media processing[J]. IEEE/ACM International Symposium on Microarchitecture, 2000, 20(2):85-95.
    [22] M. Tremblay, J.M. O'Connor, V. Narayanan, Liang He. VIS Speeds New Media Processing[J]. IEEE/ACM International Symposium on Microarchitecture, 1996, 16(4):10-20.
    [23] R. B. Lee. Subword Parallelism with MAX-2[J]. IEEE/ACM International Symposium on Microarchitecture, 1996, 16(4):51-59.
    [24] MIPS Technologies. The MIPS-3D?Application-Specific Extension to the MIPS64? Architecture[E]. https://www.mips.com/application/login/login.dot?product_name=/auth/MD00099-2B-MIPS3D64-AFP-02.61.pdf, 2011.
    [25] Alfed V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman.编译原理(第二版)[M].赵建华等译.北京:机械工业出版社, 2009.
    [26]王迪. SIMD编译优化技术研究[D].杭州:浙江大学, 2008.
    [27]陈火旺,刘春林,谭庆平等.程序设计语言编译原理(第三版)[M].北京:国防工业出版社. 2001.
    [28]周建鹏.媒体处理器编译器中的SIMD编译优化技术的研究与实践[D].杭州:浙江大学, 2007.
    [29] William Pugh, Univ. of Maryland, College Park. A practical algorithm for exact array dependence analysis[J]. Communications of the ACM, 1992, 35(8):102-114.
    [30] Carl W. Lee. Linear Programming Notes[D]. Lexington: University of Kentucky, 1996.
    [31] D. E. Maydan, J. L. Hennessy, M. S. Lam. Efficient and exact data dependence analysis[J]. ACM SIGPLAN Notices, 1991, 26(6):1-14.
    [32]张兆庆,乔如良.先进编译技术研究与发展[J].中国科学院计算技术研究所信息技术快报, 2003, 1(8): 1-5.
    [33] U. Banerjee. Unimodular transformation of double loops[J]. Computer Science & Engineering, 1990, Proceedings:192-219.
    [34] D. R. Wallace. Dependence of multi-dimensional array references[J]. International Conference on Supercomputing, 1988, Proceedings:418-428.
    [35] M. Wolfe and U. Banerjee. Data dependence and its application to parallel processing[J]. International Journal of Parallel Processing, 1987, 16(2):137-178.
    [36] Peng Wu,Alexandre E. Eichenberger, Amy Wang. Efficient SIMD Code Generation for Runtime Alignment and Length Conversion[J]. Code Generation and Optimization, 2005, Proceedings:153-164
    [37]吴圣宁,李思昆.多媒体处理器的SIMD代码生成[J].计算机科学, 2007, 34(7):268-270.
    [38] A. E. Eichenberger, P. Wu, and K. O?Brien. Vectorization for SIMD Architectures withAlignment Constraints[J]. ACM SIGPLAN Notices, 2004, 39(6):82–93.
    [39] Silicon Graphics International. WHIRL Intermediate Language Specification[E]. http://cdnetworks-kr-1.dl.sourceforge.net/project/open64/open64/Documentation/whirl.pdf, 2010.
    [40]田祖伟. IF转换实现方法分析[J].湖南第一师范学报, 2005, 5(2):74-77.
    [41] Rastislav Bodík, Rajiv Gupta, Mary Lou Soffa. Interprocedural conditional branch elimination[J]. ACM SIGPLAN Notices, 1997, 32(5):146-158.
    [42] University of Houston. Overview of the open64 Compiler Infrastructure[E]. http://www2.cs.uh.edu/~dragon/Documents/open64-doc.pdf, 2010.
    [43]汪小飞.数据流分析技术研究与实例分析[D].长沙:国防科学技术大学, 2007.
    [44] Fan Zhihua. The vectorization of IF and GOTO statements[J]. Science in China, Ser.A, 1984, 27(1):86-97.
    [45] J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence[J]. The 10th ACM SIGACT-SIGPLAN Symposium, 1983, Proceedings:177-189.
    [46]张宏江.针对多媒体应用的SIMD编译优化技术研究[D].上海:复旦大学, 2006.
    [47]吕志宏,刘迎捷.有向连接图中节点可达的矩阵行取1算法[J].浙江工业大学学报, 2000, 28(3):280-282.
    [48]顾丽红,吴少剐,章隆兵,蔡飞.针对非规则应用的OpenMP制导扩展[J].小型微型计算机系统, 2005, 26(1):124-128.
    [49]马红途. OpenMP程序分析及优化技术研究[D].郑州:解放军信息工程大学, 2009.
    [50] Samuel Larsen, Emmett Witchel, Saman Amarasinghe. Increasing and Detecting Memoy Address Congruence[J]. IEEE International Symposium on Parallel Architectures and Compilation Techniques, 2002, Proceedings:18-29.
    [51] Yutao Zhong, Maksim Orlovich, Xipeng Shen, Chen Ding. Array regrouping and structure splitting using wholeprogram reference affinity[J]. ACM SIGPLAN Notices, 2004, 39(6):255-266.
    [52]付雄,王汝传.一种基于局部性的数据重组框架[J].计算机科学, 2009, 36(2):146-151.
    [53] Hagog M, Tice C. Cache Aware Data Layout Reorganization Optimization in GCC[J]. GCC Developers?Summit, 2005, Proceedings:69-92.
    [54]尉红梅,姚建华.并行语言及编译技术现状和发展趋势[J].计算机工程, 2004, 30(S1):97-98.
    [55] Charles F. Goldfarb, Paul Prescod. XML Handbook?(Fifth Edition)[M]. New Jersey: Prentice Hall PTR, 2003.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700