支持SIMD的DSP编译优化技术的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
与传统的DSPs相比,现代DSPs采用了更多的ILP技术以提高性能。同时,为了适应媒体处理的特殊需要,现代的DSPs在硬件的设计方面的一个普遍的特征就是引入了SIMD指令。这就使得低精度的媒体数据的快速处理成为可能。一个真正的高性能、低功耗的系统,不仅要有良好的硬件支持而且更重要的是要有一个能够充分利用这些硬件特征的优化的软件系统。编译器作为这个软件系统中最重要的一环,其性能的优劣将直接影响系统的整体性能。但目前的编译技术却不能很好的提供对SIMD指令的支持,所以本文就深入的研究了支持SIMD指令的编译优化技术,取得了如下的一些研究成果。
     1.提出了一套完整的支持SIMD指令的代码选择技术的实现框架。该框架使得代码选择面向基本块,而不再是语句,扩大了指令注释的搜索空间,产生了SIMD指令。同时采用机器描述来刻画目标机器指令集,使得代码选择中与机器相关的部分局限于机器描述中,提高了代码选择的灵活性和可移植性。
     2.设计并实现了一种基本块的数据流图(DFG)表示,这种表示既能充分表达Lcode语义又比较适合模式匹配算法。传统的代码选择技术往往基于的是数据流树(DFTs)的中间表示,这也正是其不能很好的支持SIMD指令的一个重要原因。同时为了借鉴模式匹配的思想,在这里还实现了从DFG到DFTs的转化。
     3.采用一种树文法描述了目标机器的指令集,并对其进行预处理,使其转化为相应的指令模板。树文法为目标机器指令集的描述提供了一种易读、易写、易修改的方式,同时通过预处理隐藏了机器描述的细节,为编译的树匹配过程提供了统一的接口。
     4.改进了传统的树匹配和动态规划算法。使得其在匹配每棵输入树时,不再是产生唯一的最优覆盖,而是产生多个可选的最优覆盖。为整数线形规划从全局角度最终确定最优覆盖做准备。
     5.采用了整数线形规划的方法解决了最终的最优覆盖的选择问题。在充分考虑最终覆盖的正确性和有效性的情况下,提取出其所应满足的一系列约束及相应的目标方程,该方程的最大解即表示最终将产生的SIMD指令的数目。
Compared with traditional DSPs, modern DSPs use more ILP technologies to improve their performance. Meanwhile, in order to meet the special need of media processing, modern DSPs have a common characteristic in the hardware designing and implementation - SIMD. So it is possible that the lower precision media data is processed quickly, a true system which is higher performance and lower power consumption needs not only good hardware supporting but also optimized software that may make full use of the hardware. Beside others , compiler is the most important element. Its performance will influent the total performance of the entire system directly. But current compiler doesn't have the ability to support these SIMD instructions smoothly. So in this thesis, we do a deep research on compiler optimizing techenology which is support of SIMD, making main contributions as follows.
    1.A whole implementation framework of code selection which is support of SIMD has been advanced. The framework makes code selection not face single statement but a basic block., expanded the searching space of code selection generated the instructions of SIMD. Meanwhile,the instruction set of target machine is characterized by machine description. It makes the part that is dependence of machine be constrained to machine description and makes the code selector more flexibility and retargetability.
    2.A DFG representation of basic block is designed and implemented, the representation not only expresses the semantic of Lcode sufficiently but also adapts to the need of tree matching algorithm. The traditional code selection thechnology is always based on the middle representation of DFT. This is one of main reasons that it can not suppot of SIMD.
    3.A tree grammar is used to describe the instruction set of target machine,the description is preprocessed to generate the corresponding instruction template. The tree grammer provides a way that is easy to read, modify and write for the description of target instruction set.meanwhile,hiding the detail of the description through preprocess module,so providing the general interface for process of tree matching.
    4.Improved on the traditional tree matching with dynamic programming algorithm. For each subject tree,it generates not a single optimizing cover but many alternative covers. This step makes a good foundation for integer linear programming which is
引文
[1] Alfred V.Aho, Mahadevan Ganapathi, Steven W.K. Jiang. Code Generation Using Tree Matching and Dynamic Programming[C]. ACM Transactions on Programming Languages and System, vol 11, NO.4, October 1989, Pages 491-516.
    [2] Rainer Leupers, Code Selection for Media Processors with SIMD Instructions[C]. Design, Automation, and Test in Europe 2000, pp. 4-8.
    [3] Rainer Leupers, Steven Bashford. Graph-based Code Selection Techniques for Embedded Processors[C]. ACM Transaction on Design Automation of Electronic Systems, Vol.5, No.4, October 2000, pp.794-814.
    [4] Rainer Leupers, Code Generation for Embedded Processors[C]. 2000 IEEE 1080-1082/00.
    [5] Steven Bashford, Rainer Leupers, Constraint Driven Code Selection for Fixed-Point DSPs[C]. 36th Design Automation Conference (DAC), 1999.
    [6] Rainer Leupers, Peter Marwedel, Instruction Selection for embedded DSPs with Complex Instructions[C]. In European Design Automation Conference(EURO-DAC).
    [7] M. Lorenz, L.Wehmeyer, T.Drager. Energy aware Compilation for DSP swith SIMD Instructions[C]. In Proc. Of LCTES/SCOPES, 2002.
    [8] Christopher W.Fraser, Engineering a simple, Efficient Code-Generator Generator[C], ACM Letters on Programming Language and Systems, vol.1, NO.3, September 1992, pp213-226.
    [9] 胡定磊,陈书明,刘春林.VLIW DSP编译器的构造[J].高技术通讯,2002年增刊.
    [10] 赵常智,刘春林,胡定磊,陈书明.一种支持SIMD的表驱动的代码选择技术。计算机应用研究,2006年5月拟刊出.
    [11] J. Glossner, J. Moreno, M. Moudgill, J. Derby, E. Hokenek, D. Meltzer, U. Shvadron, and M. Ware, "Trends in compilable DSP architecture," in proceedings of 2000 Workshop on Signal Processing Systems, pp. 181-199, October 2000.
    [12] B. R. Rau and J. A. Fisher, "Instruction-Level Parallel Processing: History, Overview and Perspective," The Journal of Supercomputing, vol. 7, pp. 9-50, January 1993.
    [13] P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, "IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors," in Proceedings of the 18th International Symposium on Computer Architecture, pp. 266-275, May 1991.
    [14] R. M. Graham, Principles of Systems Programming, John Wiley & Sons, 1975.
    [15] D. Salomon, Assemblers and Loaders, Ellis Horwood, 1993.
    [16] D. M. Dhamdhere, Systems Programming and Operating Systems, Second Revised Edition, 北京: 清华大学出版社, 2001.[17] J.R.Levine,T.Mason,and D.Brow.杨作梅,张旭东 等译.lex与yacc.第二版,北京:机械工业出版社,2003.
    [18] R. A. Bringmann, "Template for code generation development using the IMPACT-I C compiler," M.S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1992.
    [19] A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986.
    [20] S. A. Mahlke, "Design and Implementation of a Portable Global Code Optimizer," M.S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1992.
    [21] W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, "The superblock: An effective technique for VLIW and superscalar compilation," The Journal of Supercomputing, vol. 7, pp. 229-248, January 1993.
    [22] W. Y. Chen, "An optimizing compiler code generator: A platform for RISC performance analysis," M.S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1991.
    [23] B. T. Sander, "Performance Optimization and Evaluation for the IMPACT X86 Compiler," M.S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1995.
    [24] J. Gyllenhaal, "A Machine Description Language for Compilation," M.S. thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana, IL, 1994.
    [25] J. C. Gyllenhaal, B. R. Rau, and W. W. Hwu, "Hmdes version 2.0 specification," Tech. Rep. IMPACT-96-3, The IMPACT Research Group, University of Illinois, Urbana, IL, 1996.
    [26] S. Aditya, V. Kathail, and B. R. Rau, "Elcor's machine description system: versin 3.0." Tech. Rep. HPL-98-128, Hewlett-Packard Laboratories, 1998.
    [27] B. R. Rau, V. Kathail, and S. Aditya, "Machine-description driven compilers for epic processors," Tech. Rep. HPL-98-40, Hewlett-Packard Laboratories, 1998
    [28] R. E. Hank, "Machine Independent Register Allocation for the IMPACT-I C Compiler", MS thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana IL, 1993.
    [29] P. Faraboschi, G. Desoli, J. A. Fisher, "Clustered Instruction-Level Parallel Processors," Tech. Rep. HPL-98-204, Hewlett-Packard Laboratories, 1998.
    [30] Rainer Leupers, Instruction Scheduling for Clustered VLIW DSPs, IEEE PACT, 2000, pp. 291-300.
    [31] Scott A. Mahlke, William Y. Chen, Pohua P. Chang, and Wen-mei W. Hwu. Scalar Program Performance on Muliple-Instruction-Issue Processors with a Limited Number of Registers. Proceedings of the 25th Annual Hawaii Int'l Conference on System Sciences, Jan. 6-9, 1992, pp. 34-44.
    [32] 袁正才,刘春林,胡定磊,陈书明.分簇VLIW DSP调度技术.计算机应用研究,2004,21(8):80-82.[33] 袁正才,刘春林,胡定磊.一种基于机器描述的VLIW DSP编译技术.计算机工程,2004,30(22):79-81.
    [34] 陈惠斌,刘春林,胡定磊,陈书明.YHFT-D4汇编器的设计与实现.电脑与信息技术.2005.1.
    [35] M.Lorenz, T.drager, Rainer leupers, compiler based Energy Savings by SIMD Operations. LCTES'02-SCOPES'02, June 19-21, 2002, Berlin, Germany.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700