VLIW DSP汇编器与代码生成器的设计与实现

英文题名：Design and Implementation of a VLIW DSP Assembler and Code Generator
作者：陈惠斌
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：汇编器 ; 代码生成器 ; VLIW ; DSP ; 簇分配 ; 调度 ; UAS
英文关键词：Assembler ; Code Generation ; VLIW DSP ; cluster assigning ; scheduling ; UAS
学位年度：2005
导师：刘春林
学科代码：081201
学位授予单位：国防科学技术大学
论文提交日期：2005-06-01

摘要

与传统DSPs相比,现代DSPs采用更多的ILP技术以提高机器性能。另一方面,它们呈现出规整的、可编译的体系结构,使得能够为它们构造高效的优化编译器。本文讨论这样一款DSP,它采用分簇的VLIW体系结构,能够在单个时钟周期同时执行多个操作。我们叙述这款VLIW DSP的汇编器和代码生成器的构造。
为了便于对向前引用的处理,VLIW DSP汇编器组织成两遍结构。第一遍仅记录源文件中的符号(标号)信息,第二遍重新扫描源文件,利用先前收集的信息产生目标文件。VLIW DSP汇编器的特点包括:借助lex和yacc生成词法和语法分析器;汇编语句在汇编器中以内部表示的形式存在;指令的编码信息存于数据表格,编码指令时使用通用的过程查询这些表格以决定指令的编码格式和opcode,然后调用相应的编码函数产生机器代码。
VLIW DSP代码生成器在IMPACT C编译器框架基础上实现。我们为VLIW DSP定制它的机器规格说明和机器描述,并利用IMPACT的模板构造它的代码生成器。VLIW DSP体系结构的一个显著特点是分簇,即大而集中的寄存器文件被分成多个块,每个块与几个功能单元相关联组成簇。与这一特点相对应,代码生成的一个重要步骤是簇分配,即为每个操作及其操作数映射合适的簇。簇分配应使得各簇的功能单元得到充分利用,并设法减少簇之间的数据传递。我们给出统一的簇分配与调度算法(UAS)针对VLIW DSP的实现,算法的特点是簇分配与调度一同进行,当调度一个操作时,同时为这个操作和它的操作数分配合适的簇。
Compared with traditional DSPs, modern DSPs use more ILP technologies to improve their performance. On the other hand, they present regular and compilable architectures, which enable construction of efficient, optimizing compliers for them. In this thesis we discuss such a DSP, which uses a clustered VLIW architecture and can perform multiple operations simultaneously during a single clock cycle. We describe the construction of the assembler and the code generator of VLIW DSP.To ease the processing of forward references, our assembler is organized as a two-pass structure. In the first pass, it only records information about symbols (labels) in the source file. In the second pass, it scans the source file again, and by using the information collected earlier generates the object file. The VLIW DSP assembler's features include: it uses lex and yacc to generate the lexer and the parser respectively; an assembly statement exists as an internal representation in the assembler; instructions' encoding information is saved in the data tables, and when encoding an instruction a generic procedure is employed to search these tables to decide on its encoding format and opcode, and then a corresponding encoding function is called to produce its machine code.Our code generator is implemented based on IMPACT C compiler framework. We customized a machine specification and a machine description for the VLIW DSP, and constructed the code generator using the template provided by IMPACT. One of the prominent features of our DSP's architecture is clustering, that is, a big centralized register file is splitted into more small pieces, each and its several associated functional units forming a cluster. With this feature, an important phase of our code generation is cluster assigning, which maps operations and their operands to appropriate clusters. Cluster assignment should make maximal use of functional units across clusters, and reduce inter-cluster data movement besides. We implemented the Unified Assign and Schedule (UAS) algorithm to support cluster assignment, which has the following features: cluster assigning and scheduling are unified, and when scheduling an operation, the operation and its operands are assigned to their appropriate clusters at the same time.

引文

[1] J. Glossner, J. Moreno, M. Moudgill, J. Derby, E. Hokenek, D. Meltzer, U. Shvadron, and M. Ware, "Trends in Compilable DSP Architecture," in proceedings of 2000 Workshop on Signal Processing Systems, pp. 181-199, October 2000.
    [2] B. Rau and J. Fisher, "Instruction-Level Parallel Processing: History, Overview and Perspective," The Journal ofSupercomputing, vol. 7, pp. 9-50, January 1993.
    [3] Texas Instruments, "TMS320C6000 CPU and Instruction Set Reference Guide", TI Report number SPRU189F, October 2000.
    [4] R. Graham, "Principles of Systems Programming," John Wiley & Sons, 1975.
    [5] D. Salomon, "Assemblers and Loaders," Ellis Horwood, 1993.
    [6] D. Dhamdhere. "Systems Programming and Operating Systems." Second Revised Edition.北京:清华大学出版社,2001.
    [7] J. Levine, T. Mason, and D. Brow.杨作梅,张旭东等译.“lex与yacc.”第二版.北京:机械工业出版社,2003.
    [8] A. Aho, R. Sethi, and J. Ullman, "Compilers: Principles, Techniques, and Tools," Addison-Wesley, 1986.
    [9] P. Chang, S. Mahlke, W. Chen, N. Warter, and W. Hwu, "IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors," in Proceedings of the 18th International Symposium on Computer Architecture, pp. 266-275, May 1991.
    [10] R. Bringmann, "Template for Code Generation Development Using the IMPACT-I C Compiler," M. S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1992.
    [11] S. Mahlke, "Design and Implementation of a Portable Global Code Optimizer." M. S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1992.
    [12] S. Mahlke, D. Lin, W. Chen, R. Hank, and R. Bringmann, "Effective Compiler Support for Predicated Execution Using the Hyperblock," in Proceedings of the 25th Annual International Symposium on Microarchitecture, pp. 45-54, December 1992.
    [13] W. Hwu, S. Mahlke, W. Chen, P. Chang, N. Warter, R. Bringmann, R. Ouellette, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery, "The Superblock: An Effective Technique for VLIW and Superscalar Compilation," The Journal of Supercomputing, vol. 7, pp. 229-248, January 1993.
    [14] S. Mahlke, W. Chen, J. Gyllenhaal, W. Hwu, P. Chang, and T. Kiyohara, "Compiler Code Transformations for Superscalar-Based High-Performance Systems," in Proceedings of Supercomputing, pp. 808-817, November 1992.
    [15] W. Y. Chen, "An Optimizing Compiler Code Generator: A Platform for RISC Performance Analysis," M. S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1991.
    [16] R. Ouellette, "Compiler Support for Spare Architecture Processors," M. S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1994.
    [17] W. Dugal, "Code Scheduling and Optimization for a Superscalar x86 Microprocessor," M. S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1995.
    [18] B. Sander, "Performance Optimization and Evaluation for the IMPACT X86 Compiler," M. S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1995.
    [19] C. Shannon, "The IMPACT SC140 code generator," M. S. thesis, Department of Computer Science, University of Illinois, Urbana, IL, 2002.
    [20] V. Kathail, M. Schlansker, and B. Rau, "HPL PlayDoh Architecture Specification: Version 1.0," Tech. Rep. HPL-93-80, Hewlett-Packard Laboratories, 1994.
    [21] J. Gyllenhaal, "A Machine Description Language for Compilation," M. S. thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana, IL, 1994.
    [22] J. Gyllenhaal, B. Rau, and W. Hwu, "Hmdes Version 2.0 Specification," Tech. Rep. IMPACT-96-3, The IMPACT Research Group, University of Illinois, Urbana, IL, 1996.
    [23] S. Aditya, V. Kathail, and B. Rau, "Elcor's Machine Description System: Version 3.0," Tech. Rep. HPL-98-128, Hewlett-Packard Laboratories, 1998.
    [24] B. Rau, V. Kathail, and S. Aditya, "Machine-description Driven Compilers for Epic Processors," Tech. Rep. HPL-98-40, Hewlett-Packard Laboratories, 1998.
    [25] P. Chang, D. Lavery, S. Mahlke, W. Chen, and W. Hwu, "The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors," IEEE Transactions on Computers, vol. 44, no. 3, pp. 353-370, March 1995.
    [26] R. Hank, "Machine Independent Register Allocation for the IMPACT-I C Compiler", M. S. thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana IL, 1993.
    [27] R. Hank, "Region-Based Compilation", Ph. D. dissertation, Department of Electrical and Computer Engineering, University of Illinois, Urbana IL, May 1996.
    [28] R. Bringmann, "Enhancing Instruction Level Parallelism through Complier-Controlled Speculation," Ph. D. dissertation, Department of Electrical and Computer Engineering, University of Illinois, Urbana, IL, 1995.
    [29] J. Gyllenhaal, "An Efficient Framework for Performing Execution-Constraint-Sensitive Transformations that Increase Instruction-Level Parallelism," Ph. D. dissertation, Department of Electrical and Computer Engineering, University of Illinois, Urbana, IL, 1997.
    [30] J. Ellis, "Bulldog: A Compiler for VLIW Architectures," The MIT Press, 1986, Ph. D. thesis, Yale, 1994.
    [31] P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg, "The Multiflow Trace Scheduling Compiler," The Journal of Supercomputing, vol. 7, pp. 51-142, January 1993.
    [32] A. Capitanio, N. Dutt, and A. Nicolau, "Partitioned Register Files for VLIW's: A Preliminary Analysis of Tradeoffs," In Proceedings of the 25th Annual International Symposium on Microarchitecture (MICRO-25), pp. 292-300, Protland, OR, December 1-4 1992.
    [33] D. Kolson, A. Nicolau, Nikil Dutt, and K. Kennedy, "A Method for Register Allocation to Loops in Multiple Register File Architectures," IPPS, pp. 28-33, 1996.
    [34] S. Jang, "Generating Efficient Code of VLIW Architectures with Partitioned Register Files," M. S. thesis, Michigan Technological University, 1996.
    [35] S. Jang, S. Carr, P. Sweany, and D. Kuras, "A Code Generation Framework for VLIW Architectures with Partitioned Register Files," In Proceedings of the Third International Conference on Massively Parallel Computing Systems (MPCS), pp. 61-69, April 1998.
    [36] D. Kuras, "Using Value Cloning to Improve Code Generation for Software Pipelined Loops on VLIW Architectures with Partitioned Register Files", M. S. thesis, Michigan Technological University, 1998.
    [37] J. Hiser, S. Carr, and P. Sweany, "Global Register Partitioning," In proceedings of the 2000 International Conference on Parallel Archlitectures and Compilation Techniques, pp. 13-23, 2000.
    [38] G. Desoli, "Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach," HPL-98-13, HP Laboratories Cambridge, February 1998.
    [39] P. Faraboschi, G. Desoli, J. Fisher, "Clustered Instruction-Level Parallel Processors," Tech. Rep. HPL-98-204, Hewlett-Packard Laboratories, 1998.
    [40] E. Ozer and S. Banerjia, "Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures," MICRO, pp. 308-315, 1998.
    [41] R. Leupers, "Instruction Scheduling for Clustered VLIW DSPs," IEEE PACT, pp. 291-300, 2000.
    [42] M. Chu, K. Fan, and S. Mahlke, "Region-based Hierarchical Operation Partitioning for Multicluster Processors," In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pp. 300-311, June 2003.
    [43] S. Mahlke, W. Chen, P. Chang, and W. Hwu, "Scalar Program Performance on Muliple-Instruction-Issue Processors with a Limited Number of Registers," In Proceedings of the 25th Annual Hawaii Int'l Conference on System Sciences, pp. 34-44, January 1992.
    [44] 袁正才,刘春林,胡定磊.一种基于机器描述的VLIW DSP编译技术.计算机工程,2004,30(22):79-81.
    [45] 胡定磊,陈书明,刘春林.分簇结构超长指令字DSP编译器的设计与实现.小型微型计算机系统.即将刊出.
    [46] 陈惠斌,刘春林,胡定磊.一种快速构造汇编器的方法及其应用.计算机工程与科学.2006.6拟刊出.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700