基于IA-64的X微处理器虚拟寄存器技术的研究与实现

英文题名：Research and Implementation of the Virtual Register Technology in X Microprocessor Based on IA-64
作者：黄彩霞
论文级别：硕士
学科专业名称：电子科学与技术
中文关键词：虚拟寄存器 ; 寄存器重命名 ; 寄存器堆栈引擎 ; 软件流水 ; 映射表
英文关键词：virtual register ; register renaming ; register stack engine ; software pipelining ; mapping table
学位年度：2004
导师：张民选
学科代码：080901
学位授予单位：国防科学技术大学
论文提交日期：2004-11-01

摘要

随着微处理器技术迅速发展,当前寄存器技术的管理与使用呈现虚拟化趋势。X微处理器是我们自行研制的、与IA-64体系结构完全兼容的一款高性能微处理器。本文从分析IA-64体系结构入手,具体研究与实现X高性能微处理器的虚拟寄存器技术,并提出一种新颖的基于映射表的寄存器堆栈引擎RSE(Register Stack Engine)技术实现方法。
     X高性能微处理器中的虚拟寄存器技术主要包括寄存器重命名和RSE。本文详细介绍X高性能微处理器中寄存器旋转和寄存器堆栈方式实现的寄存器重命名,举例说明寄存器旋转支持的软件流水技术如何克服传统循环展开方法给代码优化带来的弊端,并进一步给出通用、浮点和谓词寄存器重命名逻辑的具体实现。在RSE方面,本文深入分析该技术在物理寄存器和存储器之间转移数据的实现条件与工作过程,给出RSE状态机和功能部件的设计实现。
     设计验证是确保功能和时序正确的重要手段。本文在概要介绍常用验证方法的基础上,在两个不同的层次对RSE及寄存器重命名功能部件进行验证:一方面在模块级对它们单独进行测试,另一方面把它们与CPU内核进行集成,在系统级作为整个X高性能微处理器的一部分进行测试。后者是通过比对同一测试码在Ski IA-64硬件模拟器和Verilog-xL逻辑模拟下的不同运行结果实现的。
     在Itanium微处理器中,RSE技术得以有效实现,但是一个过程内不再使用的物理寄存器在该过程执行结束之前不能释放。本文提出一种新颖的基于映射表的RSE技术实现方法,它将编号连续的虚拟寄存器映射到非连续的物理寄存器,使过程内的任一物理寄存器只要使用完毕就可以及时释放,从而更加高效地利用寄存器资源。该方法完全兼容于IA-64体系结构,并支持寄存器旋转和软件流水等关键技术。
With the rapid development of microprocessor technology, the management and utilization of register technology has taken on a trend of virtualization at present. X microprocessor is a high-performance processor developed by ourself, and completely complatibale with IA-64's architecture. The paper begins with an analysis of IA-64's architecture, then concretely researches and implements the virtual register technology in X high-performance microprocessor, and at last presents a novel mapping-table-based implementation method of RSE (register stack engine) technology.The virtual register technology in X high-performance microprocessor mainly includes regiseter renaming and RSE. The paper detailedly introduces two methods of the regiseter renaming in X high-performance microprocessor, respectively implemented through register rotation and register stack, give an example to show how software pipelining supported by register rotation overcomes the disadvantage to code optimization induced by traditional unloop method, and furthermore implements the renaming design of general register, floating point register and predication register. In respect of RSE, the paper thoroughly studys its implementation condition and process for exchanging data between physical registers and memory, gives the actual design implementation of its finite state machine and functional unit.Design verification is one of the main means to insure the correctness of function and timing. After simply introducing many general verification methods, this paper verifies regiseter renaming and RSE functional units at two different levels. On the one hand, the two functional units is separately verified at module level; On the other hand they are integrated into CPU core , then verified as parts of entire X high-performance microprocessor at system level. The latter is implemented through comparing the two different results from Ski IA-64 hardware simulator and Verilog-XL logic simulator.RSE technology has been implemented efficiently in Itanium microprocessor, but a physical register which has been useless in a procedure can not be released until the procedure is over. This paper presents a novel mapping-table-based implementation method of RSE technology, which maps continuous virtual registers to incontinuous physical registers. A physical register may be freed in time only if it is not required any more in a procedure, so that the register resource will be utilized higher efficiently. This method is completely compatible with IA-64's architecture and supports some important technology such as register rotation and software pipelining.

引文

[1]. IA-64 Linux Kernel design and implementation, David Mosberger and Stephane Eranian
    [2].李立三,廖恒新.型体系结构概念——虚拟寄存器与并行的指令处理部件.
    [3]. Intel IA-64 Architecture Software Developer's manual, revision 1.1, Intel Corporation, July 2000
    [4]. Dezso Sima. The Design Space of Register Renaming Techniques. IEEE Micro. Vol. 20 No. 5. pp. 70-83. Sep/Oct 2000.
    [5]. R. D. Weldon, S. Chang, H. Wang, "Quantitative Evaluation of the Register Stack Engine and Optimizations for Future Itanium Processors", In Interact-6 held in conjuction with the 8th International Conference on High-Performance Computer Architecture, Feb. 2002.
    [6]. Intel: Hewlett-Packard "INTRODUCING THE IA-64 ARCHITECTURE" IEEE, 2000
    [7]. Antonio Gonzalez, Jose Gonzalez and Mateo Valero "Virtual-Physical Registers", 2002
    [8]. Alex Settle "Optimization for the Intel Itanium Architecture Register Stack", 2002
    [9]. R. David Weldon, "Quantitative Evaluation of the Register Stack Engine and Optimizations for Future Itanium Processors", IEEE, 2002
    [10]. Ryan Rakvic "Performance Advantage of the Register Stack in Intel(?) Itanium~(TM) Processors", 2002
    [11]. Jack L. Lo, Sujay S. Parekh, Susan J. Eggers "Software-Directed Register Deallocation for Simultaneous Multithreaded Processors" 1998
    [12]. Keith I. Farkas "Register File Design Considerations in Dynamically Scheduled Processors" 1996
    [13]. Microprocessor Report "MIPS R10000 Uses Decoupled Architecture"
    [14]. Gurhan Kucuk, Kanad Ghose, Dmitry V. Ponomarev "Energy-efficient Instruction Dispatch Buffer Design for Superscalar Processors" 2002
    [15]. Dmitry Ponomarev, Gurhan Kucuk, Kanad Ghose "Energy-Efficient Design of the Reorder Buffer" 2002
    [16]. Kucuk, G., Ponomarev, D., Ghose, K., "Low-Complexity Reorder Buffer Architecture", in Proc. Of the 16th International Conference on Supercomputing, 2002.
    [17]. Ryan Rakvic, Ed Grochowski, Bryan Black, Murali Annavaram, Trung Diep, and John P. Shen "Performance Advantage of the Register Stack in Intel~(?) Itanium~(TM) Processors" 2002
    [18]. Chang-chung liu, R-Ming Shiu and Chung-Ping Chung "Register Renaming for x86 superScalar Design" 1997
    [19]. Roshan Gummattira, Spyros, Tsavachidis Teresa Watkins "Register Hierarchy" 2001
    [20]. Alex Settle Daniel A. Connors "Optimization for the Intel RItanium R Architecture Register Stack" 2003
    [21]. Benjamin Bishop, Thomas P. Kelliher "The Design of a Register Renaming Unit" 1999
    [22]. Milo M. Martin, Amir Roth, and Charles N. Fischer "Exploiting Dead Value Information" 1997 IEEE
    [23]. Scott A. Mahlke, Richard Hank "A comparison of Full and Partial Predicated Execution Support for ILP Processors" 1994
    [24]. Monica S. Lam "Software Pipelining: An Effective Scheduling Technique for VLIW Machines" 2003 ACM
    [25]. A. Douillet, J. N. Amaral, and G. R. Gao." Fine-grain stacked register allocation for the itanium architecture". In 15th Workshop on Languages and Compilers for Parallel Computing(LCPC), 2002.
    [26]. G. J. Chaitin. Register allocation and spilling via graph coloring. In Proceedings of the ACM SIGPLAN 82 Symp. On Compiler Construction, pages 98-105, June 1982.
    [27]. Y. Kiyohara, S. M. W. Chen, R. Bringmann, R. Hank, S. Anik, andW. Hwu. Register connection: A new approach to adding registers into instruction set architectures. In Proceedings of the 20th International Symposium on Computer Architecture, pages 247-256, May 1993.
    [28]. J. L. Cruz, A. Gonzalez, M. Valero, and N. Topham, "Multiple-Banked Register File Architectures".
    [29]. J. Zalamea, J. Llosa, E. Ayguade, and M. Valero, "Two-Level Hierarchical Register File Organization for VLIW Processors".
    [30]. M. Postiff, D. Greene, S. Raasch, and T. N. Mudge. Integrating superscalar processor components to implement register caching. In International Conference on Supercomputing, 2001.
    [31]. Michael Keating Pierre Bricaud "Reuse methodology manual for system-on-chip design"
    [32]. John Paul Shen, Mikko H. Lipasti "Modern Processor Design fundamentals of Superscalar Processors" 张承义邓宇、王蕾等译 “现代处理器设计”—超标量处理器基础
    [33].张承义张民选“动态超长指令字技术研究”第八界计算机工程与工艺全国学术年会 2003年
    [34].孙彩霞,张民选、王永文“高性能微处理器体系结构IA-64的特征分析—显示并行与前瞻技术和推测执行技术结合” 计算机研究与发展第39卷增刊 2002年
    [35].李勇衣晓飞田新华方粮”高性能微处理器中的虚拟寄存器技术”
    [36].张亮,数字电路设计与Verilog HDL,人民邮电出版社
    [37].Thomas & Moorby's,硬件描述语言Verilog(第四版)
    [38].张晨曦,王志英,张春元,戴葵,计算机体系结构,高等教育出版社
    [39]. Ski IA-64 Simulator Reference Manual
    [40].曾芷德,数字系统测试与可测性,国防科技大学出版社

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700