RISC处理器指令Cache设计及其优化
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着信息技术的发展,为了满足高速信息处理和复杂智能控制的要求,以微处理器为控制核心的电路系统应用日益广泛。微处理器体系结构方面的研究和设计,可以推动我国集成电路的发展,满足信息产业发展的要求。
     本论文的研究内容是西北工业大学航空微电子中心所承担的十五预研项目之一“新一代战斗机用32位徽处理器(龙腾R1)设计技术研究”的一部分。
     “龙腾R1”微处理器系统由定点执行单元、浮点单元、存储子系统单元(指令Cache、存储管理单元)和总线接口单元四部分组成,以流水和超标量方式执行指令。本论文完成存储子系统单元的设计与实现、“龙腾R1”系统的集成、存储子系统单元的验证以及在“龙腾R1”存储子系统基础上进行了TraceCache的研究,其中重点讨论存储子系统的设计与实现。
     本课题组设计的“龙腾R1”微处理器芯片,指令系统与Motorola公司的PowerPc603e兼容,体系结构自主设计。整个芯片基于TSMC 0.25u 1P5M CMOS标准单元库和宏单元实现,现已生产出样片并通过测试。芯片内核电压为2.5V,I/O电压为3.3V,内部频率66MHz,芯片面积<25mm~2,晶体管数336万,平均功耗为1.7W(66MHz),封装形式为240—pinCQFP,采用全路径扫描。
     本论文的研究工作包括:
     系统地开展了有关32位RISC处理器中存储子系统的研究和设计。按照功能将存储子系统划分了不同模块,然后按照自顶向下的设计思想进行了存储子系统的设计与实现。所设计的存储子系统主要分为指令存储管理单元、数据存储管理单元和指令Cache。
     存储子系统的功能仿真。对“龙腾R1”存储子系统在模块级、子系统级和芯片级三个层次上分别进行功能仿真。
     存储子系统综合中进行的编码优化,主要目的是提高设计的速度,使存储子系统的时序达到设计要求。
     存储子系统的时序仿真。
     对TraceCache技术进行研究。在已经实现的“龙腾R1”存储子系统基础上对TraceCache进行设计、实现和性能评估。
     通过本论文的研究为设计具有自主知识产权的嵌入式微处理器积累了经验。
The rapid development of information process technology and complex intelligent control method has posed a challenge to computer architecture designers . In order to met the requirements, the circuits with microprocessor as core are used wider and wider. Research and design in microprocessor architecture can promote the development of our national IC industry and satisfy market demand.
    The work in this thesis was part of a National 05' project which task was designing the "LongTengRl" microprocessor.
    There are four parts in "LongTengRl" microprocessor: Integer Execution Unit(IEU), Floating Point Unit(FPU), Memory Subsystem Unit(MSU) and Bus Interface Unit(BIU). The instructions are executed in pipeline. This paper discusses MSU's design, implementation and verification, implements the integration of the "LongTengRl" system and studies the optimization of instruction cache. As a crucial enhancing technology of instruction cache, the trace cache was also studied in this thesis.
    The research work of this thesis mainly includes:
    1. Analyzer of "LongTengRl" Architecture:
    2. Design and Implementation of Memory Subsystem Unit;
    3. Function simulation in three ways:
    4. Coding optimization for improving the speed of MSU.
    5. Timing simulation for verifing the setup/hold time:
    6. Study of the TraceCache Technology;
    "LongTengRl" is a complex microporcessor system. This thesis has contributed a lot to the designing of embedded microprocessor with full copyrights. Moreover, it provides an optional microprocessor core for urgent need in aviation field.
引文
[1] Wang H.,Sun T. and Yang Q., "CAT—Caching Address Tags: A Technique for Reducing Area Cost of On-chip Caches", 22nd Annual International Symposium on Computer Architecture, pp.381-390.June 1995
    [2] David Nagle, Richard Uhlig. Trevor Mudge and Stuart Sechrest, "Optimal Allocation of On-chip Memory for Multiple-API Operation Systems". Proceedings of 21th Annual International Symposium on Computer Architecture, pp.358-369, Apr. 1994
    [3] E.Rotenberg,S.Bennett,and J.E.Smith. "Trace Cache:a low latency approach to high bandwidth instruction fetching," in proceedings of the 29th Annual AMC/IEEE International Symposium on Microarchitecture. 1996.
    [4] Q. Jacobson, E. Rotenberg, and J.E. Smith. "Path-based Next Trace Pre- diction." In Proceedings of the 30th International Symposium on Micro- architecture, December 1997.
    [5] B. Black, B. Rychlik, and J. Shen, "The Block-based Trace Cache." In Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999.
    [6] E. Rotenberg, Q. Jacobson, Y. Sazeides and J. E. Smith. Trace Processors. Proc of the 30th. Ann. Symp. on Microarchitecture, 1997.
    [7] PowerPC~(TM) MicroPorcessor Family:The Programming Environments for 32-Bit Microprocessors. Motorola
    [8] T.M.Austin and G.S.Sohi, "Dynamic dependency analysis of ordinary programs," in Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992.
    [9] S.J.Patel,D.H.Friendly. and Y.N.Patt. "Critical issues regarding the Trace Cache fetch mechanism," Technical Report CSE-TR-335-97. University of Michigan Technical Report,May 1997.
    [10] J.D.Johnson, "Expansion caches for superscalar microprocessors," Technical Report CSL-TR-94-630,Stanford University, Palo Alto CA,June 1994.
    [11] Tse-Yu Yeh, Deborah T. Marr, Yale N.Patt, "Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache", The 7th ACM International Conference on Supercomputing. July, 1993, Tokyo. Japan
    [12] J.Lee and A.J.Smith, "Branch Prediction Strategies and Branch Target Buffer Design", Computer,Jan. 1984.
    [13] Steven Przybylski."The performance Impact of Block Sizes and Fetch Strategies", 17th Annual International Symposium of Computer Architecture, Seattle,Washington, May 1990
    [14] Michael Sung, "Design of Trace Caches for High Bandwidth Instruction Fetching," Electrical Engineering and Computer Science Massachusetts Institute of Technology, May 1998
    
    
    [15] Thomas M.Conte, Kishore N.Menezes, Patrick M.Mills, Burzin A.Patel, "Optimization of Instruction Fetch Mechanisms for High Issue Rates," Proceedings of the 22nd Annual International Symposium on Computer Architecture. Jun. 1995
    [16] Matt Postiff, Gary Tyson, and Trevor Mudge, "Performance Limits of Trace Caches," University of Michigan EECS dpartment
    [17] Amit Saha, Jerry Yen, and Rajnish Kumar, "Towards a More Efficient Trace Cache," ELEC/COMP 525 April 24, 2001
    [18] Daniel A. Jimenez and Calvin Lin, "Neural Methods for Dynamic Branch Prediction," Proceedings of the Seventh Insternational Symposium on High Performance Computer Architecture, Monterrey, NL, Mexico 2001
    [19] Quinn Jacobson, Eric Rotenberg and James E.Smith, "Path-Based Next Trace Prediction," Proceedings of Micro-30, December. 1997
    [20] Verilog Synthesis and Simulation Workshop, Synopsys Inc., 1995.
    [21] Peter Chambers, "The Ten Commandments of Excellent Design," 1997, VLSI Technology, Download from: http://www.fpga.com.cn
    [22] Shien-Tai Pan, Kimming So, and Joseph T. Rahmeh. Improving the Accuracy of DynamicBranch Prediction Using Branch Correlation. In Proceedings of the Fifth International Conferenceon Architectural Support for Programming Languages and Operating Systems, pages76-84. Boston. Massachusetts, October 12-15, 1992.
    [23] H. Akkary, M.Driscoll. "A Dynamic Multithreading Processor." in Proceedings of the 31st International Symposium on Microarchitecture, Nov. 1998.
    [24] D. Friendly, S. Patel. Y. Part, "Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors," in Proceedings of the 31st International Symposium on Microarchitecture. Nov. 1998.
    [25] S. Jourdan, T. Hsing, J. Stark, and Y. Patt, "The Effects of Mispredicted-Path Execution on Branch Prediction Structures." In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, October 1996
    [26] S-T. Pan. K. So, and J. Rahmeh, "Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation." In Proceedings of the 5th International Conference on Architecture Support for Programming Languages and Operating Systems, pp. 76-84. October 1992
    [27] S. Wallace and N. Bagherzadeh, "Multiple Branch and Block Prediction." In Proceedings of the Third International Symposium on High Performance Computer Architecture, February 1997
    [28] Bruce Jacob, Trevor Mudge: Virtual Memory In Contemporary Microprocessors, IEEE Micro July-August, Page 60-75, 1998
    [29] 林钟官,“2000年微处理器展望”,计算机世界,IDG版,1996年12月
    [30] 杜贵然,周兴铭等,“Trace Cache及Trace处理器技术”,计算机工程与科学,2001年第23卷第一期
    [31] 黄震春,李三立,“填补存储器间距的一种方法—前赡性Cache”,小型微型计算机系统,第23卷第六期,2002
    [32] 朱霞,“适合线程级并行的硬件技术研究与设计”,西北工业大学博士学位论文,
    
    2002年7月
    [33] 郑纬民、汤志忠,“计算机系统结构”,清华大学出版社,1998年
    [34] 任恭海,“32位嵌入式航空机载RISC微处理器的研究及系统设计”,西北工业大学 博士学位论文,1996年5月
    [35] 马婉良,“微处理器片内存储系统的研究与设计”,西北工业大学,博士学位论文,1998年11月
    [36] 李涛,“航空机载嵌入式32位微处理器的研究与设计接口和控制通路的设计”西北工业大学 硕士学位论文,2002年4月
    [37] 张盛兵,“先进微处理器的可测试性设计技术研究”,西北工业大学,博士学位论文,1998年10月
    [38] 侯伯亨、顾新:“硬件描述语言与数字逻辑电路设计”,西安电子科技大学出版社,1997年
    [39] 李瑛等,“Design of Instruction Fetch Unit of a 32-bit RISC MicroProcessor”,西北工业大学第八届研究生学术年会,2003年
    [40] Jan M. Rabaey,“数字集成电路设计透视”,清华大学出版社,1999年
    [41] 白中英、韩兆轩:《计算机组成原理教程》,科学出版社,1988年
    [42] 《硬件描述语言Verilog》,刘明业 蒋敬旗 刁岚松等译,清华大学出版社,2001年
    [43] 李学干,苏东庄:“计算机系统结构”,西安电子科技大学出版社,1994年
    [44] 薛宏熙、边计年:“数字系统设计自动化”,清华大学出版社,1996年
    [45] 高德远、康继吕:“超大规模集成电路——系统和电路的设计原理”,西北工业大学出版社,1989年12月
    [46] 曾繁泰、李冰:“EDA工程概论”,清华大学出版社,2002年

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700