32位RISC微处理器设计研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着VLSI工艺水平的提高,如今能够把整个电子系统集成到一块或几块芯片上(SoC)。SoC的出现能够在改善系统性能的同时减小系统的功耗、尺寸和成本。SoC设计成败的关键在于其中的RISC微处理器的设计。同时随着半导体工艺技术的提高、体系结构技术的不断发展以及应用需求的不断提高,对高性能嵌入式微处理器产品的需求量也越来越大。
    本文在介绍了各种商业主流RISC微处理器的技术特点后讨论了一种32位高性能RISC微处理器的设计方法,重点在于其逻辑设计,包括:指令集结构设计、RISC CPU设计、层次化存储器系统设计和其它功能单元设计。随后我们对RISC微处理器进行了功能验证,它包括两个方面:系统级仿真与FPGA硬件验证。实验表明,我们所设计的电路达到了预期的目标,并且在速度、面积等指标上有着较好的性能。
    本文提出了一种对集中式控制器单元进行划分的方法并根据该方法确定了RISC CPU的体系结构。这种结构不仅利于进行调试和扩展而且流水线的暂停信息不会在多个流水段内进行传递,因此对流水线的速度不会附加额外的负面影响。
    本文提出了一种完全去掉由RAW冒险在流水线中所引起的“气泡”的方法。
    本文提出了一种显著缩短程序的执行时间的方法。当转移指令处于指令译码段时就能够判断转移发生与否和确定下一条要取的指令的地址,这使得在转移指令之后只需插入一条空指令。这种方法显著地缩短了程序的执行时间。
    本文提出了流水线暂停的两条原则并根据这两条原则产生了流水线中各个流水段的暂停信号。仿真波形表明,这些信号能够使流水线正确地暂停与恢复。
    本文采用了不同的方式产生从指令MMU送往指令Cache和从数据MMU送往数据Cache的是否可缓存标志信号。在任何情况下指令存储器所对应的地址空间都是可缓存的。这在功能上是正确的,同时减少了指令存储器的访问次数和去掉了一个异步环路。这改善了整个系统的时序。
    本文研究了RISC微处理器的低功耗设计技术并给出了一种支持动态和静态功耗管理的功耗管理单元的设计方法。
    本文研究了RISC微处理器对WISHBONE SoC接口的支持并给出了一种采用
    
    
    WISHBONE协议的总线接口单元的设计方法。
    本文介绍了在进行系统级仿真时所采用的两种配置管理方法。通过这两种仿真管理方法,对于一种特定的仿真,仿真环境的使用者能够以最快的速度决定在该仿真中要使用哪些模型从而提高了仿真效率。
    最后,给出了设计的FPGA硬件验证方案。比较了系统级软件仿真与FPGA硬件验证两种方式所能得到的吞吐率并论证了FPGA硬件验证的必要性。
    总的来说,该微处理器在应用方面具有很好的性能,并且实现简单,规模可扩展性好,具有开放的SoC接口。
Technology advances are providing overwhelming capability to integrate the whole electronic system into a single chip or a chip-set. The SOC (system-on-a-chip) paradigm reduces system power, size and cost as well as improves system performance. The key to the success of SoC design is the RISC microprocessor inside. And with the boost of semiconductor technology and the development of architecture technology, new applications need more and more high performance embedded microprocessors.
    This paper discusses the design methodology of a 32-bit high performance RISC microprocessor after various commercial mainstream RISC machines are introduced. It includes the following aspects: design of Instruction Set Architecture; design and implementation of RISC CPU﹑hierarchy memory system and other functional units with the emphasis is logic design. Functional verification that consists of system-level simulation and FPGA hardware prototyping is preceded and the results show goal is achieved and the performance characters such as speed and area of the design is good.
    This paper presents a new approach to partition the central controller unit and the architecture of the RISC CPU is determined based on the proposed approach. Compared to the traditional architecture, this architecture is easier to be debugged and extended. Moreover, the pipeline stall information doesn’t traverse through multiple pipeline stages and it doesn’t impact on the pipeline speed.
    This paper presents an approach to completely remove the bubbles in the pipeline caused by the RAW hazard.
    This paper presents an approach to shorten the program’s execution time. Whether branch is taken can be judged and what is the next instruction to be fetched can be determined when the branch instruction is still in instruction decode stage. Thus, only a NULL operation is inserted after the branch instruction and the execution time of program can be significantly decreased.
    This paper also presents two principles to stall the pipeline and freeze signals corresponding to various pipeline stages are generated accordingly. The simulation waveform shows that the pipeline can be correctly stalled or unstalled by these signals.
    Different methods are adopted to generate the cache-inhibit flag according to whether
    
    
    it is sent to the instruction cache from the instruction MMU or it is sent to the data cache from the data MMU. The address space corresponding to instruction memory is always indicated cacheable, which is correct in function. The number of instruction memory access times is decreased and an asynchronous loop is removed to improve the timing of the whole system.
    Low power design technology in microprocessor is researched and a design methodology that supports dynamic and static power management is proposed.
    WISHBONE SoC interface is researched and the design methodology of a bus interface unit that is completely compatible with the WISHONE bus protocol is provided.
    This paper presents two approaches used in system-level simulation to manage configurations. Through these two approaches, the user of the simulation environment can determine which models are necessary in a special simulation with the least time and thus, the simulation efficiency is improved.
    The scheme of FPGA hardware implementation has been given at the end of this paper. The throughputs of software simulation and hardware emulation are compared and the essence of FPGA hardware prototyping is verified.
    In general, this RISC microprocessor based on design principle achieves high performance with low hardware cost. It has good scalability and open SoC interface and can be easily integrated into an electronic system.
引文
[1] 马忠梅,马广云,徐英慧等. ARM嵌入式处理器结构与应用基础. 北京:北京航空航天大学出版社,2002.1. pp.11~13
    [2] G.Budd, G.Milne. ARM7100-a high-integration, low-power microcontroller for PDA applications. in: Compcon '96. 'Technologies for the Information Superhighway' Digest of Papers, 25-28 Feb. 1996, pp.182~ 187
    [3] L.Goudge, S.Segars. Thumb: reducing the cost of 32-bit RISC performance in portable and consumer applications. in: Compcon '96. 'Technologies for the Information Superhighway' Digest of Papers, 25-28 Feb. 1996, pp.176 ~ 181
    [4] D.Auer, M.Buer. A design flow for embedding the ARM processor in an ASIC. ASIC Conference and Exhibit, 1995., Proceedings of the Eighth Annual IEEE International, 18-22 Sept. 1995, pp.342 ~ 345
    [5] L.T.Clark, E.J.Hoffman, J.Miller, et al. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE Journal of Solid-State Circuits, Nov. 2001,Volume(36):1599~1608
    [6] C.Piguet, T.Schneider, J.-M.Masgonty, et al. Low-power embedded microprocessor design. EUROMICRO 96. 'Beyond 2000: Hardware and Software Design Strategies'., Proceedings of the 22nd EUROMICRO Conference , 2-5 Sept. 1996, pp.600 - 605
    [7] J.E.Smith, S.Weiss. PowerPC 601 and Alpha 21064: a tale of two RISCs. Computer,?June 1994, Volume(27):46 ~ 58
    [8] D.Meyer. Alpha architecture: Hardware implementation and software prograuming implications.in: Computer Design: VLSI in Computers and Processors,1992. ICCD '92 Proceedings,IEEE 1992 International Conference on,11-14 Oct. 1992,pp.4~5
    [9] D.K.Bhavsa. An algorithm for row-column self-repair of RAMs and its implementation in the Alpha 21264. Test Conference, 1999. Proceedings.?28-30 Sept. 1999, pp.311 - 318
    [10] D.M. Fenwick, D.J. Foley, S.R. VanDoren. Enterprise AlphaServer system. Compcon '95.'Technologies for the Information Superhighway', Digest of Papers.,?5-9 March.1995, pp.102~105
    [11] J.O. Hamblen. A VHDL synthesis model of the MIPS processor for use in computer
    
    
    architecture laboratories. IEEE Transactions on Education,?Volume: 40,?Issue: 4,?Nov. 1997, pp.10
    [12] K. Au, P. Chang, C. Giles, et al. MiniRISC CW4001-a small, low-power MIPS CPU core. Custom Integrated Circuits Conference, 1995. Proceedings of the IEEE 1995,?1-4 May 1995, pp.577~ 580
    [13] B. Zivkov, B. Ferguson, M. Gupta. R4200: a high-performance MIPS microprocessor for portables. Compcon Spring '94, Digest of Papers. ,?28 Feb.-4 March 1994, pp.18 ~ 25
    [14] S.Mirapuri,M.Woodacre,N. Vasseghi. The Mips R4000 processor Micro. IEEE Transactions on VLSI,April 1992,Vol.12,Issue: 2,pp.10 ~22
    [15] R.B. Lee. Realtime MPEG video via software decompression on a PA-RISC processor. Compcon '95.'Technologies for the Information Superhighway', Digest of Papers. 5-9 March 1995, pp.186 ~192
    [16] J.R. Bell, J. Beyers, R.E. Jonas. Precision RISC Organization: an independent organization for PA-RISC products and standards. Compcon Spring '94, Digest of Papers.?28 Feb.-4 March 1994, pp.40 ~ 46
    [17] L.C. Tsai. A 1 GHz PA-RISC processor. Solid-State Circuits Conference, 2001. Digest of Technical Papers. ISSCC, 2001 IEEE International.?5-7 Feb. 2001, pp.322 ~ 323
    [18] T. Brewer. A highly scalable system utilizing up to 128 PA-RISC processors. Compcon '95.'Technologies for the Information Superhighway', Digest of Papers. ,?5-9 March 1995, pp.133~140
    [19] G. Kurpanek, K. Chan, J. Zheng, et al. PA7200: a PA-RISC processor with integrated high performance MP bus interface. Compcon Spring '94, Digest of Papers.?28 Feb.-4 March 1994, pp.375 ~ 382
    [20] K.A. Hurd. A 600 MHz 64 b PA-RISC microprocessor. Solid-State Circuits Conference, 2000. Digest of Technical Papers. ISSCC, 2000 IEEE International.?7-9 Feb. 2000, pp.94 ~ 95
    [21] A. Kumar. The HP PA-8000 RISC CPU. IEEE,?Volume: 17,?Issue: 2,?March-April 1997, pp.27 ~ 32
    [22] A.M. Holler. Compiler optimizations for the PA-8000. IEEE Proceedings on Micro,?23-26 Feb. 1997, pp.87 ~ 94
    [23] C. Burch. PA-8000: a case study of static and dynamic branch prediction. Computer Design: VLSI in Computers and Processors, 1997. ICCD '97. Proceedings., 1997 IEEE International Conference on,?12-15 Oct. 1997, pp.97 ~ 105
    
    [24] R. Elsbernd. Mid-range and high-end PA-RISC computer systems. Compcon '96. 'Technologies for the Information Superhighway' Digest of Papers,?25-28 Feb. 1996, pp.161 ~ 166
    [25] T. Okada, S. Narita, O. Nishii, et al. A PA-RISC microprocessor PA/50L for low-cost systems. Compcon Spring '94, Digest of Papers. ,?28 Feb.-4 March 1994, pp.47 ~ 52
    [26] R. Lee,J. Huck. 64-bit and multimedia extensions in the PA-RISC 2.0 architecture, Compcon '96. 'Technologies for the Information Superhighway' Digest of Papers,25-28 Feb. 1996,pp.152 ~ 160
    [27] D. Levitan,T. Thomas,Tu.P. The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor,Compcon '95.' Technologies for the Information Superhighway',Digest of Papers,5-9 March 1995,pp.285 ~ 291.
    [28] G.B. Kromann. 13??Thermal management of a C4/ceramic-ball-grid array: the Motorola PowerPC 603TM and PowerPC 604TM RISC microprocessors. Semiconductor Thermal Measurement and Management Symposium, 1996. SEMI-THERM XII. Proceedings., Twelfth Annual IEEE,?5-7 March 1996, pp.36 ~ 42
    [29] G.B. Kromann, D. Gerke, W. Huang. Motorola's PowerPC 603 and PowerPC 604 RISC microprocessor: the C4/ceramic-ball-grid array interconnect technology. Electronic Components and Technology Conference, 1995. Proceedings., 45th ,?21-24 May 1995, pp.1 ~ 9
    [30] B. Burgess, M. Alexander, Ho. Ying-Wai, et al. The PowerPC 603 microprocessor: a high performance, low power, superscalar RISC microprocessor. Compcon Spring '94, Digest of Papers. ,?28 Feb.-4 March 1994, pp.300 ~ 306
    [31] A. Marsala, B. Kanawati. PowerPC processors. System Theory, 1994., Proceedings of the 26th Southeastern Symposium on,?20-22 March 1994, pp.550 ~ 556
    [32] S.P. Song, M. Denman, J. Chang. The PowerPC 604 RISC microprocessor. Micro. IEEE,?Volume: 14,?Issue: 5,?Oct. 1994, pp.8
    [33] R.W. Berger, D. Bayles, R.Brown. The RAD750TM-a radiation hardened PowerPCTM processor for high performance spaceborne applications. Aerospace Conference, 2001, IEEE Proceedings. ,?Volume: 5,?10-17 March 2001, pp.2263 ~ 2272
    [34] S. Surya, P. Bose, J.A. Abraham. Architectural performance verification: PowerPC processors. Computer Design: VLSI in Computers and Processors, 1994. ICCD '94. Proceedings., IEEE International Conference on,?10-12 Oct. 1994, pp.344 ~ 347
    
    [35] K. Diefendorff, R. Oehler, R. Hochsprung. Evolution of the PowerPC architecture. Micro, IEEE,?Volume: 14,?Issue: 2,?April 1994, pp.34 ~ 49
    [36] B.R. Olszewski, J.-J. Guillemaud. The performance and performance methodology for a PowerPC SMP system. Compcon '95.'Technologies for the Information Superhighway', Digest of Papers. ,?5-9 March 1995, pp.116 ~ 121
    [37] T. Horel, G. Lauterbach. UltraSPARC-III: designing third-generation 64-bit performance. Micro, IEEE,?Volume: 19,?Issue: 3,?May-June 1999, pp.73 ~ 85
    [38]?R. Cadman. SPARC architecture and processor implementation RISC Architectures and Applications,IEE Colloquium on,4 Nov 1991,pp.4/1 ~ 4/17
    [39] T. Jamil. RISC versus CISC. Potentials, IEEE,?Volume: 14,?Issue: 3,?Aug.-Sept. 1995, pp.13 ~ 16
    [40] Thanh Tran, G. Frantz, Cheng Peng. System-on-chip choices [RISC or DSP solution]. SOC Conference, 2003. Proceedings. IEEE International [Systems-on-Chip],?17-20 Sept. 2003, pp.259 ~ 260
    [41] J.P. Wittenburg, M. Ohmacht, J. Kneip, et al. HiPAR-DSP: a parallel VLIW RISC processor for real time image processing applications. Algorithms and Architectures for Parallel Processing, 1997. ICAPP 97. 1997 3rd International Conference on,?10-12 Dec. 1997, pp.155 ~ 162
    [42] T. Lv, B. Ozer, W. Wolf. Exploiting parallelism in media processing using VLIW processor. Image Processing, 2003. Proceedings. 2003 International Conference on,?Volume: 3,?14-17 Sept. 2003, pp.97~100
    [43] S. Kyo, T. Koga, S. Okazaki, et al. A 51.2-GOPS scalable video recognition processor for intelligent cruise control based on a linear array of 128 four-way VLIW processing elements. Solid-State Circuits, IEEE Journal of,?Volume: 38,?Issue: 11,?Nov. 2003, pp.1992 ~ 2000
    [44]?F. Campi, M.Toma, A. Lodi, et al. A VLIW processor with reconfigurable instruction set for embedded applications. Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC. 2003 IEEE International,?2003, pp.250 ~ 491
    [45] W. Gass. Higher performance and lower power enhancements to VLIW architectures. Signal Processing Systems, 2001 IEEE Workshop on,?26-28 Sept. 2001, pp.157
    [46] A. Abnous, N. Bagherzadeh. Pipelining and bypassing in a VLIW processor. Parallel and Distributed Systems, IEEE Transactions on,?Volume: 5,?Issue: 6,?June 1994, pp.658 ~ 664
    [47] B. Goossens, Duc Thang Vu. Multithreading to improve cycle width and CPI in
    
    
    superpipelined superscalar processors. Parallel Architectures, Algorithms, and Networks, 1996. Proceedings. Second International Symposium on,?12-14 June 1996, pp.36 ~ 42
    [48] D. Dobberpuhl. A 200MHz 64b Dual-Issue CMOS Microprocessor. IEEE Journal of Solid-State Circuits,Nov.1992, pp.126~130
    [49] E. Shriver,D. Hall,N. Nassif,et al. Timing verification of the 21 264: A 600 MHz full-custom microprocessor. Proc. 1998 IEEE Int. Conf. Computer Design,Oct.1998,pp. 96~103
    [50] D. Bailey, B. Benschneider. Clocking design and analysis for a 600-MHz Alpha microprocessor. IEEE J. Solid-State Circuits,Nov. 1998, vol. 33,pp. 1627~1633,
    [51] K. Shimamura,S. Tanaka,T. Shimomura,et al. A Superscalar RISC Processor with Pseudo Vector Processing Feature. Proc. of International Conference on Computer Design ‘95,pp. 102 ~ 109
    [52] C. Zheng, C. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. Computer,?March 2000, Volume: 33,?Issue: 3,?pp.47 ~ 52
    [53] R.B. Lee, A.M. Fiskiran, Zhijie Shi, et al. Refining instruction set architecture for high-performance multimedia processing in constrained environments. Application-Specific Systems, Architectures and Processors, 2002. Proceedings. The IEEE International Conference on,?17-19 July 2002, pp.253 ~ 264
    [54] M.Gschwind, V. Salapura, D. Maurer. FPGA prototyping of a RISC processor core for embedded applications, IEEE Transactions on Very Large Scale Integration (VLSI) Systems,?April 2001,Volume: 9,?Issue: 2,?pp.241 ~ 250
    [55] A. Halambi, A. Shrivastava, P. Biswas et al. An efficient compiler technique for code size reduction using reduced bit-width ISAs. Design, Automation and Test in Europe Conference and Exhibition, 2002. Proceedings ,?4-8 March 2002, pp.402 - 408
    [56] 李勇,裘式纲,王凤学等.计算机原理与设计(修订本).长沙:国防科技大学出版社,2002.8. 198~204
    [57] William Stalling,Computer Organizations and Architecture: designing for performance,4th edition,Ohio: Prentice-Hall International,Inc., 2002.6.263~267
    [58] David A. Patterson,John L. Hennessy,Computer Architecture: A Quantitative Approach,Second Edition,New York:Morgan Kaufmann Publishers, 2001.3.67~71
    [59] Christian Piguet,Jead-Mare Masgonty. Low-Power Design of 8-bit Embedded CoolRISC
    
    
    Microcontroller Cores,IEEE Journal of Solid-State Circuits,Sep.1997, vol.44,pp.172~180
    [60] L. Clark. An embedded 32-b microprocessor core for low-powerand high-performance applications. IEEE J. Solid-State Circuits,Nov. 2001,vol.36,pp. 1599~1608
    [61] J. Montanaro. A 160-MHz 32-b 0.5-W CMOS RISC microprocessor. IEEE J. Solid-State Circuits,Nov. 1996, vol. 31,pp. 1703~1714
    [62] S. Narita, K. Ishibashi, S. Tachibana, et al. A low-power single-chip microprocessor with multiple page-size IMMU for nomadic computing. VLSI Circuits, 1995. Digest of Technical Papers., 1995 Symposium on ,?8-10 June 1995, pp.59 ~ 60
    [63] J. M. Chang,E. F. Gehringer. A high-performance memory allocator for object-oriented systems. IEEE Transaction on Computers,March 1996,Vol. 45,No. 3,pp. 357~366
    [64] W. Srisa-an,C. D. Lo, J. M. Chang. A Performance Analysis of the Active Memory Module (AUM). Proceedings of IEEE International Conference on Computer Design,Austin,Texas,Sep. 23-26,2001,pp.493 ~ 496
    [65] J. M. Chang, K. Agun. Designing Reusable Components in VHDL. Proceedings of 13th IEEE International ASIC/SOC Conference,Washington,D.C.,Sept. 13-16,2000, pp. 165-169
    [66] K.Ghose and M.B.Kanlle,Reducing power in superscalar processor caches using subbanking,multiple line buffers and bit-line segmentation. IEEE symposium on low power electronics,1999,pp.70
    [67] U. Ko, P.T. Balsara, A.K. Nanda. Energy optimization of multilevel cache architectures for RISC and CISC processors. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on,?June 1998, Volume: 6,?Issue: 2,?pp.299 ~ 308
    [68] H.-J. Stolberg, M. Ikekawa, I. Kuroda. Code positioning to reduce instruction cache misses in signal processing applications on multimedia RISC processors. Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on,?21-24 April 1997, Volume: 1,?pp.699 ~ 702
    [69] P.P. Chu, R. Gottipati. Write buffer design for on-chip cache. Computer Design: VLSI in Computers and Processors. 1994. ICCD '94. Proceedings., IEEE International Conference on,?10-12 Oct. 1994, pp.311 ~ 316
    [70] K. Au, P. Chang, C. Giles, et al. MiniRISC CW4001-a small, low-power MIPS CPU core.
    
    
    Custom Integrated Circuits Conference, 1995., Proceedings of the IEEE 1995,?1-4 May 1995, pp.577 ~ 580
    [71] A. Epstein, C. Boulin. General purpose RISC based unit: a building block for fast data acquisition systems. Real Time Conference, 1999. Santa Fe 1999. 11th IEEE NPSS,?14-18 June 1999, pp.123 ~ 125
    [72] P.V.C. Caironi, L. Mezzalira, M. Sami. Context reorder buffer: an architectural support for real-time processing on RISC architectures. Real-Time Systems, 1996., Proceedings of the Eighth Euromicro Workshop on,?12-14 June 1996, pp.262 ~ 270
    [73] T. Scholz, M. Schafers. An improved dynamic register array concept for high-performance RISC processors. System Sciences, 1995. Proceedings of the Twenty-Eighth Hawaii International Conference on,?3-6 Jan. 1995,Volume: 1,?pp.181 ~ 190
    [74] T. Werner, V. Akella. Asynchronous processor survey. Computer,?Nov. 1997, Volume: 30,?Issue: 11,?pp.67 ~ 76
    [75] IEEE Standard 1149.1-1990 Test Access Port and Boundary Scan Architecture, (ANSI/IEEE),IEEE,Piscataway,NJ,1990.
    [76] Gustavo R.Daniel Aga,Ovidiu Mosuc, J.M.Martins Ferreira. Debug and Test of Microcontroller Based Applications using the Boundary Scan Test Infrastructure. Student Forum within the IEEE International Symposium on Industrial Electronics,IEEE Industrial Electronics Society,1997, pp.238~247
    [77] Gustave R. Alves, J.M.Martins Ferreira. From design-for-test to design-for-debug-and-test: analysis of requirements and limitations for 1149.1. VLSI Test Symposium,Proceedings. 17th IEEE,1999, pp.117~124
    [78] S.Gary. The PowerPC 603 microprocessor. A low-power design for portable applications. Proc. COMPCON ’94,Feb.1994. pp.234~242
    [79] A. Poursepanu. The PowerPC 603 microprocessor: Performance analysis and design trade-offs. Proc. COMPCON’94,March 1994. pp.316~323
    [80] D.Liu, C.svensson. Power consumption estimation in CMOS VLSI chips. IEEE J.Solid-State Circ,June 1994, vol.29:663~670
    [81] 朱霞,高德远,樊晓桠等.一种复杂BIU的设计及其CPLD实现.计算机工程与应用,2001年第11期:10~12
    
    [82]边计年,薛宏熙.用VHDL设计电子线路.北京:清华大学出版社,2000.198~204
    [83]Janick Bergeron. Writing Testbenches-Functional Verification of HDL models. NJ: Kluwer Academic Publishers, 2000.242~246