GHz级64位整数算术逻辑运算部件优化设计
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
微电子技术飞速进步,工艺特征尺寸已经减小到了130纳米以下,65纳米工艺已成为成熟工艺。基于集成电路工艺技术的提升,微处理器不断地更新换代,性能迅速提高。微处理器要达到高的速度,算术逻辑部件的速度必须足够快。
     本文设计实现的64位1GHz整数算术逻辑部件是X流处理器中的重要运算部件之一,主体半定制实现,关键路径上的关键部件采用全定制设计实现,在没有增加过多设计时间和工作量的前提下,使设计性能从原来的500MHz提高到了1GHz。并且较好地解决了设计规模大与设计性能不高的矛盾,具有广泛的应用价值和重要的实践意义。论文的主要工作包括:
     一、优化设计实现64位GHz级整数算术逻辑运算部件,采用130纳米工艺,半定制与全定制混合设计,半定制设计部分组合逻辑综合延时550ps以下,采用静态互补CMOS电路结构的全定制64位加法器版图后模拟延时730ps,采用静态传输门阵列结构的全定制64位漏斗移位网络版图后模拟延时270ps,均达到设计要求。
     二、研究了高速逻辑优化设计方法,对高速逻辑优化设计流程做了描述,对逻辑级数确定、电路结构选择、前后端设计交互、全定制设计等方面提出了补充建议,对设计中需要注意的问题进行了总结,给出了解决办法。并在64位GHz整数算术逻辑运算部件的优化设计中进行了实践。
     三、对层次化全定制设计和验证进行了深入研究,从设计、优化、验证三个方面层次化设计全定制模块,在电路功能验证使用形式化静态验证方法验证电路功能与设计需求一致,版图后时序模拟验证使用静态时序分析的方法辅助确定全定制设计关键路径。在工程中实践了层次化全定制设计流程,提高了验证效率,加快了全定制设计周期。
As the technology is making progress rapidly in microelectronics, the feature size has reduced below 130nm, and the 65nm process has been in practical use. Thanks to the continuous improvement of integrated circuit technology, microprocessors are developing generation after generation, and the performance is improved rapidly. The speed of Arithmetic Logic Unit must be fast enough to design high performance microprocessors.
     In this thesis, a 64-bit GHz integer ALU is designed for steam processor X. A Semi-custom methodology based on standard cell is employed for the design of the main part, while the optimization of critical data-path is designed with a full-custom method. Without consuming too much time and labor, this design method improves the performance of the ALU from 500MHz to 1GHz, and it also solves the contradiction between large design scale and high performance in a better way. This 1GHz ALU design is practically applicable in a wide range of circumstance. This thesis mainly contributes to the following aspect:
     1 Optimization of the 64-bit 1GHz integer ALU, using a 130nm process, both the semi-custom and full-custom design parts meet with requirements. In typical case, delay of the semi-custom design parts is below 550ps. In full-custom parts, the layout of critical paths of 64-bit adder with static CMOS circuit have a delay of about 730ps, 270ps is the largest delay in the network of 64-bit funnel shifter which is constructed with static transmission gate arry.
     2 The high-speed logic design and optimization methodology is well studied, and some promotion are proposed in the design flow. Decision of logic levels, practical advices are brought out for circuit architecture chosen, parasitic feedback into tuning, full-custom design, and solutions for frequent problems in ALU design are presented. Finally, the optimization design methodology is made a practice in the optimization of 64-bit GHz integer ALU.
     3 A hierarchical full-custom design and verification methodology is well studied, which carry out the full-custom design and verification hierarchically in three aspects: design, optimization and verification. Static formal verification method was used to verify circuit function and RTL description. Static timing analysis after layout is used to help finding out the critical path in the design. The methodology improves the verification efficiency and speed up the full-custom design cycle.
引文
[1]张报昌.64位微处理器体系结构发展回顾和展望(上).2002年全国计算机体系结构学术会议技术报告,2002
    [2]2003 Workshop on Streaming Systems,http://catfish.csail.mit.edu/wss03/
    [3]AJ KleinOsowski,John Flynn,Nancy Meares,and David J.Lilja,Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research.In Workshop on Workload Characterization,International Conference on Computer Design(ICCD),September 2000
    [4]MScott Rixner,Stream Processor Architecture,Kluwer Academic Publishers.Boston,MA,2001
    [5]孙旭光.高性能算术运算单元的研究与实现:博士学位论文.哈尔滨:哈尔滨工业大学,2003
    [6]http://www.smics.com/website/cnVersion/Technology/90nm.htm,中芯国际网站
    [7]http://www.ict.ac.cn/loongson/longxin/detail3462.shtml,中国科学院计算技术研究所网站:龙芯专题
    [8]D.C.Chen et al.An Integrated System for Rapid Prototyping of High Performance Algorithm Specific Data Paths.International Conference on Application Specific Array Processor.Berkeley,1992:134-148
    [9]S.Majerski.On Determination of Optimal Distribution of Carry Skips in Adders.IEEE Transactions on Computers.1967,16(1):45-58
    [10]E.R.Bams,V.G.Oklobdzija.New Multilevel Scheme for Fast Carry-Skip Addition
    [11]A.Guyot,B.Hochet,J.M.Muller.A Way to Build Efficient Carry-Skip Adders.IEEE Transactions on Computers.1987,36(4):1144-1151
    [12]S.Turrini.Optimal Group Distribution in Carry-Skip Adders.IEEE International Symposium on Computer Arithmetic,1989
    [13]Huey Ling.High-Speed Binary Adder.IBM Journal of Research and Development.1981,25(3):156-165
    [14]R.P.Brent,H.T.Kung.A Regular Layout for Parallel Adders.IEEE Transactions on Computers.1982,31(3):260-264
    [15]T.P.Kelliher,R.M.Owens,M.J.Irwin,T.-T.Hwang.ELM-A Fast Addition Algorithm Discovered by a Program.IEEE Transactions on Computers.1992,41(9):1181-1192
    [16]P.Kogge,H.Stone.A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Relations.IEEE Trans.Computers,vol.C-22,no.8,pp.786 - 793,Aug.1973
    [17]S.Mathew,M.Anders,R.Krishnamurthy,S.Borkar.A 4-GHz 130-nm Address Generation Unit with 32-bit Sparse-tree Adder Core.IEEE J.Solid State Circuits,vol.38,no.5,pp.689-695,May.2003
    [18]S.K.Mathew,M.A.Anders,B.Bloechel,et al.A 4-GHz 300-mW 64-bit Integer Execution ALU With Dual Supply Voltages in 90-nm CMOS.IEEE Journal of Solid-state Circuits,vol.40,no.1,January 2005
    [19]S.-J.Lee,R.Woo,H.-J.Yoo.480ps 64-bit Race Logic Adder.IEEE Symposium on VLSI Circuits.Kyoto,2001:27-28
    [20]R.Woo,S.-J.Lee,H.-J.Yoo.A 670ps 64bit Dynamic Low-Power Adder Design.IEEE International Symposium on Circuits and Systems.East Lansing,2000:28-31
    [21]D.Stasiak,J.Tran,F.Mounes-Toussl,S.Storino.A 2~(nd) Genzeration 440ps SOl 64b Adder.IEEE International Conference on Solid-State Circuits.San Francisco,2000:288-289
    [22]D.Stasiak,F.Mounes-Toussi,S.N.Storino.A 440-ps 64-bit Adder in 1.5V/0.18um Partially Depleted SOI Technology.IEEE Journal of Solid-State Circuits.2001,36(10):1546-1552
    [23]Seongmoo Heo.A Low-Power 32 bit Datapath Design.Massachusetts Institute of Technology,Aug 2000
    [24]宋焕章、王保恒、张春元.计算机原理与设计-中央处理机.国防科技大学出版社,2000
    [25]徐慧、刘祥远、陈书明.桶形移位器的三种电路设计与比较.高技术通讯增刊,2004
    [26]罗恒、胡封林、赵振宇、吴虎成.微处理器中一种64位移位寄存器的逻辑设计与实现.中国计算机学会第八届计算机工程与工艺学术年会论文集,2004
    [27]M.R.Pillmeier,Barrel Shifter Design,Optimization,and Analysis,Master's thesis,Lehigh University,January 2002
    [28]Matthew R.Pillmeier Michael J.Schulte,E.George Walters Ⅲ.A Design alternatives for barrel shifters,2002
    [29]F.Worrell,Microprocessor Shifter using Rotation and Masking Operations,U.S.Patent 5,729,482,March 1998
    [30]R.Pereira,J.A.Mitchell,and J.M.Solana,Fully Pipelined TSPC Barrel Shifter for High-speed Appli-cations,IEEE Journal of Solid State Circuits,vol.30,pp.686-690,June 1995
    [31]G.M.Tharakan S.M.Kang,A New Design of a Fast Barrel Switch Network.IEEE JOURNAL of Solid-state Circuits,Vol.27.No.2,Feb.1992
    [32]Kevin P.Acken Mary Jane Irwin Robert M.Owens,Power Comparisons for Barrel Shifters,Department of Computer Science and Engineering,The Pennsylvania State University,1996
    [33]32-bit Cascadable Barrel Shifter,Logic Devices incorporated,2000
    [34]Moon Key Lee,Byeong Yoon Choi,Seoung Ho Lee,Kwang Yup Lee.ASIC design of a high performance RISC.Euro ASIC'92,Proceedings,1992
    [35]56 GRONOWSKI,PAUL E.,BOWHILL,WILLIAM J.,PRESTON,RONALD P.,GOWAN,MICHAEL K.,AND ALLMON,RANDY L.,High-Perfomance Microprocessor Design,IEEE Journal of Solid-state Circuits.Vol 33.No.5,May 1998,pp.676-686
    [36]Y.Shimazaki,R.Zlatanovici,B.Nikolic.A Shared-Well Dual-Supply-Voltage 64-bit ALU.IEEE Journal of Solid-state Circuits,vol.39,no.3,March 2004
    [37]I.E.Sutherland,R.F.Sproull,D.Harris.Logical Effort:Designing Fast CMOS Circuits.Morgan Kaufmann Publisher,San Francisco,1999
    [38]H.Yamada,T.Hotta and et al,A 13.3ns double-precision floating-point ALU and multiplier,in:Proceedings of the International Conference on Computer Design,466-470,1995
    [39]W.Belluomini et al,Limited switch dynamic logic circuits for high-speed low-power circuit design,IBM Journal of Research Development,50(2/3):277-286,2006
    [40]Qi-Wei Kuo et al,Substrate-Bias Optimized 0.18um 2.5GHz 32-bit Adder with Post-Manufacture Tunable Clock,in:VLSI-TSA-DAT,2005
    [41]A.Alvandpour and et al,A conditional keeper technique for sub-0.13um wide dynamic gates,in:Proc.IEEE Symp.VLSI Circuits,29-30,2001
    [42]Process monitor based keeper scheme for dynamic circuits,U.S.Patent 6,894,528,May 2005
    [43]Juergen Koehl,Ulrich Baur,Thomas Ludwig,Bernhard Kick,Thomas Pflueger.A Flat,Timing-Driven Design System for a High-Performance CMOS Processor Chipset.D,2005
    [44]Gregory A.Northrop,Pong-Fei Lu.A Semi-Custom Design Flow in High-Performance Microprocessor Design.Proceeding of 38th Design Automation Conference,2001
    [45]Carina Ben-Zvi,Patrick J.McGuinness,Franklin Lassandro.An Effective Datapath Design Methodology for High-frequency Design.Computer Design:VLSI in Computers and processors,1998
    [46]Brinkley Sprunt.Pentium 4 Performance Monitoring Features.IEEE Computer Society Press,2002
    [47]Sapumal Wijeratne,Nanda Siddaiah,Sanu Mathewm,Mark Anders,Ram Krishnamurthy,Jeremy Anderson,Seung Hwang,Matthew Ernest,Mark Nardin,A 9GHz 65nm Intel Pentium 4 Processor Integer Execution Core,ISSCC,2006
    [48]David A.Hodges,Horace G.Jackson,Resve A.Saleh.数字集成电路分析与设计—深亚微米工艺(第三版),电子工业出版社,2005
    [49]Michael Keating,Pierre Bricaud.Reuse Methodology Manual —for System-on-a-Chip Design,Third Edition,2004
    [50]姚亚峰,陈建文,黄载禄.ASIC设计技术及其发展研究.中国集成电路,2006(10)
    [51]Jan M.Rabaey,Anantha Chandrakasan,Borivoje Nikolic.Digital Integrated Circuits:A Design Perspective,Second Edition,2004
    [52]何小虎,胡庆生,肖洁.深亚微米下ASIC后端设计及实例.中国集成电路,2006(8).
    [53]I Koren.Computer Arithmetic Algorithms(2nd edition).A.K.Peters,Ltd.,Natick,MA,2002
    [54]B.Parhami.Computer Arithmetic:Algorithms and Hardware Designs.Oxford Univ.Press,2000
    [55]M.Ergecovac and T.Lang.Digital Arithmetic.Morgan-Kauffman,2003
    [56]A.R.Omondi.Computer Arithmetic Systems.ISBN 0-13-334301-4,1994
    [57]Synopsys Inc.PrimeTime User Guide:Fundamentals.Synopsys Inc,2005
    [58]Synopsys Inc.PrimeTime User Guide:Advanced Timing Analysis.Synopsys Inc,2005
    [59]舒适,唐长文,闵吴.ASIC综合后的静态验证方法的研究.微电子学,Vol.34,No.1,2004
    [60]陈麒旭.静态时序分析基础及应用(上).CIC eNews,vol.36,www.cic.org.tw,2004.
    [61]孙岩,张民选.一种稀疏树结构的64位并行前缀加法器.微电子学,2005
    [62]Signal Storm Library Characterizer User Guide,Version V2006.09.Cadence Open Book,2006
    [63]刘龙.高性能通用寄存器文件的设计与全定制实现:硕士学位论文.长沙:国防科技大学计算机学院,2004
    [64]孙岩,高性能算术逻辑部件研究与全定制设计:硕士学位论文.长沙:国防科学技术大学,2005
    [65]I.E.Sutherland,R.F.Sproull,D.Harris.Logical Effort:Designing Fast CMOS Circuits.Morgan Kaufmann Publisher,San Francisco,1999
    [66]David A.Hodges,Horace G.Jackson,Resve A.Saleh.Analysis and Design of Digital Integrate Circuits In Deep Submicron Technology(Third Edition),Tsinghua University Press,Aug.2004
    [67]Synopsys Inc.Formality User Guide:Fundamentals.Synopsys Inc.2005
    [68]William J.Dally,Andrew Chang.The Role of Custom Design in ASIC Chips.Proceedings of the 37~(th) Design Automation Conference,2000.