“银河飞腾”DSP的ALU单元全定制设计优化
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
运算部件作为数据通路的重要组成部分,是数字信号处理器的核心,对芯片的性能、面积和功耗都有很重要的影响。本文的主要目的就是探讨如何对运算部件进行优化设计。
     在“银河飞腾”DSP芯片中,针对ALU运算单元的定点和浮点指令的运算过程,提出以下优化方案,对其第一站的一个41位加法器、第二站的一个56位移位器和第三站的一个56位移位器采用全定制方法设计。
     在41位加法器设计中,研究了各种快速加法器的算法,采用速度最快的K-S算法进行全定制逻辑设计和版图设计。对版图提取寄生参数网表通过SPICE模拟结果表明,典型条件下,关键路径延迟0.955ns,面积12280μm~2。
     在56位桶形移位器设计中,在比较了各种译码移位方式之后,采用了速度较快,传输较稳定的2级3-8混合译码的结构。采用单传输管作为移位阵列,无源阵列大幅度节省了功耗。对版图提取寄生参数网表通过SPICE模拟结果表明,典型条件下,关键路径延迟(译码到数据输出)0.734ns,面积8152μm~2。
     本文还针对两个全定制设计模块提出完整的流片测试方案,采用扫描测试的方法借助FPGA-PCB板,使用最小的硬件成本进行功能和性能上的测试。此外还对全定制模块建立视图,嵌入到ALU运算单元中进行综合并且进行物理设计。采用了全定制模块设计在三个流水站上的时序都有0.3ns左右的提高,很好地达到了优化目的。设计过程表明,采用全定制和半定制相结合的设计比单纯的半定制设计在时序、面积和功耗上都有明显的改进。
Operation unit as an important part of datapaths, is the core of DSP, has great influence on chip's performance, area and power consumption. The aim of this dissertation is to discuss how to optimize the operation units.
     Due to the process of L unit's pointing and floating operation in "YHFT"-DSP, full-custom designing method is introduced to optimize the design of ALU units. The optimization parts are composed of the first stage's 41-bit adder, the second stage's 56-bit shifter and the third stage's 56-bit shifter.
     In the design of adder, base on the studying of many fast adder, we adopt K-S tree which is the most fast adder algorithm. After the logic design and layout design, simulate the R/C extraction data with Hspice. In the typical condition, the longest delay is 0.955ns, the area is 12280μm~2.
     In the design of barrel shifter, comparing with kinds of decode and shifte structure, we take the 2 stage hybrid-decoding method that has fast speed and good stabilization. Nmos-only shifter array can reduce power consumption saliently. Simulate the R/C extraction data with Hspice. In the typical condition, the longest delay(from decode to out) is 0.734ns, the area is 8152μm~2.
     An entire test process after manufacturing has been discussed in the dissertation. Depending up on FPGA, we could easily test the chip's function and capability(speed) with scan test method. Furthermore, we set the view of full-custom module. Then, embed them in the whole semi-custom design flow from synthesis to detailed route. As a result, the three stages have about 0.3ns's improvement in timing. It is clear that the combination of full-custom and semi-custom design is more excellent in performance, area and power consumption than pure semi-custom design.
引文
[1]李芳慧,王飞,何佩琨,TMS320C6000系列DSPs的原理与应用,北京,电子工业出版社,2005
    [2]Jan M.Rabey,Digital Integrated Circuits,2004
    [3]John P.Uyemura著,周润德译,超大规模集成电路与系统导论 Introduction to VLSI Cicuits and Systems,电子工业出版社,2004
    [4]Michael John Sebastian Smith著,虞惠华,汤庭鳌,来金梅,孙承绥等译,专用集成电路 Application-Specific Integrated Circuits,电子工业出版社,2004
    [5]国防科技大学计算机学院YHFT-D4课题组,32位高性能嵌入式数字信号处理器“银河飞腾-D4”芯片技术研究,2004
    [6]张晨曦,王志英,张春元,戴葵,朱海滨,计算机体系结构,北京,高等教育出版社,2000
    [7]Sanu K.Mathew,Mark A.Anders,Brad Bloechel,Trang Nguyen,Ram K.Krishnamurthy,Shekhar Borkar,A 4-GHz 300-mW 64-bit Integer Execution ALU With Dual Supply Voltages in 90-nm CMOS.IEEE JOURNAL OF SOLID-STATE CIRCUITS,VOL.40,NO.1,JANUARY 2005
    [8]Matthew Ziegler,Mircea Stan,Optimal Logarithmic Adder Structures With A Fanout Of Two For Minimizing The Area-Delay Product,ECE Department-University of Virginia-Charlottesville,Virginia,22904
    [9]雷从华,李振涛,李少青,32位全静态加法器的电路优化,第九届计算机工程与工艺全国学术年会,2005
    [10]孙旭光,毛志刚,来逢昌,改进结构的64位CMOS并行加法器设计与实现,半导体学报2003年02期
    [11]David Harris,Ivan Sutherland,Logical effort of carry propagate adders,Signals,Systems and Computers,2004
    [12]Gurkayna,F.K,Leblebicit Y,Chaouati L,McGuinness,P.J,Higher radix Kogge-Stone parallel prefix adder architectures,Circuits and Systems,2000.Proceedings,ISCAS 2000 Geneva,The 2000 IEEE International Symposium.May 2000
    [13]Sanu Mathew,Mark Anders,Ram K.Krishnamurthy,Shekhar Borkar,A 4-GHz 130-nm Address Generation Unit With 32-bit Sparse-Tree Adder Core,IEEE JOURNAL OF SOLID-STATE CIRCUITS,VOL.38,NO.5,MAY 2003
    [14]Wei-Li Chen,Steve Fang,and Ho-Yan Wong,Design and Evaluation of Prefix Adders.University of California,Los Angeles 2004
    [15]徐慧,刘祥远,陈书明,桶形移位器的三种电路设计与比较,高技术通讯增刊,2004
    [16]Sun Huajin,Gao Deyuan,Zhang Shengbing,Wang Danghui,Design fast round robin scheduler in FPGA,Communications,Circuits and Systems and West Sino Exoositions,IEEE 2002 International Conference July 2002
    [17]胡剑,沈绪榜,部分译码方式桶式移位器及其VHDL实现,微电子学与计算机,2003
    [18]喻仁峰,陈怒兴,张民选,16×2/32桶形移位器设计与优化,第九届计算机工程与工艺全国学术年会,2005
    [19]T.Thomson and H.Tam,Barrel Shifter,U.S.Patent 5,652,718,July 1997
    [20]Christopher Saint,Judy Saint,集成电路版图设计,清华大学出版社,2004.1
    [21]Christopher Saint,Judy Saint,集成电路版图基础,清华大学出版社,2004.4
    [22]李振涛,陈书明,预防闩锁效应的CMOS版图设计技术,国防科技大学首届计算机学术活动周论文集,2000
    [23]李兆亮,李振涛,邢座程,16位乘法器全定制,第九届计算机工程与工艺全国学术年会,2005
    [24]Michael L,Bushnell,Vishwani D,Agrawal著,蒋安平,冯建华,王新安等译,超大规模集成电路测试 Essentials of Electronic Testing,电子工业出版社,2005
    [25]David A.Hodges,Horace G.Jackson,Resve A.Saleh,Analysis and Design of Digital Integrated Circuits-In Deep Submicron Technology,Third Edition,2004
    [26]王诚,薛小刚,钟信潮,FPGA/CPLD设计工具Xilinx ISE使用详解,人民邮电出版社
    [27]Star-HSPICE Manual,Release 2001.2,June 2001
    [28]Star-Sim XT Option User Guide,Release 2000.4,December 2000
    [29]VCS User Guide,Version6.1 February2002
    [30]Astro User Guide,Synopsys Corp.2003.03
    [31]Michael Keating Pierre Bricaud 著,罗雯,张欣等译,片上系统-可重用设计方法学(第三版)Reuse Methodology Manual for System-on-a-chip Design,Third Edition,电子工业出版社,2004
    [32]J.D.Warnock,J.M.Keaty,J.Petrovick,J G.Clabes,C.J.Kircher,B.L.Krauter,P.J.Restle,B.A.Zoric,C.J.Anderson,The circuit and physical design of the POWER4 microprocessor.IBM J.RES.& DEV.VOL.46 NO.1 JANUARY 2002
    [33]www.edacn.net

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700