高性能DSP关键电路及EDA技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数字信号处理器(Digital Signal Processor:DSP)是一种专用于数字信号处理的嵌入式处理器,拥有强大的运算能力。在无线通信、多媒体、便携式数字终端、医疗设备、计算机网络、雷达和精确制导武器等领域有广泛应用。改进电路设计是提高DSP性能的有效方法,改进电路设计包括采用先进的电路技术和先进的电路设计方法,而EDA技术是设计方法中非常关键的一个要素。本文以600MHz YHFT-DSP/MHM数据通路的设计为背景,从关键电路和EDA技术两个方面入手,对高性能DSP的电路设计技术进行了深入研究,并取得了以下成果:
     1.为了降低动态电路的功耗,避免或减少同步开销,提出了有限动态电路设计方法。阐述了有限动态电路设计方法的基本思想,结合一个32位加法器环路的设计,系统介绍了有限动态电路设计方法的关键技术:动态电路的选择与设计,时钟设计,延迟预充技术,双态电路和抗噪声设计方法。实验结果表明:有限动态电路的速度略优于完全动态电路,功耗降低了52.78%。
     2.针对13读/9写寄存器文件的设计,提出了端口复用技术,将寄存器单元端口的数目和译码器的数目均减少了7个,并完成了与端口复用相关电路的设计。提出了沟道增长Dual-Vt位线技术,通过增加Dual-Vt位线结构中高阈值器件的栅长,获得了更快的速度,并改善了电路的噪声特性。实验结果表明,在90nm工艺中,沟道增长Dual-Vt位线结构的主要指标均优于伪静态位线结构,与LBSF位线结构相比功耗降低了28.5%,漏电流降低了99.78%,面积增加了9.5%。
     3.基于二位Booth乘法,提出了16位混合乘法器的算法。与同类型研究相比部分积减少了6个,面积、延时和功耗的改善均超过了20%。在180nm工艺下完成了乘法器的全定制设计优化和测试芯片的设计,提出并实现了一种通用、灵活、低成本的模块级电路测试方案。测试结果表明,芯片的工作频率在SIMD模式下高于475.2 MHz,在普通模式下介于404.8MHz和475.2MHz之间。
     4.完成了全定制电路功能模型提取关键算法的研究,实现了一个功能模型提取工具TranSpirit。实验结果表明TranSpirit具有很高的效率,能够满足模块级全定制设计功能验证的要求。
     5.阐述了晶体管级混合时序分析方法的基本思想和流程,提出了考虑MIS效应的最大延时和最小延时测试波形生成算法,实现了一个晶体管级混合时序分析工具SpiceTime。与Hspice相比,SpiceTime具有更高的分析效率,而且最大延时的误差不超过2.89%,最小延时的误差不超过7%。
     6.研究了有限动态电路时序验证方法。基于四事件周期模型,研究并总结了HI-CMOS、LO-CMOS、NTP动态门和N-C~2MOS锁存器正确工作所需要满足的时序约束;率先将混合时序分析方法应用于动态电路的延时计算,提出了动态门延时测试波形的生成算法。有限动态电路的时序验证方法已经在SpiceTime中得到了实现,并且应用于32位加法器环路的设计验证。该方法提高了设计效率,帮助发现了设计中存在的问题。如果不考虑伪路径的影响,求值方向和预充方向延时的最大误差分别为3.62%和8.26%。
     本文的研究为YHFF-DSP/MHM数据通路的设计提供了可行的设计方案,为进一步研究如何提高DSP的电路设计技术奠定了坚实的基础。
Digital signal processors (DSPs), a class of embedded microprocessors optimized for digital signal processing, have become a key component in many multimedia appliances, communication devices, medical instruments, and industrial products. Improving circuit designs, such as employing advanced circuits and new circuit design methodology, are vital to enhance the performance of DSPs. EDA techniques play an important role in design methodology. The circuit design techniques of high performance DSPs are researched in this thesis, with an emphasis on key circuits and EDA techniques. The main contributions are as follows:
     1. To reduce the power of dynamic circuits, limited dynamic circuit design methodology has been proposed. Combining with the design of a 32-bit adder loop, the key techniques of this methodology are introduced, such as selection and design of dynamic circuits, clock design, delayed precharge, dual-mode circuits, and noise control method. Simulation results showed that dynamic power has been reduced by 52.78% when compared to fully dynamic circuits.
     2. Port multiplex techniques has been proposed in the design of a register file with 13 read ports and 9 write ports. The number of decoders and the ports of register cells have been both reduced by seven. Channel-enhanced Dual-Vt bitline has been proposed to improve the noise stability and to reduce leakage power of the dynamic read bitlines. For 90nm CMOS process, the proposed bitline has great advantage over the pseudostatic bitline. Compared to LBSF bitline, dynamic power is reduced by 28.5% and the leakage power is reduced by 2-3 orders with an area overhead of 9.5%.
     3. An algorithm of a 16-bit hybrid multiplier has been proposed. The proposed algorithm is based on the radix-4 modified Booth's algorithm. The algorithm generates ten partial products and one modifier, which is six less than the other algorithms. The delay, power and area are all reduced by more than 20%. The multiplier has been designed in 180nm CMOS process by full custom design and a test chip has been fabricated. The test results showed the multiplier works well at 404.8MHz in normal mode, and 475.2MHz in SIMD mode.
     4. Key algorithms for functional model extraction of transistor-level circuits have been proposed. A transistor-level functional model extractor named TranSpirit has been coded in C/C++. The experimental results showed TranSpirit has high speed and can meet the functional verification requirements of unit-level full custom designs.
     5. Key techniques of transistor-level hybrid timing analysis methods have been studied. Test waveform generation algorithms for max and min delay considering multiple inputs simultaneous switching have been proposed. A transistor-level hybrid timing analysis tool named SpiceTime is coded in C/C++. The errors of max and min delay of SpiceTime are less than 2.89% and 7% respectively when compared to Hspice. Experimental results showed that SpiceTime has higher efficiency than Hspice. The effect of multiple inputs simultaneous switching on path delay has also been studied.
     6. Timing verification of limited dynamic circuits has been studied. Based on the four-event periodic waveform models, the timing constraints for HI-CMOS, LO-CMOS, NTP dynamic circuits and N-C~2MOS have been constructed. Hybrid timing analysis method has been applied to calculate the delay of dynamic circuits for the first time. Delay test waveform generation algorithms for dynamic circuits have been proposed. The algorithms have been implemented in SpiceTime and applied to the design of a 32-bit adder based on limited dynamic circuits. It helped to increase the design efficiency and find several design problems. Without false paths, the errors of evaluation delay and precharge delay are within 3.62% and 8.26% respectively.
     The research in this dissertation provides a practical solution for implementing the datapath of YHFT-DSP/MHM. The research lays a solid foundation for further investigation on high performance DSP designs.
引文
[1]陈书明,李振涛等,“银河飞腾”高性能数字信号处理器研究进展,计算机研究与发展,43(6):149-165,2006年6月
    [2]Texas Instruments,TMS320C6000 CPU and Instruction Set Reference Guide,October 2000
    [3]Analog Devices,TigerSHARC Embeded Processor,2006
    [4]Texas Instruments,DSP Selection Guide,2007
    [5]A BDTI Analysis of the Analog Devices ADSP-BF5xx,Tech.rep.,Berkeley Design Technology,Inc,2004
    [6]Jennifer Eyre and Jeff Bier,The Evolution of DSP Processors,Tech.rep.,Berkeley Design Technology,Inc,2000
    [7]Freescale Semiconductor,MSC8126 Technical Data Sheet,2006
    [8]Andrew Duller,Daniel Towner and et al,picoArray technology:the tool's story,in:Proceedings of the conference on Design,Automation and Test in Europe,2005
    [9]DSPs Adapt to New Challenges,Tech.rep.,Berkeley Design Technology,Inc,2003
    [10]A BDTI Analysis of the Texas Instruments TMS320C64x,Tech.rep.,Berkeley Design Technology,Inc,2004
    [11]Yehuda Adelman and et al,A 600MHZ DSP with 24Mb Embedded DRAM with an Enhanced Instruction Set for Wireless Communication,in:IEEE Int.Solid-State Circuit Conference,23.1,2004
    [12]Texas Instruments,TMS320C6455 Technical Reference,August 2005
    [13]Sanjive Agarwala and et al,A 600-MHz VLIW DSP,IEEE Journal of Solid-State Circuits,37(11):1532-1544,2002
    [14]Sanjive Agarwala and et al,A 600 MHz VLIW DSP,in:IEEE Int.Solid-State Circuit Conference,56-57,2002
    [15]S.Moch,M.Berekovic and et al,HIBRID-SOC:a multi-core architecture for image and video applications,SIGARCH Comput.Archit.News,32(3):55-61,2004
    [16]Kaijian Shi and et al,Hierarchical Timing Closure Methodology For OMAP:An Open Multimedia Application Platform,238-241,2003
    [17]万江华,基于超长指令字处理器的同时多线程关键技术研究,博士学位论文,国防科学技术大学,2006
    [18]ADI与IBM合作开发新一代高性能DSP,http://ic.sjtu.edu.cn/vocation/hydt/adiandibmhz.asp
    [19]Kaijian Shi and Graig Godwin,Hybrid Hierarchical Timing Closure Methodology for a High Performance and Low Power DSP,in:Proceedings of ACM/IEEE DAC,850-855,2003
    [20]William J.Dally and Andrew Chang,The Role of Custom Design in ASIC Chips,in:Proceedings of ACM/IEEE DAC,643-647,2000
    [21]Andrew Chang and William J.Dally,Explaining the Gap Between ASIC and Custom Power:A Custom Perspective,in:Proceedings of ACM/IEEE DAC,281-284,2005
    [22]D.Pham and et al,The Design Methodology and Implementation of a First-Generation CELL Processor:A Multi-Core SoC,in:IEEE Custom Integrated Circuits Conference,45-50,2005
    [23]T.Beacom and et al,Fine-Grained Power Managed Dual-Thread Vector Scalar Unit for the First-Generation CELL Processor,in:IEEE Custom Integrated Circuits Conference,235-238,2005
    [24]Dac C.Pham and et al,Overview of the Architecture,Circuit Design,and Physical Implementation of a First Generation Cell Processor,IEEE Journal of Solid State Circuits,2006
    [25]James Warnock and et al,Circuit Design Techniques for a First-Generation Cell Broadband Engine Processor,IEEE Journal of Solid-State Circuits,41(8):1692-1706,2006
    [26]李振涛,寄存器文件全定制详细设计,技术报告,国防科学技术大学计算机学院,2003
    [27]P.J.Restle,Technical Visualizations in VLSI Design,in:Proceedings of ACM/IEEE DAC,31.1,2001
    [28]H.H.Chen and J.S.Neely,Interconnect and Circuit Modeling Techniques for Full-Chip Power Supply Noise Analysis,IEEE Trans.Compon.Packag.,Manuf.Technol.,21(3):209,1998
    [29]J.Scott Neely,Howard H.Chen and et al,CPAM:A Common Power Analysis Methodology for High-Performance VLSI Design,in:Proceedings of the 9th Topical Meeting on the Electrical Performance of Electronic Packaging,303-306,2000
    [30]Kenneth L.Shepard,Vinod Narayanan and Ron Rose,Harmony:Static Noise Analysis of Deep Submicron Digital Integrated Circuits,IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,18(8):1132-1150,1999
    [31] J. D. Wamock, J. M. Keaty, J. Petrovick and et al, The circuit and physical design of the P0WER4 microprocessor, IBM Journal of Research Development, 46(1): 27 - 51,2002
    
    [32] P. J. Camporese, A. Deutsch and et al, X-Y Grid Tree Tuning Method, U.S. Patent 6,205,571, March 2001
    
    [33] P. J. Restle, T. G. McNamara and et al, A Clock Distribution Method for Microprocessors, IEEE Journal of Solid-State Circuits, 36(5): 792 - 799, 2001
    
    [34] D.H.Allen and et al, Custom circuit design as a driver of microprocessor performance,IBM Journal of Research Development, 44(6): 799 - 822, November 2000
    
    [35] http://www.research.ibm.com
    
    [36] A. R. Conn and et al, Gradient-Based Optimization of Custom Circuits Using a Static-Timing Formulation, in: Proceedings of ACM/IEEE DAC, 452 - 459, 1999
    
    [37] Andrew R. Conn and Chandu Visweswariah, Overview of Continuous Optimization Advances and Applications to Circuit Tuning, in: ISPD'01,74-81, 2001
    
    [38] Gregory A. Northrop and Pong-Fei Lu, A Semi-custom Design Flow in High-performance Microprocessor Design, in: Proceedings of ACM/IEEE DAC, 426 - 431,2001
    
    [39] R.M. Averill and et al, Chip integration methodology for the IBM S/390 G5 and G6 custom microprocessors, IBM Journal of Research Development, 43: 681 - 706,1999
    
    [40] M.A. Check and T.J. Slegel, Custom S/390 G5 and G6 microprocessors, IBM Journal of Research Development, 43: 671 - 680, 1999
    
    [41] Mahadevamurty Nemani and Vivek Tiwari, Macro-Driven Circuit Design Methodology for High-Performance Datapaths, in: Proceedings of ACM/IEEE DAC, 661 - 666,2000
    
    [42] V. Rao, J. Soreff and et al, EinsTLT: Transistor Level Timing with EinsTimer, in: Proc.Of Int. Workshop on Timing Issues (TAU), 1999
    
    [43] C. Visweswariah and R. A. Rohrer, Piecewise approximate circuit simulation, IEEE Transactions on Computer-Aided Design of ICs and Systems, 10(7): 861 - 870, 1991
    
    [44] C. Visweswariah and J. A. Wehbeh, Incremental event-driven simulation of digital FET circuits, in: Proceedings of ACM/IEEE DAC, 737 - 741, 1993
    
    [45] A. Kuehlmann, A. Srinivasan and D. P. Lapotin, Verity-A formal verification program for custom CMOS circuits, IBM Journal of Research Development, 39(1/2): 149 -165, January/March 1995
    [46] Sandip Kundu, GateMaker: A Transistor to Gate Level Model Extractor for Simulation, Automatic Test Pattern Generation and Verification, in: Proceedings of the IEEE International Test Conference, 372 - 381, 1998
    
    [47] F.L. Heng, L. Liebmann and J. Lund, Application of Automated Design Migration to Alternating Phase Shift Mask Design, in: International Symposium on Physical Design, 38-43,2001
    
    [48] Samuel D. Naffziger et al, The Implementation of the Itanium 2 Microprocessor, IEEE Journal of Solid-State Circuits, 37(11): 1448 - 1460, 2002
    
    [49] Sapumal B. Wijeratne et al, A 9-GHz 65-nm Intel Pentium 4 Processor Integer Execution Unit, IEEE Journal of Solid-State Circuits, 42(1): 26 - 37, 2007
    
    [50] Alexandre Solomatnikov and et al, Skewed CMOS: Noise-Immune High-Performance Low-Power Static Circuit Family, in: Proceedings of the IEEE International Conference on Computer Design, 241 - 246, 2000
    
    [51] Naran Sirisantana, Aiqun Cao and et al, Selectively clocked skewed logic (SCSL):low-power logic style for high-performance applications, in: Proceedings of the international symposium on Low power electronics and design, 267 - 270, 2001
    
    [52] Klaus von Arnim, Peter Seegebrecht and et al, A Low-Leakage 2.5 GHz Skewed CMOS 32b Adder for Nanometer CMOS Technologies, in: IEEE Int. Solid-State Circuit Conference, 380 - 381, 2005
    
    [53] H. Yamada, T. Hotta and et al, A 13.3ns double-precision floating-point ALU and multiplier, in: Proceedings of the International Conference on Computer Design, 466-470,1995
    
    [54] Fumio Murabayashi and et al, 2.5 V CMOS Circuit Techniques for a 200 MHz Superscalar RISC Processor, IEEE Journal of Solid-State Circuits, 31(7): 972 - 980,1996
    
    [55] David Harris, Genevieve Breed, Matt Erler and David Diaz, Comparison of noise tolerant precharge (NTP) to conventional feedback keepers for dynamic logic, in: Proceedings of the 13th ACM Great Lakes Symposium on VLSI, 261 - 264, 2003
    
    [56] W. Belluomini et al, Limited switch dynamic logic circuits for high-speed low-power circuit design, IBM Journal of Research Development, 50(2/3): 277 - 286, 2006
    
    [57] Wendy Belluomini et al, An 8Ghz Floating-Point Multiply, in: IEEE International Solid-State Circuits Conference, 375 - 375, 2005
    
    [58] Jayakumaran Sivagnaname et al, Wide Limited Switch Dynamic Logic Circuit Implementations, in: Proceedings of the 19th International Conference on VLSI Design, 94- 99, 2006
    [59] Josephine Chang, Variability and Voltage Supply Scaling in Limited Switch Dynamic Logic, Tech. rep., EECS, UC Berkerly, 2006
    
    [60] Fabian Klass et al, A New Family of Semidynamic and Dynamic Flip-Flops with Embedded Logic for High-Performance Processors, IEEE Journal of Solid-State Circuits,34(5): 712-716, 1999
    
    [61] Ana Sonia Leon and et al, A Power-Efficient High-Throughput 32-Thread SPARC Processor, IEEE Journal of Solid-State Circuits, 42(1): 7 - 16, 2007
    
    [62] S. Tyagi and et al, A 130-nm generation logic technology featuring 70-nm transistors, dual-Vt transistors and 6 layers of Cu interconnects, in: IEDM Tech. Dig., 567 - 570,2000
    
    [63] S. Thompson and et al., A 90 nm technology featuring 50 nm strained silicon channel transistors, in: IEDM Tech. Dig., 61-64, 2002
    
    [64] James T. Kao and et al, Dual-Threshold voltage techniques for low power digital circuits, IEEE Journal of Solid-State Circuits, 35(7): 1009 - 1018, 2000
    
    [65] Seong-Ook Jung and et al, Dual Threshold Voltage Domino Logic Synthesis for High Performance with Noise and Power Constraint, in: Proceedings of the conference on Design, Automation and Test in Europe, 2002
    
    [66] Gin Yee and Carl Sechen, Clock-Delayed Domino for Adder and Combinational Logic Design, in: Proceedings of the International Conference on Computer Design, 332 -337,1996
    
    [67] Gin Yee and Carl Sechen, Clock-Delayed Domino for Dynamic Circuit Design, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS,8(4): 425-430, 2000
    
    [68] Sanu K. Mathew et al, Sub-500-ps 64-b ALUs in 0.18-m SOI/Bulk CMOS: Design and Scaling Trends, IEEE Journal of Solid-State Circuits, 36(11): 1636 - 1646, 2001
    
    [69] K. Bernstein et al, High-performance CMOS variability in the 65-nm regime and beyond, IBM Journal of Research and Development, 50(4/5): 433 - 449, 2006
    
    [70] Osman S. Unsal, Impact of Parameter Variations on Circuits and Microarchitecture,Micro, 30 - 39, 2006
    
    [71] David Harris, SKEW-TOLERANT CIRCUIT DESIGN, Ph.D. thesis, STANFORD UNIVERSITY, 1999
    
    [72] Qi-Wei Kuo et al, Substrate-Bias Optimized 0.18um 2.5GHz 32-bit Adder with Post-Manufacture Tunable Clock, in: VLSI-TSA-DAT, 2005
    [73]Samuel Naffziger et al,The Implementation of a 2-Core,Multi-Threaded Itanium Family Processor,IEEE Journal of Solid-State Circuits,41(1):197-209,2006
    [74]Yolin Lih,Nestoras Tzartzanis and William W.Walker,A Leakage Current Replica Keeper for Dynamic Circuits,IEEE Journal of Solid-State Circuits,42(1):48-55,2007
    [75]Semiconductor Industry Association,San Jose,CA,International Technology Roadmap for Semiconductors(ITRS),2001ed
    [76]Semiconductor Industry Association,San Jose,CA,International Technology Roadmap for Semiconductors(ITRS),2003ed
    [77]Semiconductor Industry Association,San Jose,CA,International Technology Roadmap for Semiconductors(ITRS),2005ed
    [78]Tze-Chiang Chen,Where CMOS Is Going:Trendy Hype vs.Real Technology,in:IEEE International Solid-State Circuits Conference,1-18,2006
    [79]M.Anders,R.Krishnamurthy,K.Soumyanath and R.Spotten,Robustness of sub-70-nm dynamic circuits:Analytical techniques and scaling trends,in:Symposium on VLSI Circuits,23-24,2001
    [80]A.Alvandpour and et al,A conditional keeper technique for sub-0.13um wide dynamic gates,in:Proc.IEEE Symp.VLSI Circuits,29-30,2001
    [81]Process monitor based keeper scheme for dynamic circuits,U.S.Patent 6,894,528,May 2005
    [82]A.Alvandpour and R.K.Krishnamurthy,Conditional burn-in keeper for dynamic circuits,U.S.Patent 6,791,364,Sep 2004
    [83]D.Stasiak and et al,A 2nd generation 440 ps SOI 64b adder,in:IEEE Int.Solid-State Circuit Conference,288-289,2000
    [84]Ram K.Krishnamurthy et al,A 0.13um 6-GHz 256 x 32 bit Leakage-Tolerant Register File,in:Symposium on VLSI Circuits,25-26,2001
    [85]Stephen Tang and et al,A Leakage Tolerant Dynamic Register File Using LBSF and SFN Techniques,in:Symposium on VLSI Circuits,320-321,2002
    [86]Magnus Sj(a|¨)lander,Henrik Eriksson and Per Larsson-Edefors,An Efficient Twin-Precision Multiplier,in:In Proceedings of the IEEE 22nd International Conference on Computer Design,2004
    [87]岳虹,嵌入式异构多核处理器设计与实现关键技术研究,博士学位论文,国防科学技术大学,2006
    [88] A. A. Farooqui and V. G. Oklobdzija, General Data-Path Organization of a MAC Unit for VLSI Implementation of DSP Processors, in: Proc. IEEE Int. Symp. Circuits and Systems, 260 - 263, 1998
    
    [89] Lei Yang and C.-J. Richard Shi, FROSTY: A Fast Hierarchy Extractor for Industrial CMOS Circuits, in: Proceedings of the IEEE/ACM ICCAD, 741 - 747, 2003
    
    [90] R.E. Bryant, Boolean Analysis of MOS Circuits, IEEE Transactions on Computer-Aided Design, CAD-6(4): 634 - 649, 1987
    
    [91] dR. E. Bryant, Extraction of Gate Level Models from Transistor Circuits by Four-Valued Symbolic Analysis, in: Proceedings of the IEEE/ACM ICCAD, 350 - 353,1991
    
    [92] Simon Jolly, Atanas Parashkevov and Tim McDougall, Automated equivalence checking of switch level circuits, in: Proceedings of the 39th DAC, 299 - 304, 2002
    
    [93] Sasha Novakovsky, Shy Shyman and Ziyad Hanna, High capacity and automatic functional extraction tool for industrial VLSI circuit designs, in: Proceedings of the IEEE/ACM ICCAD, 520 - 525, 2002
    
    [94] S. Bose, Automated Modeling of Custom Digital Circuits for Test, in: Proceedings of the conference on Design, Automation and Test in Europe, 954 - 961, 2002
    
    [95] C. Amin, F. Dartu and Y. I. Ismail, Weibull based analytical waveform model, in: Proc.of ICCAD, 161-168,2005
    
    [96] D. Blaauw, S. Sirichotiyakul and C. Oh, Driver modeling and alignment for worst-case delay noise, EEE Trans. on VLSI, 157 - 165, 2003
    
    [97] F. Dartu, K. Killpack, C. Amin and N. Menezes, Evaluating the factors influencing timing accuracy, in: Proc. Of Int. Workshop on Timing Issues, 2005
    
    [98] S.H. Choi, F. Dartu and K. Roy, Timed Input Pattern Generation for Delay Calculation under Simultaneous Switching, in: Proc. Of Int. Workshop on Timing Issues (TAU),2004
    
    [99] T. Karnik, S. Borkar and V. De, Sub-90nm technologies - Challenges and opportunities for CAD, in: Proc. of ICCAD, 203 - 206, 2002
    
    [100] Jr. J. Rubenstein, P. Penfield and M. A. Horowitz, Signal delay in RC networks, IEEE Trans. on CAD, CAD-2: 202 - 211, July 1983
    
    [101] Pawan Kulshreshtha and Robert Palermo et al, Transistor-level timing analysis using embedded simulation, in: Proceedings of the IEEE/ACM ICCAD, 344 - 349, 2000
    
    [102] Larry McMurchie and Carl Sechen, WTA: waveform-based timing analysis for deep submicron circuits, in: Proceedings of the IEEE/ACM ICCAD, 625 - 631, 2002
    [103]Vinod Narayanan,Barbara A.Chappell and Bruce M.Fleischer,Static timing analysis for self resetting circuits,in:Proceedings of the IEEE/ACM ICCAD,119-126,1996
    [104]Wendy Belluomini,Chris J.Myers and H.Peter Hofstee,Verification of Delayed-Reset Domino Circuits Using ATACS,in:Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems,3-12,1999
    [105]D.Van Campenhout,T.Mudge and K.Sakallah,Timing verification of sequential domino circuits,in:Proceedings of the IEEE/ACM ICCAD,127-132,1996
    [106]D.Van Campenhout,T.Mudge and K.Sakallah,Timing Verification of Sequential Dynamic Circuits,IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,18(5):645-658,1999
    [107]Kevin J.Nowka and Tibi Galambos,Circuit Design Techniques for a Gigahertz Integer Microprocessor,in:Proceedings of the International Conference on Computer Design,11-16,1998
    [108]Sanu Mathew et al,A 4GHz 300mW 64b Integer Execution ALU with Dual Supply Voltages in 90nm CMOS,in:IEEE Int.Solid-State Circuit Conference,162-163,2004
    [109]Sanu K.Mathew et al,A 4-GHz 300-mW 64-bit Integer Execution ALU With Dual Supply Voltages in 90-nm CMOS,IEEE Journal of Solid-State Circuits,40(1):44-51,2005
    [110]Cedric Lichtenau,Mathew I Ringler and et al,PowerTune:Advanced Frequency and Power Scaling on 64b PowerPC Microprocessor,in:IEEE Int.Solid-State Circuit Conference,19.8,2004
    [111]增强型英特尔SpeedStep动态节能技术,http://www.intel.com
    [112]AMD PowerNow! Technology Overview,http://www.amd.com
    [113]LongRun Power Management,www.charmed.com/PDF/LongRunWhitePaper_1-17-01.pdf
    [114]赵荣彩等,编译指导的多线程低功耗技术研究,计算机研究与发展,39(12):1572-1579.2002
    [115]Keneath L.Shepard,Design Methodologies for Noise in Digital Integrated Circuits,in:Proceedings of the 35th ACM/IEEE conference on Design automation,94-99,1998
    [116]Ram K.Krishnamurthy et al,A 130-nm 6-GHz 256 x 32 bit Leakage-Tolerant Register File,IEEE Journal of Solid-State Circuits,37(5):624-632,2002
    [117]Nestoras Tzartzanis et al,A 34Word x 64b 10R/6W Write-Through Self-Timed Dual-Supply-Voltage Register File,in:IEEE Int.Solid-State Circuit Conference,25.6,2002
    [118]Eric S.Fetzer et al,The Parity Protected,Multithreaded Register Files on the 90-nm Itanium Microprocessor,IEEE Journal of Solid-State Circuits,41(1):246-255,2006
    [119]Eric S.Fetzer and John T.Orton,A Fully-Bypassed 6-Issue Integer Datapath and Register File on an Itanium Microprocessor,in:IEEE Int.Solid-State Circuit Conference,2002
    [120]Jan M.Rabaey,Anantha Chandrakasan and Borivoje Nikolic,Digital Integrated Circuits:A Design Perspective,Second Edition,Prentice-Hall,2003
    [121]Stefan Rusu and et al,A 65-nm Dual-Core Multithreaded Xeon Processor With 16-MB L3 Cache,IEEE Journal of Solid-State Circuits,42(1):19-27,2007
    [122]Michael Kagan.and et al,MMX~(TM) Microarchitecture of Pentium Processors With MMX Technology and PentiumⅡ Microprocessors,Intel Technology Journal,(3),1997
    [123]Glenn Hinton and et al,The Microarchitecture of the Pentium 4 Processor,Intel Technology Journal,1-13,2001
    [124]J.B.Kuang and et al,A Double-Precision Multiplier with Fine-Grained Clock-Gating Support for a First-Generation CELL Processor,in:IEEE Int.Solid-State Circuit Conference,378-380,2005
    [125]Gary W.Bewick,Fast Multiplication:Algorithms and Implementation,Ph.D.thesis,Stanford University,1994
    [126]何肇雄,一个SIMD乘法器的设计与实现,学士学位论文,国防科学技术大学,2003
    [127]罗飞,“银河飞腾”DSP乘法部件及算术逻辑部件的设计,硕士学位论文,国防科学技术大学,2006
    [128]李兆亮,李振涛,邢座程,16位乘法器的全定制设计,计算机工程与工艺年会,济南,2005
    [129]雷丛华,李振涛,李少青,32位静态加法器的电路优化,计算机工程与工艺年会,济南,2005
    [130]李兆亮,“银河飞腾”DSP的L单元全定制设计优化,硕士学位论文,国防科学技术大学,2006
    [131]张能,李振涛,陈书明,时钟树的混合时序分析方法,第十届计算机工程与工艺年会,2006年
    [132]Vern Paxson,Flex:A fast scanner generator,Edition 2.5,1995
    [133] Charles Donnelly and Richard Stallman, Bison: The Yacc-compatible Parser Generator, Bison Version 2.0, 2004
    
    [134] Jorn Lind-Nielsen, BuDDy: Binary Decision Diagram package, Release 2.2, IT-University of Copenhagen, 2002
    
    [135] Logic Minimization Algorithms for VLSI Synthesis, Kluwer, 1984
    
    [136] David Blaauw, Vladimir Zolotov and et al, Slope propagation in static timing analysis,in: Proceedings of the IEEE/ACM ICCAD, 338 - 343, 2000

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700