面向FPGA设计及应用的EDA关键技术研究

英文题名：EDA Key Technology for the Design and Application of FPGA
作者：陈迅
论文级别：博士
学科专业名称：电子科学与技术
中文关键词：现场可编程门阵列 ; 版图设计自动化 ; 可制造性设计 ; 规则化版图 ; 并行布线算法
英文关键词：FPGA ; layout design automation ; design for manufacturability ; regular layout ; parallel routing algorithm
学位年度：2011
导师：周兴铭
学科代码：0809
学位授予单位：国防科学技术大学
论文提交日期：2011-09-01

摘要

在过去的半个世纪里，现场可编程门阵列FPGA逐渐成为数字电路实现的一种主流设计方法。与专用集成电路ASIC设计方法不同，FPGA设计有着可以避免一次性工程费用NRE以及上市时间短的优点，但是它所实现电路的规模、速度、和功耗受限于FPGA芯片本身，因而通常采用先进工艺和增加规模的方式弥补其与ASIC设计之间的差异。这些方法在带来性能提升的同时也对FPGA芯片的设计实现、制造和应用带来了新的挑战。因此，如何有效地设计实现FPGA芯片、改进芯片可制造性以及改善设计工具的使用体验成为了工业和学术界研究的热点。
     本文从FPGA芯片的版图设计，可制造性设计和并行化布线三个方面进行了研究，通过对EDA关键技术的改进，改善了版图设计速度、可制造特性以及设计工具的使用体验。同时，本文采用测试电路实验分析和定量建模分析的手段对所做的改进进行了评估。本文的创新点如下：
     1.针对FPGA芯片设计实现技术问题，本文改进了已有的FPGA版图自动生成流程，缩短了设计时间，节省了设计成本。经典的FPGA版图自动生成流程中，用于构建FPGA版图的基本单元是采用手工方式设计实现的，本文将基本单元的版图生成过程进行了自动化设计，并对小规模晶体管组的版图自动生成算法中的链接算法进行了改进，提出了基于子网络置换的链接算法，优化了单元级版图链接结果。
     2.针对FPGA芯片可制造性设计问题，本文通过约束FPGA版图的样式来改进版图的可印刷特性，进而对芯片制造过程中的工艺偏差和失效率进行控制。定量分析显示在9%的面积开销下,本文所采用的规则化版图式样能够获得33%的工艺偏差改进和21.2%的失效率改进，如果还能允许有另外9%的面积开销，规则化版图将能够获得93.8%的工艺偏差缩减和16.2%的失效率改进。
     3.针对FPGA布线算法优化问题，本文通过并行化方法对布线算法进行设计提速，提出了一种基于几何划分的并行化布线方法，其主要思想是对FPGA布线区域进行划分。不同划分内的信号线布线不存在数据相关，被分配到不同处理核中进行布线。现有并行布线算法在加速时会影响布线质量，而本文提出的布线算法能在获得较高布线加速的情况下不损失布线精度。
During the past50years, Filed-Programmable Gate Arrays (FPGAs) have becomeone of the most popular implementation media for digital circuits. Compared withApplication Specific Integrated Circuits (ASICs), FPGAs allow designers to achievelower Non-Recurring Engineering (NRE) costs and short time to markets for theirdesigns. But scale, speed and power of the designs implemented in FPGA are limited,So, FPGA can compete with ASIC by using newly developed technology and increasingsize. While, these methods not only enhance the performance but also increase thedesign, manufacture and application challenge for FPGA. So, how to efficientlyimplement FPGAs, improve the manufacturability and change the design experiencebecomes a hot topic in industry and academic.
     In this paper, our researches mainly focus on the design, manufacturability anddesign tools. Through improving the Electronic Design Automation (EDA) algorithm,we optimized the FPGA implementation method and design tools. To verify ourimprovement, we involved the testbench analyzing method and modeling method. Wemade three innovations in this paper:
     1. For the problem of FPGA chip implementation, we cut down the time for FPGAlayout design. Classic FPGA layout design automation flow is based on the manuallydesigned building cells, and we automated their layout design, with the keyimprovement of the chaining algorithm for the small transistor group layout automation.
     2. For the manufacture problem of FPGA chip design, we improved the printabilityof FPGA chip by limiting the layout style which can also improve the process variationand Probability of Failure (PoF). Quantitative analysis shows that our new layout stylecan achieve33%variation improvement and21.2%PoF improvement with only9%area penalty, which could be potentially recovered by process window optimizationthanks to its superior printability.
     3. For the problem of FPGA design tool optimization, we changed designer’s usingexperience through speeding up the compiling time of FPGA design, which indeedoptimized the application cost. After analyzing compile tool chain, we found thecompiling time could be shrinked through the parallelization of the routing algorithm.We proposed a parallel routing algorithm based on geometric partition. The algorithmpartitions the routing region into several parts, and the nets which belong to differentparts can be routed concurrently because there is no data dependency between them. So,our parallelization method can achieve good speedup without compromising the qualityof routing result.

引文

[1] Instruments, T. The Chip that Jack Built.[EB/OL] http://www.ti.com/corp/docs/kilbyctr/jackbuilt.shtml,2008/2011.
    [2] Gardiner, B. IDF: Gordon Moore Predicts End of Moore’s Law.[EB/OL]http://www.wired.com/epicenter/2007/09/idf-gordon-mo-1/,2007/2011.
    [3] Xilinx. Virtex-7Product Table.[EB/OL] http://www.xilinx.com/publications/prod_mktg/Virtex7-Product-Table.pdf,2011/2011.
    [4] Toepelt. Size Comparison--Bigger Die, Fewer Transistors.[EB/OL] http://www.tomshardware.com/reviews/Intel-Core-i7-Nehalem,2057-2.html,2008/2011.
    [5] Kowaliski, C. Dunnington materializes as six-core Xeon7400.[EB/OL] http://techreport.com/discussions.x/15514?post=354036#354036,2008/2011.
    [6] News, S. Sun's Niagara3-One Billion Transistors,16SPARC CoresCombined in a Single SoC.[EB/OL] http://sun.systemnews.com/articles/144/4/hw/22847,2010/2011.
    [7] Dakic, V. Intel Core i7980X, Core i5650and Core i3530review.[EB/OL]http://it-review.net/article/hardware/cpu/Intel_Core_i7_980X,_Core_i5_650_and_Core_i3_530_review,2010/2011.
    [8] Stokes, J. IBM's8-core POWER7: twice the muscle, half the transistors.
    [EB/OL] http://arstechnica.com/hardware/news/2009/09/ibms-8-core-power7-twice-the-muscle-half-the-transistors.ars,2010/2011.
    [9] IBM. IBM to Ship World's Fastest microprocessor.[EB/OL] http://www-03.ibm.com/press/us/en/pressrelease/32414.wss,2010/2011.
    [10] Intel. New Dual-Core Intel Itanium2Processor Doubles Performance,Reduces Power Consumption.[EB/OL] http://www.intel.com/pressroom/archive/releases/2006/20060718comp.htm,2006/2011.
    [11] Intel. World's First2-Billion Transistor Microprocessor.[EB/OL] http://www.intel.com/technology/architecture-silicon/2billion.htm,2009/2011.
    [12] Intel. Intel Previews Intel Xeon 'Nehalem-EX' Processor.[EB/OL] http://www.intel.com/pressroom/archive/releases/2009/20090526comp.htm,2009/2011.
    [13] AnandTech. Westmere-EX: Intel's Flagship Benchmarked.[EB/OL] http://www.anandtech.com/show/4285/westmereex-intels-flagship-benchmarked,2011/2011.
    [14] Hardware, T.s. AMD's RV770to have800stream processors?[EB/OL] http://www.tomshardware.com/forum/249346-33-rv770-stream-processors,2008/2011.
    [15] Anandtech. NVIDIA's1.4Billion Transistor GPU: GT200Arrives as theGeForce GTX280&260.[EB/OL] http://www.anandtech.com/show/2549,2008/2011.
    [16] AMD. ATI Radeon HD5870Graphics.[EB/OL] http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5870/Pages/ati-radeon-hd-5870-overview.aspx,2009/2011.
    [17] Spot, T. AMD Radeon HD6970Review.[EB/OL] http://www.techspot.com/review/348-amd-radeon-6970/,2009/2011.
    [18] Nvidia. GTX400Architecture.[EB/OL] http://www.nvidia.com/object/GTX_400_architecture.html,2010/2011.
    [19] Kienhuis, B. How to teach a billion transistor chip a new trick.[EB/OL] http://ptolemy.eecs.berkeley.edu/~kienhuis/ftp/kenniscongres2006.pdf,2006/2011.
    [20] Xilinx. Xilinx and UMC extenf long-time manufacturing relationship to65nmand below.[EB/OL] http://www.xilinx.com/prs_rls/xil_corp/05108xlnx_umc.htm,2010/2006.
    [21] EEtimes. Altera's new40nm FPGAs-2.5billion transistors![EB/OL] http://www.eetimes.com/electronics-products/fpga-pld-products/4104287/Altera-s-new-40nm-FPGAs--2-5-billion-transistors-,2010/2008.
    [22] Embedded-control-europe. Altera:28-nm Stratix FPGA with3.9billiontransistors.[EB/OL] http://www.embedded-control-europe.com/product-news/article/1-news-global/13636-altera-28-nm-stratix-fpga-with-3-9-billion-transistors,2011/2011.
    [23] Betz, V. and J. Rose. VPR: A new packing, placement and routing tool forFPGA research [C]∥Proceedings of the7th International Workshop on FieldProgrammable Logic and Applications, London:Springer Verlag,1997:213~222.
    [24] Padalia, K. Automatic Transistor-Level Design and Layout Placement ofFPGA Logic and Routing from an Architectural Specification [D]. Toronto:Universityof Toronto,2001:71.
    [25] Padalia, K. et al. Automatic Transistor and Physical Design of FPGA Tilesfrom an Architectural Specification [C]∥Proceedings of the2003ACM/SIGDAInternational Symposium on Field programmable gate arrays, Monterey, California,USA:ACM,2003:164~172.
    [26] Chan, A. Automating Transistor Resizing in the Design of Field-Programmable Gate Arrays [D]. Toronto:University of Toronto,2003:52.
    [27] Fung, R. Optimization Of Transistor-Level Floorplans For Field-Programmable Gate Arrays [D]. Toronto:University of Toronto,2002:133.
    [28] I. Kuon, A. Egier, and J. Rose. Design, Layout and Verification of an FPGAUsing Automated Tools [C]∥Proceedings of2005ACM/SIGDA InternationalSymposium on Field-programmable gate arrays, Monterey, California, USA:ACM,2005:215~226.
    [29] A. Choong, R. Beidas, and Z. Jianwen. Parallelizing Simulated Annealing-Based Placement Using GPGPU [C]∥Proceeding of2010International Conference onField Programmable Logic and Applications (FPL'10), IEEE,2010:31~34.
    [30] Bian, H. et al. Towards scalable placement for FPGAs [C]∥Proceedings ofthe18th annual ACM/SIGDA international symposium on Field programmable gatearrays, Monterey, California, USA:ACM,2010:147~156.
    [31] Gort, M. and J.H. Anderson. Deterministic multi-core parallel routing forFPGAs [C]∥Proceeding of2010International Conference on Field-ProgrammableTechnology (FPT'10), Beijing:IEEE,2010:78~86.
    [32]黄志军,张鹏,童家榕.FPGA/CPLD结构分析[J].微电子学,1998(5):345~353.
    [33]黄志军等.一种适于数据通路电路的FPGA结构[J].微电子学,1999(5):305~310.
    [34]张万鹏,童家榕.性能驱动的LUT结构FPGA的工艺映射[J].计算机辅助设计与图形学学报,1999(4):324~327.
    [35]周锋,童家榕,唐璞山.一种带时延约束的FPGA布局算法[J].计算机辅助设计与图形学学报,1999(4):304~308.
    [36]徐嘉伟,来金梅,童家榕.可配置宏的快速FPGA布局算法[J].计算机工程,2009(16):228~230.
    [37]徐健,黄维康,黄志军.一个有效的用于确定SRAM型FPGA容错布线中预留线的算法[J].计算机辅助设计与图形学学报,1999(4):328~331.
    [38]周锋,童家榕,唐璞山.带时延约束的FPGA布线算法[J].半导体学报,1999(9):831~836.
    [39]朱利民等.基于整数规划的层次式FPGA布线算法[J].计算机辅助设计与图形学学报,2010(10):1687~1693.
    [40] Hui, D. et al. Fast placement for large-scale hierarchical FPGAs [C]∥Proceeding of2009IEEE International Conference on Computer Aided Design andComputer Graphics (CAD/Graphics'09) IEEE,2009:190~194.
    [41] Hui, D. et al. Cluster-based Placement for multilevel hierarchical FPGA [C]∥Proceeding of2008International Conference on Solid State and Integrated CircuitTechnology (ICSICT'08), IEEE,2008:2325~2328.
    [42] Hui, D. et al. A Novel Incremental Floorplan Algorithm for Duplication inIntegration of High-level Synthesis and Floorplan [C]∥Proceeding of2007International Conference on Communications, Circuits and Systems (ICCCAS'07),IEEE,2007:1163~1167.
    [43]徐新民,王倩,严晓浪.FPGA布线通道分布对面积效率的影响研究[J].电子与信息学报,2006(10):1959~1962.
    [44]徐新民,吴晓波,严晓浪.现场可编程门阵列动态重构下的低功耗研究[J].电子与信息学报,2007(2):193~197.
    [45]刘站.几种用于FPGA的新型有效混合布线算法[D].无锡:江南大学,2007:111.
    [46]中国科学院电子学研究所.可编程芯片与系统研究室简介.[EB/OL] http://www.ie.cas.cn/jgsz/kybm/gjsyjbm/kbcxpyxt/,2011/2010.
    [47] Phillips, S. and S. Hauck. Automatic Layout of Domain SpecificReconfigurable Subsystems for System-On-a-Chip [C]∥Proceedings of the2002ACM/SIGDA International Symposium on Field-programmable gate arrays, Monterey,California, USA:ACM,2002:165~173.
    [48] Wu, J.C.H. et al. SoC Implementation Issues for Synthesizable EmbeddedProgrammable Logic Cores [C]∥Proceedings of2003Custom Integrated CircuitsConference, San Jose:IEEE,2003:45~48.
    [49] Kuon, I. Automated FPGA Design, Verification and Layout [D]. Toronto:University of Toronto,2004:124.
    [50] Egier, A. Enhancing and Using an Automatic Design System for CreatingFPGAs [D]. Toronto:University of Toronto,2004:143.
    [51] V. Aken'Ova, G. Lemieux, and R. Saleh. An Improved "soft" eFPGA Designand Implementation Strategy [C]∥Proceedings of the2005IEEE Custom IntegratedCircuits Conference, New York:IEEE,2005:179~182.
    [52] Z. Xiaowei, W. Yu, and Y. Huazhong. DCCB and SCC based fast circuitpartition algorithm for parallel SPICE simulation [C]∥Proceedings of2009International Conference on ASIC (ASICON'09), IEEE,2009:1247~1250.
    [53] C. Yhonkyong, J. Young Suk, and C.S. Rim. A topology-based multi-waycircuit partition for ASIC prototyping [C]∥Proceedings of1996IEEE Midwestsymposium on Circuits and Systems, IEEE,1996:357~360.
    [54] Tao, L. and Y.C. Zhao, Effective heuristic algorithms for VLSI-circuitpartition [J]. IEE Proceedings of Circuits, Devices and Systems,1993,140(2):127~134.
    [55] Yu-Chung, L. et al. Cost minimization of partitioning circuits with complexresource constraints in FPGAs [C]∥Proceedings of2000IEEE Asia Pacific Conferenceon Circuits and Systems (APCCAS'00), IEEE,2000:556~559.
    [56] D. Kolar, J.D. Puksec, and I. Branica. VLSI circuit partition using simulatedannealing algorithm [C]∥Proceedings of the12th IEEE Mediterranean ElectrotechnicalConference (MELECON'04),2004:205~208.
    [57] T. Iizuka, M. Ikeda, and K. Asada. High speed layout synthesis forminimum-width CMOS logic cells via Boolean satisfiability [C]∥Proceeding of2004Asia and South Pacific Design Automation Conference (ASPDAC'04), IEEE,2004:149~154.
    [58] Kim, J. and S.M. Kang. A timing-driven data path layout synthesis withinteger programming [C]∥Proceeding of1995IEEE/ACM International Conference onComputer Aided Design (ICCAD'95), IEEE,1995:716~719.
    [59] Tani, K. et al. Two-dimensional layout synthesis for large-scale CMOScircuits [C]∥Proceedings of1991IEEE International Conference on Computer AidedDesign (ICCAD'91), IEEE,1991:490~493.
    [60] Haixia, Y. et al. Considering The ffect Of Standard Cell Placement In Mied-Size Placement [C]∥Proceeding of2005IEEE International Conference On ASIC(ASICON'05) IEEE,2005:179~182.
    [61] Richard, B.D. A Standard Cell Initial Placement Strategy [C]∥Proceedings of1984Design Automation Conference (DAC'84), ACM,1984:392~398.
    [62] K. Tsota, K. Cheng-Kok, and V. Balakrishnan. A study of routabilityestimation and clustering in placement [C]∥Proceedings of2009IEEE/ACMInternational Conference on Computer Aided Design (ICCAD'09), IEEE,2009:363~366.
    [63] Tsung-Yi, H. and L. Sheng-Hung. Fast legalization for standard cellplacement with simultaneous wirelength and displacement minimization [C]∥Proceedings of2010VLSI System on Chip Conference (VLSISoC'10), IEEE,2010:369~374.
    [64] S. Wakabayashi, N. Iwauchi, and H. Kubota. A hierarchical standard cellplacement method based on a new cluster placement model [C]∥Proceedings of2002Asia Pacific Conference on Circuits and Systems (APCCAS'02), IEEE,2002:273~278.
    [65] Suaris, P.R. and G. Kedem, A quadrisection-based combined place and routescheme for standard cells [J]. IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems,1989,8(3):234~244.
    [66] Y. Xiaojian, C. Bo-Kyung, and M. Sarrafzadeh, Routability-driven whitespace allocation for fixed-die standard-cell placement [J]. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems,2003,22(4):410~419.
    [67] Bhingarde, S. et al. Over-the-cell routing algorithms for industrial cell models
    [C]∥Proceeding of the7th International Conference on VLSI Design, IEEE,1994:143~148.
    [68] S. Bhingarde, A. Panyam, and N. Sherwani. Efficient over-the-cell routingalgorithm for general middle terminal model [C]∥Proceeding of1993IEEEInternational Symposium on Circuits and Systems (ISCAS '93), IEEE,1993:1861~1864.
    [69] Brouwer, R.J. and P. Banerjee. PHIGURE: a parallel hierarchical global router
    [C]∥Proceedings of27th ACM/IEEE Design Automation Conference, ACM,1990:650~653.
    [70] Rose, J. LocusRoute: a parallel global router for standard cells [C]∥Proceedings of1988Design Automation Conference (DAC'88), ACM,1988:189~195.
    [71] Xianlong, H. et al. TIGER: an efficient timing-driven global router for gatearray and standard cell layout design [J]. IEEE Transactions on Computer-Aided Designof Integrated Circuits and Systems,1997,16(11):1323~1331.
    [72] Uehara, T. and W.M. Vancleemput, Optimal Layout of CMOS FunctionalArrays [J]. IEEE Transactions on Computers,1981,30(5):305~312.
    [73] Chen, C.Y.R. and C.Y. Hou. A new layout optimization methodology forCMOS complex gates [C]∥Proceeding of IEEE International Conference on ComputerAided Design (ICCAD'88), IEEE,1988:368~371.
    [74] Maziasz, R.L. and J.P. Hayes, Layout optimization of static CMOS functionalcells [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems,1990,9(7):708~719.
    [75] Hwang, C.Y. et al. A Fast Transistor Chaining Algorithm for CMOS CellLayout [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems,1990,9(7):781~786.
    [76] Wong and A. Kwok-Kit. Resolution Enhancement Techniques in OpticalLithography [M]. Washington:SPIE Bekkingham,2001:400~408.
    [77] S. Postnikov, K. Lucas, and K. Wimmer, Impact of Optimized Illuminationupon Simple Lambda Based Design Rules for Low-k1Lithography [J]. Metrology,Inspection, and Process Control for Microlithography,2001,4344:797~808.
    [78] Torres, J.A. et al. RET Compliant Cell Generation for Sub-130nm Processes[J]. Design, Process Integration, and Characterization for Microelectronics,2002,4692:529~539.
    [79] Jhaveri, T. et al. Co-Optimization of Circuits, Layout and Lithography forPredictive Technology Scaling Beyond Gratings [J]. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems,2010,29(4):509~527.
    [80] Sedcole, P. and P.Y.K. Cheung. Parametric yield in FPGAs due to within-diedelay variations: a quantitative analysis [C]∥Proceedings of the2007ACM/SIGDAInternational Symposium on Field programmable gate arrays, Monterey, California,USA:ACM,2007:178~187.
    [81] Matsumoto, Y. et al. Performance and yield enhancement of FPGAs withwithin-die variation using multiple configurations [C]∥Proceedings of the2007ACM/SIGDA International Symposium on Field programmable gate arrays, Monterey,California, USA:ACM,2007:169~177.
    [82] Albrecht, C. Provably good global routing by a new approximation algorithmfor multicommodity flow [C]∥Proceedings of the2000International Symposium onPhysical Design, San Diego, California, United States:ACM,2000:19~25.
    [83] J. Rose, Parallel global routing for standard cells [J]. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems,1990,9(10):1085~1095.
    [84] S. Brown, J. Rose, and Z. Vranesic. A detailed router for field-programmablegate arrays [C]∥Proceeding of IEEE International Conference on Computer AidedDesign (ICCAD'90), IEEE,1990:382~385.
    [85] Lemieux, G.G. and S.D. Brown. A detailed routing algorithm for allocatingwire segments in field-programmable gate arrays [C]∥Proceedings of ACM/SIGDAPhysical Design Workshop, Lake Arrowhead, CA:ACM,1993:215~226.
    [86] Yuh-Sheng Lee, A.C.H.W. A Performance and Routability Driven Router forFPGAs Considering Path Delays [C]∥Proceedings of1995Design AutomationConference (DAC'95), ACM,1995:557~561.
    [87] Wu, Y.L. and M. Marek-Sadowska. An efficient router for2-D fieldprogrammable gate array [C]∥Proceedings of1994Design Automation Conference(DAC'94), ACM,1994:412~416.
    [88] Yu-Liang Wu, M.M.-S. Orthogonal Greedy Coupling-A New OptimizationApproach to2-D FPGA Routing [C]∥Proceedings of1995Design AutomationConference (DAC'95), ACM,1995:568~573.
    [89] Betz, V. Architecture and CAD for the Speed and Area Optimization ofFPGAs [D]. Toronto:University of Toronto,1998:229.
    [90] V.Betz, J. Rose, and A. Marquardt. Architecture and CAD for deep-submicronFPGAs [M]. Norwell:Kluwer Academic,1999:264.
    [91] Ebeling, C. et al. Placement and routing tools for the Triptych FPGA [J]. IEEETransactions on Very Large Scale Integration (VLSI) Systems,1995,3(4):473~482.
    [92] Michael J. Alexander, G.R. New Performance-Driven FPGA RoutingAlgorithms [C]∥Proceedings of1995Design Automation Conference (DAC'95), ACM,1995:562~567.
    [93] Tessier, R. Negotiated A*Routing for FPGAs [C]∥Proceedings of FifthCanadian Workshop On Field-Programmable Devices, Montreal:IEEE,1998:
    [94] J. S. Swartz, V. Betz, and J. Rose. A fast routability-driven router for FPGAs
    [C]∥Proceedings of the1998ACM/SIGDA International Symposium on Fieldprogrammable gate arrays, Monterey, California, United States:ACM,1998:140~149.
    [95] R. Lyseckya, F. Vahid, and S.X.D. Tan. Dynamic FPGA routing forjust-in-time FPGA compilation [C]∥Proceedings of2004Design AutomationConference (DAC'04), ACM,2004:954~959.
    [96] A. Ludwin, V. Betz, and K. Padalia. High-quality, deterministic parallelplacement for FPGAs on commodity hardware [C]∥Proceedings of the2008International ACM/SIGDA symposium on Field programmable gate arrays, Monterey,California, USA:ACM,2008:14~23.
    [97] Wang, C.C. and G.G.F. Lemieux. Scalable and deterministic timing-drivenparallel placement for FPGAs [C]∥Proceedings of the19th ACM/SIGDA InternationalSymposium on Field programmable gate arrays, Monterey, CA, USA:ACM,2011:153~162.
    [98] Chan, P.K. et al. Distributed-memory parallel routing for field-programmablegate arrays [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuitsand Systems,2000,19(8):850~862.
    [99] Zhaoyun, X. and P. Banerjee. A parallel algorithm for timing-driven globalrouting for standard cells [C]∥Proceedings of1998International Conference onParallel Processing, IEEE,1998:54~61.
    [100] L.A.F. Cabral, J.S. Aude, and N. Maculan. TDR: A Distributed-MemoryParallel Routing Algorithm for FPGAs [C]∥Proceeding of12th InternationalConference on Field-Programmable Logic and Applications, Springer Verlag,2002:263~270.
    [101]拉贝,钱德拉卡山,尼科利奇.数字集成电路-设计透视（第二版）[M].北京:清华大学出版社,2004:761.
    [102] Sherwani. Algorithm for VLSI Physical Design Automation [M].Norwell:Kluwer Academic,2001:572.
    [103] Altera. Excalibur Device Overview.[EB/OL] http://inst.eecs.berkeley.edu/~cs294-59/fa10/resources/Altera-history/ds_arm.pdf,2002/2011.
    [104] Altera. APEX II programmable logic device family DSAPEXII-3.0.[EB/OL]http://www.altera.com/literature/ds/ds_ap2.pdf,2002/2011.
    [105] Altera. FLEX10K embedded programmable logic device family DS-F10K-4.2.[EB/OL] http://www.altera.com/literature/ds/dsf10k.pdf,2003/2011.
    [106] Altera. APEX20K programmable logic device family data sheetDS-APEX20K-5.1.[EB/OL] http://www.altera.com/literature/ds/apex.pdf,2004/2011.
    [107] Altera. Stratix II device handbook SII5V1-4.5.[EB/OL] http://www.altera.com/literature/hb/stx2/stratix2_handbook.pdf,2007/2011.
    [108] Altera. Stratix II GX Device Handbook SIIGX5V1-4.4.[EB/OL] http://www.altera.com/literature/hb/stx2gx/stxiigx_handbook.pdf,2007/2011.
    [109] Altera. Cyclone II Device Handbook CII5V1-3.3.[EB/OL] http://www.altera.com/literature/hb/cyc2/cyc2_cii5v1.pdf,2007/2011.
    [110] Altera. Cyclone III Device Handbook CIII5V1-3.3.[EB/OL] http://www.altera.com/literature/hb/cyc3/cyclone3_handbook.pdf,2010/2011.
    [111] Altera. Stratix III Device Handbook SIII5V1-2.2.[EB/OL] http://www.altera.com/literature/hb/stx3/stratix3_handbook.pdf,2010/2011.
    [112] Altera. Cyclone IV Device Handbook CYIV-5V1-1.5.[EB/OL] http://www.altera. com/literature/hb/cyclone-iv/cyclone4-handbook.pdf,2010/2011.
    [113] Altera. Stratix V Device Family Overview [EB/OL] http://www.altera.com/literature/hb/stratix-v/stx5_51001.pdf,2011/2011.
    [114] Altera. Stratix IV Device Handbook SIV5V1-4.3.[EB/OL] http://www.altera.com/literature/hb/stratix-iv/stratix4_handbook.pdf,2011/2011.
    [115] Lattice. LatticeXP Family Data Sheet DS1001.[EB/OL]http://www.latticesemi. com/lit/docs/datasheets/fpga/DS1001.pdf,2007/2011.
    [116] Lattice. LatticeECP/EC Family Data Sheet DS1000.[EB/OL] http://www.latticesemi.com/lit/docs/datasheets/fpga/DS1000.pdf,2008/2011.
    [117] Lattice. LatticeXP2Family Data Sheet DS1009.[EB/OL] http://www.latticesemi.com/documents/DS1009.pdf,2008/2011.
    [118] Lattice. LatticeSC/M Family Data Sheet DS1004.[EB/OL] http://www.latticesemi.com/documents/DS1004.pdf,2010/2011.
    [119] Lattice. LatticeECP3Family Data Sheet DS1021.[EB/OL] http://www.latticesemi.com/documents/ds1021ea.pdf,2011/2011.
    [120] Lattice. LatticeECP2/M Family Data Sheet DS1006.[EB/OL] http://www.latticesemi.com/documents/DS1006.pdf,2011/2011.
    [121] Xilinx. Spartan-3E FPGA Family: Data Sheet DS312v3.8.[EB/OL] http://www.xilinx.com/support/documentation/data_sheets/ds312.pdf,2009/2011.
    [122] Xilinx. Virtex-4Family Overview DS112v3.1.[EB/OL] http://www.xilinx.com/support/documentation/data_sheets/ds112.pdf,2010/2011.
    [123] Xilinx. Virtex-5FPGA User Guide UG190v5.3.[EB/OL] http://www.xilinx.com/support/documentation/user_guides/ug190.pdf,2010/2011.
    [124] Xilinx. Virtex-6Family Overview DS150v2.3.[EB/OL] http://www.xilinx.com/support/documentation/data_sheets/ds150.pdf,2011/2011.
    [125] Xilinx.7Series FPGAs Overview DS180v1.6.[EB/OL] http://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf,2011/2011.
    [126] Xilinx. Spartan-6Family Overview DS160v1.7.[EB/OL] http://www.xilinx.com/support/documentation/data_sheets/ds160.pdf,2011/2011.
    [127] Frohman-Bentchkowsky, D. A fully-decoded2048-bit electrically-programmable MOS ROM [C]∥Proceeding of1971IEEE International Solid-StateCircuits Conference, IEEE,1971:80~81.
    [128] Cuppens, R. et al. An EEPROM for microprocessors and custom logic [J].IEEE Journal of Solid-State Circuits,1985,20(2):603~608.
    [129] Scheibe, A. and W. Krauss, A two-transistor SIMOS EAROM cell [J]. IEEEJournal of Solid-State Circuits,1980,15(3):353~357.
    [130] Guterman, D.C. et al. An electrically alterable nonvolatile memory cell usinga floating-gate structure [J]. IEEE Journal of Solid-State Circuits,1979,14(2):498~508.
    [131] Actel. ProASIC3Flash Family FPGAs with Optional Soft ARM Support.
    [EB/OL] http://www.actel.com/documents/PA3_DS.pdf,2009/2011.
    [132] Actel. ProASIC3E Flash Family FPGAs with Optional Soft ARM Support.
    [EB/OL] http://www.actel.com/documents/PA3E_DS.pdf,2009/2011.
    [133] Actel. IGLOO Low Power Flash FPGAs with Flash*Freeze Technology.
    [EB/OL] http://www.actel.com/documents/IGLOO_DS.pdf,2009/2011.
    [134] Actel. IGLOOe Low Power Flash FPGAs with Flash*Freeze Technology.
    [EB/OL] http://www.actel.com/documents/IGLOOe_DS.pdf,2009/2011.
    [135] I. Kuon, R. Tessier, and J. Rose, FPGA Architecture: Survey and Challenges[J]. Foundations and Trends in Electronic Dedign Automation,2008,2(2):135~253.
    [136] Ishihara, S. et al. A Switch Block Architecture for Multi-Context FPGAsBased on a Ferroelectric-Capacitor Functional Pass-Gate Using Multiple/Binary ValuedHybrid Signals [J]. IEICE Transactions on Information and Systems,2010,E93.D(8):2134~2144.
    [137] Mingjie, L. et al. HAFT: A hybrid FPGA with amorphous and fault-tolerantarchitecture [C]∥Proceedings of2008IEEE International Symposium on Circuits andSystems (ISCAS'08), IEEE,2008:1348~1351.
    [138] Ahmed, E. and J. Rose. The effect of LUT and cluster size ondeep-submicron FPGA performance and density [C]∥Proceedings of the2000ACM/SIGDA eighth international symposium on Field programmable gate arrays,Monterey, California, United States:ACM,2000:3~12.
    [139] Ahmed, E. The Effect of Logic Block Granularity on Deep-SubmicronFPGA Performance and Density [D]. Toronto:University of Toronto,2001:141.
    [140] Ahmed, E. and J. Rose, The effect of LUT and cluster size ondeep-submicron FPGA performance and density [J]. IEEE Transactions on Very LargeScale Integration Systems,2004,12(3):288~298.
    [141] Betz, V. and J. Rose, How much logic should go in an FPGA logic block [J].Design&Test of Computers, IEEE,1998,15(1):10~15.
    [142] Tsu, W. et al. HSRA: high-speed, hierarchical synchronous reconfigurablearray [C]∥Proceedings of the1999ACM/SIGDA International Symposium on Fieldprogrammable gate arrays, Monterey, California, United States:ACM,1999:125~134.
    [143] Altera. Stratix III device handbook SIII5V1-2.2.[EB/OL] http://www.altera.com/literature/hb/stx3/stratix3_handbook.pdf,2008/2011.
    [144] Lattice. LatticeXP family data sheet DS1001Version05.1.[EB/OL] http://www.latticesemi.com/lit/docs/datasheets/fpga/DS1001.pdf,2007/2011.
    [145] Lemieux, G. and D. Lewis. Design of Interconnection Networks forProgrammable Logic [M]. Boston:Kluwer Academic,2003:226.
    [146] Xilinx. XC4000XLA/XV Field Programmable Gate Arrays DS015v1.3.
    [EB/OL] http://www.xilinx.com/support/documentation/data_sheets/ds015.pdf,1999/2011.
    [147] D. Chen, J. Cong, and P. Pan, FPGA Design Automation: A Survey [J].Foundations and Trends in Electronic Dedign Automation,2006,1(3):139~169.
    [148] H.W. Kuhn, The Hungarian method for the assignment problem [J]. NavalResearch Logistics,2005,52(4):7~21.
    [149] Jaewon, K. and S.M. Kang. An Efficient Transistor Folding Algorithm ForRow-based Cmos Layout Design [C]∥Proceeding of1997Design AutomationConference (DAC'97), ACM,1997:456~459.
    [150] Cheng, E.Y.C. and S. Sahni. A Fast Algorithm for Transistor Folding [R].Gainesville:U.o. Florida,2002:1~104.
    [151] T.T. Ho, S.S. Iyengar, and S.Q. Zheng, A general greedy channel routingalgorithm [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems,1991,10(2):204~211.
    [152] Preas, B. Channel routing with non-terminal doglegs [C]∥Proceedings of1990Design Automation Conference (DAC'90), ACM,1990:451~458.
    [153] Cadence Design Systems Inc. Encounter User Guide V5.2.5[Z]. SanJose:Cadence Design Systems Inc,2007.
    [154] Cadence Design Systems Inc. Virtuso Abstract Generator User Guide [Z].San Jose:Cadence Design Systems Inc,2007.
    [155] Nair, R. and A. Bruss. Linear Time Algorithm For Optimal CMOS Layout
    [C]∥Proceedings of International Workshop on Parallel Computing and VLSI, IEEE,1984:327~338.
    [156] Jhaveri, T. et al. Maximization of Layout Printability/Manufacturability byExtreme Layout Regularity [J]. Journal of Micro-Nanolithography Mems and Moems,2007,6(3):200~214.
    [157] North Carolina State University. FreePDK45nm.[EB/OL] http://www.eda.ncsu.edu/wiki/FreePDK45:Contents,2008/2009.
    [158] Chan, T.B. and P. Gupta. On Electrical Modeling of Imperfect DiffusionPatterning [C]∥Proceeding of23rd International Conference on VLSI Design, LosAlamitos:IEEE,2010:224~229.
    [159] Ghaida, R.S. and P. Gupta. A Framework for Early and SystematicEvaluation of Design Rules [C]∥Proceedings of the2009International Conference onComputer Aided Design, San Jose, California:ACM,2009:615~622.
    [160] Gupta, P. et al. Electrical Metrics for Lithographic Line-end Tapering [C]∥Proceeding of Photomask and Next-Generation Lithography Mask Technology,Bellingham:SPIE,2008:A283~A283.
    [161] C.H. Stapper, Modeling of Intergrated Circuit Defect Sensitivities [J]. IBMJournal of Research and Development,1983,27(6):549~557.
    [162] Jhaveri, T. Regular design fabrics for low cost scaling of integrated circuits
    [D]. Pittsburgh:Carnegie Mellon University,2009:136.
    [163] Nangate Inc. Nangate Open Cell Library v1.3.[EB/OL] http://www.si2.org/openeda.si2.org/projects/nangatelib,2009/2010.
    [164] International Technology Roadmap for Semiconductors. InternationalTechnology Roadmap for Semiconductors.[EB/OL] http://www.itrs.net,2009/2010.
    [165] Chin, S.Y.L. and S.J.E. Wilton, Static and Dynamic Memory FootprintReduction for FPGA Routing Algorithms [J]. ACM Transaction on ReconfigurableTechnology,2009,1(4):1~20.
    [166] ComputeCanada. Scinet.[EB/OL] http://www.scinet.utoronto.ca,2011/2011.
    [167] IBTA. InfiniBand.[EB/OL] http://www.infinibandta.org/index.php,2011/2011.
    [168] IPoIB. IPoIB.[EB/OL] http://infiniband.sourceforge.net/NW/IPoIB/index.htm,2011/2011.
    [169] IWLS. IWLS benchmarks.[EB/OL] http://iwls.org/iwls2005/benchmarks.html,2011/2011.
    [170] Altera. OpenCore stamping and benchmarkingmethodology [R]. A.Corporation,2008:1~3.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700