基于高密度计算的多核芯片设计关键技术研究

英文题名：Research on the Key Technology of Multi-core Chip Design Based on High Density Computing
作者：李东生
论文级别：博士
学科专业名称：精密仪器与机械
中文关键词：高密度计算 ; 多核计算 ; 多核芯片 ; 三维格雷编码 ; 三维片上网络 ; CMP ; NOC ; 3D ; SC
英文关键词：High Density Computing ; MultiCore Computing ; MultiCore Chip ; 3D Gray Code ; 3D
英文关键词：Network on Chip ; CMP ; NOC ; 3D SC
学位年度：2012
导师：高明伦
学科代码：080401
学位授予单位：合肥工业大学
论文提交日期：2011-11-01
答辩委员会主席：陈军宁

摘要

电子系统结构的高度集成化、功能的一体化给系统设计提出了挑战，体积、重量和功耗的约束使得电子系统小型化设计技术更具有挑战性。CMP（Chip Multiprocessor，单芯片多核处理器）、SiP (System in Package，系统封装)、MEMs（Micro-Electro Mechanical System，微电子机械系统）和NEMs（Nano-Electro Mechanical System，纳电子机械系统），3D（ThreeDimension，三维）集成等技术成为集成电路设计领域重要的研究方向。
     人们对于便捷性和低碳性的追求，以前从不计较成本的HPC也开始关注单位空间、单位功耗、甚至单位资金投入所产生的计算速度等指标。使得高密度计算（High Density Computing，HDC）成为继高性能计算（High Performance Computing，HPC）之后又一个重要的研究方向。
     从目前发展情况来看，CMP已经成为推动HPC和HDC继续进步的核心技术。本文依据HDC应用需求，研究CMP设计、评估和计算优化问题。主要内容包括：
     （1）首先讨论了并行计算、高性能计算、多核计算等相关概念，以多核系统为支撑平台，提出了高密度计算（HDC）问题，给出了HDC的主要特征和所要研究的主要问题，提出了多核系统的评测问题。以数值计算中最具代表性的矩阵类任务作为研究对象，讨论了矩阵运算的一些基本问题，尤其是矩阵的分块运算和复杂度分析，分析了并行矩阵类任务的多核计算方法，为多核系统与矩阵类任务结合的问题研究给出了研究思路。
     （2）在分析单芯片多核处理器结构和当前研究状况的基础上，给出了CMP的发展趋势和典型结构，即小核大数量处理器核、阵列和层次型簇结构，阐述了CMP体系结构的主要类型和基本设计原则，对并行计算模型、结构扩展和评估方法进行了理论分析；提出了“三维相邻码”概念和两种“扩展3D Gray码”的编码方法，即坐标扩展法和层扩展法，并分析了三维相邻码的特点和规律；提出了真正意义上的三维簇结构的3D NOC架构，即“三维星形簇（3DStar Cluster,简称3D SC）结构”。3D SC的NOC架构、节点编码和模型描述方法的提出和应用对于3D NOC的定量研究和分析具有重要的理论意义和使用价值。
     （3）设计了兼容于MIPS指令集的单核RISC处理器，在此基础上实现了一种基于层次化AHB总线的四核处理器FPGA原型设计。探索了处理器结构的扩展性以及提高处理器整体性能的设计方法，利用多维矩阵相乘验证了系统性能和计算能力。实验结果表明，随着矩阵阶数的增加，多核处理器的加速比随之升高，印证了“处理器数的增加与问题规模的增加要保持平衡”的多核计算理论。
     （4）随着核数量的增加，多核系统的任务调度与映射技术的重要性突显出来。本文在分析多核系统的并行调度技术和任务映射技术基础上，给出了层次化任务分解方法，重点讨论了矩阵类任务的等时间复杂度的分块计算，提出了基于嵌入式监控子网的动态任务调度技术。监视子网可以提供数据流信息，用于数据流的智能监控和处理，为提高控制过程可视化和自动化提供了可能。
     （5）在3D NoC设计中，随着节点数量的增加，路由设计和低功耗设计是两个关键技术。本文在分析NoC及3D互连集成基本问题的基础上，给出了低功耗研究的基本思路，重点讨论了3D NoC的路由和3D NoC的低功耗模型，给出了3D NOC路由分析和低功耗分析方法。
     （6）在多核平台上进行复杂信号处理是一个牵涉面很广的工程问题。本文针对并行计算和多核平台特点，对合成孔径雷达（SAR）信号处理的复杂性问题、任务分解方法、算法流程等问题进行了分析。主要分析了SAR回波信号模型和信号模拟算法，分析了回波模拟数值计算的计算量和存储量，给出了SAR回波模拟的几种并行计算和任务分解方法，以及在簇结构多核平台上进行计算的任务和流程分析。
     本文的主要创新点及其意义：
     （1）研究领域创新。提出了高密度计算的概念，将高密度计算与多核计算作为一个重要的研究领域。本文以此为切入点，对多核计算、多核系统评测和多核系统设计几个重要问题进行了研究。其意义在于，提升了多核系统设计和高速计算研究的针对性。
     （2）理论创新。提出了真正意义的三维(3D)簇的NOC结构，即三维星型簇（3D SC）结构，针对该结构首次提出了三维相邻编码概念和两种扩展三维格雷（3D Gray）编码方法。其意义在于，可以将3D NOC的大数量节点的几何互连关系转换为数组关系，便于几何模型的数值化抽象。
     （3）方法创新。提出了基于嵌入监控子网的动态任务调度方法，这个思路可望作为多核计算中智能调度和自动化控制的硬件基础。其意义在于，便于建立动态数据流采集和控制的闭环系统，使得现代控制论技术用于多核数据流分析和控制成为可能。
Electronic system design is being challenged by the high degree of integration and combinationof functionality. The restriction of volume, weight, and energy consumption is making theminimization of electronic system an challenge. CMP (chip multiprocessor), SiP (System inPackage), MEMs (Mirco-Electro Mechanical System), and NEMs (Nano-Electro MechanicalSystem),3D (Three Dimension) integration are becoming important areas of research in integratedcircuit design.
     The pursuit of convenience and low-carbon is making high density computing (HDC) anotherimportant research area after high performance computing (HPC). Even HPC, which used to not careabout cost, is becoming more concerned about unit space, unit power consumption, and evencomputing speed per unit investment.
     Based on the current development, CMP is becoming the key technology to advance HPC andHDC. This dissertation focuses on CMP design, evaluation, and optimization for the practical needsof HDC. It includes the following:
     (1) The concepts of parallel computing, high performance computing and multi-core computing,etc. are discussed. Using the multi-core system as the base platform, HDC is presented, along withthe main features of HDC and the primary research topics, as well as the evaluation of multi-coresystems. Basic problems in matrix operations are discussed, especially matrix decompositionmethods and complexity analysis. The thesis analyzes multi-core computation methods of parallelmatrix operation, and gives thoughts on the combination of multi-core systems and matrixoperations.
     (2) Based on the analysis of single chip multi-core processor structure and current research, thethesis presents the trend and typical structure of CMP: small core, big quantity, array andhierarchical clusters. It expounds the main types and basic design principles, and makes a theoreticalanalysis of parallel computation models, structure extension, and evaluation methods. The thesisproposes the concept of3D Neighbor Code and two types of coding methods to expand3D Graycode: coordinate expansion method and layer expansion method. It also analyzes the characteristicsand patterns of3D Neighbor Code, and proposes an3D NOC structure of real3D cluster structure:the3D Star Cluster (3D SC) structure. The NOC structure, node coding, and model descriptionmethod for3D SC are important contributions towards the quantitative research and analysis, forboth theoretical purpose and practical purpose.
     (3) This thesis designs a single-core RISC processor compatible with MIPS instruction set, andaccomplishes a4-core processor FPGA prototype design based on hierarchical AHB bus. It explores extensibility of processor structure and design method to increase total processor functionality.Multi-dimension matrix multiplication is used to verify system functionality and computation power.Experiments show that multi-core processor performance increases with the increase of matrixdimensions. This corroborates the theory of “balance must be maintained between the increase ofnumber of processors and the increase of problem scale.”
     (4) With the increase in number of cores, the scheduling and mapping of task are becomingevidently important. Based on the analysis of parallelization techniques and task mapping techniquesof multi-core systems, this thesis proposes a hierarchical task decomposition method and dynamictask scheduling technique based on embedded monitoring subnet. The decomposed calculation ofmatrix in same time complexity are discussed. Monitoring subnet provides data flow information, tobe used in intelligent control and processing of data flow. It also makes it possible to visualize andautomate the control process.
     (5) In3D NOC design, with the increase in number of nodes, route design and low powerconsumption design are the two key techniques. This thesis analyzes the fundamental issue in NOCand3D interconnection and integration, discusses3D NOC route and3D NOC low powerconsumption model, and poses methods on route analysis and low power consumption analysis.
     (6) To perform complicated signal processing on multi-core platforms is an engineeringproblem involving many disciplines. Based on the characteristics of parallel computing andmulti-core platforms, the thesis analyzes complexity issue of SAR signal processing, task schedulingmethods, and algorithm flow. SAR echo signal model and signal simulation algorithm are analyzed,along with echo simulation data calculation quantity and storage quantity. Several parallelcomputing and task decomposition methods on SAR echo simulation are proposed, as well as thetask and flow analysis on clustered multi-core platforms.
     The main innovations and contributions include:
     (1) Research field innovation. The concept of high density computing is proposed. High densitycomputing and multi-core computing is presented as an important research field. Multi-corecomputing, multi-core system evaluation, and multi-core system design are researched. Contributionis made to raise the research focus of multi-core system design and high-speed computing.
     (2) Theoretical innovation. For the first time, real3D cluster NOC structure is proposed:3DStar Cluster (3D SC) structure. The thesis also raises the concept of3D Neighbor Code, and putsforth two3D Gray Code expansion methods. The contribution is to convert the geometricconnectivity relationship of the large number of3D NOC nodes into array relationship, whichfacilitates the numerical abstraction of the geometry model.
     (3) Method innovation. Dynamic task scheduling technique based on embedded monitoringsubnet is proposed. This method could become the hardware foundation for the intelligentmanagement and automation control of multi-core computing. The contribution is to establish aclosed-loop system for dynamic data flow collection and control, and to enable the use of moderncontrol technique in multi-core data flow analysis and control.

引文

[1] RUPERT B, DOUG P. A Total Cost Approach to Evaluating Different ReconfigurableArchitectures for Baseband Processing in Wireless Receivers[J]. IEEE Communications Magazine,January2003:105-111
    [2] International Technology Roadmap For Semiconductors2010Update Design[R],http://www.itrs.net/Links/2010ITRS/2010Update/ToPost/2010_DesignUpdate_ITRS.pdf,2010
    [3]苏涛.并行处理技术在雷达信号处理中应用[D].西安电子科技大学博士论文.1999.1:10-15
    [4] International Technology Roadmap for Semiconductors2009Update System Drivers [R],http://www.itrs.net/Links/2009ITRS/2009Chapters_2009Tables/2009_SysDrivers.pdf,2009
    [5] International Technology Roadmap For Semiconductors2010Update System Drivers[R],http://www.itrs.net/Links/2010ITRS/2010Update/ToPost/2010_SystemDriverUpdate_ITRS.pdf.2010
    [6] Moore's Law and Intel Innovation[R], http://www.intel.com/about/companyinfo/museum/exhibits/moore.htm
    [7] The intel4004[R]. http://www.intel.com/about/companyinfo/museum/exhibits/4004/facts.htm#
    [8] International Technology Roadmap for Semiconductors2010Update Overview[R].http://www.itrs.net/Links/2010ITRS/2010Update/ToPost/2010_Update_Overview.pdf..2010
    [9] International Technology Roadmap for Semiconductors2009Assembly and Packaging[R].http://www.itrs.net/Links/2009ITRS/2009Chapters_2009Tables/2009_Assembly.pdf.2009
    [10] International Technology Roadmap for Semiconductors2009Edition Test and TestEquipment[R]. http://www.itrs.net/Links/2009ITRS/2009Chapters_2009Tables/2009_Test.pdf
    [11] International Technology Roadmap for Semiconductors2009Edition Design[R]. http://www.itrs.net/Links/2009ITRS/2009Chapters_2009Tables/2009_Design.pdf
    [12] International Technology Roadmap for Semiconductors2009Overall Roadmap TechnologyCharacteristics[R].2009ITRS Winter Conference Proceedings.2009
    [13] WAYNE W. Multiprocessor System-on-Chip Technology-A study of the requirements. history,and architectures of MPSoCs[J]. IEEE signal processing magazine,November2009
    [14] GEOFFREY B, RONALD G. D, TREVOR M. A Survey of Multicore Processors[J], IEEEsignal processing magazine [26], November2009:26-37
    [15] ZHONGHAI L, AXEL J. Trends of Terascale Computing Chips in the Next Ten Years[C],2009IEEE:62-66
    [16] HENK S. Multi-Core/Tile Polymorphous Computing Systems[C],Proceedings of the20081stInternational Conference on Information Technology. May2008,Gdansk. Poland
    [17]古志民.并行计算机系统结构与可扩展性[M],2009.2清华大学出版社.18-22
    [18] ANWEN H,JUN G,,CHAOCHAO F,et al. Optimization Techniques of On-chip MemorySystem Based on UltraSPARC Architecture[C].2009IEEE:428-431
    [19] DAEWOOK K. MANHO K, GERALD E. et al, DCOS:Cache Embedded Switch Architecturefor Distributed Shared Memory Multiprocessor SoCs[C],IEEE ISCAS2006:979-982
    [20] MARTTI F, Configurable Emulated Shared Memory Architecture for general purposeMP-SoCs and NoC regions[C],2009IEEE
    [21] MATTEO M, GIANLUCA P, CRISTINA S,et al. Exploration of Distributed Shared MemoryArchitectures for NoC-based Multiprocessors[C],2006IEEE:144-151
    [22] FERNANDO G，GUSTAVO W. Parallelization Analysis on Clusters of Multicore Nodesusing Shared and Distributed Memory Parallel Computing Models[C].2009World Congress onComputer Science and Information Engineering,2009:466-470
    [23] ZHANG Y, LI L,YANG S, et al,A Scalable Distributed Memory Architecture for Network onChip[C].2008IEEE:1260-1263
    [24] GIOVANNI E,High Performance Computing For The Science of21stCentury [C], IEEE JohnVincent Atanasoff2006International Symposium on Modern Computing (JVA'06)
    [25] China Grabs Supercomputing Leadership Spot in Latest Ranking of World’s Top500Supercomputers[R],http://www.top500.org/lists/2010/11/press-release）
    [26]2009国外军事电子发展年度报告[R].工信部电子科学情报研究所.2009
    [27] N. NOLTE, C. SIMON-KLAR, S. LANGEMEYER,et al. Next Generation On-Board SARProcessor for Compact Airborne Systems[C].2004IEEE:1514-1517
    [28] CARMINE C, MAURIZIO D B, MICHELE D S,et al, Processing of Synthetic Aperture RadarData With GPGPU[C].2009IEEE, SiPS2009:309-314
    [29] HAHN K, ROBERT B. Multicore Software Technologies[J]. IEEE Signal ProcessingMagazine [1] November2009
    [30] JACK D. Trends in high performance computing-A Historical Overview and Examination ofFuture Developments[J]. IEEE circuits&devices magazine. january/february2006:22-27
    [31] KRSTE A, RAS B, BRYAN C C, et al. The Landscape of Parallel Computing Research: AView from Berkeley[R]. http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
    [32] VASILEIOS K,GEORGIOS G,NECTARIOS K. A comparative study of blocking storagemethods for sparse matrices on multicore architectures[C].2009International Conference onComputational Science and Engineering,2009:247-256
    [32] GUIMING W,YONG D,YUANWU L, et al. A Fine-grained Pipelined Implementation of theLINPACK Benchmark on FPGAs [C].200917th IEEE Symposium on Field ProgrammableCustom Computing Machines,2009:183-190
    [33] GREGORIO Q, ENRIQUE S. QUINTANA-ORT. Scheduling of QR Factorization Algorithmson SMP and Multi-Core Architectures[C].2008IEEE16th Euromicro Conference onParallel,Distributed and Network-Based Processing,2008:301-310
    [34] MERCEDES M, GREGORIO Q, ENRIQUE S. et al. Solving “Large” Dense Matrix Problemson Multi-Core Processors [C].2009IEEE
    [35] BILEL H,HATEM L, EMMANUEL A,et al. Tile QR Factorization with Parallel PanelProcessing for Multicore Architectures[C].2010IEEE:2-10
    [36] CONG F,XIANGMIN J, TAO Y. Efficient Sparse LU Factorization with Partial Pivoting onDistributed Memory Architecture[C]. IEEE Transactions On Parallel And Distributed Systems,Vol.9, NO.2,February1998:109-125
    [37] SIMPLICE D,LAURA G, ALOK K G. Adapting communication-avoiding LU and QRfactorizations to multicore architectures[C].2010IEEE
    [38] VASILEIOS K,GEORGIOS G,NECTARIOS K. A comparative study of blocking storagemethods for sparse matrices on multicore architectures[C].2009International Conference onComputational Science and Engineering,2009:247-256
    [39] HATEM L, JAKUB K, JACK D. Parallel Two-Sided Matrix Reduction to Band BidiagonalForm on Multicore Architectures [J]. IEEE Transactions On Parallel And Distributed Systems,Vol.21,No.4,April2010:417-423
    [40] FERNANDO G. TINETTI,GUSTAVO W. Parallelization Analysis on Clusters of MulticoreNodes using Shared and Distributed Memory Parallel Computing Models[C].2009WorldCongress on Computer Science and Information Engineering,2009:466-470
    [41] GREGORIO Q, ENRIQUE S. QUINTANA-ORT,et al. Design of Scalable Dense LinearAlgebra Libraries for Multithreaded Architectures: the LU Factorization[C].2008IEEE
    [42] PEDRO A, RAVI R, ALEXEY L. Experimental Study of Six Different Implementations ofParallel Matrix Multiplication on Heterogeneous Computational Clusters of MulticoreProcessors[C].201018th Euromicro Conference on Parallel,Distributed and Network-basedProcessing,2010:263-270
    [43] DA Q R,REIJI S.Modeling and Estimation for the Power Consumption ofMatrix_Computation on Multi-core Platform[C].2009International Joint Conference onComputational Sciences and Optimization,2009:42-46
    [44] MATHIAS J,LORIS M, YVES R. Complexity analysis and performance evaluation of matrixproduct on multicore architectures[C].2009International Conference on ParallelProcessing,2009:196-203
    [45] VASILY V, JAMES W. D. Benchmarking GPUs to Tune Dense Linear Algebra[C],2008IEEE SC,Austin,Texas,USA,November2008:
    [46] HADRIEN C, J′ER′EMIE A. Parallel Dense Gauss-Seidel Algorithm on Many-CoreProcessors[C].200911th IEEE International Conference on High Performance Computing andCommunications,2009:139-147
    [47] LING Z, VIKTOR K... Scalable and Modular Algorithms for Floating-Point MatrixMultiplication on Reconfigurable Computing Systems[J]. IEEE Transactions On Parallel AndDistributed Systems,Vol.18,No.4,April2007:433-448
    [48] SIDDHARTH S,VIJAY G, ASHOK K. MATLAB for Signal Processing on Multiprocessorsand Multicores——A review of three variations of multiprocessor parallel MATLAB[J], IEEEsignal processing magazine [40],March2010:40-49
    [49] JACK J. DONGARRA. Performance of Various Computers Using Standard Linear EquationsSoftware[J]. June20,2011
    [50] GENE H.GOLUP.矩阵计算[M]. Charles F. Van Loan.袁亚湘等译.人民邮电出版社,2011.3
    [51]柯善胜.有限域上单变元多项式分解的研究及矩阵乘法指数的改进[D].信息工程大学硕士论文.2003.6
    [52] PETER M. KOGGE. An Exploration of the Technology Space for Multi-Core Memory-LogicChips for Highly Scalable Parallel Systems[C].2005IEEE,Proceedings of the InnovativeArchitecture for Future Generation High-Performance Processors and Systems (IWIA’05),2005
    [53] Y. XIE. Three Dimensional Integrated Circuit Design[M]. Springer Science Business Media,2010.1
    [54] YAOY B,ZHANG J W,ZHAO D Y,Survey on Microprocessor Architecture and DevelopmentTrends[C],200811th IEEE International conference on communication technologyproceedings,2008:297-300
    [55] AXEL J,HANNU T, Networks on Chip[M],kluwer academic publishers,2004
    [56] W. Wolf, Multiprocessor system on chip technology[J]. IEEE Signal Processing Magazine(50).November2009
    [57] W. WOLF, A. JERRAYA, G. MARTIN, Multiprocessor system-onchip (MPSoC)technology[J], IEEE Trans. Comput. Aided Design Integr. Circuits Syst.,vol.27,no.10,pp.1701–1713,Oct.2008.
    [58] S. WILLIAMS J. SHALF L. OLIKER S. et al,Scientific computing kernels on the cellprocessor[C], Int. J. Parallel Program.,vol.35,no.3,,Jun.2007:262–298
    [59] M GARLAND, S L GRAND, J NICKOLLS et al, Parallel computing experiences withCUDA[C], IEEE Micro,vol.28,no.4, Jul.-Aug.2008:13–27
    [60] S VANGAL, J HOWARD, G RUHL,S. et al, An80-Tile1.28TFLOPS network-on-chip in65nm CMOS[C], in Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers,Feb.2007:98–589.
    [61] MENGXIAO L, WEIXING J, JIAXIN L,et al. Storage Architecture for an On-chip Multi-coreProcessor[C],200912th Euromicro Conference on Digital System Design/Architectures, Methodsand Tools,2009
    [62]刘梦晓,基三体系结构存储系统相关问题的研究[D],北京理工大学博士论文.2006.6
    [63] LI J S,DU G M,ZHANG D L,et al. High Throughput Memory Data-path Design for Multi-coreArchitecture[C],.20102nd International Asia Conference on Informatics in Control, Automationand Robotics,2010:28-31
    [64] GENG L F, ZHANG D L, GAO M L,et al. Prototype Design of Cluster-based HomogeneousMultiprocessor System-on-Chip[C].2009IEEE
    [65] FERNANDO G. T,GUSTAVO W. Parallelization Analysis on Clusters of Multicore Nodesusing Shared and DistributedMemory Parallel Computing Models[C].2009World Congress onComputer Science and Information Engineering,2009
    [66] FREDERICO P,PEDRO T,ALEXANDROS S,et al. Fine-grain Parallelism usingMulti-core,Cell/BE,and GPU Systems:Accelerating the Phylogenetic Likelihood Function[C],2009International Conference on Parallel Processing,2009:9-17
    [67] MAREK T,LUKASZ M. Fine-Grain Numerical Computations in Dynamic SMP Clusters withCommunication on the Fly[C],2004Proceedings of the International Conference on ParallelComputing in Electrical Engineering (PARELEC’04),2004
    [68] TYRONE T, YU K K, On the Design,Control,and Use of a Reconfigurable HeterogeneousMulti-Core System-on-a-Chip[C],2008IEEE
    [69] TYRONE T, YU-KWONG K. On the Design of an SoPC Based Multi-Core EmbeddedSystem[C].2008IEEE,International Conference on Complex,Intelligent and Software IntensiveSystems,2008:621-626
    [70] ZHANG Y P, TAIKYEONG, CHEN J F Et al. A Study of the On-Chip InterconnectionNetwork for the IBM Cyclops64Multi-Core Architecture[C],2006IEEE
    [71] XU Y., DU Y. ZHAO B et al. A low-radix and low-diameter3D interconnection networkdesign.2009IEEE,In Proceedings of International Symoposium on High Performance ComputerArchitecture[C],2009:30–41
    [72] YAN S, LIN B. Design of application-specific3D networks-on-chip architectures[C],InProceedings of International Conference of Computer Design,2008IEEE:142–149,
    [73] LEE G G,Algorithm/Architecture Co-Exploration of Visual Computing on Emergent Platforms:Overview and Future Prospects[J],IEEE Transactions On Circuits And Systems For VideoTechnology,Vol.19,No.11,November2009:1576-1587
    [74]曹祥，易伟，潘红兵，高明伦，李丽.面向层次化NoC的混合并行编程模型[J]．计算机工程.2010.36(13)；
    [75]徐懿，杜高明，李丽．基于总线的片上多处理器SoC仲裁算法研究[J]．仪器仪表学报2008.VOL.27，NO.6:2415-2418
    [76]杨博，刘大友.复杂网络聚类方法[J].软件学报.2009.1
    [77] ANDREW D,GAJINDER P,DANIEL T, Parallel Processing—the picoChip way![J],Communicating Process Architectures,2003
    [78] International Technology Roadmap For Semiconductors2009Edition Emerging ResearchDevices[R], http://www.itrs.net/Links/2009ITRS
    [79] WAYNE W. Multiprocessor Systems-on-Chips[M]. Morgan Kaufmann Publishers.2005:32-36
    [80] HESHAM E R，MOSTAFA A E. Advanced Computer Architecture and Parallel Processing[R].2005.
    [81] HESHAM E R,MOSTAFA A.先进计算机体系结构与并行处理[M]，陆鑫达林新华翁楚良等翻译.电子工业出版社2005.12:39-40
    [82]李奕磊.32位4核多处理器系统芯片的RTL级设计[D].电子工程学院硕士论文.2009.4
    [83]胡运权.运筹学教程[M].清华大学出版社.2007.4
    [84] John F.Wakerly. Degital Design[M],prentice hall.1994:47-48
    [85] Max Baron.探秘Intel80核处理器[J].电子产品世界.2007.6:112-115
    [86]徐献民.中国CPU研发现状[J].科技资讯.2007,(14):105
    [87]王明虎.16位精简指令集微处理器软核的设计研究[D].硕士论文.合肥工业大学.2004:8
    [88] LI D S, LI Y L, GAO M L,et al. Approach to4-core MPSOC Design based on MatrixComputing[C], The international conference on e-product,e-service, and e-entertainment2010(ICEEE2010)
    [89]张庆利，王进祥，叶以正，朱昌盛. AMBA片内总线结构的设计[J].微处理机.2002.2::8-10
    [90] UC Berkeley. RAMP: Research Accelerator for Multiple Processors[R]. http://ramp.eecs.berkeley.edu/index.php?index, January17,2006
    [91] UC Stanford, FAST: FLexible Architecture for Simulation and Testing[R],http://www-hydra.stanford. edu/fast/fast.shtml.2006
    [92] GPUFFTW:High Performance Power-of-Two FFT Library using Graphics Processors[R]//University of North California, http://gamma.cs.unc.edu/GPUFFTW/#,2006
    [93] DAVID A. BADER, VIRAT AGARWAL. FFTC: Fastest Fourier Transform for the IBM CellBroadband Engine[R], HiPC2007,2007.
    [94] N. NOLTE, C. SIMON-KLAR, S. LANGEMEYER, et al. Next Generation On-Board SARProcessor for Compact Airborne Systems[C].2004IEEE:1514-1517
    [95]NAVARRO, ASENJO R, TABIK S. Analytical Modeling of Pipeline Parallelism[C]//18thPACT '09.12-16Sept.2009:281–290
    [96]HUNOLD S, HOFFMANN R, SUTER F, et al. A Tool for Visualizing Schedules of ParallelApplications[C]//39th Parallel Processing Workshops (ICPPW),13-16Sept.2010:169–178
    [97] CASTRILLON J. ZHANG D D, et al. Task management in MPSoCs: An ASIP approach[C]//ICCAD2009, Nov.2009:587
    [98] SIBAI F N. Simulation and Performance Analysis of Multi-core Thread Scheduling andMigration Algorithms[C]//CISIS2010,15-18Feb.2010:895-900
    [99] ZHONG G, XU F. A power-scalable reconfigurable FFT/IFFT IC based on amulti-processorring[J], Solid-State Circuits, IEEE Journal of Volume41, Issue2, Feb.2006
    [100] SHELBY F, SANJOY B.Task assignment on uniform heterogeneous multiprocessors[C],17th Euromicro Conference on Real-Time Systems (ECRTS’05),2005
    [101] STRANG T,BAUER C. Context-Aware Elevator Scheduling[C]//21stAINAW '07,21-23May2007:276-281
    [102] EUISEONG S, JINKYU J, SEONYEONG P. Energy Efficient Scheduling of Real-TimeTasks on Multicore Processors[C]. Parallel and Distributed Systems2008, Nov.2008:1540-1552
    [103] WU D, ZHANG F, et al. Efficient lists intersection by CPU-GPU cooperative computing[C]//Parallel&Distributed ProcessingWorkshops and Phd Forum (IPDPSW)2010,2010:1-8
    [104] LU Y H, ZHOU H, LI S. et al. Multicore Parallelization of Min-Cost Flow for CADApplications[C]//IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems, Volume29, Oct.2010:1546-1557
    [105]周伟明.多核计算与程序设计[M].华中科技大学出版社.2010.4:552-573
    [106] MICHELI G D, BENINI L.Networks on Chips[M]. Morgan Kaupmann. San Francisco,CA.2006.
    [107] JANTSCH A, TENHUNEN H. Networks on Chip[M]. Kluwer Academic Publishers, Boston,2003
    [108]周文彪.网格NoC平台中的若干关键技术研究[D].哈尔滨工业大学博士论文.2008年04月
    [109] WANG C, HU W H, NADER B. Congestion-Aware Network-on-Chip Router Architecture[C]//2010IEEE.2010:137-144
    [110] FU F F, BAI Y X, HU X A, et al. An Objective-Flexible Clustering Algorithm for TaskMapping and Scheduling on Cluster-Based NoC[C]//2010IEEE,2010:369-373
    [111] LI S, FAHIMEH J, AHMED H, et al. Layered Spiral Algorithm for Memory-Aware Mappingand Scheduling on Network-on-Chip[C],2010IEEE
    [112] XIE Y,et al. Three-Dimensional Integrated Circuit Design,Integrated Circuits andSystems[M]. Springer Science Business Media,2010
    [113] XU Y, DU Y, ZHAO B,et al. A Low-Radix and Low-Diameter3D Interconnection NetworkDesign[C]//2008IEEE.2008:30-41
    [114] HENRIQUE C, FREITAS, PHILIPPE O, et al. A High-Throughput Multi-Cluster NoCArchitecture[C]//200811th IEEE International Conference on Computational Science andEngineering.2008:56-63
    [115] KYUNGSU KANG, JUNGSOO KIM, SUNGJOO YOO. Runtime Power Management of3DMulti-Core Architectures Under Peak Power and Temperature Constraints[C]//2011IEEE.2011
    [116] LIS M, REN P, et al. Scalable, accurate multicore simulation in the1000-core era[C]//2011IEEE.2011:175-185
    [117] FU F F, BAI Y X, HU X A, et al, An Objective-Flexible Clustering Algorithm for TaskMapping and Scheduling on Cluster-Based NoC[C]//2010IEEE.2010:369-373
    [118] HENRIQUE C, FREITAS, PHILIPPE O, et al. NoC Architecture Design For Multi-ClusterChips[C]//2008IEEE.2008:53-58
    [119] HAROON U R, SHI F, JIA X L, BAI ZR. Performance of Triplet based InterconnectionStrategy for Multi-Core On-Chip Processors[C],200911th IEEE International Conference on HighPerformance Computing and Communications,2009:163-170
    [120]刘梦晓,基三体系结构存储系统相关问题的研究[D].北京理工大学博士论文.2010年；
    [121] WIKLUND D, LIU D, Design of a System-on-Chip Switched Network and its DesignSupport[C]//IEEE2002International Conference on Communications,Circuits and Systems andWest Sino Expositions. China IEEE Computer Society Press.2002. vol.2:1279-1283
    [122] KUMAR S， JANTSCH A， et al. A network on chip architecture and designmethodology[C]//In Proc.of IEEE Computer Society Annual Symposium on VLSI. Pittsburgh.Pennsylvania. USA, IEEE Computer Society Press. April2002:117-124
    [123] DALLY W J. Performance analysis of k-ary n-cube interconnection networks[J]. IEEETranscations on Computers.1990.6:775-785.;
    [124] DUATO J, YALAMANCHILI S. Interconnection Networks.,An Engineering Approach[M].Morgan Kaufmann,2002
    [125]常政威,谢晓娜,熊光泽.片上网络拓扑结构[J].计算机应用,27(11),2007:2847-2850
    [126]刘有耀.片上网络拓扑结构与通信方法研究[D].西电博士论文.2009:26-30
    [127] FABIEN C, FLORIAN D, DENIS D， et al.3D Embedded Multi-core:SomePerspectives[C]//2011EDAA,2011
    [128] ZHANG X，AHMED L. A Multilayer Nanophotonic Interconnection Network for On-ChipMany-core Communications[C],DAC2010,June13-18,2010:156-161
    [129] LI F, NICOPOULOS C., RICHARDSON T,et al. Design and management of3D chipmultiprocessors using network-in-memory[C]//Proceedings of International Symposium onComputer Architecture,2006:130-141
    [130] DALLY W J, SEITZ C L. Deadlock-free Message Routing in Multiprocessor InterconnectionNetwork[J]. IEEE Transactions on Computers,1987,36(5):547-553.
    [131] DALLY J, TOWLES B. Principles and practices of Interconnection Network[M]. MorganKaufmann Publishers.2003.
    [132]方路平，刘世华，陈盼等. NS2网络模拟基础与应用[M].国防工业出版社.2008
    [133] PAVLIDIS V F, FRIEDMAN E G.3D Topologies for Networks-on-Chip[J]. IEEE Trans. VeryLarge Scale Integration(VLSI07).2007:1081-1090
    [134] WU N, GE F，WANG Q. Simulation and performance analysis of network on chip architectures using OPENT[C]//2007IEEE.2007
    [135]刘炎华，刘静，赖宗声等.基于遗传蚁群算法的片上网络映射研究[J].计算机工程,2010,36(22):262-263
    [136]许川佩,陈于倩,颜晓风.星型子网的NoC映射研究[J].理论与方法.2010,19(4):28—30
    [137]杨盛光,李丽,高明伦,等.面向能耗和延时的NoC映射方法[J].电子学报,2008,36(5):937-941
    [138]方路平，刘世华，陈盼等. NS2网络模拟基础与应用[M].国防工业出版社.2008
    [139]苏涛.并行处理技术在雷达信号处理中应用[D].西安电子科技大学博士论文，1999
    [140]苏涛,庄德靖,吴顺君.一种SAR成像快速算法及其并行实现[J].西安电子科技大学学报(自然科学版),2005年2月
    [141] DAVID A. B, VIRAT A. Fastest Fourier Transform for the IBM Cell Broadband Engine[C],HiPC2007,2007
    [142] CHEN L, HU Z. Optimizing the Fast Fourier Transform on a Multi-core Architecture[C],Parallel and Distributed Processing Symposium (IPDPS)2007. IEEE International Volume,2007
    [143] ZHONG G C, XU F. A power-scalable reconfigurable FFT/IFFT IC based on amulti-processor ring[J], IEEE Journal Of Solid-State Circuits, Vol.41, NO.2, February2006:483-395
    [144] ANANTH G, ANSHUL G. Introduction to Parallel Computing (2nd Edition)[M], PearsonEducation Limited,2003
    [145] DU G M, ZHANG D L, et al. FPGA Prototype Design of Network on Chips[C]. InternationalConference on Anti-counterfeiting, Security and Identitification.2008
    [146] DU G M, ZHANG D L, SONG Y K, et al. Scalability Study on Mesh based Network onChip[C], Pacific-Asia Workshop on Computational Intelligence and Industrial Applications.2008
    [147] DU G M, ZHANG D L, SONG Y K, et al. Performance Evaluation of FPGA based CrossbarNoC Architecture[C]. The9th International Conference on Solid-State and Integrated-CircuitTechnology,2008
    [148] PAVEL K, JAKUB S, IVAN S. Parallel computing of GAME models,http://pengmdh.org/raw-attachment/wiki/ICIM08-35/Parallel%20computing%20of%20GAME%20models.pdf
    [149] NOLTE N, SIMON-KLAR C, LANGEMEYER S, et al. Next Generation On-Board SARProcessor for Compact Airborne Systems[C]//2004IEEE:1514-1517
    [150] WOUTER J V, MAARTEN D. Real-time Brute Force SAR Processing[C]//2009IEEE
    [151]FRANCESCHETTI G, MIGLIACCIO M, RICCIO D. On Ocean SAR Raw SignalSimulation[J], IEEE Transaction on Geoscience and Remote Sensing,1998,Vol.36,No.1:84-100
    [152]岳海霞，杨汝良．复杂场景的SAR原始数据模拟研究[M]．2006VOL.21，NO.5:450-454
    [153]李凌杰，王建国，黄顺吉．基于真实反射场景SAR原始信号模拟[J]．电子科技大学学报．1996，25(6):566-568
    [154]刘永坦．雷达成像技术[M]．哈尔滨：哈尔滨工业大学出版社，1999
    [155]苏宇．星载SAR原始回波信号并行模拟[D]．中国科学院研究生院硕士学位论文，2007
    [156]王雷．片上网络映射优化算法研究[D]，电子科技大学博士论文，成都，2010：16-20

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700