动态可重构平台操作系统中的资源管理问题研究

英文题名：Research on Resources Management of Operating Systems for Dynamic Reconfigurable Platforms
作者：张军能
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：可重构平台 ; 操作系统 ; 任务调度 ; 资源管理 ; 编程模型 ; 任务并行化
英文关键词：Reconfigurable platforms ; Operating systems ; Task scheduling ; Resource management ; Programming model ; Task parallelization
学位年度：2014
导师：周学海
学科代码：0812
学位授予单位：中国科学技术大学
论文提交日期：2014-04-25

摘要

随着集成电路工艺的快速发展,在单片系统上部署多个处理器核和加速器核的片上多核系统应用越来越广泛。同时,可重构技术的日益成熟使得在单片系统上搭建一个完整的可重构平台成为可能。然而,由于异构可重构计算平台上的计算资源具有多样性和多变性的特点,给操作系统的任务调度和资源管理都带来了极大的困难。
     本文的工作围绕动态可重构平台操作系统中的资源管理问题展开,主要包含如下几个方面：
     (1)可重构平台的硬件架构及操作系统架构
     可重构平台都具有一些共同的特征,例如系统中都包含可重构资源FPGA,可以通过在FPGA上布局IP核加速程序的执行,平台都支持静态重构或者动态重构。本文提取了可重构平台的共同特征,提出了系统硬件架构的设计规范,包括IP核设计规范、异构计算单元之间的通信规范等。以可重构硬件平台为基础,本文构建了可重构平台上的操作系统的层次化模型,提出了统一的任务模型、资源模型和驱动模型,并采用面向服务的思想来设计操作系统中的任务调度算法和资源管理算法。
     (2)可重构平台上的操作系统中的任务调度以及资源管理
     可重构平台上的任务既包含硬件任务也包含软件任务,可重构平台上的资源除了包含通用处理器、外设、内存等之外,还包含可重构逻辑资源。本文根据可重构平台的任务特征以及资源特征,将对硬件任务和可重构资源的管理纳入到操作系统的范畴,通过在系统中添加硬件函数库以及定制软件函数库辅助完成任务调度和资源管理。
     针对动态全局重构的特点,本文设计并实现了基于可变窗口的任务调度算法和资源管理算法,其中,任务调度算法与资源管理算法是分离的,任务调度算法与资源管理算法通过共享系统的某些全局信息进行通信。
     针对动态部分重构的特点,本文设计了基于独立窗口的任务调度和资源管理算法,其中,任务调度算法和资源管理算法是紧密耦合在一起的。该方法降低了任务调度占用的处理器时间,并且能够保证每次重构请求的有效性。与基于可变窗口的任务调度算法和资源管理算法相比,该算法在不带来额外开销的情况下,能够更有效的利用可重构资源。
     (3)可重构平台上的任务自动并行化
     异构多核可重构平台上集成的计算资源日益增多,给有效并行利用计算资源带来了严峻的挑战,加剧了“编程墙”问题带来的影响。本文首先将任务级的Tomasulo算法进行了扩展,然后采用硬件的方式实现了该算法,并对该硬件模块功能的正确性以及调度性能进行了测试。硬件的Tomasulo算法作为一个硬件调度器与操作系统中的调度算法一起构成了调度系统；其中,操作系统中的调度算法负责解决任务级的控制相关,并将任务序列发送给硬件调度器；而硬件调度器检测任务之间的RAW相关并消除任务间的WAW相关和WAR相关,从而自动完成任务的并行化。
With the rapid development of integrated circuit technology, system on chip with multiple general purpose processors and accelerators is widely used. Meanwhile, the maturity of reconfigurable technology makes it possible to build a complete reconfigurable system on a single chip. However, due to the diverse and variable computing resources on heterogeneous reconfigurable platforms, it is posing significant challenges for task scheduling and resource management in the operating system design and implementation.
     This dissertation focuses on the key techniques of resources management of operating systems for dynamic reconfigurable platforms. The study mainly includes the following aspects:
     (1) Reconfigurable hardware platform and operating system architecture
     Reconfigurable platforms have some common characteristics, such as integrating reconfigurable FPGA resources, accelerating the execution of programs via IP cores on FPGA, and supporting static or dynamic reconfiguration. Based on the characteristics of reconfigurable platforms, this dissertation proposes the design specifications of the hardware platform, including IP core design specifications, standardized communication between heterogeneous computing units. Through specifications that satisfy the hardware platform, this paper illustrates the hierarchical model of the operating system for reconfigurable platforms, and establishes a unified task model, resource model and driver model. On this basis, we propose task scheduling algorithms and resource management algorithms using service-oriented concepts.
     (2) Task scheduling and resource management of operating systems for reconfigurable platforms.
     The tasks of reconfigurable platforms include both hardware tasks and software tasks. Except for general purpose processors, peripherals, memory, the resources of reconfigurable platforms also include reconfigurable logic resources. Based on the characteristics of tasks and resources of reconfigurable platforms, we use operating systems to manage the hardware tasks and the reconfigurable resources. Hardware function library and custom software libraries are integrated in the system to facilitate task scheduling and resource management.
     Regarding dynamic global reconfiguration, we design and implement a variable-window based task scheduling algorithm and resource management algorithm in which task scheduling and resource management are separated. Task scheduling algorithm and resource management algorithm communicate by sharing global information of the system.
     Regarding dynamic partial reconfiguration, we design a separate-window based task scheduling algorithm and resource management algorithm in which task scheduling algorithm and resource management algorithm are tightly coupled. The algorithms could significantly shorten the processor time for scheduling and resource management and will guarantee that all reconfiguration requests will surely get a response. Compared with variable-window based task scheduling algorithm and resource management algorithm, separate-window based algorithms could achieve better resource utilization without additional overhead.
     (3) Task-level automatic parallelization for reconfigurable platforms
     The growing computing resources integrated within heterogeneous multicore platform are posing a great challenge to effectively utilize the parallel computing resources. The effect of "Programming wall" problem is becoming more and more serious. In order to deal with these problems, firstly, we extend the task level Tomasulo algorithm to a more general situation, and then we implement the algorithm via hardware circuits. Both the correctness and efficiency of the hardware module are tested. The hardware Tomasulo cooperates with the software scheduler of operating system in following ways:The software scheduler is responsible for resolving task-level control hazards, then the task sequence is sent to the hardware scheduler, which detects the task-level RAW hazard and eliminates task-level WAW hazards and WAR hazards. As a result, tasks can run in parallel automatically.

引文

[1]http://news.cnet.com/2100-1001-984051.html
    [2]http://www.intel.com/
    [3]http://www.amd.com/
    [4]http://www.arm.com/
    [5]Pham, D.. S. Asano, M. Bolliger, et al.. The design and implementation of a first-generation CELL processor. Solid-State Circuits Conference,2005. Digest of Technical Papers. ISSCC.
    [6]"Nomadik-Open multimedia platform for next generation mobile devices." STMicroelectronics, available from http://www.st.com/.
    [7]Chaoui, J., Cyr, K., de Gregorio, S., Giacalone, J.P., Webb, J. and Masse, Y., 'Open multimedia application platform:enabling multimedia applications in third generation wireless terminals through a combined RISC/DSP architecture'. Acoustics, Speech, and Signal Processing,2001. Proceedings. (ICASSP'01). 2001 IEEE International Conference on,1009-1012 vol.1002.
    [8]Dutta, S., Jensen, R. and Rieckmann, A., (2001).'Viper:A multiprocessor SOC for advanced set-top box and digital TV systems'. Design & Test of Computers, IEEE,18(5):21-31.
    [9]Waingold, E., Taylor, M., Srikrishna, D., Sarkar, V., Lee, W., Lee, V., Kim, J., Frank, M., Finch, P., Barua, R., Babb, J., Amarasinghe, S. and Agarwal, A., 'Baring it all to software:Raw machines'. Computer,30 (9):86-93.1997.
    [10]Hammond, L., Hubbert, B.A., Siu, M., Prabhu, M.K., Chen, M. and Olukotun, K., (2000).'The Stanford Hydra CMP'. IEEE Micro,20:71-84.
    [11]Kunle, O., Basem, A.N., Lance, H., Ken, W. and Kunyung, C.,.'The case for a single-chip multiprocessor'. ACM,2-11.1996.
    [12]Kunle, O., Jules, B., Kun, C. and Basem, A.N.,'Rationale, Design and Performance of the Hydra Multiprocessor'. Stanford University.1994.
    [13]http://www.nvidia.com
    [14]Estrin, G., Bussel, B. and Turn, R.. (1963).'Parallel Processing in a Restructurable Computer System'. IEEE Transactions on Electronic Computers. 12:747-755.
    [15]http://www.xilinx.com
    [16]D. Andrews, D. Niehaus, et al.. Programming models for hybrid FPGA-cpu computational components:a missing link [J]. Micro. IEEE.2004.24(4):p. 42-53.
    [17]V. Nollet, P. Coene, et al., Designing an operating system for a heterogeneous reconfigurable SoC. in Proceedings of the 17th International Symposium on Parallel and Distributed Processing.2003. p.174.1-174.7
    [18]H. K. So and R. Brodersen. A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH. ACM Trans. Embed. Comput. Syst.2008.7(2):p.1-28.
    [19]E. Lubbers and M. Plazner. ReconOS:Multithreaded programming for reconfigurable computers[J]. ACM Trans. Embed. Comput. Syst.2009.9(1):p. 1-33.
    [20]F. Muller, A flexible operating system for dynamic applications[J]. XCell Journal,2010, no.73, p.31-34.
    [21]C. Chang, J. Wawrzynek and R.W. Brodersen, "BEE2:A high-end reconfigurable computingsystem", IEEE Design & Test,2005,22(2),114-125.
    [22]Compton, K., and Hauck, S.:'Reconfigurable computing:a survey of systems and software', ACM Comput. Surv.,2002,34, (2), pp.171-210
    [23]Cadence Design Systems Inc, Palladium Datasheet,2004
    [24]Mentor Graphics, Vstation Pro:High Performance System Verification,2003
    [25]Goldstein, S.C., Schmit, H., Budiu, M., Cadambi, S., Moe, M., and Taylor, R.: 'PipeRench:a reconfigurable architecture and compiler',Computer,2000,33, (4), pp.70-77
    [26]Hauser, J.R., and Wawrzynek, J.:'Garp:a MIPS processor with a reconfigurable processor'. IEEE Symp. on Field-Programmable Custom Computing Machines (IEEE Computer Society Press,1997)
    [27]Rupp, C.R., Landguth, M., Garverick, T., Gomersall, E., Holt, H.,Arnold, J., and Gokhale, M.:'The NAPA adaptive processing architecture'. IEEE Symp. on Field-Programmable Custom Computing Machines, May 1998, pp.28-37
    [28]Singh, H., Lee, M.-H., Lu, G., Kurdahi, F., Bagherzadeh, N., and Chaves, E.: 'MorphoSys:an integrated reconfigurable system for dataparallel and compute intensive applications', IEEE Trans. Comput.,2000,49, (5), pp.465-481
    [29]Annapolis Microsystems, Inc., Wildfire Reference Manual,1998
    [30]Laufer, R., Taylor, R., and Schmit, H.:'PCI-PipeRench and the SwordAPI:a system for stream-based reconfigurable computing'. Proc. Symp. on Field-Programmable Custom Computing Machines (IEEE Computer Society Press,1999)
    [31]Vuillemin, J., Bertin, P., Roncin, D., Shand, M., Touati, H., and Boucard, P.: 'Programmable active memories:reconfigurable systems come of age', IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,1996,4, (1), pp.56-69
    [32]Wittig, R.D., and Chow. P.:"OneChip:an FPGA processor with reconfigurable logic". IEEE Symp. on FPGAs for Custom Computing Machines,1996
    [33]Marshall, A., Stansfield, T., Kostarnov, L, Vuillemin, J., and Hutchings, B.:'A reconfigurable arithmetic array for multimedia applications', ACM/SIGDA Int. Symp. on FPGAs, Feb 1999, pp.135-143
    [34]Mirsky, E., and DeHon, A.:'MATRIX:a reconfigurable computing architecture with configurable instruction distribution and deployable resources'. Proc. Symp. on Field-Programmable Custom Computing Machines (IEEE Computer Society Press,1996)
    [35]Taylor, M., et al:'The RAW microprocessor:a computational fabric for software circuits and general purpose programs', IEEE Micro,2002,22,(2), pp. 25-35
    [36]Razdan, R., and Smith, M.D.:'A high performance microarchitecture with hardware programmable functional units'. Int. Symp. on Microarchitecture, 1994, pp.172-180
    [37]Altera Corp., Excalibur Device Overview, May 2002
    [38]Xilinx, Inc., PowerPC 405 Processor Block Reference Guide, October,2003
    [39]Altera Corp., Nios II Processor Reference Handbook, May 2004
    [40]Fidjeland, A., Luk, W., and Muggleton, S.:'Scalable acceleration of inductive logic programs'. Proc. IEEE Int. Conf. on Field-Programmable Technology, 2002
    [41]Leong, P.H.W., and Leung, K.H.:'A microcoded elliptic curve processor using FPGA technology', IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,2002, 10, (5), pp.550-559
    [42]Seng, S.P., Luk, W., and Cheung, P.Y.K.:'Flexible instruction processors'. Proc. Int. Conf. on Compilers, Arch, and Syn. for Embedded Systems (ACM Press,2000)
    [43]Seng, S.P., Luk, W., and Cheung, P.Y.K.:'Run-time adaptive flexible instruction processors', Lect. Notes Comput. Sci.,2002,2438
    [44]Xilinx, Inc., Microblaze Processor Reference Guide, June 2004
    [45]Understanding The Linux Kernel,3rd edition.
    [46]G. Brebner. A virtual hardware operating system for the Xilinx XC6200. International Workshop on Field-Programmable Logic and Applications,1996
    [47]Z. Li, K. Compton. S. Hauck. Configuration caching management techniques for reconfigurable computing. IEEE Symposium on FPGAs for Custom Computing Machines,2000.
    [48]Z. Li, S. Hauck. Configuration prefetching techniques for partial reconfigurable coprocessor with relocation and defragmentation. ACM/SIGDA Symposium on Field-Programmable Gate Arrays,2002.
    [49]R. Maestre, F. J. Kurdahi, M. Fern'andez, R. Hermida, N. Bagherzadeh, H. Singh. A framework for reconfigurable computing:Task scheduling and context management. IEEE Transactions on VLSI 9(6), December 2001.
    [50]Y. Markovskiy, E. Caspi, R. Huang, J. Yeh, M. Chu, J.Wawrzynek, A. DeHon. Analysis of quasi-static scheduling techniques in a virtualized reconfigurable machine. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays,2002.
    [51]W. Fu, K. Compton. An execution environment for reconfigurable computing. IEEE Symposium on Field-Programmable Custom Computing Machines,2005.
    [52]Y. Markovskiy, E. Caspi, R. Huang, J. Yeh, M. Chu, J.Wawrzynek, A. DeHon. Analysis of quasi-static scheduling techniques in a virtualized reconfigurable machine. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays,2002.
    [53]V. Nollet, P. Coene, D. Verkest, S. Vernalde, R. Lauwereins. Designing an operating system for a heterogeneous reconfigurable SoC. Proceedings of the Reconfigurable Architectures Workshop,2003.
    [54]J. Resano, D. Mozos, F. Catthoor. A hybrid prefetch scheduling heuristic to minimize at runtime the reconfiguration overhead of dynamically reconfigurable hardware. Design, Automation, and Test in Europe,2005.
    [55]http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html
    [56]http://oprofile.sourceforge.net/
    [57]https://perf.wiki.kernel.org/index.php/Main_Page
    [58]http://www.mcs.anl.gov/research/proj ects/mpi/
    [59]http://www.openmp.org/
    [60]"Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages."-"MapReduce:Simplified Data Processing on Large Clusters", by Jeffrey Dean and Sanjay Ghemawat; from Google Research
    [61]https://computing.llnl.gov/tutorials/pthreads/
    [62]https://www.khronos.org/opencl/
    [63]M. Frigo, C. E. Leiserson, and K. H. Randall, "The implementation of the cilk-5 multithreaded language," in Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, ser. PLDI'98. New York, NY, USA:ACM,1998, pp.212-223.
    [64]Intel Thread Building Blocks:Reference Manual. Intel,2011.
    [65]https://developer.nvidia.com/cuda-downloads
    [66]G. Sohi, S. Breach, and T. Vijaykumar, "Multiscalar processors," in Computer Architecture,1995. Proceedings.,22nd Annual International Symposium on, June 1995, pp.414-425.
    [67]L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Chen, and K. Olukolun, "The stanford hydra cmp," Micro, IEEE, vol.20, no.2, pp.71-84, mar/apr 2000.
    [68]Castrillon, J., Velasquez, R., Stulova, A., Sheng, W., Ceng, J., Leupers, R., Ascheid, G. and Meyr, H.,.'Trace-based KPN composability analysis for mapping simultaneous applications to MPSoC platforms'. Design, Automation & Test in Europe Conference & Exhibition (DATE) Dresden 753-758.2010.
    [69]Kuck, D., E. Davidson. D. Lawrie, et al.. "The cedar system and an initial performance study." SIGARCH Comput. Archit. News 21(2):213-223.1993.
    [70]J. C. Jenista, Y. h. Eom, and B. C. Demsky, "Ooojava:software outof-order execution," in Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, ser. PPoPP'11. NewYork, NY, USA:ACM, 2011, pp.57-68.
    [71]G. F. Diamos and S. Yalamanchili, "Harmony:an execution model and runtime for heterogeneous many core systems," in Proceedings of the 17th international symposium on High performance distributed computing, ser. HPDC'08. New York, NY, USA:ACM,2008, pp.197-200.
    [72]G. Gupta and G. S. Sohi, "Dataflow execution of sequential imperative programs on multicore architectures," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-44'11. New York, NY, USA:ACM,2011, pp.59-70.
    [73]J. Perez, R. Badia, and J. Labarta, "A dependency-aware task-based programming environment for multi-core architectures," in Cluster Computing, 2008 IEEE International Conference on,29 2008-oct.1 2008, pp.142-151.
    [74]G. F. Diamos and S. Yalamanchili, "Harmony:an execution model and runtime for heterogeneous many core systems," in Proceedings of the 17th international symposium on High performance distributed computing, ser. HPDC'08. New York, NY, USA:ACM,2008, pp.197-200.
    [75]M. Rinard. D. Scales, and M. Lam. "Jade:a high-level, machineindependent language for parallel programming." Computer, vol.26. no.6. pp.28-38, june 1993.
    [76]K. Fatahalian. T. J. Knight, M. Houston. M. Erez, D. R. Horn. L. Leem, J. Y. Park. M. Ren. A. Aiken. W. J. Dally, and P. Hanrahan. "Sequoia:Programming the memory hierarchy," in SC 2006 Conference, Proceedings of the ACM/IEEE, nov.2006. p.4.
    [77]冯晓静,面向服务的异构多核片上系统的关键技术研究及实现[D].2013.
    [78]王超.异构多核可重构片上系统关键技术研究[D].2011.
    [79]Chao Wang, Xi Li, Junneng Zhang, Xuehai Zhou, Xiaoning Nie. "MP-Tomasulo:a Dependency-aware Automatic Parallel Execution Engine for Sequential Programs". ACM Transactions on Architecture and Code Optimization,10 (2), No.9,2013
    [80]Chao Wang, Xi Li, Junneng Zhang, Xuehai Zhou, Aili Wang, " A Star Network Approach in Heterogeneous Multi Processors System on Chip", Journal of Supercomputing,62(3):1404-1424.
    [81]Chao Wang, Xi Li, Xiaojing Feng, Peng Chen, Xuehai Zhou, "Colored Petri Net Model with Automatic Parallelization on Real-Time Multicore Architectures", Journal of Systems Architecture, pp.293-304
    [82]Chao Wang, Xi Li, Xuehai Zhou, Jim Martin, Ray C. C. Cheung:Genome sequencing using mapreduce on FPGA with multiple hardware accelerators (abstract). FPGA 2013:266
    [83]Chao Wang, Xi Li, Huizhen Zhang, Jinsong Ji, Xuehai Zhou:Custom instruction generation and mapping for reconfigurable instruction set processors (abstract). FPGA 2013:268
    [84]Peng Chen, Chao Wang, Xi Li, Xuehai Zhou:Acceleration of the long read mapping on a PC-FPGA architecture (abstract). FPGA 2013:271
    [85]Chao Wang, Xi Li, Junneng Zhang, Peng Chen, Xuehai Zhou:"CaaS:Core as a service realizing hardware services on reconfigurable MPSoCS". FPL 2012: 495-498.
    [86]Chao Wang, Xi Li, Xuehai Zhou, Yajun Ha:Parallel dataflow execution for sequential programs on reconfigurable hybrid MPSoCs. FPT 2012:53-56
    [87]Chao Wang, Xi Li, Dong Dai, Gangyong Jia, Xuehai Zhou:Phase Detection for Loop-Based Programs on Multicore Architectures. CLUSTER 2012:584-587
    [88]Chao Wang, Xi Li, Junneng Zhang, Gangyong Jia, Peng Chen, Xuehai Zhou: Analyzing Parallelization and Program Performance in Heterogeneous MPSoCs. MASCOTS 2012:489-491
    [89]Chao Wang, Xi Li, Peng Chen, and Xuehai Zhou. "Detecting Data Hazards in Multi-Processor System-on-Chips on FPGA ",19th Reconfigurable Architecture Workshop (RAW 2012):282-287
    [90]Chao Wang, Xi Li, Junneng Zhang and Xuehai Zhou. "FPM:A Flexible Programming Model for MPSoCs ",19th Reconfigurable Architecture Workshop (RAW 2012):477-484
    [91]Chao Wang, Xi Li, Xuehai Zhou and Junneng Zhang. "CaaS:Core as a Service", Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2012){Poster}
    [92]Qi Guo, Chao Wang, Xuehai Zhou and Xi Li. "Static or Dynamic:Trade-offs for Task Dependency Analysis'", Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2012){Poster}
    [93]E. D. Berger, T. Yang. T. Liu, and G. Novark, "Grace:safe multithreaded programming for c/c++," in Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, ser. OOPSLA'09. New York, NY, USA:ACM,2009, pp.81-96.
    [94]C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang, "Software behavior oriented parallelization," in Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, ser. PLDI'07. New York, NY, USA:ACM,2007, pp.223-234.
    [95]C. von Praun, L. Ceze, and C. Cas.caval, "Implicit parallelism with ordered transactions," in Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, ser. PPoPP'07. New York, NY, USA:ACM,2007, pp.79-89.
    [96]A. Welc, S. Jagannathan, and A. Hosking, "Safe futures for Java," in Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, ser. OOPSLA'05. New York, NY, USA:ACM,2005, pp.439-453.
    [97]P. Bellens, J. Perez, R. Badia. and J. Labarta, "Cellss:a programming model for the cell be architecture," in SC 2006 Conference, Proceedings of the ACM/IEEE, nov.2006, p.5.
    [98]J. Perez, R. Badia, and J. Labarta, "A dependency-aware task-based programming environment for multi-core architectures," in Cluster Computing, 2008 IEEE International Conference on,29 2008-oct.1 2008, pp.142-151.
    [99]M. Pericas, A. Cristal, F. J. Cazorla, R. Gonzalez, et al., A Flexible Heterogeneous Multi-Core Architecture, in Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques. 2007:IEEE Computer Society.13-24
    [100]M. A. Watkins. D. H. Albonesi. ReMAP:A Reconfigurable Heterogeneous Multicore Architecture, in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.2010, IEEE Computer Society. p.497-508.
    [101]S. Singh, Computing without processors[J]. Communications of ACM, 2011.54(8):p.46-54.
    [102]E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, et al., Baring it all to software:Raw machines[J]. Computer,1997.30(9):p.86-93.
    [103]C. Silvano, W. Fornaciari, G. Palermo, V. Zaccaria, et al. MULTICUBE: Multi-objective Design Space Exploration of Multi-core Architectures, in 2010 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).2010.488-493
    [104]D. Tarditi, S. Puri,J. Oglesby. Accelerator:using data parallelism to program GPUs for general-purpose uses, in Proceedings of the 12th international conference on Architectural support for programming languages and operating systems.2006. San Jose, California, USA:ACM.325-335
    [105]R. C. Unrau, O. Krieger, B. Gamsa,M. Stumm, Hierarchical clustering:a structure for scalable multiprocessor operating system design[J]. The Journal of Supercomputing.1995.9(1-2):p.105-134.
    [106]B. Gamsa, O. Krieger, J. Appavoo,M. Stumm. Tornado:maximizing locality and concurrency in a shared memory multiprocessor operating system, in Proceedings of the third symposium on Operating systems design and implementation.1999. New Orleans, Louisiana, United States:USENIX Association.87-100
    [107]J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, et al. Hive:fault containment for shared-memory multiprocessors, in Proceedings of the fifteenth ACM symposium on Operating systems principles.1995. Copper Mountain, Colorado, United States:ACM.12-25
    [108]E. Bugnion, S. Devine, K. Govil,M. Rosenblum, Disco:running commodity operating systems on scalable multiprocessors[J]. ACM Transactions on Computer Systems (TOCS).1997.15(4):p.421-447.
    [109]S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, et al. Corey:an operating system for many cores, in Proceedings of the 8th USENIX conference on Operating systems design and implementation.2008. San Diego, California: USENIX Association.43-57
    [110]D. Wentzlaff,A. Agarwal, Factored operating systems (fos):the case for a scalable operating system for multicores[J]. ACM SIGOPS Operating Systems Review,2009.43(2):p.76-85.
    [111]A. Baumann, P. Barham, P.-E. Dagand, T. Harris, et al. The multikernel:a new OS architecture for scalable multicore systems. in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles.2009.29-44
    [112]D. Andrews, D. Niehaus, et al., Programming models for hybrid FPGA-cpu computational components:a missing link [J]. Micro, IEEE,2004.24(4):p. 42-53.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700