基于多核的虚拟化技术研究

英文题名：A Study of Multicore-Based Virtualization Technique
作者：马汝辉
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：虚拟化 ; 多核 ; 系统级虚拟机 ; 进程级虚拟机 ; KVM ; 实时 ; 嵌入式 ; Cache ; CrossBit
英文关键词：Virtualization ; Multicore ; System virtual machine ; Process virtual machine ; KVM ; Real time ; Embedded ; Cache ; CrossBit
学位年度：2011
导师：管海兵
学科代码：081201
学位授予单位：上海交通大学
论文提交日期：2011-11-01

摘要

近年来,计算机硬件技术相对于落后软件计算模式的快速发展、大量信息资源的可控管理、服务器整合的需求,以及最近云计算模式的推出,使得虚拟化技术成为近来研究热点之一。虚拟化技术主要是通过软硬件技术方式,将底层的计算资源或者化分为多个运行环境,或者整合成单个运行环境,从而满足对各种应用的要求。虚拟化技术在很多重要领域都具有很高的实用价值,如集成服务,内核的开发,内核的调试,安全计算,多系统并行计算,系统迁移等。另外,Intel、AMD等公司的硬件辅助虚拟化技术弥补了软件虚拟化技术性能降低较大的缺陷,进一步促进了虚拟化技术的发展。
     多核技术的出现给虚拟化技术的发展带来了机遇。多核处理器的存在,虚拟化的实现方式将更会变得相对容易,因为每个内核都可以运行不同的进程。然而虚拟化不仅仅是每个内核一个虚拟服务器,而是每个内核可同时运行多个虚拟机。多核虚拟化技术的集中化计算、动态分配资源、充分利用系统资源等等优势,都可以让企业和普通用户用较少的硬件来完成较多的工作,并且获得更优的性能。
     本文就是结合多核思想进一步优化虚拟化技术,针对这一问题,具体的研究工作如下:
     1.分别定性和定量分析了动态二进制翻译系统的各个执行开销,根据分析结果,利用多核技术将翻译部分、执行部分和优化部分分别线程化。另外,本文提出了基于动态工作集变迁的Code Cache替换策略,同前人研究相比,该策略更加符合程序的行为,反映了程序的局部性特性。
     2.提出了基于翻译、执行部分与优化部分的多线程版本的动态二进制翻译系统(MTCrossBit)。在该系统中,引入新的超级块生成线程(优化线程),并利用多核处理器的优势和多线程执行的优点获得性能加速。为了解决线程间通信问题,提出了一种无锁机制的通信机制(ASLC),避免了加解锁算法的控制,防止出现盲等待现象;还提出了各线程间私有Code Cache的策略,防止了各线程间彼此污染Code Cache,达到多线程系统的高度并行性。
     3.提出了基于翻译、优化部分与执行部分的多线程的动态二进制翻译系统(MTEE CrossBit)。在该系统中,根据执行部分需求,将翻译部分和超级块优化部分线程化,增加翻译线程,实现并行翻译,这个过程中避免了传统动态二进制翻译系统中的翻译与执行部分的上下文切换操作。同样地,为了合理地协调各线程间的工作,本论文提出了BranchTree模块,它不仅可以管理多线程的并行翻译操作,而且可以协调完成执行线程与优化线程的工作。
     4.提出了基于KVM的嵌入式虚拟化系统的两种软件调优方法。在嵌入式虚拟化系统中,为了减小GP客户系统对RT客户系统的影响,本论文提出一种提升实时任务优先级的调度策略,它大大减小了GP任务对系统实时性能的影响;接着,本论文提出一种利用多核技术的专有核绑定的调优策略,在该策略中,一些可操作的中断命令以及GP任务都通过硬亲和力技术绑定到一个专有核上,而实时任务被分配到另外一个核心上,这样可以避免其他任务对RT任务的影响。
     5.提出了基于KVM的嵌入式虚拟化系统的两种硬Cache调优方法。本论文结合页表预取技术、Cache架构以及Page coloring思想分别提出了基于硬Cache的预取策略和划分策略。同前人研究工作相比,本论文的工作是在真实物理环境下实现的,而不是传统的仿真下模拟实现;另外,本论文不是单纯的关注系统本身的吞吐量的大小,而是在注重实时性能的情况下,兼顾了系统的吞吐量。这种实现方式更加贴近实际生活结合。
Nowadays, with development of computer hardware compared to the slower software computing, the requirement of huge information resources management and service consolidation, virtualization technique, as one solution, has been deemed as one of the research points prompted by cloud computing. Based on software and hardware technique, virtualization technique assigns fundamental resources to several or only one execution environments to satisfy its requirements. Currently, virtualization technique has been widely applied on many important research domains, such as service consolidation, kernel developing, kernel debugging, security computing, parallel computing between OSes and system migration, etc. To improve system performance with soft virtualization technique, Intel and AMD propose hardware-assisted virtualization that prompts virtualization technique.
     Multicore gives virtualization technique a chance. With chip multiprocessors, the execution of virtualization is easier than before, since different processes are assigned to each core. In addition, with the help of virtualization, each core can execution several virtual machines, rather than only one. Indeed, the advantages of multicore-based virtualization technique, such as cluster computing, dynamically assigning resource, efficiently utilizing system resources, etc, has been confirmed by many companies and general-purpose guests, because it utilizes a fewer hardware to help people finish works and gets better performance.
     Combined with multicore technique, virtualization technique can be improved further. According that mentioned, in this paper, the research is depicted in detain as follow:
     1. This paper proposes qualitative analysis and quantitative analysis of dynamic binary translation (DBT) system’s performance. According analysis results, we change translation part, execution part and optimization part into various threads assisted by multicore. Then this paper presents dynamic code cache (DCC) policy based on working set. Compared to previous works, this policy fits program’s behaviors and reflects program’s locality feature.
     2. This paper proposes multithreaded DBT system---MTCrossBit, where translation/execution thread and optimization thread are executed concurrently. In this architecture, optimization thread used to build superblock is employed, and this is to efficiently utilize multicore resource to improve system performance. To address communication issue between threads, a novel lock-free communication mechanism--- Assembly Language Level Communication (ASLC), is presented, which can fully avoid lock/unlock algorithm. Another key point is that private code cache for each thread is presented in this architecture, which avoids polluting code cache between various threads.
     3. This paper proposes multithreaded DBT system---MTEE CrossBit, where translation/optimization thread and execution thread are executed concurrently. In this architecture, to efficiently execute target code block, several translation threads (including optimization part) are executed concurrently to translate basic blocks and superblocks. And this leads no context switch between translation thread and execution thread compared to original CrossBit. Then this paper also presents another key technique----BranchTree utilized to manage translation threads and execution thread.
     4. This paper presents two software tuning methods for KVM-based embedded virtualization system. In this architecture, to minimize the effect for RT guest caused by GP guest, this paper presents prioritization policy, which enhance the priority of RT guest to occupy CPU resource as long as possible. So the GP guest cannot frequently influence on RT guest. Then this paper proposes CPU shielding, which prevents the RT guest from being adversely affected by harmful workloads, like interrupt-off regions and cache pollution, thus achieving the best real-time performance.
     5. This paper presents two cache-based tuning methods for KVM-based embedded virtualization system, Page table prefetch (PTP) and cache partitioning (CAP) assisted by prefetch technique and page coloring. In previous works, various cache partitioning solutions were executed in simulation, while in this paper it is applied on physical cache. Compared to previous works, the goal in this paper focuses on not only better real-time response but also high system performance.

引文

[1] Moore G. Electronics magazine. April 19th, 1965.
    [2] Schaller R. Moore’s law: past, present and future. IEEE spectrum, 1997, 34(6):52–59.
    [3] Hofstee H.P. Power Efficient Processor Design and the Cell Processor. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA 05), San Francisco, USA, Feb. 2005, 258-262.
    [4] Nickolls J. and Buck I. NVIDIA CUDA software and GPU parallel computing architecture. In Microprocessor Forum, May 2007, 103-104.
    [5] Kongetira P., Aingaran K., and Olukotun K. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, February 2005, 25(2):21-29.
    [6] Juan del Cuvillo, Weirong Zhu, ZiangHu, et al. Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture. In: Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS), Quebec City, Canada, 2006, 9-15.
    [7] Guangming Tan, Dongrui Fan, Junchao Zhang, et al. Experience on Optimizing Irregular Computation for Memory Hierarchy in Many-core Architecture. In:Proceedings of 13th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), Salt Lake City, USA, 2008, 279-280.
    [8] Smith J.E., and Nair R. Virtual Machines: Versatile Platforms for Systems and Process. Morgan Kaufman, 2005.
    [9] Seawright L. H.,MacKinnon R. A. VM/370-a study of multiplicity and usefulness,IBM System Journal,1979,18(1):4-17.
    [10] Goldberg R.P. Survey of Virtual Machine Research. IEEE Computer,June 1974,34-45.
    [11] Keith Adams,Ole Agesen. A Comparison of Software and Hardware Techniques for x86 Virtualization,Operating Systems Review,2006,40(5):2-13.
    [12]金海等.计算系统虚拟化--原理与应用.北京:清华大学出版社, 2008.
    [13] Rich Ublig, Gil Neiger, et al. Intel Virtualization Technology. IEEE ComputerMagazine, 2005, 38(5):48-56.
    [14] Andrew S. Tanenbaum.现代操作系统(第三版),机械工业出版社,2009.
    [15] Susanta N. A Survey on Virtualization Technologies,State University of New York,Stony Brook,Feb 2005.
    [16] Gil Neiger, Amy Santoni, Felix Leung, Dion Rodgers, Rich Uhlig. Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization,Intel Technology Journal, August 2006, 10(3):167–177.
    [17] Intel Corporation, Intel Virtualization Technology Specification for the IA-32 Intel Architecture, April 2005.
    [18] Barham P. Xen and the Art of Virtualization. In: Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2003.
    [19] The Xen project. http://www.cl.cam.ac.uk/Research/SRG/netos/xen/, 2011.
    [20] The Xensource Company. http://www.xensource.com/, 2011.
    [21] Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, Anthony Liguori. KVM: the Linux Virtual Machine Monitor, Linux Symposium, 2007, 225-230.
    [22] VMware, Inc. VMware virtual machine technology. http://www.vmware.com, 2010.
    [23] Jay Munro. Virtual Machines and VMware. http://www.extremetech.com/article2/0, 1697,1156372,00.asp , 2011.
    [24] Carl Waldspurger. Memory Resource Management in VMware ESX Server. In: Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), ACM SIGOPS Operating Systems Review, 2002, Winter 2002 Special Issue: 181-194.
    [25] Microsoft. Microsoft virtual PC. http://www.microsoft.com/windows/virtual-pc/, 2011.
    [26] Dike J. A user-mode port of the linux kernel. In: Proceedings of the 4th annual Linux Showcase & Conference , October 2000.
    [27] Lawton K., Denney B., Guarneri N.D., Ruppert V., Bothamy C. Bochs x86 PC Emulator User Manual. http://bochs.sourceforge.net/, 2011.
    [28] Bao Yuncheng. Building process virtual machine via dynamic binary translation. Master thesis, Shanghai Jiao Tong University, China, January 2007.
    [29] The CrossBit Developers. CrossBit. http://www.crossbit.org/, 2011.
    [30] OpenVZ. http://en.wikipedia.org/wiki/OpenVZ, 2011.
    [31] Server virtualization open source project-OpenVZ. http://openvz.org/, 2011.
    [32] Kamp P.H., Watson R.N.M. Jails: Confining the Omnipotent root. In: Proceedings of the Second International SANME Conference, May 2000,1-15.
    [33] Wine Project, Wine user guide, http://www.winehq.com/site/docs/wine-user/index. 2011.
    [34] Noer G. J. Cygwin: A free Win32 Porting Layer for UNIX applications. In: Proceedings of the 2nd USENIX Windows NT Symposium. Seattle, Washington, USA, August 1998.
    [35] Venners B. The lean, mean, virtual machine An introduction to the basic structure and functionality of the Java Virtual Machine. Java World, 1996.
    [36] Nidhi Aggarwal, Parthasarathy Ranganathan, Norman P. Jouppi, James E. Smith. Configurable isolation: building high availability systems with commodity multi-core processors. In: Proceedings of the 34th annual international symposium on Computer architecture (ISCA). ACM SIGARCH Computer Architecture News, 2007, 35(2): 470-481.
    [37] Herb Shutter. The Free Lunch is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb's Journal 30 (3); http://www.gotw.ca/publications/concurrency-ddj.htm, 2005.
    [38] Tullsen D.M., Eggers S.J., and Levy H.M. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA). S. Margherita Ligure, Italy, June 22-24, 1995, 392-403.
    [39] Fabrizio Petrini, Gordon Fossum, Juan Fern′andez, Ana Lucia Varbanescu, Mike Kistler and Michael Perrone. Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine. IBM TJ Watson ResearchCenter, Yorktown Heights, NY 10598, USA, 2007.
    [40] Agarwal A. Performance Tradeoffs in Multithreaded Processors. IEEE Transactions on Parallel and Distributed Systems, 1992, 3(5):525-539.
    [41] Bellard F. QEMU, a fast and portable dynamic translator. In: Proceedings of USENIX Annual Technical Conference, Anaheim, USA, April 10-15, 2005, 41-41.
    [42] Ung D., Cifuentes C. Machine-adaptable dynamic binary translation. In: Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization, Boston, USA, January 18, 2000, 41-51.
    [43] Wang C., Ying V., Wu Y. Supporting legacy binary code in a software transactioncompiler with dynamic binary translation and optimization. In Proceedings of the 17th international conference on Compiler construction, Budapest, Hungary, March 29-April 6, 2008, 291-306.
    [44] Ma Ruhui, Guan Haibing, Zhu Erzhou, Yang Hongbo, Yang Yindong and Liang Alei. Partitioning the Conventional DBT System for Multiprocessors. Journal of Computer Science and Technology, 2011, 26 (3):474-490.
    [45] Edson Borin, Youfeng Wu. Characterization of Dynamic Binary Translation Overhead. In: Proceedings of the 1st Workshop on Architectural and Microarchitectural Support for Binary Translation, Beijing, Intel Corporation 2200 Mission College Blvd. Santa Clara, 2008.
    [46] Sorav Bansal, Alex Aiken. Binary Translation Using Peephole Superoptimizers. In: Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Computer Systems Lab, Stanford University, 2008.
    [47] Jiwei Lu, Howard Chen, Pen-Chung Yew, Wei-Chung Hsu. Design and Implementation of a Lightweight Dynamic Optimization System. Journal of Instruction-Level Parallelism, 2004, 6:1-24.
    [48] Jiwei Lu, Howard Chen, Rao Fu, Wei-Chung Hsu, Dong-Yuan Chen. The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System. In: Proceedings of the 36th International Symposium on Micro-architecture (MICRO), 2003.
    [49] Weifeng Zhang, Brad Calder, Dean M. Tullsen. An Event-Driven Multithreaded Dynamic Optimization Framework. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), San Diego, 2005.
    [50] Weifeng Zhang, Steve Checkoway, Brad Calder, Dean M. Tullsen. Dynamic Code Value Specialization Using the Trace Cache Fill Unit. In: Proceedings of International Conference on Computer Design (ICCD), San Diego, 2006.
    [51] Intel? Atom? Processor D510. http://ark.intel.com/Product.aspx?id=43098, 2011.
    [52] An RTOS for an SMP Multi-core Processor. http://rtcmagazine.com/ 2011.
    [53] Multi-Core with virtualization, a solution for future smart phones. http://www.alphagalileo.org, 2011.
    [54] RTS Hypervisor. www.real-time-systems.com, 2011.
    [55] VirtualLogix Real-Time Virtualization and VLX. http://www.osware.com, 2011.
    [56] WindRiver. http://www.windriver.com/products/platforms/real-time_core/, 2011.
    [57] Bao Yunchen. Building Process Virtual Machine via Dynamic Binary Translation. Shanghai: School of Software, Jan. 2007.
    [58] Chernoff A., Herdeg, M., Hookway R., Reeve C., Rubin N., Tye T., Bharadwaj Yadavalli S. and Yates J. FX!32 A Profile-Directed Binary Translator. IEEE Micro, 1998, 18(2): 56-64.
    [59] Cifuentes C. and Van EmmeriK M. UQBT: Adaptable Binary Translation at Low Cost. Computer, 2000, 33(3): 60-66.
    [60] Cindy Zheng and Carol Thompson. PA-RISC to IA-64: Transparent Execution, No Recompilation. Computer, 2000, 33(3): 47-52.
    [61] Bonnie 1.4. http://wiki.linuxquestions.org/wiki/Bonnie, 2011.
    [62] Leonid Baraz, Tevi Devor, Orna Etzion, Shalom Goldenberg, Alex Skaletsky, Yun Wang and Yigal Zemach. IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium?-based systems. In: Proceedings of the 36th International Symposium on Microarchitecture (MICRO), Intel Corporation, 2003.
    [63] Lance Hammond, Mark Willey and Kunle Olukotun. Data Speculation Support for a Chip Multiprocessor. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1998.
    [64]中科院计算所,动态二进制翻译中的代码cache管理策略,计算机工程, 2005, 31(10): 97-99.
    [65] Alexander Klaiber, The Technology Behind Crusoe Processors. Transmeta technology report, Jan.2000.
    [66] Ebcioglu K. and Altman E. DAISY: Dynamic Compilation for 100 Percent Architectural Compatibility. In: Proceedings of the 24nd Annual International Symposium on Computer Architecture (ISCA), New York, 1997, 26-37.
    [67] Michael Gschwind, and Erik Altman. Inherently Lower Complexity Architectures using Dynamic Optimization. In: Proceedings of Workshop on Complexity Effective Design in conjunction with ISCA, Anchorage, AK, May 2002.
    [68] Cyclictest. https://rt.wiki.kernel.org/index.php/Cyclictest, 2011.
    [69] Ung D. and Cifuentes C. Optimising Hot Paths in a Dynamic Binary Translator. In: Proceedings of the 2nd Workshop on Binary Translation, Philadelphia, Pennsylvania, October 19, 2000.
    [70] Mark Probst. Fast machine-adaptable dynamic binary translation. In: Proceedings ofthe Workshop on Binary Translation 2001, September, 2001.
    [71] Probst M. K. and Scholz B. A. Register liveness analysis for optimizing dynamic binary translation. In: Proceedings of the 9th Working Conference on Reverse Engineering (WCRE) , 2002, 35-44.
    [72] Scott K. and Davidson J. Strata: A Software Dynamic Translation Infrastructure. In: Proceedings of IEEE Workshop on Binary Translation , Technical Report, 2001.
    [73] Cifuentes C., Lewis B. and Ung D. Walkabout-a retargetable dynamic binary translation framework. Technique Report TR2002-106, January. Sun Microsystems Laboratory, Palo Alto, 2002.
    [74] Wang C., Hu S., Kim H., Nair S.R., Breternitz M., Ying Z. and Wu Y. StarDBT: an efficient multi-platform dynamic binary translation system. In: Proceedings of the Asia-Pacific Computer Systems Architecture Conference, 2007, 4-15.
    [75] Deutsch P. and Schiffman A.M. Efficient Implementation of the Smalltalk-80 System. In Proceedings of the 11th ACM Symposium on Principles of Programming Languages (POPL), 1984, 297-302.
    [76] John L.H. and David A.P. Computer Architecture: A Quantitative Approach, Morgan Kaufman, 2002, 397-402.
    [77] Kim H. and Smith M.D. Code cache management schemes for dynamic optimizers. In: Proceedings of the 6th Workshop on Interaction between Compilers and Computer Architecture, 2002, 102-110.
    [78] Desoli G., Mateev N., Duesterwald E., Faraboschi P. and Fisher J.A. Deli: A new run-time control point. In: Proceedings of the 35th International Symposium on Microarchitecture (MICRO), 2002, 257-268.
    [79] Witchel E. and Rosenblum M. Embra: Fast and flexible machine simulation. Measurement and Modeling of Computer Systems, 1996, 68-79.
    [80] Chen W. K., Lerner S., Chaiken R. and Gilles D.M. Mojo: A dynamic optimization system. In: Proceedings of the 3rd ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO), Monterey, USA, December 10, 2000.
    [81] Kim H. and Smith J. E. Exploring code cache eviction granularities in dynamic optimization systems. In: Proceedings of the 2nd International Symposium on Code Generation and Optimization, Palo Alto, CA, 2004, 89–99.
    [82] Bruening D., Garnett T. and Amarasinghe S. An infrastructure for adaptive dynamicoptimization. In: Proceedings of the 1st Annual International Symposium on Code Generation and Optimization (CGO), March, 2003, 265–275.
    [83] London K., Dongarra J., Moore S., Mucci P., Seymour K. and Spencer T. End-user tools for application performance analysis using hardware counters. In: Proceedings of the 14th Conference on Parallel and Distributed Computing Systems (ICPDCS), August, 2001.
    [84] Bala V., Duesterwald E. and Banerjia S. Dynamo: A transparent runtime optimization system. In: Proceedings of the 21th ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), Vancouver, Canada, Jun. 18-21, 2000, 1-12.
    [85] Randal E.B. and David R.O. Computer Systems: A Programmer’s Perspective. Pearson Education Asia Limited and Publishing House of Electronics Industry, 2004, 478-481.
    [86] Stallings W. Operating Systems: Internals and Design Principles, Four Edition. Prentice Hall, 2000.
    [87] Banerjia S., Bala V. and Duesterwald E. Preemptive replacement strategy for a caching dynamic translator. USA Patent, No.US6,237,065 B1, 2001.
    [88] Li Zengxiang, Guan Haibing, and Li Xiaoyong. Optimization for Dynamic Binary Translation. Computer Application and Software, 2007, 24(7): 12-14.
    [89] Shi Huihui, Wang Yi, Guan Haibing and Liang Alei. An Intermediate Language Level Optimization Framework for Dynamic Binary Translation. ACM SIG/PLAN Notice, 2007, 42(5): 3-9.
    [90] Robert F.C. and David K. Shade: A Fast Instruction Set Simulator for Execution Profiling. Sun Microsystems, CA, USA, 1994.
    [91] Young C. and Smith M.D. Static correlated branch prediction. ACM Transactions on Programming Languages and Systems, 1999, 21: 111-159.
    [92] Scott K., Kumar N., Childers B.R., Davidson J.W. and Soffa M.L. Overhead Reduction Techniques for Software Dynamic Translation. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS), 2004.
    [93] Bach D. Bui, Marco Caccamo, Lui Sha and Joseph Martinez. Impact of Cache Partitioning on Multi-tasking Real Time Embedded Systems. In: Proceedings of the 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, August 25-27, 2008, 101-110.
    [94] Sites R.L., Chernoff A., Kirk M.B., Marks M.P. and Robinson S.G. Binary Translation. Communications of the ACM. 1993, 36(2): 69-81.
    [95] Thomas Ball and James R. Larus. Efficient Path Profiling. In: Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Paris, France, December 2-4, 1996, 46-57.
    [96] Thomas Ball, Peter Mataga and Mooly Sagiy. Edge profiling versus path profiling: the slowdown. In: Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL), 1998, 134-148.
    [97] Dhodapkar A.S., Smith J.E. Comparing Program Phase Detection Techniques. In: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), December 3-5, 2003, 217-217.
    [98] Evelyn Duesterwald and Vasanth Bala. Software Profiling for Hot Path Prediction: Less is More. ACM SIGOPS Operating System Review, 2000, 34(5): 202-211.
    [99] Wen-mei W.H., Scot A.M., William Y.C., Pohua P.C., Nancy J.W., Roger A.B., Roland G.O., Richard E.H., Tokuzo K., Grant E.H., John G.H. and Daniel M.L. The Superblock: An Effective Technique for VLIW and Superscalar Compilation. Journal of Supercomputing, 1993, 229-248.
    [100] Massimiliano Poletto and Vivek Sarkar. Linear scan register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS), 1999, 21(5): 895-913.
    [101] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann publications, 2008.
    [102] Shameem Akhter and Jason Roberts. Multi-core Programming: Increasing Performance through Software Multi- threading. Publishing House of Electronics Industry. 2007.
    [103] Patterson D.A., and Hennessy J.L. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufman, 1998.
    [104] Abraham Silberschatz. Operating System Concepts, 7th edition. State University of New York at Stony Brook, 2009.
    [105] Stallings W. Operating Systems: Internals and Design Principles. Prentice Hall, Sixth Edition, 2008
    [106] Sorav Bansal and Alex Aiken. Binary Translation Using Peephole Superoptimizers. In: Proceedings of the 8th USENIX Symposium on Operating Systems Design andImplementation (OSDI), December, 2008.
    [107] Smith J.E. An overview of virtual machine architectures. http://www.ece.wisc.edu/ jes/papers/vms.pdf, 2011.
    [108] Goldberg R.P. Survey of Virtual Machine Research. Computer, June 1974, 34-45.
    [109] Rosenblum M. and Garfinkel T. Virtual Machine Monitor: Current Technology and Future Trends. Computer, 2005, 38(5): 39-47.
    [110] Smith J.E., Uhlig R. Virtual Machines: Architectures, Implementations and Applications. http://www.hotchips.org/archives/hc17/1_Sun/HC17.T1P2.pdf, 2011.
    [111]英特尔开源软件技术中心,复旦大学并行处理研究所,系统虚拟化——原理与实现,北京,清华大学出版社,2008.
    [112] Ongaro D., Cox A. L. and Rixner S. Scheduling I/O in virtual machine monitors. In: Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), 2008, 1-10.
    [113] Robin J.S. and Irvine C.E. Analysis of the Intel Pentium’s Ability to Support a Secure Virtual Machine Monitor. In: Proceedings of the 9th conference on USENIX Security Symposium (SSYM), Denver, CO, USA, Auguest 2000, 129-144.
    [114] Gerald J. P. Formal requirements for virtualizable third generation architectures,Communications of the ACM. 1974, 17(7): 412-421.
    [115] Tam D., Azimi R., Soares L. and Stumm M. Managing shared L2 caches on multicore systems in software. In: Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture, June, 2007.
    [116] Jeanna N.M., Eli M.D., et al. Running Xen: A Hands-On Guide to the Art of Virtualization. Prentice Hall Inc., 2008.
    [117] Intel Corp. Intel 64 and IA-32 Architectures Software Developer’s Manual-Volume 3B: System Programming Guide Part 2, 2003.
    [118] Chisnall D. The Definitive Guide to the Xen Hypervisor, Prentice.Hall., United States, 2008.
    [119] Qumranet: KVM Whitepaper. Qumranet 2006.
    [120] Sugerman J., Venkitachalam G. and Beng-Hong Lim. Virtualizing I/O Devices on VMware Workstation’s Hosted Virtual Machine Monitor. In: Proceedings of USENIX Annual Technical Conference, Boston, Massachusetts, USA, June 25–30, 2001.
    [121] Ahmad I., Anderson J., Holler A., Kambo R. and Makhija V. An Analysis of DiskPerformance in VMware ESX Server Virtual Machines. In: Proceedings of the 6th Workshop on Workload Characterization (WWC), October, 2003.
    [122] Lin J., Lu Q., Ding X., Zhang Z., Zhang X. and Sadayappan P. Gaining insights into multi-core cache partitioning: Bridging the gap between simulation and real systems. In: Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA), 2008, 367-378.
    [123] Intel Corporation. IA-32 Intel Architecture Software Developer's Manual, 2003.
    [124] Programmable Interrupt Controller. http://en.wikipedia.org/wiki/PIC_microcontroller, 2011.
    [125] Programmable Interval Timer. http://en.wikipedia.org/wiki/Programmable_interval _timer, 2011.
    [126] The High-Resolution Timer API. http://lwn.net/Articles/167897/, 2011.
    [127] Qing Li.嵌入式系统的实时概念,北京航空航天大学出版社,2004.
    [128] Real-time Computing FAQ. http://www.faqs.org/faqs/realtime-computing/faq/, 2011.
    [129]黄武陵,何小庆,艾云峰.嵌入式Linux实时化技术,电子产品世界,2009, (8): 57-60.
    [130] RTLinux. http://en.wikipedia.org/wiki/RTLinux, 2011.
    [131] Roberto Bucher and Silvano Balemi. Scilab/Scicos and Linux RTAI - A unified approach. Control Applications, 2005, 1121-1126.
    [132] L4Linux. http://os.inf.tu-dresden.de/L4/LinuxOnL4, 2011.
    [133] Gerum P. The Xenomai project. http://www.xenomai.org, 2011.
    [134] Preempt_RT Patch. http://www.kernel.org/pub/linux/kernel/projects/rt/, 2011.
    [135] Real-Time Linux. http://www.mvista.com/real_time_linux.php, 2011.
    [136] TimeSys. http://www.timesys.com/company, 2011.
    [137] Wind River Linux. http://www.bsdi.com/products/linux, 2011.
    [138] Fran?ois Armand, Michel Gien, Gilles Maignéand Gregory Mardinian. Shared Device Driver Model for Virtualized Mobile Handsets. In: Proceedings of the 1st International conference on Mobile System Virtualization, 2008.
    [139] Brosky S. and Rotolo S. Shielded processors: Guaranteeing sub-millisecond response in standard Linux. In: Proceedings of the 17th International Parallel and Distributed Processing Symposium, Nice, France, April 2003.
    [140] Heursch A.C., Grambow D., Hosrtkotte A. and Rzehak H. Steps towards a fullypreemptable Linux kernel. In: Proceedings of the 27th IFAC/IFIP/IEEE Workshop on Real-Time Programming, Lagow, Poland, May, 2003.
    [141] Love R. Linux Kernel Development, 2nd Edition. Novell Press, January, 2005.
    [142] Wong C.S., Tan I., Kumari R.D. and Wey F. Towards achieving fairness in the linux scheduler. ACM SIGOPS Operating Systems Review, 2008, 42(5): 34-43.
    [143] Henning Schild, Adam Lackorzynski and Alexander Warg. Faithful Virtualization on a Real-Time Operating System. In: Proceedings of the 11th Real-Time Linux Workshop, September, 2009.
    [144] Kaiser R. Alternatives for Scheduling Virtual Machines in Real-Time Embedded Systems. In: Proceedings of the 1st workshop on Isolation and integration in embedded systems (IIES), April, 2008, 5-10.
    [145] Intel Virtualization Technology. Applying Virtualization to Embedded Devices http://www.intel.com/technology/advanced_comm/322288.pdf, 2011.
    [146] Jonas Eriksson. Virtualization, Isolation and Emulation in a Linux Environment. Master's Thesis in Computing Science, 2009
    [147] Kaseridis D. et al. A Bandwidth-Aware Memory Subsytem Resource Management Using Non-Invasive Resource Profilers for Large CMP Systems. In: Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Bangalore, India, January 9-14, 2010.
    [148] Liu Fang et al. Understanding How Off-Chip Memory Bandwidth Partitioning in Chip Multiprocessors Affects System Performance. In: Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Bangalore, India, January 9-14, 2010.
    [149] Kim Yoongu et al. ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers. In: Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Bangalore, India, January 9-14, 2010.
    [150] Sergey Zhuravlev et al. Addressing Shared Resource Contention in Multicore Processors via Scheduling. In: Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Pittsburgh, PA, March 13-17, 2010.
    [151] Andrew Baumann et al. The Multikernel A New OS Architecture for ScalableMulticore Systems. In: Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), Big Sky, MT, October 11-14, 2009.
    [152] Edmund B. Nightingale et al. Helios Heterogeneous Multiprocessing with Satellite Kernels. In: Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), Big Sky, MT, October 11-14, 2009.
    [153] Seehwan Yoo, Yunxin Liu, Cheol-Ho Hong, Chuck Yoo and Yongguang Zhang. MobiVMM: aVirtual Machine Monitor for Mobile Phones. In: Proceedings of the 1st Workshop on Virtualization in Mobile Computing, June 17, 2008, Breckenridge, Colorado.
    [154] Lu Q., Lin J., Ding X., Zhang Z., X Zhang. and Sadayappan P. Soft-OLP: improving hardware cache performance through software-controlled object-level partitioning. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), 2009, 246-257.
    [155] Jan Kiszka. Towards Linux as a Real-Time Hypervisor. In: Proceedings of the 11th Real-Time Linux Workshop, 2009.
    [156] Wei Jiang et al. CFS Optimizations to KVM Threads on Multi-Core Environment. In: Proceedings of the 15th International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, China, December 8-11, 2009.
    [157] Min Lee et al. Supporting Soft Real-Time Tasks in the Xen Hypervisor. In: Proceedings of the 6th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), Pittsburgh, PA, March 17-19, 2010.
    [158] Seehwan Yoo, Miri Park and Chuck Yoo. A Step to Support Real-time in Virtual Machine. In: Proceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference, 2009.
    [159] Jin Xinxin, Chen Haogang, Wang Xiaolin, Wang Zhenlin, Wen Xiang, Luo Yingwei and Li Xiaoming. A Simple Cache Partitioning Approach in a Virtualized Environment. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing with Applications, Chengdu, China, August10-12, 2009.
    [160] Kemal Ebcioglu, Erik Altman, Michael Gschwind and Sumedh Sathaye. Dynamic Binary Translation and Optimization. IEEE Transactions on Computers, 2001, 50(6): 529-548.
    [161] Cao Hongjia, Yu Lei, Deng Kun and Zhou Xingming. Design and Implementation of a Dynamic User-Level Binary Translation System. Computer Engineering & Science, 2004,26(8): 79-82, 99.
    [162] PChen Yu, Ren Jie, PZhu Hui and Shi Yuan Chun. Dynamic Binary Translation and Optimization in a whole-System Emulator– SkyEye. In: Proceedings of International Conference Workshops on Parallel Processing, 2006.
    [163] Samuel T.K., George W.D. and Peter M.C. Debugging operating systems with time-traveling virtual machines. In: Proceedings of Annual Usenix Technical Conference, Anaheim, CA, April, 2005, 1-15.
    [164] Dike J. A user-mode port of the Linux kernel. In: Proceedings of the 2000 Linux Showcase and Conference, October, 2000.
    [165] John L.H., David A.P. and David G. Computer architecture: a quantitative approach, 4th Editon. Morgan Kaufmann, 2007.
    [166] Norman P. J. Cache Write Policies and Performance. In: Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA), May, 1993, 191-201.
    [167] Intel Corporation. An Overview of Cache. Embedded Intel Architecture Papers, 2003.
    [168] Richard Uhlig, David Nagle, Tim Stanley, Trevor Mudge, Stuart Sechrest and Richard Brown. Design tradeoffs for software-managed TLBs. ACM Transactions on Computer Systems (TOCS), 1994, 12(3): 175-205.
    [169] Saulsbury A., Dahlgren F. and Stenstr?m P. Recency-Based TLB Preloading. In: Proceedings of the 27th annual international symposium on Computer architecture, 2000.
    [170] Steven P.V. and David J.L. Data Prefetch Mechanisms. ACM Computing Surveys (CSUR), 2000, 32(2): 174-199.
    [171] Brown D., Mowry T.C. and Krieger O. Compiler-based I/O prefetching for out-of-core applications. ACM Transactions on Computer Systems, 2001, 19(2): 111-170.
    [172] Smith A. Cache memories. ACM Computing Surveys (CSUR), 1982, 4: 473-530.
    [173] Myoung Kwon Tcheun, Hyunsoo Yoon and Seung Ryoul Maeng. An adaptive sequential prefetching scheme in shared-memory multiprocessors. In: Proceedings of the international Conference on Parallel Processing (ICPP), Washington, DC, USA, 1997: 306-313.
    [174] Gill B. and Modha D. Sarc: Sequential prefetching in adaptive replacement cache. In: Proceedings of the USENIX Annual Technical Conference , 2005: 293-308.
    [175] Gill B. and Bathen L. Amp: Adaptive multi-stream prefetching in a shared cache. In:Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), 2007: 185-198.
    [176] Chang J. and Sohi G.S. Cooperative cache partitioning for chip multiprocessors. In: Proceedings of the 21st annual international conference on Supercomputing (ICS), 2007.
    [177] Zhou X., Chen W. and W Zheng. Cache sharing management for performance fairness in chip multiprocessors. In: Proceedings of the 18th International Conference on Parallel Architectures & Compilation Techniques (PACT), 2009, 384–393.
    [178] Xie Y. and Loh G.H. PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-core Shared Caches. In: Proceedings of the 36th International Symposium on Computer Architecture (ISCA), June, 2009, 174-183.
    [179] Wang H., Koren I. and Krishna C.M. An Adaptive Resource Partitioning Algorithm for SMT Processors. In: Proceedings of the 17th International Conference on Parallel Architectures & Compilation Techniques (PACT), October, 2008, 230-239.
    [180] Zhang X., Dwarkadas S. and Shen K. Towards practical page coloring-based multicore cache management. In: Proceedings of the 4th ACM European conference on Computer systems, 2009, 89-102.
    [181] Ding X.N., Wang K.B. and Zhang X.D. ULCC: A User-Level Facility for Optimizing Shared Cache Performance on Multicores. In Proceedings of the 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), San Antonio, TX, February 12-16, 2011.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700