用户名: 密码: 验证码:
非对称多核处理器的若干调度问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着芯片集成规模极限的逼近以及能耗和成本等因素,多核处理器逐渐占据了市场。相对于对称多核处理器,非对称多核处理器在效能、芯片面积、适用范围等方面有着巨大的优势,将成为未来的主流体系结构。现有调度算法从单核处理器发展而来,并为对称多处理器做了相应扩展,不能利用非对称多核处理器的特性和优势。本文致力于研究非对称多核处理器的调度问题,以提高系统的效能、性能和公平性。
     具体来说,本文从以下4个方面进行了深入研究:
     (1)针对非对称多核处理器上操作系统的单线程任务调度问题,本文建模分析各种因素,提出了一种综合性调度算法。该算法采用行为匹配、减少迁移和负载均衡的调度策略,包括两个部分:1)集成负载表征,提出集成行为的概念,全面衡量任务的整体性和阶段性行为;2)基于集成行为的调度算法,有效开发非对称多核处理器的特性,能够保证各核心负载均衡,同时可以避免不必要的任务迁移。另外,该算法通过参数调整机制实现了算法的通用性。该算法是一种综合处理任务的整体性和阶段性行为,并具备通用性的调度算法。实验结果表明:该算法可通用于多种环境,且性能比其他同类算法提高6%~22%。
     (2)针对非对称多核处理器上操作系统的多线程任务调度问题,本文建模分析各种因素,提出了一个集成调度算法。该算法具有以下特性:1)全面考虑多线程任务同步特性、核心非对称性以及核心负载;2)通过集成线程调度和动态电压频率调整来提高效能;3)通过参数调整机制实现了算法的通用性。该算法是第一个在非对称多核处理器上结合线程调度和动态电压频率调整的调度算法。实验结果表明:该算法可适用于多种环境,且效能比其他同类算法高24%~50%。
     (3)针对非对称多核处理器上的虚拟处理器公平调度问题,本文建模分析各种因素,提出了一个组合调度算法。该算法具有以下特性:1)全面考虑虚拟处理器同步特性、核心非对称性以及核心负载;2)定义了效用因子、比例系数、比例资源的概念,结合虚拟处理器的同步特性和核心的非对称性对资源和负载进行全面度量;3)通过运行队列分解降低调度开销。实验结果表明:该算法实现了公平调度,并且性能比其他同类算法提高19%~48%。
     (4)针对非对称多核处理器上的虚拟处理器高效能调度问题,本文提出一个并行度感知调度器,该调度器综合利用了虚拟处理器调度和动态电压频率调整。并行度感知调度器用一种非入侵的方法动态监测虚拟机的并行度,然后选择并调度相关的虚拟处理器同时执行。提出的推迟协同调度算法使多个并行的虚拟机可以同时进行协同调度,而不会导致冲突。实际平台上的实验表明,并行度感知调度器的性能和效能优势明显,分别达到26%和65%。此外,并行度感知调度器的开销接近默认调度器,低于其他非对称多核处理器上的虚拟机调度器。
Owing to the limit of chip integration scale, power consumption and cost, multi-coreprocessor has gradually become the mainstream of the market. Asymmetric multi-coreprocessor has been proposed as a more performance, power and area efficient alternative to itssymmetric counterpart, and will be the mainstream of computer architecture in the near future.The state of the art scheduling algorithms cannot exploit the asymmeitic multi-core processors,since they have originated from the single-core processors and only been reinforced for thesymmetric multi-processors. This dissertation investigates scheduling problems onasymmetric multi-core processors to improve performance, power efficiency and fairness.
     Specifically, this dissertation focuses on4important scheduling problems as follows:
     (1) To solve the problem of signle-threaded task scheduling in Operating Systems (OS) onasymmetric multi-core processors, this dissertation models and analyzes the problem,taking various factors into account. Moreover, this dissertation proposes a comprehensivescheduling algorithm, which adopts the scheduling policies of behavior matching,migration avoiding, and load balancin. The algorithm has two parts: an integratedworkload characterization, which proposes integrated behavior to measure the global andlocal behaviors of tasks comprehensively, and an integrated behavior-based schedulingalgorithm, which efficiently utilizes the asymmetric multi-core processors withoutfrequent task migration. This guarantees the load balance between cores. In addition, thealgorithm achieves universality with a flexible parameter adjustment mechanism. It is analgorithm to achieve universality as well as the first to handle the global and localbehaviors of tasks comprehensively. The evaluation on real platform demonstrates thatthe algorithm is universal for different conditions, and it always outperforms otherscheduling algorithms on asymmetric multi-core processors (by6%~22%).
     (2) To solve the problem of multi-threaded task scheduling in OS on asymmetric multi-coreprocessors, this dissertation models and analyzes the problem, taking various factors intoaccount. Moreover, this dissertation proposes an integrated scheduling algorithm, with three features:1) comprehensively considering synchronization characteristics ofmulti-threaded tasks, asymmetry and load of cores;2) integrating thread scheduling anddynamic voltage and frequency scaling (DVFS) in OS to improve energy efficiency;3)achieving universality with a flexible parameter adjustment mechanism. It is the firstalgorithm to exploit thread scheduling and DVFS on AMP simultaneously. Theevaluation on real platform demonstrates that the algorithm is universal for differentconditions and it always outperforms other scheduling algorithms on asymmetricmulti-core processors (by24%~50%).
     (3) To solve the problem of Virtual CPU (VCPU) fair scheduling on asymmetric multi-coreprocessors, this dissertation models and analyzes the problem, taking various factors intoaccount. Moreover, this dissertation proposes a composite scheduling algorithm, withthree features:1) comprehensively considering synchronization characteristics of VCPUs,asymmetry and load of cores;2) defining the concepts of utility factor, scaled factor andscaled resource to measure the load and resource comprehensively, in whichsynchronization characteristics of VCPUs and asymmetry of cores are taken into account;3) decomposing the run queues of cores to reduce overhead of scheduling. It is the firstalgorithm to exploit the synchronization characteristics of VCPUs on AMP. Theevaluation on real platform demonstrates that the algorithm achieves fair scheduling andit always outperforms other scheduling algorithms on asymmetric multi-core processors(by19%~48%).
     (4) To solve the problem of VCPU high energy efficient scheduling on asymmetricmulti-core processors, this dissertation proposes a Parallelism-aware Scheduler (PS),which integrates VCPU scheduling and DVFS. PS dynamically detects parallelism of thedomain with a non-invasive approach, then determines and executes co-scheduling ofVCPUs on asymmetric cores. With a novel delay co-scheduling algorithm, differentparallel domains are enabled to perform co-scheduling simultaneously without conflictsin PS. The evaluation on real platform demonstrates that the performance and energyefficiency improvement achieved by PS are significant, reaching up to26%and65% respectively. Furthermore, the overheads of PS are comparable to those of the nativescheduler, and lower than those of state-of-the-art asymmetry-aware hypervisorschedulers.
引文
[1]陈国良,孙广中,徐云,等.并行计算的一体化研究现状与发展趋势[J].科学通报,2009,54(8):1043-1049
    [2] Asanovic K., Bodik R., Demmel J., et al. A View of the Parallel Computing Landscape [J].Communications of the ACM,2009,52(10):56-67
    [3] Manferdelli J.L., Govindaraju N.K., Crall C. Challenges and Opportunities in Many-coreComputing [J]. Proceedings of the IEEE,2008,96(5):808-815
    [4] Kongetira P., Aingaran K., Olukotun K. Niagara: A32-Way Multithreaded Sparc Processor[J]. IEEE Micro,2005,25(2):21-29
    [5] Li T., Brett P., Knauerhase R., et al. Operating System Support for Overlapping-ISAHeterogeneous Multi-core Architectures[A]. In: Matthew T. Jacob C R D, editor. IEEE16th International Symposium on High Performance Computer Architecture[C],Bangalore: IEEE Computer Society,2010:1-12
    [6] Hill M.D., Marty M.R. Amdahl's Law in the Multicore Era [J]. IEEE Computer,2008,41(7):33-38
    [7] Sun Xian-He, Chen Yong. Reevaluating Amdahl's Law in the Multicore Era [J]. Journal ofParallel and Distributed Computing,2010,70(2):183-188
    [8] Fedorova A., Saez J.C., Shelepov D., et al. Maximizing Power Efficiency withAsymmetric Multicore Systems [J]. Communications of the ACM,2009,52(12):48-57
    [9] Blake G., Dreslinski R.G., Mudge T. A Survey of Multicore Processors [J]. IEEE SignalProcessing Magazine,2009,26(6):26-37
    [10] Annavaram M., Grochowski E., Shen J. Mitigating Amdahl's law through epithrottling[A]. Proceedings of the32st International Symposium on ComputerArchitecture[C], Madison: IEEE Computer Society,2005:298-309
    [11] Kumar R., Tullsen D.M., Ranganathan P., et al. Single-ISA heterogeneous multi-corearchitectures for multithreaded workload performance[A]. In: Dubois M B A, editor.Proceedings of the31st Annual International Symposium on Computer Architecture[C],Munich: IEEE Computer Society,2004:64-75
    [12] Kumar R., Farkas K.I., Jouppi N.P., et al. Single-isa heterogeneous multi-corearchitectures: the potential for processor power reduction[A]. Proceedings of the36thannual IEEE/ACM International Symposium on Microarchitecture[C], San Diego: IEEEComputer Society,2003:81-92
    [13] Li T., Baumberger D., Koufaty D.A., et al. Efficient operating system scheduling forperformance-asymmetric multi-core architectures[A]. In: B V, editor. Proceedings of theConference on High Performance Computing Networking, Storage and Analysis[C], Reno:ACM,2007:1-11
    [14] Mogul J.C., Mudigonda J., Binkert N., et al. Using asymmetric single-isa cmps to saveenergy on operating systems [J]. IEEE Micro,2008,28(3):26-41
    [15] Kahle J. A., Day M.N., Hofstee H.P., et al. Introduction to the Cell Multiprocessor [J].IBM Journal Research And Development,2005,49(4/5):589-604
    [16] Saez J.C., Fedorova A., Koufaty D., et al. Leveraging Core Specialization via OSScheduling to Improve Performance on Asymmetric Multicore Systems [J]. ACMTransactions on Computer Systems,2012,30(2):6:1-6:38
    [17] J.C. Saez. Thread Scheduling on Asymmetric Multicore Systems [D]: ComplutenseUniversity of Madrid,2011
    [18] Li T., Baumberger D., Hahn S. Efficient and Scalable Multiprocessor Fair SchedulingUsing Distributed Weighted Round-robin[A]. In: Reed D, editor. Proceedings of the14thACM SIGPLAN symposium on Principles and practice of parallel programming[C], NewYork: ACM,2009:65-74
    [19] Hofmeyr S., Iancu C., Blagojevi'c F. Load Balancing on Speed[A]. In: R. GovindarajanD P, editor. Proceedings of the15th ACM SIGPLAN symposium on Principles andpractice of parallel programming[C], Raleigh: ACM,2010:147-157
    [20] Hofmeyr S., Colmenares J.A., Iancu C., et al. Juggle: Proactive Load Balancing onMulticore Computers[A]. In: Maccabe B, editor. Proceedings of the20th InternationalACM Symposium on High-Performance Parallel and Distributed Computing[C], San Jose:ACM,2011:3-14
    [21] Saez J.C., Shelepov D., Fedorova A., et al. Leveraging workload diversity through OSscheduling to maximize performance on single-ISA heterogeneous multicore systems [J].Journal of Parallel and Distributed Computing,2011,71(1):114-131
    [22] Suleman M.A., Mutlu O., Qureshi M.K., et al. Accelerating Critical Section Executionwith AsymmetricMulti-Core Architectures[A]. In: Soffa M L, editor. Proceedings of the14th international conference on Architectural support for programming languages andoperating systems[C], Washington, DC: ACM,2009:253-264
    [23] M. Gschwind. The Cell Broadband Engine: exploiting multiple levels of parallelism in achip multiprocessor [J]. International Journal of Parallel Programming,2007,35(3):233-262
    [24] Perez J.P., Bellens P., Badia R.M., et al. CellSs: Making it Easier to Program The CellBroadband Engine Processor [J]. IBM Journal of Research and Development,2007,51(5):593-604
    [25] Koufaty D., Reddy D., Hahn S., et al. Bias scheduling in heterogeneous multi-corearchitectures[A]. In: Morin C M G, editor. Proc of the5th European Conf on ComputerSystems[C], New York: ACM,2010:125-138
    [26] Becchi M., Crowley P. Dynamic thread assignment on heterogeneous multiprocessorarchitectures[A]. In: U B, editor. Proceedings of the3rd Conference on ComputingFrontiers[C], Ischia: ACM,2006:29-40
    [27] Kwon Y., Kim C., Maeng S., et al. Virtualizing performance asymmetric multi-coresystems[A]. In: Ravi Iyer Q Y, Antonio González, editor. Proceedings of the38th annualinternational symposium on Computer architecture[C], San Jose: ACM,2011:45-56
    [28] Weng Chuliang, Wang Zhigang, Li Minglu, et al. The hybrid scheduling framework forvirtual machine systems[A]. In: Antony L. Hosking D F B, Orran Krieger, editor.Proceedings of the2009ACM SIGPLAN/SIGOPS international conference on Virtualexecution environments[C], Washington, DC: ACM,2009:111-120
    [29] Sukwong O., Kim H.S. Is co-scheduling too expensive for SMP VMs?[A]. In: ChristophM. Kirsch G H, editor. Proceedings of the sixth conference on Computer systems[C],Salzburg: ACM,2011:257-272
    [30] Weng Chuliang, Liu Qian, Yu Lei, et al. Dynamic adaptive scheduling for virtualmachines[A]. In: Arthur B. Maccabe D T, editor. Proceedings of the20th internationalsymposium on High performance distributed computing[C], San Jose: ACM,2011:239-250
    [31] Uhlig V., LeVasseur J., Skoglund E., et al. Towards scalable multiprocessor virtualmachines[A]. Proceedings of the3rd conference on Virtual Machine Research andTechnology Symposium-Volume3[C], San Jose: USENIX Association,2004:4-4
    [32] Tam D.K., Azimi R., Soares L.B., et al. RapidMRC: Approximating L2miss rate curveson commodity systems for online optimizations[A]. In: Soffa ML I M, editor. Proceedingsof the14th international conference on Architectural support for programming languagesand operating systems[C], Washington, DC: ACM,2009:
    [33] Sherwood T., Perelman E., Hamerly G., et al. Discovering and exploiting program phases[J]. IEEE Micro,2003,23(6):84-93
    [34] Dhodapkar A.S., Smith J.E. Comparing program phase detection techniques[A]. In: BMS, editor. Proceedings of the36th Annual IEEE/ACM International Symposium onMicroarchitecture[C], San Diego: IEEE Computer Society,2003:217-227
    [35] Jiang Wei, Zhou Yisu, Cui Yan, et al. Cfs optimizations to kvm threads on multi-coreenvironment[A]. Proceedings of the200915th International Conference on Parallel andDistributed Systems[C], Shenzhen: ACM,2009:248-354
    [36] Feitelson D.G., Rudolph L., Schwiegelshohn U. Parallel job scheduling-a statusreport[A]. Proceedings of the10th international conference on Job SchedulingStrategies for Parallel Processing[C], New York: Springer-Verlag,2004:1-16
    [37] J.K. Ousterhout. Scheduling techniques for concurrent systems[A]. Proceedings of the3rd International Conference on Distributed Computing Systems[C], Miami/Ft.Lauderdale: IEEE Computer Society,1982:22-30
    [38]邓亚丹,景宁,熊伟.多核处理器中基于Radix-Join的嵌套循环连接优化[J].计算机研究与发展,2010,47(6):1079-1087
    [39] Kumar R., Tullsen D.M., Jouppi N.P., et al. Heterogeneous chip multiprocesors [J]. IEEEComputer,2005,38(11):32-38
    [40] W. Mauerer. Professional Linux Kernel Architecture[M]. Hoboken: John Wiley&Sons,2008
    [41] D. Chisnall. The Definitive Guide to the Xen Hypervisor[M]. Upper Saddle River:Prentice hall,2007
    [42] Zhong Yutao, Shen Xipeng, Ding Chen. Program locality analysis using reuse distance[J]. ACM Transactions on Programming Languages and Systems,2009,31(6):21-59
    [43] Azimi R., Tam D.K., Soares L., et al. Enhancing operating system support for multicoreprocessors by using hardware performance monitoring [J]. ACM SIGOPS OperatingSystems Review,2009,43(2):56-65
    [44] Shelepov D., Alcaide J.C.S., Jeffery S., et al. HASS: A scheduler for heterogeneousmulticore systems [J]. ACM SIGOPS Operating Systems Review,2009,43(2):66-75
    [45] Chen J., John L.K. Efficient program scheduling for heterogeneous multi-coreprocessors[A]. In: AB K, editor. Proceedings of the46th Annual Design AutomationConference[C], San Francisco: ACM,2009:927-930
    [46] Chen J., John L.K. Energy-Aware application scheduling on a heterogeneous multi-coresystem[A]. In: Christie D L A, Mutlu O, Zorn BG, editor. Proceedings of the2008IEEEInternational Symposium on Workload Characterization[C], Seattle: IEEE ComputerSociety,2008:5-13
    [47]蒋建春,汪同庆.异构多核处理器的任务调度算法[J].计算机工程与应用,2009,45(33):52-56
    [48]彭蔓蔓,徐立超,王颖.异构多核处理器的任务分配及能耗的研究[J].计算机应用研究,2010,57(5):1729-1731
    [49] Teodorescu R., Torrellas J. Variation-aware application scheduling and powermanagement for chip multiprocessors[A]. Proceedings of the35th InternationalSymposium on Computer Architecture[C], Beijing: IEEE Computer Society,2008:363-374
    [50] Winter J.A., Albonesi D.H., Shoemaker C.A. Scalable thread scheduling and globalpower management for heterogeneous many-core architectures[A]. In: Salapura V G M,Knoop J, editor. Proceedings of the19th International Conference on ParallelArchitectures and Compilation Techniques[C], Vienna: ACM,2010:29-40
    [51] Saez J.C., Fedorova A., Prieto M., et al. Operating system support for mitigating softwarescalability bottlenecks on asymmetric multicore processors[A]. In: N A, editor.Proceedings of the7th ACM International Conference on Computing Frontiers[C],Bertinoro: ACM,2010:31-40
    [52] Lakshminarayana N.B., Lee J., Kim H. Age based scheduling for asymmetricmultiprocessors[A]. In: W P, editor. Proceedings of the Conference on High PerformanceComputing Networking, Storage and Analysis[C], Portland: ACM,2009:1-12
    [53] Saez J.C., Prieto M., Fedorova A., et al. A comprehensive scheduler for asymmetricmulticore systems[A]. In: Morin C M G, editor. Proceedings of the5th EuropeanConference on Computer Systems[C], New York: ACM,2010:139-152
    [54]林闯,田源,姚敏.绿色网络和绿色评价:节能机制、模型和评价[J].计算机学报,2011,34(4):593-612
    [55] D. Geer. Industry Trends: Chip Makers Turn to Multicore Processors [J]. Computer,2005,38(5):11-13
    [56] Gonzalez R., Horowitz M. Energy dissipation in general purpose microprocessors [J].IEEE Journal of Solid-State Circuits,1996,31(9):1277-1284
    [57] Rangan K.K., Wei Gu-Yeon, Brooks D. Thread motion: fine-grained power managementfor multi-core systems[A]. In: Stephen W. Keckler L A B, editor. Proceedings of the36stAnnual International Symposium on Computer Architecture[C], Austin: ACM,2009:302-313
    [58] Lee W., Frank M., Lee V., et al. Implications of I/O for gang scheduled workloads[A].Proceedings of the Job Scheduling Strategies for Parallel Processing[C], Geneva:Springer-Verlag,1997:215-237
    [59] C. Bienia. Benchmarking modern multiprocessors [D]. Princeton: Princeton University,2011
    [60] Kazempour V., Kamali A., Fedorova A. AASH: an asymmetry-aware scheduler forhypervisors[A]. The2010ACM SIGPLAN/SIGOPS International Conference onVirtual Execution Environments[C], Pittsburgh: ACM,2010:85-96
    [61]冯登国,张敏,张妍,等.云计算安全研究[J].软件学报,2011,22(1):71-83
    [62] Buyya R., Yeo C.S., Venugopal S., et al. Cloud computing and emerging IT platforms:vision, hype, and reality for delivering computing as the5th utility [J]. Future GenerationComputer Systems,2009,25(6):599-616
    [63] Bai Yuebin, Xu Cong, Li Zhi. Task-aware based co-scheduling for virtual machinesystem[A]. In: Sung Y. Shin S O, Michael Schumacher, Mathew J. Palakal, Chih-ChengHung, editor. Proceedings of the2010ACM Symposium on Applied Computing[C],Sierre: ACM,2010:181-188
    [64] Kumar V., Fedorova A. Towards better performance per watt in virtual environments onasymmetric single-ISA multi-core systems [J]. ACM SIGOPS Operating Systems Review,2009,43(3):105-109
    [65] Rosenblum M., Garfinkel T. Virtual machine monitors: current technology and futuretrends [J]. Computer,2005,38(5):39-47
    [66] Tickoo O., Iyer R., Illikkal R. Modeling virtual machine performance: challenges andapproaches [J]. ACM SIGMETRICS Performance Evaluation Review,2009,37(3):55-60
    [67] Cherkasova L., Gupta D., Vahdat A. Comparison of the three CPU schedulers in Xen [J].ACM SIGMETRICS Performance Evaluation Review,2007,35(2):42-51
    [68] Shue D., Freedman M.J., Shaikh A. Performance isolation and fairness for multi-tenantcloud storage[A]. Proceedings of the10th USENIX conference on Operating SystemsDesign and Implementation[C], Hollywood: USENIX Association,2012:349-362
    [69] Lama P., Zhou X. NINEPIN: Non-invasive and energy efficient performance isolation invirtualized servers[A]. In: Swarz R.S. K P, Cukier M., editor.201242nd AnnualIEEE/IFIP International Conference on Dependable Systems and Networks[C], Boston:IEEE Computer Society,2012:1-12
    [70] Beloglazov A., Buyya R. Energy Efficient Resource Management in Virtualized CloudData Centers[A]. Proceedings of the201010th IEEE/ACM International Conference onCluster, Cloud and Grid Computing[C], Melbourne: IEEE Computer Society,2010:826-831
    [71] Brown D.J., Reams C. Toward energy-efficient computing [J]. Communications of theACM,2010,53(3):50-58
    [72] Gupta V., Nathuji R., Schwan K., et al. An analysis of power reduction in datacentersusing heterogeneous chip multiprocessors [J]. ACM SIGMETRICS PerformanceEvaluation Review,2011,39(3):87-91
    [73] Kivity A., Kamay Y., Laor D., et al. KVM: the Linux virtual machine monitor[A].Proceedings of the Linux Symposium2007[C], Ottawa: Linux Symposium,2007:255-230
    [74] Bertran R., Becerra Y., Carrera D., et al. Energy accounting for shared virtualizedenvironments under dvfs using pmc-based power models [J]. Future GenerationComputer Systems,2012,28(2):457-468
    [75] Meng K., Joseph R., Dick R.P., et al. Multi-optimization power management for chipmultiprocessors[A]. In: Andreas Moshovos D T, Kunle Olukotun, editor. Proceedings ofthe17th international conference on Parallel architectures and compilation techniques[C],Toronto: ACM,2008:177-186
    [76] Albayraktaroglu K., Jaleel A., Wu X., et al. Biobench: A benchmark suite ofbioinformatics applications[A]. IEEE International Symposium on PerformanceAnalysis of Systems and Software[C], Austin: IEEE Computer Society,2005:2-9
    [77] Ozisikyilmaz B., Narayanan R., Zambreno J., et al. An architectural characterizationstudy of data mining and bioinformatics workloads[A]. Proceedings of the2006IEEEInternational Symposium on Workload Characterization[C], San Jose: IEEE ComputerSociety,2006:61-70
    [78] Hennessy J., Patterson D. Computer Architecture: A Quantitative Approach4thedition[M]. San Francisco: Morgan Kauffman,2007

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700