摘要
围绕超算资源的易用性和多类软件的集成以及协作需求,开发了超算环境下的科学工作流应用平台,设计了异步并发的流程执行引擎,采取调度算法和调度器、引擎相分离的设计策略,给出了资源调度方案。提出了局部资源池化技术和资源预约算法,并比较分析了五种常用调度算法的性能,给出了算法选择的建议。实际应用表明设计的引擎能够支撑复杂工作流的灵活执行方式,给出的资源调度方案能够满足超算环境下工作流应用的高效执行。
In general,lots of software need to be cooperated together for particular object in scientific experiments and engineering domains. This paper described a new scientific workflow application platform in HPC environment. This platform contained an engine with high concurrency and asynchronous framework to process workflow application. Scheduler,planner and engine were decoupled from each other,which allowed that the three components could develop independently: scheduler for scheduling algorithms implementation,planner for collecting the scheduling information and engine for workflow driver. Scheduler and planner used resource advance reservation and local pooling mechanism to increase the performance of workflow execution. This paper also implemented and compared five scheduling algorithms as to their performance on testing graph set,and got some useful advices of algorithm selection. Real application shows that engine can support various execution strategies and resource scheduling solution can help increase efficiency of workflow execution in supercomputing environment.
引文
[1] Top 500. org. Top 500 list[EB/OL].(2017-11-30)[2018-05-08].https://www. top500. org/lists/2017/11/.
[2] Obama. NSCI executive order 13702[EB/OL].(2017-07-20)[2018-05-08]. https://obamawhitehouse. archives. gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative.
[3]肖飞,张为华,王东辉.面向科学过程的工作流技术研究现状与趋势[J].计算机应用研究,2011,28(11):4013-4019.(Xiao Fei,Zhang Weihua,Wang Donghui. Overview of workflow technology in scientific process[J]. Application Research of Computers,2011,28(11):4013-4019.)
[4] Ludascher B,Altintas I,Berkley C,et al. Scientific workflow management and the Kepler system[J]. Concurrency and Computation:practice&Experience,2006,18(10):1039-1065.
[5] Deelman E,Singh G,Su M H,et al. Pegasus:a framework for mapping complex scientific workflows onto distributed systems[J]. Scientific Programming,2005,13(3):219-237.
[6] Wolstencroft K,Haines R,Fellows D,et al. The Taverna workflow suite:designing and executing workflows of Web services on the desktop,Web or in the cloud[J]. Nucleic Acids Research,2013,1(1):1-5.
[7]王红霞.网格工作流引擎的设计与实现[J].计算机工程与设计,2011,32(2):430-433.(Wang Hongxia. Design and realization of grid workflow engine[J]. Computer Engineering and Design,2011,32(2):430-433.)
[8]沈瑜,李娟,常飚,等.高性能计算机统一资源管理系统的设计与实现[J].计算技术与自动化,2014,33(1):83-90.(Shen Yu,Li Juan,Chang Biao,et al. Design and implementation of the uniform resource management system of HPC[J]. Computing Technology and Automation,2014,33(1):83-90.)
[9] Topcuoglu H,Hariri S,Wu Minyou. Performance effective and low complexity task scheduling for heterogeneous computing[J]. IEEE Trans on Parallel and Distributed System,2002,13(3):260-274.
[10]Shi Zhiao,Dongarra J J. Scheduling workflow applications on processors with different capabilities[J]. Future Generation Computer Systems,2006,22(6):665-675.
[11] Kwok Y K,Ahmad I. Dynamic critical-path scheduling an effective technique for allocating task graphs to multiprocessors[J]. IEEE Trans on Parallel and Distributed System,1996,7(5):506-521.
[12]Rahman M,Venugopal S,Buyya R. A dynamic critical path algorithm for scheduling scientific workflow applications on global grids[C]//Proc of the 3rd IEEE International Conference on e-Science and Grid Computing. Piscataway,NJ:IEEE Press,2007:35-42.
[13]Rahman M,Hassan R,Ranjan R,et al. Adaptive workflow scheduling for dynamic grid and cloud computing environment[J]. Concurrency and Computation:Practice&Experience,2013,25(13):1816-1842.
[14] Chan W Y,Li C K. Heterogeneous dominant sequence cluster(HDSC):a low complexity heterogeneous scheduling algorithm[C]//Proc of IEEE Pacific Rim Conference on Communications,Computers and Signal Processing. Piscataway,NJ:IEEE Press,1997:956-959.
[15]Amalarethinam G,Selvi F K M. A minimum makespan grid workflow scheduling algorithms[C]//Proc of Conference on Computer Communication and Informatics. Piscataway,NJ:IEEE Press,2012:1-6.
[16]Patil V A,Chaudhary V. Rack aware scheduling in HPC data centers:an energy conservation strategy[J]. Cluster Computing,2013,16(3):559-573.
[17] Chen Wei,Lee Y C,Fekete A,et al. Adaptive multiple-workflow scheduling with task rearrangement[J]. The Journal of Supercomputing,2015,71(4):1297-1317.
[18]Wu Fuhui,Wu Qingbo,Tan Yusong. Workflow scheduling in cloud:a survey[J]. Journal of Supercomputing,2015,71(9):1-46.
[19] Suter F. DAGGEN[EB/OL].(2017-07-26)[2018-05-08]. https://github. com/frs69wq/daggen.
[20]Casanova H,Giersch A,Legrand A,et al. Versatile,scalable,and accurate simulation of distributed applications and platforms[J]. Journal of Parallel and Distributed Computing,2014,74(10):2899-2917.