网格环境下跨域并行作业资源协同分配的设计和实现

英文题名：The Design and Implementation of Cross-Domain Parallel Jobs' Resource Co-Allocation in Grid Environment
作者：邢少程
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：网格计算 ; 虚拟作业 ; 跨域 ; 资源协同分配
英文关键词：Grid Computing ; Virtual Job ; Cross-Domain ; Resource Co-allocation
学位年度：2009
导师：魏晓辉
学科代码：081201
学位授予单位：吉林大学
论文提交日期：2009-04-01

摘要

随着网格计算技术的不断发展,越来越多的大规模计算问题期望在网格环境下处理,以求得高的运行效率。但是网格环境自身具有异构性、自主性等特点,使得并行程序移植到网格环境下运行会出现同步等待,资源死锁等问题,这些问题给并行程序在网格环境下的执行造成了很多限制。而现有的网格环境下的资源协同分配机制,如MPICH-G2使用的DUROC协议和资源预约的方式,都不能有效解决上述问题。针对这种情况,本文提出了一个在网格环境下实现跨域资源协同分配的模型,即虚拟作业模型——VJM(Virtual Job Model);该模型通过派发虚拟作业(VJob)为真实作业提前抢占资源,使得并行作业的派发和资源的获取相分离,并设计一个对所需资源进行统一管理的机制;并运用这个统一的管理机制有效地解决了负载不均、同步等待、死锁等问题。通过资源选择算法,VJM可以根据不同集群的负载情况来动态地分配资源,优化了VJob的派发结果,提高了并行作业的执行效率。通过生命周期的机制,VJM能够避免由于多个并行作业资源竞争而引起的死锁,而且在发生死锁时还可以通过资源重组的方式来缓解因释放资源而造成的浪费,提高资源的使用率。而且,VJM并不依赖于资源预约机制,因此它可以通过标准的GRAM协议与绝大多数现有的局部调度器协作。最后,我们通过MPICHG2-G2并行应用程序验证了该模型的合理性。
The Research of Grid Computing has become more and more popular, which has many applications in the field of life sciences, high energy physics, and aerospace and so on. The word "Grid" was derived from the power grid, whose aim is to enable users to use computer resources as easy as possible. This model puts network resources, such as computing resources, storage resources, data resources, communicating resources, software resources and so on together, shares them safely and pellucid to making a large high-performance global computer system, thus eliminates the information isolated island and resource isolated island.
     Before the technology of Grid Computing is widely used, the parallel program was usually executed in single cluster for high computing speed. Width the development of Grid Computing, more and more parallel applications look forward to running on the grid environment to achieve higher performance. However, because parallel applications ran on single cluster whose nodes had the same machine structure in the past, which would cause many problems such as synchronization waiting, deadlock, and load balance and so on when migrate them to the autonomous, heterogeneous, distributed Grid environment. The problem would be serious when doing cross-domain resources co-allocation, which reduced the efficiency of the parallel jobs greatly.
     For the resource co-allocation of the cross-domain parallel jobs, the existing manner, such as DOROC and Resource Reservation can’t solve the above problem effectively. The main reason is the lack of a unified cross-domain resource manager, therefore, this paper propose the Virtual Job Model (VJM) which can manage heterogeneous resources and parallel jobs on the grid environment. This model is in the meta-schedule layer, it is able to co-allocate grid resources for the parallel jobs and avoid deadlock. VJM does not depend on the mechanism of resource reservation, so it can collaborate with existing major local scheduler via GRAM protocol, such as OpenPBS and SGE which do not support Resource Reservation. The Resource Selection Algorithm selects an optimized clusters set by computing the minimal waiting time of each job to minimize the whole resource co-allocation time, which can resolve the problem of long synchronization waiting time effetely. In addition, the Resource Reorganization Algorithm can also reduce the resource waste caused by deadlock and improve the utilization of resources.
     In VJM, the parallel jobs' resource request is not satisfied immediately, also the sub-jobs are not submitted immediately, but resolve them after the resource scheduling by VJM. Firstly, VJM will calculate the load conditions of all candidate clusters from VJobs' log file whenever a parallel job's resource request arrives, and calculate the VJob target cluster based on the formula of the Resource Selection Algorithm. Then VJM distributes VJobs based on the result of Resource Selection Algorithm. The main function of VJob is to occupy resource and feed back the information of the resource to the Virtual Job Control Center (VJC). VJC co-allocate resources based on the resources information synchronously, and distribute real jobs to the virtual jobs which have occupied resources finally.
     Because the grid resources are distributed in many administrative domains, on which the various local policies for external jobs are enforced. So these domains are seen as black box, and the detail load information generally unknown to the grid users and hard to measure which will make it difficult to give a reasonable resource co-allocation. Because the waiting time of the local jobs can represent the cluster load condition, we can calculate each cluster's average waiting time of external jobs which can be got from the VJobs' feedback information to measure the load. Another important evaluation argument is the number of involved clusters. When the parallel application is running, its sub jobs need to communicate between each other, the more the clusters involve in, the more cross-domain communication overheads cost. VJM resources VJob selection algorithm is based on the log information, as well as the number of the candidate cluster to measure the distribute cost of each VJob, thereby enhancing the efficiency of resource co-allocation.
     VJM can avoid the resource allocation deadlock by unified resource management. However, the concurrent submission of parallel jobs and the independence of the grid nodes also can cause deadlock between VJM or VJM and other resource co-allocator possibly. Therefore, we proposed a life cycle-based management mechanism to detect deadlock. That is, should not permanently hold resources, when a VJob get the maximum of its life cycle and the resource co-allocation is still not completed, we will consider that deadlock occurs between several parallel jobs. So we need to release resources to break the deadlock.
     However, when releasing resources, there may be other parallel jobs have not get enough resources, so that we do not need to release all the resources. Resources Reorganization Algorithm is the way of exchanging the resources belong to different parallel jobs in the same submitter. The exchange between the VJob needed to release resources and the VJob have not obtain recourse reduce the waste of releasing resources, and also makes the VJob submitted later obtain resources ahead of time , so the algorithm improve the utilization of resources to a certain extent.
     The implementation can be used independently; also can be a resources module as a part of meta-scheduler. VJC composes of the following components: RequestHandler, CerManager, VJobManager, VJobPool, VJobDeliver and VJobControllor. RequestHandler takes charge of resource request and the user control instruction, and then forward them to other related internal components. CerManager is in charge of the authentication of the resource user and resource requester. VJobManager is the core manager of VJob, Resource Selection Algorithm and Resource Reorganization Algorithms are implemented in this component. VJobPool as the container take charge of the storage for VJob information. A VJobProxy is information unit corresponds to a VJob, which contains resource information and VJob status. VJobDeliver is used to distribute VJobs to their target cluster through the GRAM protocol.
     VJobControllor is the component used for the information communicating such as control information or the VJob status information between VJC and VJob. This paper analyses the existing problems when a parallel job running a grid environment, and designs virtual job model (VJM) that will support a cross-domain allocation of resources synchronous. VJM is in scheduling layer, it manages grid parallel jobs and heterogeneous resources. Under the use of virtual jobs, VJM is able to run synchronous parallel jobs in the cross-domain, heterogeneous resources, and avoid the deadlock, and Resource Selection Algorithm and Resource Reorganization Algorithm reduce the resource waste and improve the utilization of resources. Moreover, VJM can work with almost all kinds of local schedulers via standard Grid Resource Allocation and Management (GRAM) protocol as it does not depend on resource reservation. We have validated the rationality of VJM by MPICH-G2, a parallel application.
     In the future research, we will study in depth of the parallel application of each scientific field, summing up the general features of this application, and develop VJM's schedule algorithm based on those features to improve application performance.

引文

[1] Ian Foster and Carl Kesselman. The Grid: Blueprint for a New Computing Infrastructure [M], Second Edition. 2004.
    [2] Nicholas T. Karonis, Brian Toonen, Ian Foster,“MPICH-G2: A Grid-enabled implementation of the Message Passing Interface”, [J] Journal of Parallel and Distributed Computing. 2003.
    [3]易江川,金海等. CGSP2.0中域管理服务设计与实现[J].华中科技大学学报(自然科学版), Vol. 34, pp.59-63, 2006年9月.
    [4]申德荣,于戈登.支持多域动态数据集成的数据库网格系统[J].软件学报(网格计算专刊) Vol. 17, pp.2302-2313, 2006年11月.
    [5] K. Park, H. Lee, Y. Lee, O. Kwon, H. Park, S. Kan,“An Effective Collective Communication method in Grids Using Two Level Latency-Optimal Tree”[C], Computational Science - ICCS 2003, LNCS, Volume 2660,2003.
    [6]卢宇彤,沈志宇等.多域MPI运行环境技术[J].华中科技大学学报(自然科学版), Vol 34, pp.160-163, 2006年9月.
    [7] K. Czajkowski, I. Foster, and C. Kesselman. Resource Co-Allocation in Computational Grids [C]. Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing (HPDC-8), pp. 219-228, 1999.
    [8] Globus Toolkit Website, About the Globus Toolkit[EB] http://www.globus.org/toolkit, 2009.
    [9] Driss Azougagh , Jung-Lok Yu , Seung Ryoul Maeng. Resource Co-Allocation : A Complementary Technique that Enhances Performance in Grid Computing Environment[C]. 11th International Conference on Parallel and Distributed Systems (ICPADS'05), vol. 1, July 2005.
    [10] Jonghun Park. A Scalable Protocol for Deadlock and Livelock Free Co-Allocation of Resources in Internet Computing[C]. 2003 Symposium on Applications and the Internet (SAINT'03), January 2003.
    [11] Lizhe Wang , Wentong Cai , Bu-Sung Lee , Simon See , Wei Jie. Resource Co-allocation for Parallel Tasks in Computational Grids[C]. International Workshop on Challenges of Large Applications in Distributed Environments, June 2003.
    [12] W. Smith, I. Foster, V. Taylor. Scheduling with Advanced Reservations[C]. Proceedings of the IPDPS Conference, May 2000.
    [13] Sun Microsystems, Inc. Sun Grid Engine [EB/OL], http://gridengine.sunsource.net, 2003.
    [14] Z. DING, X. WEI, etc.“VJM- A Deadlock Free Resource Co-allocation Model for Cross Domain Parallel Jobs”[C], HPCAsia2007, Korea, Sep 9-12, 2007.
    [15] GT 2.4 : The Globus Resource Specification Language RSL v1.0 [EB/OL]. http://www.globus.org/toolkit/docs/2.4/gram/rsl_spec1.html, 2009.
    [16]郭绍忠,黄永忠等.用RSL描述网格资源分配[J].信息工程大学学报, Vol 4,No. 3 , 2003年9月.
    [17] GT 2.4 : The Globus Resource Allocation Management [EB/OL] http://www.globus.org/toolkit/docs/2.4/gram/, 2009.
    [18] The Dynamically-Updated Request Online Coallocator (DUROC) v0.8: Function Reference [EB/OL] http://www.globus.org/toolkit/docs/2.4/duroc/, 2009.
    [19] Globus Alliance, Globus Overview of the Grid Security Infrastructure[EB/OL], http://www.globus.org/security/overview.html, 2009.
    [20]马达.跨域资源同步分配模型的设计与实现[C].吉林大学. 2008年9月.
    [21] Message Passing Interface Forum. MPI: A message-passing interface standard [J]. International Journal of Supercomputer Applications, 8(3/4), 1994.
    [22] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, MPICH: A high-performance, portable implementation of the MPI Message Passing Interface Standard [M], in: Parallel Computing, 22(6):789–828, 1996.
    [23] N.T. Karonis, B. Toonen, I. Foster. A MPICH-G2: A Grid-enabled implementation of the Message Passing Interface [J]. Journal of Parallel and Distributed Computing, 2003.
    [24]白雅兰,袁道华. MPICH-G2及其在网格计算中的应用研究[J].计算机与数字工程, Vol 36, No. 11, 2008年11月.
    [25] Wei X., Ding Z, Li WW. "GDIA: A Scalable Grid Infrastructure for Data Intensive Applications"[C], In: 2006 International Conference on Hybrid Information Technology, IEEE CS Press, Cheju Island, Korea, 9-11 Nov, 2006. .

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700