网格互操作系统的作业管理研究

英文题名：Job Management Research in Grid Interoperability System
作者：许中清
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：网格 ; 作业调度 ; 作业管理 ; 互操作
英文关键词：grid ; job scheduling ; job management ; interoperability
学位年度：2007
导师：李胜利
学科代码：081202
学位授予单位：华中科技大学
论文提交日期：2007-02-01

摘要

随着科技的发展,许多领域对计算资源的需求量越来越大,单台计算机难以满足要求,同时互联网上却存在着大量的分散、异构、自治的资源,需求关系难以平衡,从而造成资源浪费。近年来网格技术已经有了实质性的进展。世界上各所大学和科研中心的科学家正致力于网格计算的发展。虽然这些科学家所从事的项目不同,但是这些项目有相同的首要目标:让用户能访问分布式资源。网格中间件消除了因特网内的信息资源孤岛。然而,每个网格系统专注于不同领域的应用,并且每个系统的整体框架和开发引擎都是异构的。诸多因素导致了多个网格系统之间的新的信息孤岛。
     为了解决网格系统异构的问题,CSGrid(China e-Science Grid)网格互操作系统通过作业管理互操作,信息服务互操作以及安全策略互操作等机制成功地整合了CGSP(中国教育科研网格支撑平台)和VEGA(织女星网格操作系统)。作为CSGrid网格互操作系统的核心模块,作业管理互操作系统负责管理使用计算资源的过程,屏蔽目标系统的异构性,为用户提供一个统一的界面和编程应用接口。此外,基于插件的技术被CSGrid互操作系统所采用。插件层位于CSGrid系统与目标系统之间,接收来自客户端的以统一参数模式的请求,将统一模式转换成与目标系统兼容的参数模式,并将请求传递给目标作业管理系统。
     CSGrid证明了异构网格系统之间在不需要改变的情况下来实现互操作的可行性,实现了网格互操作的基本功能,为网格用户提供透明访问异构网格系统的接口,屏蔽了目标系统的异构性。实验结果显示自适应资源调度算法在访问延迟上比随机作业调度算法低了13.4%。
With the development of science and technology, the demand for computing resources is increasely becoming greater in many fields. Single computer can hardly meet such requirement, meanwhile there are large amounts of heterogeneous and autonomous resources, which are dispersed inwide-area network, so the relation between demand, and supply is so difficult to balance that too much resources have been wasted. There has been a substantial progress in developing grid technologies in the recent years. At universities and research centers world-wide scientists work on the evolution of grid computing. Even if the way differs in many cases one of the principal goals of all those projects is the same: to give users access to distributed resources. Grid middlewares eliminate of the isolated information islands on the network. However, existing grid systems focus on specific areas of application, and these system framework and development engine are heterogeneous. Many factors lead to some new information islands amongst these grid systems.
     To overcome those problems described above, CSGrid (Chinese e-Science Grid) has successfully integrated CGSP(ChinaGrid Support Platform) and VEGA(VEGA GOS) through the mechanisms such as job management interoperability, information interoperability and security strategy interoperability etc. Job management interoperability, the key module in CSGrid, is to manage the whole process of using the computing resources. It smooths the differences of target systems and provides uniform grid interface and API to users. In addition, Plugin-based interoperability technology is applied to the CSGrid project. The plugin layer, between CSGrid and underlying grid systems, receives the request as uniform parameter model from client, and converts this uniform model to target-system-compatible parameter model and pass to the target job management system.
     It is demonstrated that grid interoperation amongst heterogeneous grid systems is feasible without any changes to these systems.CSGrid has implemented the basic function of job management interoperability, supplied a transparent interface to access heterogeneous grid systems. The experiment results show that Adaptive Resource Scheduling is lower by 13.4% than Random Resource Scheduling in the access latency.

引文

[1] Ian Foster, Carl Kesselman, Jeffrey M. Nick, et al. The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration, document of Globus Project of the University of Chicago, 2002, 202~217.
    [2] Foster I, Kesselman. The Grid Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, 2002:259~278.
    [3] Ian Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications, 2003, 11(2):115~128.
    [4] Robert-Rivers, J. The Internet is Hypenning: Partly Just Hype and Partly Alreadly Happening, Telecommunication Journal of Australia, 2002,51(4): 45~54.
    [5] Kaiser, Dossick, Wenyu Jiang, et al. WWW-based Collaboration Environments With Distributed Tool Services. World Wide Web, 1998, 1(1): 13~25.
    [6] Foster I, Kesselman C, Tuecke S. The Anatomy of The Grid: Enabling Scalable Virtual Organizations. Int. J.Supercomputing Applications, 2001, 15(3):125~130.
    [7] Peh, L.S., Ananda, A.L.. The Design and Development of a Distributed Scheduler Agent. In 2001 IEEE Second International Conference on Algorithms and Architectures for Parallel Processing(ICAPP '20). 2001:108~115.
    [8] R. Byrom, B. Coghlan, A. Cooke et al. R-GMA: A Relational Grid Information and Monitoring System. In: Proceedings of the 2nd Grid Workshop, Kraow, Poland. 2002.
    [9] Goel, S., Sharda, H., Taniar, D.. Distributed Scheduler for High Performance Data-centric Systems. In Conference on Convergent Technologies for Asia-Pacific Region. 2003, 3(11): 89~94.
    [10] Michael Rambadt, Philipp Wieder. UNICORE-Globus:Interoperability of Grid Infrastructures. In Special Issue on Grid Computing. 2003: 232~238.
    [11] Czajkowski K, Fitzgerald S, Foster I et al. Grid Information Services for Distributed Resource Sharing. In: Proceedings of the 10th IEEE Int. Symp. High-Performance Distributed Computing (HPDC-10), IEEE Press, August 2001, pp.181~194.
    [12] D. Snelling, S. Van Berghe, G. Laszweski. UNICORE Globus Interoperability Layer. Computing and Informatics, 2002, 21(11): 399~411.
    [13] Vijay Sekhri. US-CMS LCG Interoperability. http://www.ppdg.net/mtgs/28jun04-wb/slides/Ian_Fisk_LCG_inter_PPDG.pdf
    [14] Ka-Po Chow, Yu-Kwong Kwok. On Load Balancing for Distributed Multiagent Computing. In IEEE Transactions on Parallel and Distributed Systems. 2002, 13(8):787~801.
    [15] Q Lian, W,Chen, Z Zhang. On the Impact of Replica Placement to the Reliablility of Distributed Brick Storage Systems. In: Proceedings of the 25th IEEE International Conference on Distributed Computing Systems, 2005:187~196.
    [16] Craig A. Lee, Rich Wolski, Ian Foster et al. A Network Performance Tool for Grid Environments. In: Proceedings of the SC’99, Portlan, Oregon. 1999:71~79.
    [17] Romberg, Mathilde. The UNICORE Grid infrastructure. In Special Issue on Grid Computing. 2002: 149~154.
    [18] C. Lee, C. Kesselman, J. Stepanek et al. The Quality of Service Component for the Globus Metacomputing System.A. Roy. Proceedings of IWQos’98. 1998: 140~142.
    [19] ChinaGrid, http://www.chinagrid.edu.cn
    [20]查礼,李伟,余海燕等.面向服务的织女星网格系统软件设计与评测.计算机学报. 2005, 28(4): 498~504.
    [21]李丙辰,徐志伟. GSML网格编程语言的一种实现方法.计算机研究与发展. 2003, 40(12):1715~1719.
    [22]谈恩华,查礼.织女星网格路由器的应用与改进.计算机研究与发展. 2004,41(12): 2165~2169.
    [23] Li Wei, Xu Zhiwei, Li Bingchen, Gong Yili. The VEGA personal grid: A lightweight grid architecture. In: Proceedings of the IASTED Int. Conf. Parallel and Distributed Computing and System, 2002.
    [24] YiLi Gong, FangPeng Dong, Wei Li et al, VEGA Infrastructure for Resource Discovery in Grids. Journal of Computer Science and Technology, Issue 4(July 2003) Grid computing. 2003: 413~422.
    [25]徐志伟,李伟.织女星网格的系统结构研究.计算机研究与发展. 2002,39(8): 923~929
    [26] J. Frey, T. Tannenbaum, I. Foster et al. Condor-G: A Computation Management Agent for Multi-Institutional Grids. In: Proceedings of The 10th IEEE International Symposium on High Performance Distributed Computing, San Francisco, Califormia, USA, Aug. 2001:55~63.
    [27] M.J. Litzkow, M. Livny, and M.W. Mutka. Condor: A Hunter of Idle Workstations. In: Proceedings of the 8th International Conference on Distributed Computing Systems, 1988:104~111.
    [28] W. Allcock, J. Bester, J. Bresnahan, et al. GridFTP Protocol Specification. GGF GridFTP Working Group Document, September 2002.
    [29] W. Allcock, J. Bresnahan, I. Foster, et al. GridFTP Update January 2002. TechnicalReport, January 2002.
    [30]伍之昂,罗军舟,宋爱波.基于Qos的网格资源管理.软件学报. 2006, 17(11): 2264~2276.
    [31] Ran Zheng, Hai Jin, Qin Zhang, et al. IPGE: Image Processing Grid Environment Using Components and Workflow Techniques. In: Proceedings of the Third International Conference on Grid and Cooperative Computing (GCC 2004), Springer-Verlag Berlin Heidelberg. GCC 2004, LNCS 3251. 2004, 671~ 678.
    [32] Ka-Po Chow, Yu-Kwong Kwok. On Load Balancing for Distributed Multiagent Computing. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(8) : 787~801.
    [33] R. Buyya. High Performance Cluster Computing. Architectures and Systems, Prentice Hall PTR, New Jersey, NJ, 1999,42(5):514~524.
    [34] He XiaoShan, Sun XianHe, Von Laszewski Gregor. QoS Guided Min-Min Heuristic for Grid Task Scheduling. Journal of Computer Science and Technology, 2003, 18(4): 442~451.
    [35] Grosu, D., Chronopoulos, A.T.. Algorithmic Mechanism Design for Load Balancing in Distributed Systems. IEEE Transactions on Parallel and Distributed Systems, 2004, 34(1): 77~84.
    [36] Czajkowski K, Fitzgerald S, Foster I et al. Grid Information Services for Distributed Resource Sharing. In: Proceedings of the 10th IEEE Int. Symp. High-Performance Distributed Computing (HPDC-10), IEEE Press, August 2001: 181~194.
    [37]陈宏伟,王汝传,韩光法.基于移动代理网格计算中任务调度的研究.计算机应用研究. 2004, 12: 45~48.
    [38] J. Novotny. The Grid Portal Development Kit. Concurrency and Computation: Practice and Experience. Special Issue: Grid Computing Environments. 2002, 14(12):1129~1144.
    [39] N Spring, R Wolski. Application Level Scheduling of Gene Sequence Comparison on Metacomputers. In: Proceedings of the 12th ACM Int’l Conference on Supercomputing, Melbourne, Australia, 2001.
    [40] Gong L, Sun X. H., Waston E. Performance Modeling and Prediction of Non-dedicated Network Computing. IEEE Transactions on Computer, September, 2002, 51(9):1041~1055.
    [41] D Arnold , J Dongarral. The Netsolve Environment: Progressing Towards the Seamless Grid. The Conference on Parallel Processing, Toronto, Canada , 2000.
    [42] Rich Wolski, Neil T. Spring, and Jim Hayes. The Network Weather Service: ADistributed Resource Performance, Forcasting Service for Metacomputing. Journal of Future Generation Computing Systems, 2002:341~347.
    [43]郑然,金海,章勤.网格工作流资源层次模型与访问机制.华中科技大学学报(自然科学版). 2006, 34(S): 37~40.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700