专用计算集群组环境中作业管理调度系统的设计与实现

英文题名：The Design and Implementation of a Job Management and Scheduling System in a Special Computing Cluster Group
作者：张毅
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：集群 ; 调度 ; web ; 文件系统
英文关键词：cluster ; schedule ; migration ; file system
学位年度：2005
导师：龚正虎
学科代码：081203
学位授予单位：国防科学技术大学
论文提交日期：2005-12-01

摘要

高性能计算集群系统(HPC cluster)具有强大的并行计算能力和规模处理能力,能够很好地满足各类应用需求。大规模并行计算是集群系统的主要应用模式,但是,大量批作业提交与处理也是一种较广泛的应用模式。大量批作业应用模式下需要采取一些有针对性的资源分配和调度策略来优化集群系统资源利用率,这是本文的研究重点。
     本文针对我单位的专用于计算流体力学(CFD)批量作业的计算集群组环境下作业调度和运行效率优化问题开展研究,主要研究内容包括专用计算集群组中作业调度技术、作业迁移技术、系统快速备份和恢复技术以及作业提交管理技术四个方面。我单位计算机系统由多个专用于CFD的计算集群系统构成,因而称之为”专用计算集群组”。
     为了解决资源保障计划和作业内存使用效率问题,作者设计并实现了一套专用的作业调度系统,制订了有针对性的作业调度算法,提出了对实际使用内存持续变化的运行作业的内存估计值算法。
     针对集群系统之间负载平衡问题,作者及课题组研制了集群系统间的作业迁移管理系统,实现了集群组之间的资源保障计划和作业通过迁移利用集群组空闲资源的机制,制订了迁移目标比较算法。
     该系统采用了PVFS并行文件系统来提高大规模集群的I/O性能。针对影响PVFS可用性的关键问题,作者提出并实现了一种系统快速备份和恢复技术。
     本文还讨论了基于WEB的集群作业提交管理系统的设计方案。
     上述研究成果已得到实际应用,取得良好效果。繁忙期间的系统利用率从集群系统初建时的80%左右提高到95%以上,只要集群系统中有CPU空闲就不会出现作业排队等待的现象。
The HPC cluster system has powerful parallel computing and large-scale batch computing ability. Thus, it can meet various application requirements. Large-scale parallel computation is main model of cluster-based applications. At the same time, large-scale batch computation also is an important model of cluster-based applications. When a lot of jobs are submitted to clusters, special resource allocation and scheduling policies need to be implemented for system optimization. The paper focuses on this subject.
     This paper researches the optimization of job scheduling and running efficiency in the special multi-cluster environment for CFD, and discusses the techniques of job scheduling, job migration, file system backup and restoration as well as web-based job submitting. The computer sysytem in my unit consists of a group of computing clusters, and it is called the special cluster group.
     For the problems of resource reserving plan and memory use efficiency, the author designs and implements a job scheduling system with dedicated job scheduling algorithm, and puts forward a special algorithm to estimate the memory usage value of the jobs whose memory usage are changing continuesly.
     For the problems of load balance between clusters, the author and his research team build a job migration management system which is able to work with dedicated job scheduling system of every cluster, set up the multi-cluster resource reserving plan and the mechanism to enable jobs to utilize the idle resources in whole cluster group by migrating, and formulates a algorithm to compare migrating destinations.
     The system selects the PVFS parallel file system to increase the I/O performance of large-scale clusters. To solve the key problems that bring low the availability of PVFS, the author proposes an original technique to rapidly backup and restore the file system.
     This paper also introduces the design of the web-based job submitting and management system.
     The above research results are used in my unit, and get a good effect. At the busy time, the utilization of the system resources increases from nearly 80% to 95% or up, and there is no job queuing problem when one or more CPUs are idle.

引文

[1] E. Strohmaier, J.J. Dongarra. Recent Trends in the Marketplace of High Performance Computing http://www.cecs.uci.edu/~mingl/RecentHPCTrends&MPP.ppt#256,1
    [2] Lawrence Berkeley,Jack Dongarra.High Performance Computing Clusters, Constellations, MPPs, and Future Directions http://www.nersc.gov/~simon/Papers/CACM3rdTry4-1.pdf
    [3] It’s Magic: SourceMage GNU/Linux as HPC Cluster OS http://clustis.idi.ntnu.no/LinuxTag-2003/
    [4] Phil Merkey, Beowulf History. http://www.beowulf.org/overview/
    [5] Luis Ferreira, Christopher Turcksin. xCAT-Redbook. http://www.liniac.upenn.edu/software/
    [6] Rawn Shah .Linux 集群大全. http://www.cngnu.org/technology/c496/260.html
    [7] Albeaus Bayucan,Robert L. Henderson. PBS External Reference Specification. http://www.physics.umd.edu/tqhn/pbs/pbs_ers.pdf
    [8] Albeaus Bayucan,Casimir Lesiak. PBS Internal Design Specification. http://www.physics.umd.edu/tqhn/pbs/pbs_ids.pdf
    [9] Albeaus Bayucan, Robert L. Henderson. OpenPBS Release 2.3 Administrator Guide. http://www.openpbs.org/UserArea/admin.html
    [10] TORQUE Admin Manual. http://www.clusterresources.com/products/torque/docs20/torqueadmin.shtml
    [11] Maui Administrator's Guide. http://www.clusterresources.com/products/maui/docs/mauiadmin.pdf
    [12] Michael Beck等著,张瑜,杨继萍等译. Linux 内核编程指南(第3版).清华大学出版社,2004
    [13] 胡凯,宋京民等著.网络计算新技术. 科学出版社,2001
    [14] 徐志伟,冯百明,李伟编著.网格计算技术.电子工业出版社,2004
    [15] MySQL Reference Manual. http://dev.mysql.com/doc/
    [16] Paul Lu. A Web-based interface to the portable batch system. http://www.cs.ualberta.ca/~pinchak/PBSWeb/
    [17] Garth A. Gibson, Brent B. Welch. Object Storage: Scalable Bandwidth for HPC clusters. http://www.linuxclustersinstitute.org
    [18] RyanB.Bloom 编著,袁勤勇,何欣译.Apache Server 2.0 技术参考大全. 清华大学出版社,2003
    [19] Leon Atkinson著,陈虹译. PHP核心编程. 清华大学出版社, 2000
    [20] Tobias Ratschiller,Till Gerken著,陈军、龙浩、李向荣译.PHP 4.0 Web开发技术指南.机械工业出版社,2001
    [21] 周靖,许青松著. MySQL核心编程-高级开发者指南. 清华大学出版社, 2003
    [22] Evi Nemeth ,Garth Snyder著,张辉译.Linux 系统管理技术手册.人民邮电出版社.2004
    [23] Randal K. Michael 著.詹文军,邓波等译.精通 UNIX Shell 脚本编程
    [24] Tavis Barr.Linux NFS?HOWTO. http://nfs.sourceforge.net/nfs-howto/
    [25] Moshe Bar著. Linux文件系统 . 清华大学出版社,2003
    [26] The Official Red Hat Linux Reference Guide .http://www.redhat.com/docs
    [27] Brian Hall. Beej's Guide to Network Programming Using Internet Sockets. http://beej.us/guide/bgnet/output/html/index.html
    [28] Unix环境下的Socket编程. http://fanqiang.chinaunix.net/a4/b7/20010626/150001679.html
    [29] User and kernel level Checkpointing PROGRESS project. http://www.sun.com/products-n-solutions/edu/events/archive/hpc/2003presentations/phoenix/sat_nmeyer.pdf
    [30] GPFS: A Parallel File System . http://www.redbooks.ibm.com/redbooks/SG245165.html
    [31] High Availability Cluster Multiprocessing (HACMP). http://www.linux-ha.org/
    [32] P.H.Carns, W.B.Ligon III. PVFS: A Parallel File System For Linux Clusters. http://www.parl.clemson.edu/pvfs/el2000/extreme2000.html
    [33] W.B.Ligon III and R.B.Ross.An Overview of the Parallel Virtual File System. Proceedings of the 1999 Extreme Linux Workshop.1999.06
    [34] R. B. Ross. Providing Parallel I/O on Linux Clusters. Second Annual Linux Storage Management Workshop.2000.10
    [35] Moab Grid Scheduler (Silver) Design Specification. http://www.clusterresources.com/products/mgs/specoverview.shtml
    [36] Moab Grid Scheduler(Silver) Administrator's Guide. http://www.clusterresources.com/products/mgs/docs/silveradmin.shtml
    [37] A Resource Manager for Optimal Resource Selection and Fault Tolerance Service in Grids. http://ieeexplore.ieee.org/Xplore/
    [38] The Anatomy of the Grid: Enabling Scalable Virtual Organizations. http://www.globus.org/alliance/publications/papers.php
    [39] Viktors Berstis. Fundamentals of Grid Computing. http://www.redbooks.ibm.com/redpapers/pdfs/redp3613.pdf

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700