文摘
Large scientific simulations are increasingly composed of multiple parallel computational models to form coherent multiphysics applications. Coupling parallel codes leads to challenges related to their efficient execution on existing high performance computing (HPC) systems. Three common execution models for multiphysics applications are: (1) the creation of a single executable, made up of tightly coupled models, (2) using distributed computing workflows and grid technologies to execute models across multiple machines, and (3) a loosely coupled, multiple executable approach within a single batch allocation. In each case, the computational characteristics of the models and the frequency, volume and medium of data communication determines the appropriate execution model. This dissertation examines how resource management within the batch allocation facilitates the third execution model: multiple-component, multiple-data (MCMD). The inherent concurrency of computational models is leveraged to provide a flexible, efficient and powerful execution environment for MCMD applications.
This dissertation presents a MCMD framework created for a fusion energy project, and a coupled simulation modeling tool to demonstrate the efficiency gains that are possible using sub-batch allocation resource management (SBA-RM) on HPC systems. The design, implementation and resource utilization analysis of a fusion energy simulation framework provide a case-study of the requirements that drive the development of a MCMD framework and some of the constraints that lead to less than desirable resource utilization. To explore the resource utilization problem for other types of workloads, the Resource Usage Simulator (RUS) was developed to experimentally examine the efficiency of MCMD applications with SBA-RM. Finally, this dissertation presents the characteristics of workloads that benefit most from SBA-RM as determined by studies performed with RUS.