Program title: HASEonGPU
Catalogue identifier: AFAM_v1_0
Program summary URL:class="interref" data-locatorType="url" data-locatorKey="http://cpc.cs.qub.ac.uk/summaries/AFAM_v1_0.html">http://cpc.cs.qub.ac.uk/summaries/AFAM_v1_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland
Licensing provisions: GNU General Public License, version 3
No. of lines in distributed program, including test data, etc.: 84610
No. of bytes in distributed program, including test data, etc.: 3791861
Distribution format: tar.gz
Programming language: C++, Matlab.
Computer: GPU cluster or workstation with CUDA-capable GPUs (compute capability class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0010465516301436&_mathId=si53.gif&_user=111111111&_pii=S0010465516301436&_rdoc=1&_issn=00104655&md5=7011e2fb7d9e40d02ad51876a337ace2" title="Click to view the MathML source">≥2.0class="mathContainer hidden">class="mathCode">).
Operating system: Linux.
Has the code been vectorized or parallelized?: Yes, can utilize 1 CPU core per compatible GPU.
RAM: Several Gb, depending on input size and number of GPUs. 4000000000 bytes (4 GB) per GPU is recommended.
Classification: 4.13, 6.5, 15.
External routines: CUDA, Boost Program Options, OpenMPI
Nature of problem:
The algorithm described by D. Albach in [1, 2] uses ray-tracing techniques and Monte Carlo integration to calculate Amplified Spontaneous Emission (ASE) with high precision. It requires a high number of sampling points as well as a high number of rays to reach the desired results. Additionally, reflections on the upper and lower surfaces of the medium increase the workload by an order of magnitude. On traditional CPU-based systems the computation is time-consuming, which limits the number of simulations that can be performed.
Solution method:
HASEonGPU uses a non-uniform distribution of sampling points within the gain medium to focus computation on areas of interest. This is further improved by combining the Monte Carlo integration with importance sampling [3]. To improve execution time further, the algorithm is highly parallelized to run on a GPU and supports adaptive sampling resolutions and random restarts. It can also be executed in a GPU cluster, where linear scaling is achieved by a coarse-granular load balancing that distributes the workload among all GPUs in a master–worker-scheme over MPI.
Restrictions:
Presently, the number of rays used for the Monte Carlo integration of a single sampling point within the gain medium is limited by the available memory on the GPU (about 108 rays per GB of GPU memory). Furthermore, when using MPI as a workload distribution mechanism, one of the MPI processes will act as a scheduling master and its GPU cannot participate in the computation.
Unusual features:
The software can run on a workstation (threaded) as well as on a large-scale GPU cluster (MPI) that provides the required GPU hardware. The simulation parameters include polychromatic laser pulses as well as surface coatings, cladding, and refractive indices of the gain medium. This also allows the simulation of reflections on the upper and lower surfaces of the medium. If a desired mean square error metric is not met with a set number of rays, the algorithm can automatically increase the number of rays to improve the results.
Additional comments:
The source code also includes a MATLAB script that can be used to call HASEonGPU directly from MATLAB code to integrate it into existing simulation setups. There are also examples included on how to execute HASEonGPU from the command line as well as an example experiment that uses MATLAB and the provided script. More detailed information can be found in the README file.
Running time:
Depending on the number of sampling points, desired sampling resolution for each point, and number of GPUs, the execution time can vary strongly. A typical cylindrical gain medium of 6 cm diameter simulated with 4210 non-uniformly distributed sampling points can be simulated with a sufficient precision in 1 min on a single NVIDIA Tesla K20m GPU. Running time as well as precision can be further optimized through various parameters.
References:
D. Albach, J.-C. Chanteloup, G. l. Touz e, Influence of ASE on the gain distribution in large size, high gain Yb3+ : Y AG slabs, Opt. express 17 (5) (2009) 37923801.
D. Albach, Amplified spontaneous emission and thermal management on a high average-power diode-pumped solid-state laser-the Lucia laser system, Ph.D. thesis, Palaiseau, Ecole polytechnique (2010).
E. C. Anderson, Monte Carlo methods and importance sampling, 1999.