HASEonGPU—An adaptive, load-balanced MPI/GPU-code for calculating the amplified spontaneous emission in high power laser media

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

HASEonGPU—An adaptive, load-balanced MPI/GPU-code for calculating the amplified spontaneous emission in high power laser media

详细信息查看全文

作者：C.H.J. Eckert ; ¹ ; ^{c.eckert@hzdr.de" class="auth_mail" title="E-mail the corresponding author} ; ^{carlchristian.eckert@gmx.de" class="auth_mail" title="E-mail the corresponding author} ; E. Zenker¹ ; ^{e.zenker@hzdr.de" class="auth_mail" title="E-mail the corresponding author} ; M. Bussmann ^{m.bussmann@hzdr.de" class="auth_mail" title="E-mail the corresponding author} ; D. Albach ^{d.albach@hzdr.de" class="auth_mail" title="E-mail the corresponding author}
关键词：Amplified spontaneous emission ; CUDA ; GPU cluster ; Massively parallel ; Monte Carlo integration ; High power laser
刊名：Computer Physics Communications
出版年：2016
出版时间：October 2016
年：2016
卷：207
期：Complete
页码：362-374
全文大小：2244 K

文摘

We present an adaptive Monte Carlo algorithm for computing the amplified spontaneous emission (ASE) flux in laser gain media pumped by pulsed lasers. With the design of high power lasers in mind, which require large size gain media, we have developed the open source code HASEonGPU that is capable of utilizing multiple graphic processing units (GPUs). With HASEonGPU, time to solution is reduced to minutes on a medium size GPU cluster of 64 NVIDIA Tesla K20m GPUs and excellent speedup is achieved when scaling to multiple GPUs. Comparison of simulation results to measurements of ASE in class="mathmlsrc">title="View the MathML source" class="mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0010465516301436&_mathId=si52.gif&_user=111111111&_pii=S0010465516301436&_rdoc=1&_issn=00104655&md5=3f44a89ed8f116d716b3a9a094958ba2">class="imgLazyJSB inlineImage" height="15" width="76" alt="View the MathML source" style="margin-top: -5px; vertical-align: middle" title="View the MathML source" src="/sd/grey_pxl.gif" data-inlimgeid="1-s2.0-S0010465516301436-si52.gif">

class="mathContainer hidden">class="mathCode">

{Y b}^{3 +} : Y AG

ceramics show perfect agreement.

Program summary

Program title: HASEonGPU

Catalogue identifier: AFAM_v1_0

Program summary URL:class="interref" data-locatorType="url" data-locatorKey="http://cpc.cs.qub.ac.uk/summaries/AFAM_v1_0.html">http://cpc.cs.qub.ac.uk/summaries/AFAM_v1_0.html

Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland

Licensing provisions: GNU General Public License, version 3

No. of lines in distributed program, including test data, etc.: 84610

No. of bytes in distributed program, including test data, etc.: 3791861

Distribution format: tar.gz

Programming language: C++, Matlab.

Computer: GPU cluster or workstation with CUDA-capable GPUs (compute capability class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0010465516301436&_mathId=si53.gif&_user=111111111&_pii=S0010465516301436&_rdoc=1&_issn=00104655&md5=7011e2fb7d9e40d02ad51876a337ace2" title="Click to view the MathML source">≥2.0class="mathContainer hidden">class="mathCode"> $\geq 2.0$ ).

Operating system: Linux.

Has the code been vectorized or parallelized?: Yes, can utilize 1 CPU core per compatible GPU.

RAM: Several Gb, depending on input size and number of GPUs. 4000000000 bytes (4 GB) per GPU is recommended.

Classification: 4.13, 6.5, 15.

External routines: CUDA, Boost Program Options, OpenMPI

Nature of problem:

The algorithm described by D. Albach in [1, 2] uses ray-tracing techniques and Monte Carlo integration to calculate Amplified Spontaneous Emission (ASE) with high precision. It requires a high number of sampling points as well as a high number of rays to reach the desired results. Additionally, reflections on the upper and lower surfaces of the medium increase the workload by an order of magnitude. On traditional CPU-based systems the computation is time-consuming, which limits the number of simulations that can be performed.

Solution method:

HASEonGPU uses a non-uniform distribution of sampling points within the gain medium to focus computation on areas of interest. This is further improved by combining the Monte Carlo integration with importance sampling [3]. To improve execution time further, the algorithm is highly parallelized to run on a GPU and supports adaptive sampling resolutions and random restarts. It can also be executed in a GPU cluster, where linear scaling is achieved by a coarse-granular load balancing that distributes the workload among all GPUs in a master–worker-scheme over MPI.

Restrictions:

Presently, the number of rays used for the Monte Carlo integration of a single sampling point within the gain medium is limited by the available memory on the GPU (about 10⁸ rays per GB of GPU memory). Furthermore, when using MPI as a workload distribution mechanism, one of the MPI processes will act as a scheduling master and its GPU cannot participate in the computation.

Unusual features:

The software can run on a workstation (threaded) as well as on a large-scale GPU cluster (MPI) that provides the required GPU hardware. The simulation parameters include polychromatic laser pulses as well as surface coatings, cladding, and refractive indices of the gain medium. This also allows the simulation of reflections on the upper and lower surfaces of the medium. If a desired mean square error metric is not met with a set number of rays, the algorithm can automatically increase the number of rays to improve the results.

Additional comments:

The source code also includes a MATLAB script that can be used to call HASEonGPU directly from MATLAB code to integrate it into existing simulation setups. There are also examples included on how to execute HASEonGPU from the command line as well as an example experiment that uses MATLAB and the provided script. More detailed information can be found in the README file.

Running time:

Depending on the number of sampling points, desired sampling resolution for each point, and number of GPUs, the execution time can vary strongly. A typical cylindrical gain medium of 6 cm diameter simulated with 4210 non-uniformly distributed sampling points can be simulated with a sufficient precision in 1 min on a single NVIDIA Tesla K20m GPU. Running time as well as precision can be further optimized through various parameters.

References:

class="label">[1]: D. Albach, J.-C. Chanteloup, G. l. Touz e, Influence of ASE on the gain distribution in large size, high gain Yb3+ : Y AG slabs, Opt. express 17 (5) (2009) 37923801.
class="label">[2]: D. Albach, Amplified spontaneous emission and thermal management on a high average-power diode-pumped solid-state laser-the Lucia laser system, Ph.D. thesis, Palaiseau, Ecole polytechnique (2010).
class="label">[3]: E. C. Anderson, Monte Carlo methods and importance sampling, 1999.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700