Improved CUDA programs for GPU computing of Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models

详细信息查看全文

作者：Yukihiro Komura^a ; ^{yukihiro.komura@riken.jp" class="auth_mail" title="E-mail the corresponding author} ; Yutaka Okabe^b
关键词：Monte Carlo simulation ; Cluster algorithm ; Ising model ; XY model ; Parallel computing ; GPU
刊名：Computer Physics Communications
出版年：2016
出版时间：March 2016
年：2016
卷：200
期：Complete
页码：400-401
全文大小：298 K

文摘

We present new versions of sample CUDA programs for the GPU computing of the Swendsen–Wang multi-cluster spin flip algorithm. In this update, we add the method of GPU-based cluster-labeling algorithm without the use of conventional iteration (Komura, 2015) to those programs. For high-precision calculations, we also add a random-number generator in the cuRAND library. Moreover, we fix several bugs and remove the extra usage of shared memory in the kernel functions.

New version program summary

Program title: SWspin_v2_0

Catalogue identifier: AERM_v2_0

Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v2_0.html

Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland

Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html

No. of lines in distributed program, including test data, etc.: 6337

No. of bytes in distributed program, including test data, etc.: 26316

Distribution format: tar.gz

Programming language: C, CUDA.

Computer: System with an NVIDIA CUDA enabled GPU.

Operating system: No limits (tested on Linux).

RAM: About 2MiB for the parameters used in the sample programs.

Classification: 23.

Catalogue identifier of previous version: AERM_v1_0 Journal reference of previous version: Comput. Phys. Comm. 185(2014)1038

Does the new version supersede the previous version?: No

Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q $q$ -state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices.

Solution method: GPU-based Swendsen–Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [K.A. Hawick, A. Leist, and D. P Playne, Parallel Computing 36 (2010). 655–678], that by Kalentev et al. [O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615–620], and that by Komura [Y. Komura, Comput. Phys. Comm. 194 (2015) 54–58].

Reasons for new version:

1.: Adding the method of GPU-based cluster-labeling algorithm without the use of conventional iteration [1].
2.: Adding a random-number generator in the cuRAND library [2] for high-precision calculations.
3.: Fixing several bugs and removing the extra usage of shared memory in the kernel functions.

Summary of revisions:

1.

Recently, we proposed the GPU-based cluster-labeling algorithm without the use of conventional iteration [1]. This cluster-labeling algorithm does not require an iterative method of comparison with the nearest-neighbor sites. The number of comparisons with the nearest-neighbor site in this method is one for a two dimensional system and two for a three-dimensional system if periodic boundary conditions are not employed. To realize this cluster-labeling algorithm, the atomic function, which is performed without interference from any other threads, is needed.

Now, we explain about the added part of programs. In this update, we add the GPU-based cluster-labeling algorithm without the use of conventional iteration as a direct-type algorithm [1] to the present programs. This cluster-labeling algorithm consists of four steps: (i) initialization (ii) analysis (iii) label reduction (iv) analysis, and we add those steps to the present programs for the cluster-labeling algorithm, we can choose the algorithm of Hawick et al. [3] (algorithm = 0), the algorithm by Kalentev et al. [4] (algorithm = 1), or the algorithm by Komura [1] (algorithm = 2). The kernel function

device_function_init_Y K;

is a function for the step of active bond generation, which corresponds to the step of initialization for the algorithm of Komura. Three kernel functions

device_function_analysis_Y K;

device_ReduceLabels;

device_function_analysis_Y K;

are used in the step of cluster labeling for the algorithm of Komura. Those functions correspond to the steps of analysis, label reduction, and analysis, respectively. The cluster-labeling algorithm of Komura does not require an iterative method such as the cluster-labeling algorithms of Hawick et al. and Kalentev et al. The kernel functions

device_function_spin_select;

device_function_spin_flip_Y K;

are used in the step of spin flip.

2.

In the previous programs, we used a linear congruential random-number generator which was proposed by Preis et al. [5]. The computational cost and the usage of memory of the linear congruential random-number generator are low. However, the higher-quality random-number generator is needed for high-precision Monte Carlo simulations. In the CUDA, the cuRAND library [2], which focuses on the simple and efficient generation of high-quality pseudorandom and quasirandom numbers, is provided. In this update, we modify a linear congruential random-number generator to a random-number generator in the cuRAND library. We use a random-number generator of the host API in the cuRAND library. The kernel function

curandCreateGenerator;

is used to create a random-number generator. In the sample programs, we use an XORWOW pseudorandom generator [6]. However we can choose other random-number generators by changing the parameter of this function. The kernel function

curandGenerateUniformDouble;

is used to generate random numbers. The generated random numbers are stored in the d_random_data. For high-precision Monte Carlo simulations, we use the random numbers of double precision.

Restrictions: The system size is limited depending on the memory of a GPU. Since the usage of memory in the present programs is increased compared with that of the previous programs, the maximum system size of the present programs is smaller than that of the previous programs.

Running time: For the parameters used in the sample programs, it takes about a minute for each program. The computational time depends on the system size, the number of Monte Carlo steps, etc.

References:

[1]: Y. Komura, GPU-based cluster-labeling algorithm without the use of conventional iteration: Application to the Swendsen–Wang multi-cluster spin flip algorithm, Comput. Phys. Comm. 194 (2015) 54–58.
[2]: cuRAND CUDA Toolkit Documentation, http://docs.nvidia.com/cuda/curand/.
[3]: K.A. Hawick, A. Leist, D. P Playne, Parallel Graph. Component Labeling with GPUs and CUDA, Parallel Computing 36 (2010) 655–678.
[4]: O. Kalentev, A. Rai, S. Kemnitzb, R. Schneider, Connected component labeling on a 2D grid using CUDA, J. Parallel Distrib. Comput. 71 (2011) 615–620.
[5]: T. Preis, P Virnau, W. Paul, J.J. Schneider, GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model, J. Comp. Phys. 228 (2009) 4468–4477.
[6]: G. Marsaglia, Xorshift RNGs, Journal of Statistical Software 8 (2003).

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700