Program title: SWspin_v2_0
Catalogue identifier: AERM_v2_0
Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v2_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland
Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html
No. of lines in distributed program, including test data, etc.: 6337
No. of bytes in distributed program, including test data, etc.: 26316
Distribution format: tar.gz
Programming language: C, CUDA.
Computer: System with an NVIDIA CUDA enabled GPU.
Operating system: No limits (tested on Linux).
RAM: About 2MiB for the parameters used in the sample programs.
Classification: 23.
Catalogue identifier of previous version: AERM_v1_0 Journal reference of previous version: Comput. Phys. Comm. 185(2014)1038
Does the new version supersede the previous version?: No
Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices.
Solution method: GPU-based Swendsen–Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [K.A. Hawick, A. Leist, and D. P Playne, Parallel Computing 36 (2010). 655–678], that by Kalentev et al. [O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615–620], and that by Komura [Y. Komura, Comput. Phys. Comm. 194 (2015) 54–58].
Reasons for new version:
Adding the method of GPU-based cluster-labeling algorithm without the use of conventional iteration [1].
Adding a random-number generator in the cuRAND library [2] for high-precision calculations.
Fixing several bugs and removing the extra usage of shared memory in the kernel functions.
Summary of revisions:
Recently, we proposed the GPU-based cluster-labeling algorithm without the use of conventional iteration [1]. This cluster-labeling algorithm does not require an iterative method of comparison with the nearest-neighbor sites. The number of comparisons with the nearest-neighbor site in this method is one for a two dimensional system and two for a three-dimensional system if periodic boundary conditions are not employed. To realize this cluster-labeling algorithm, the atomic function, which is performed without interference from any other threads, is needed.
Now, we explain about the added part of programs. In this update, we add the GPU-based cluster-labeling algorithm without the use of conventional iteration as a direct-type algorithm [1] to the present programs. This cluster-labeling algorithm consists of four steps: (i) initialization (ii) analysis (iii) label reduction (iv) analysis, and we add those steps to the present programs for the cluster-labeling algorithm, we can choose the algorithm of Hawick et al. [3] (algorithm = 0), the algorithm by Kalentev et al. [4] (algorithm = 1), or the algorithm by Komura [1] (algorithm = 2). The kernel function
device_function_init_Y K;
is a function for the step of active bond generation, which corresponds to the step of initialization for the algorithm of Komura. Three kernel functions
device_function_analysis_Y K;
device_ReduceLabels;
device_function_analysis_Y K;
are used in the step of cluster labeling for the algorithm of Komura. Those functions correspond to the steps of analysis, label reduction, and analysis, respectively. The cluster-labeling algorithm of Komura does not require an iterative method such as the cluster-labeling algorithms of Hawick et al. and Kalentev et al. The kernel functions
device_function_spin_select;
device_function_spin_flip_Y K;
are used in the step of spin flip.
In the previous programs, we used a linear congruential random-number generator which was proposed by Preis et al. [5]. The computational cost and the usage of memory of the linear congruential random-number generator are low. However, the higher-quality random-number generator is needed for high-precision Monte Carlo simulations. In the CUDA, the cuRAND library [2], which focuses on the simple and efficient generation of high-quality pseudorandom and quasirandom numbers, is provided. In this update, we modify a linear congruential random-number generator to a random-number generator in the cuRAND library. We use a random-number generator of the host API in the cuRAND library. The kernel function
curandCreateGenerator;
is used to create a random-number generator. In the sample programs, we use an XORWOW pseudorandom generator [6]. However we can choose other random-number generators by changing the parameter of this function. The kernel function
curandGenerateUniformDouble;
is used to generate random numbers. The generated random numbers are stored in the d_random_data. For high-precision Monte Carlo simulations, we use the random numbers of double precision.
Restrictions: The system size is limited depending on the memory of a GPU. Since the usage of memory in the present programs is increased compared with that of the previous programs, the maximum system size of the present programs is smaller than that of the previous programs.
Running time: For the parameters used in the sample programs, it takes about a minute for each program. The computational time depends on the system size, the number of Monte Carlo steps, etc.
References:
Y. Komura, GPU-based cluster-labeling algorithm without the use of conventional iteration: Application to the Swendsen–Wang multi-cluster spin flip algorithm, Comput. Phys. Comm. 194 (2015) 54–58.
cuRAND CUDA Toolkit Documentation, http://docs.nvidia.com/cuda/curand/.
K.A. Hawick, A. Leist, D. P Playne, Parallel Graph. Component Labeling with GPUs and CUDA, Parallel Computing 36 (2010) 655–678.
O. Kalentev, A. Rai, S. Kemnitzb, R. Schneider, Connected component labeling on a 2D grid using CUDA, J. Parallel Distrib. Comput. 71 (2011) 615–620.
T. Preis, P Virnau, W. Paul, J.J. Schneider, GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model, J. Comp. Phys. 228 (2009) 4468–4477.
G. Marsaglia, Xorshift RNGs, Journal of Statistical Software 8 (2003).