Secondly, we added the option to run the KPA preconditioner on an OpenCL device (e.g. GPU). GPUs have generally better memory access times, which speeds up particularly the sparse matrix multiplication.
Program title: hex-ecs
Catalogue identifier: AETI_v2_0
Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AETI_v2_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland
Licensing provisions: MIT License
No. of lines in distributed program, including test data, etc.: 73693
No. of bytes in distributed program, including test data, etc.: 520475
Distribution format: tar.gz
Programming language: C++11.
Computer: Any recent CPU, preferably 64-bit. Computationally intensive parts can be run on GPU (tested on AMD Tahiti and NVidia TitanX models).
Operating system: Tested on Windows 10 and various Linux distributions.
RAM: Depends on the problem solved and particular setup; KPA test run uses apx. 300 MiB.
Classification: 2.4.
Catalogue identifier of previous version: AETI_v2_0
Journal reference of previous version: Comput. Phys. Comm. 185 (2014) 2903
External routines: GSL [1], UMFPACK [2], BLAS and LAPACK (ideally threaded OpenBLAS [3]).
Does the new version supersede the previous version?: Yes
Nature of problem: Solution of the two-particle Schrödinger equation in central field.
Solution method: The two-electron states are expanded into angular momentum eigenstates, which gives rise to the coupled bi-radial equations. The bi-radially dependent solution is then represented in a B-spline product basis, which transforms the set of equations into a large matrix equation in this basis. The boundary condition is of Dirichlet type, thanks to the use of the exterior complex scaling method, which extends the coordinates into the complex plane. The matrix equation is then solved by preconditioned conjugated orthogonal conjugate gradient method (PCOCG) [4].
Reasons for new version: The original program has been updated to achieve better performance. Also, some external dependencies have been removed (HDF5, FFTW3), which simplifies deployment.
Summary of revisions: We implemented a new preconditioner introduced in [5], both for general CPU and also for an arbitrary OpenCL device (e.g. GPU) conforming to the OpenCL 2.0 specification. Furthermore, many other minor improvements have been made, particularly with the intention of reducing the memory requirements. With appropriate switches the program now does not precompute the used matrices and only calculates their elements on the fly. This is aided also by the vectorized B-spline evaluation function, which can now make use of AVX instructions when a single B-spline is being evaluated at several points. The accompanying tools hex-db and hex-dwba [6] have been also updated to use the shared code base.
Running time: KPA test run — apx. 2 minutes on Intel i7-4790K (4 threads)
References:
Galassi M. et al, GNU Scientific Library: Reference Manual, Network Theory Ltd., 2003.
Davis T. A., Algorithm 832: UMFPACK, an unsymmetric-pattern multifrontal method, ACM Trans. Math. Softw. 30 (2004) 196–199.
Xianyi Z. et al, Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor, 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), 17–19 Dec. 2012.
van der Vorst H. A., Melissen J. B. M., A Petrov–Galerkin type method for solving Ax=b, where A is symmetric complex, IEEE Trans. Magn. 26 (1990) 706–708.
Bar-On et al., Parallel solution of the multidimensional Helmholtz/Schroedinger equation using high order methods, Appl. Num. Math. 33 (2000) 95–104.
Benda J., Houfek K., Collisions of electrons with hydrogen atoms I. Package outline and high energy code, Comput. Phys. Commun. 185 (2014) 2893–2902.