New version of hex-ecs, the B-spline implementation of exterior complex scaling method for solution of electron-hydrogen scattering

详细信息查看全文

作者：Jakub Benda ; ^{jakub.benda@seznam.cz" class="auth_mail" title="E-mail the corresponding author} ; Karel Houfek
关键词：Electron&ndash ; hydrogen scattering ; Exterior complex scaling
刊名：Computer Physics Communications
出版年：2016
出版时间：July 2016
年：2016
卷：204
期：Complete
页码：216-217
全文大小：297 K

文摘

We provide an updated version of the program hex-ecs originally presented in Comput. Phys. Commun. 185 (2014) 2903–2912. The original version used an iterative method preconditioned by the incomplete LU factorization (ILU), which–though very stable and predictable–requires a large amount of working memory. In the new version we implemented a “separated electrons” (or “Kronecker product approximation”, KPA) preconditioner as suggested by Bar-On et al., Appl. Num. Math. 33 (2000) 95–104. This preconditioner has much lower memory requirements, though in return it requires more iterations to reach converged results. By careful choice between ILU and KPA preconditioners one is able to extend the computational feasibility to larger calculations.

Secondly, we added the option to run the KPA preconditioner on an OpenCL device (e.g. GPU). GPUs have generally better memory access times, which speeds up particularly the sparse matrix multiplication.

New version program summary

Program title: hex-ecs

Catalogue identifier: AETI_v2_0

Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AETI_v2_0.html

Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland

Licensing provisions: MIT License

No. of lines in distributed program, including test data, etc.: 73693

No. of bytes in distributed program, including test data, etc.: 520475

Distribution format: tar.gz

Programming language: C++11.

Computer: Any recent CPU, preferably 64-bit. Computationally intensive parts can be run on GPU (tested on AMD Tahiti and NVidia TitanX models).

Operating system: Tested on Windows 10 and various Linux distributions.

RAM: Depends on the problem solved and particular setup; KPA test run uses apx. 300 MiB.

Classification: 2.4.

Catalogue identifier of previous version: AETI_v2_0

Journal reference of previous version: Comput. Phys. Comm. 185 (2014) 2903

External routines: GSL [1], UMFPACK [2], BLAS and LAPACK (ideally threaded OpenBLAS [3]).

Does the new version supersede the previous version?: Yes

Nature of problem: Solution of the two-particle Schrödinger equation in central field.

Solution method: The two-electron states are expanded into angular momentum eigenstates, which gives rise to the coupled bi-radial equations. The bi-radially dependent solution is then represented in a B-spline product basis, which transforms the set of equations into a large matrix equation in this basis. The boundary condition is of Dirichlet type, thanks to the use of the exterior complex scaling method, which extends the coordinates into the complex plane. The matrix equation is then solved by preconditioned conjugated orthogonal conjugate gradient method (PCOCG) [4].

Reasons for new version: The original program has been updated to achieve better performance. Also, some external dependencies have been removed (HDF5, FFTW3), which simplifies deployment.

Summary of revisions: We implemented a new preconditioner introduced in [5], both for general CPU and also for an arbitrary OpenCL device (e.g. GPU) conforming to the OpenCL 2.0 specification. Furthermore, many other minor improvements have been made, particularly with the intention of reducing the memory requirements. With appropriate switches the program now does not precompute the used matrices and only calculates their elements on the fly. This is aided also by the vectorized B-spline evaluation function, which can now make use of AVX instructions when a single B-spline is being evaluated at several points. The accompanying tools hex-db and hex-dwba [6] have been also updated to use the shared code base.

Running time: KPA test run — apx. 2 minutes on Intel i7-4790K (4 threads)

References:

[1]: Galassi M. et al, GNU Scientific Library: Reference Manual, Network Theory Ltd., 2003.
[2]: Davis T. A., Algorithm 832: UMFPACK, an unsymmetric-pattern multifrontal method, ACM Trans. Math. Softw. 30 (2004) 196–199.
[3]: Xianyi Z. et al, Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor, 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), 17–19 Dec. 2012.
[4]: van der Vorst H. A., Melissen J. B. M., A Petrov–Galerkin type method for solving Ax=b $A x = b$ , where A $A$ is symmetric complex, IEEE Trans. Magn. 26 (1990) 706–708.
[5]: Bar-On et al., Parallel solution of the multidimensional Helmholtz/Schroedinger equation using high order methods, Appl. Num. Math. 33 (2000) 95–104.
[6]: Benda J., Houfek K., Collisions of electrons with hydrogen atoms I. Package outline and high energy code, Comput. Phys. Commun. 185 (2014) 2893–2902.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700