Program title: FMM–Yukawa
Catalogue identifier: AEEQ_v2_0
Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEEQ_v2_0.html
Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland
Licensing provisions: GNU GPL 2.0
No. of lines in distributed program, including test data, etc.: 78 704
No. of bytes in distributed program, including test data, etc.: 854 265
Distribution format: tar.gz
Programming language: FORTRAN 77, FORTRAN 90, and C. Requires gcc and gfortran version 4.4.3 or later
Computer: All
Operating system: Any
Classification: 4.8, 4.12
Catalogue identifier of previous version: AEEQ_v1_0
Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 2331
Does the new version supersede the previous version?: Yes
Nature of problem: To evaluate the screened Coulomb potential and force field of N charged particles, and to evaluate a convolution type integral where the Green's function is the fundamental solution of the modified Helmholtz equation.
Solution method: The new version of fast multipole method (FMM) that diagonalizes the multipole-to-local translation operator is applied with the tree structure adaptive to sample particle locations.
Reasons for new version: To handle much larger particle ensembles, to enable the iterative use of the subroutines in a solver, and to remove potential contention in assignments for parallelization.
Summary of revisions: The software package FMM–Yukawa has been revised and re-organized in data structure, software architecture, programming methods, and user interface. The revision enables more flexible use of the package and economic use of memory resources. It consists of five stages. The initial stage (stage 1) determines, based on the accuracy requirement and FMM theory, the length of multipole expansions and the number of quadrature points for diagonalization, and loads the quadrature nodes and weights that are computed off line. Stage 2 constructs the oct-tree and interaction lists, with adaptation to the sparsity or density of particles and employing a dynamic memory allocation scheme at every tree level. Stage 3 executes the core FMM subroutine for numerical calculation of the particle interactions. The subroutine can now be used iteratively as in a solver, while the particle locations remain the same. Stage 4 releases the memory allocated in Stage 2 for the adaptive tree and interaction lists. The user can modify the iterative routine easily. When the particle locations are changed such as in a molecular dynamics simulation, stage 2 to 4 can also be used together repeatedly. The final stage releases the memory space used for the quadrature and other remaining FMM parameters. Programs at the stage level and at the user interface are re-written in the C programming language, while most of the translation and interaction operations remain in FORTRAN. As a result of the change in data structures and memory allocation, the revised package can accommodate much larger particle ensembles while maintaining the same accuracy-efficiency performance. The new version is also developed as an important precursor to its parallel counterpart on multi-core or many core processors in a shared memory programming environment. Particularly, in order to ensure mutual exclusion in concurrent updates without incurring extra latency, we have replaced all the assignment statements at a source box that put its data to multiple target boxes with assignments at every target box that gather data from source boxes. This amounts to replacing the column version of matrix-vector multiplication with the row version. The matrix here, however, is in compressive representation. Sufficient care is taken in the revision not to alter the algorithmic complexity or numerical behavior, as concurrent writing potentially takes place in the upward calculation of the multipole expansion coefficients, interactions at every level of the FMM tree, and downward calculation of the local expansion coefficients. The software modules and their compositions are also organized according to the stages they are used. Demonstration files and makefiles for merging the user routines and the library routines are provided.
Restrictions: Accuracy requirement is described in terms of three or six digits. Higher multiples of three digits will be allowed in a later version. Finer decimation in digits for accuracy specification may or may not be necessary.
Unusual features: Ready and friendly for customized use and instrumental in expression of concurrency and dependency for efficient parallelization.
Running time: The running time depends linearly on the number N of particles, and varies with the distribution characteristics of the particle distribution. It also depends on the accuracy requirement, a higher accuracy requirement takes relatively longer time. The code outperforms the direct summation method when N750.