Revision of FMM–Yukawa: An adaptive fast multipole method for screened Coulomb interactions

详细信息查看全文

作者：Bo Zhang ; Jingfang Huang ; Nikos P. Pitsianis ; Xiaobai Sun
关键词：Fast multipole method ; Screened Coulomb potential ; Yukawa potential ; Diagonal translation ; Exponential sums
刊名：Computer Physics Communications
出版年：2010
出版时间：December 2010
年：2010
卷：181
期：12
页码：2206-2207
全文大小：91 K

文摘

FMM–Yukawa is a mathematical software package primarily for rapid evaluation of the screened Coulomb interactions of N particles in three dimensional space. Since its release, we have revised and re-organized the data structure, software architecture, and user interface, for the purpose of enabling more flexible, broader and easier use of the package. The package and its documentation are available at http://www.fastmultipole.org/, along with a few other closely related mathematical software packages.

New version program summary

Program title: FMM–Yukawa

Catalogue identifier: AEEQ_v2_0

Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEEQ_v2_0.html

Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland

Licensing provisions: GNU GPL 2.0

No. of lines in distributed program, including test data, etc.: 78 704

No. of bytes in distributed program, including test data, etc.: 854 265

Distribution format: tar.gz

Programming language: FORTRAN 77, FORTRAN 90, and C. Requires gcc and gfortran version 4.4.3 or later

Computer: All

Operating system: Any

Classification: 4.8, 4.12

Catalogue identifier of previous version: AEEQ_v1_0

Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 2331

Does the new version supersede the previous version?: Yes

Nature of problem: To evaluate the screened Coulomb potential and force field of N charged particles, and to evaluate a convolution type integral where the Green's function is the fundamental solution of the modified Helmholtz equation.

Solution method: The new version of fast multipole method (FMM) that diagonalizes the multipole-to-local translation operator is applied with the tree structure adaptive to sample particle locations.

Reasons for new version: To handle much larger particle ensembles, to enable the iterative use of the subroutines in a solver, and to remove potential contention in assignments for parallelization.

Summary of revisions: The software package FMM–Yukawa has been revised and re-organized in data structure, software architecture, programming methods, and user interface. The revision enables more flexible use of the package and economic use of memory resources. It consists of five stages. The initial stage (stage 1) determines, based on the accuracy requirement and FMM theory, the length of multipole expansions and the number of quadrature points for diagonalization, and loads the quadrature nodes and weights that are computed off line. Stage 2 constructs the oct-tree and interaction lists, with adaptation to the sparsity or density of particles and employing a dynamic memory allocation scheme at every tree level. Stage 3 executes the core FMM subroutine for numerical calculation of the particle interactions. The subroutine can now be used iteratively as in a solver, while the particle locations remain the same. Stage 4 releases the memory allocated in Stage 2 for the adaptive tree and interaction lists. The user can modify the iterative routine easily. When the particle locations are changed such as in a molecular dynamics simulation, stage 2 to 4 can also be used together repeatedly. The final stage releases the memory space used for the quadrature and other remaining FMM parameters. Programs at the stage level and at the user interface are re-written in the C programming language, while most of the translation and interaction operations remain in FORTRAN. As a result of the change in data structures and memory allocation, the revised package can accommodate much larger particle ensembles while maintaining the same accuracy-efficiency performance. The new version is also developed as an important precursor to its parallel counterpart on multi-core or many core processors in a shared memory programming environment. Particularly, in order to ensure mutual exclusion in concurrent updates without incurring extra latency, we have replaced all the assignment statements at a source box that put its data to multiple target boxes with assignments at every target box that gather data from source boxes. This amounts to replacing the column version of matrix-vector multiplication with the row version. The matrix here, however, is in compressive representation. Sufficient care is taken in the revision not to alter the algorithmic complexity or numerical behavior, as concurrent writing potentially takes place in the upward calculation of the multipole expansion coefficients, interactions at every level of the FMM tree, and downward calculation of the local expansion coefficients. The software modules and their compositions are also organized according to the stages they are used. Demonstration files and makefiles for merging the user routines and the library routines are provided.

Restrictions: Accuracy requirement is described in terms of three or six digits. Higher multiples of three digits will be allowed in a later version. Finer decimation in digits for accuracy specification may or may not be necessary.

Unusual features: Ready and friendly for customized use and instrumental in expression of concurrency and dependency for efficient parallelization.

Running time: The running time depends linearly on the number N of particles, and varies with the distribution characteristics of the particle distribution. It also depends on the accuracy requirement, a higher accuracy requirement takes relatively longer time. The code outperforms the direct summation method when N750.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700