A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster
详细信息    查看全文
文摘
The General Purpose Graphics Processing Unit (GPGPU or GPU) has powerful float-point computation ability and is suitable for intensive computing, such as solving large linear systems. The Jacobi Preconditioned Conjugate Gradient method (Jacobi_PCG or JPCG), one type of preconditioned iteration methods for the numerical solution of large sparse linear systems, has advantages of high parallelism and is especially appropriate for implementation on GPUs. On multi-GPU cluster, the matrix–vector multiplication involved in the PCG iteration needs the vector entries generated by current GPU and other GPUs, so the communication between GPUs becomes a major performance bottleneck. In this paper, we study the implementation of the JPCG on multi-GPU cluster. Considering the coarse-grained parallelism between GPUs and the sparsity of matrices arising from the finite element method (FEM), a simple and fast node reordering method is presented to optimize the bandwidth of sparse matrices, resulting in a reduction of the communication between GPUs. This novel reordering method is based on integerized nodal coordinates of FEM mesh and the counting sort algorithm. Additionally, computation and communication are overlapped using CUDA asynchronous memory transfer and MPI_sendrecv communication to further reduce the communication cost. A JPCG solver on multi-GPU cluster is developed using CUDA Fortran. Tests show that this solver has high efficiency and strong scalability.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700