We implement the solution of Poisson’s equation using FFT in a cluster of GPUs. We overlap computation, inter-node communication and CPU/GPU communication. The results show good scalability up to 16 GPUs. Our implementation is about 2.5 times faster than an optimized CPU based solver.