Parallelizing SOR for GPGPUs using alternate loop tiling
详细信息查看全文 | 推荐本文 |
摘要
Gauss-Seidel and SOR, which are widely used smoothers in multigrid methods, are difficult to parallelize, particularly on GPGPUs due to the existence of DOACROSS data dependences. In this paper, we present a new parallel SOR method that admits more efficient data-parallel SIMD execution than red-black SOR on GPGPUs. Our solution is obtained non-conventionally, by starting from a K-layer SOR method and then parallelizing it by applying a non-dependence-preserving scheme consisting of a new domain decomposition technique followed by a loop tiling technique called alternate tiling. Despite its relatively slower convergence, our new method outperforms red-black SOR by making a better balance between data reuse and parallelism and by trading off convergence rate for SIMD parallelism. Our experimental results highlight the importance of synergy between domain experts, compiler optimizations and performance tuning in maximizing the performance of PDE-like DOACROSS loops on GPGPUs.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700