面向磁流体动力学方程组的异构众核全隐求解器研究

英文篇名：RESEARCH ON HETEROGENEOUS MANY-CORE FULLY-IMPLICIT SOLVER FOR MHD DYNAMICAL EQUATIONS
作者：刘芳芳 ; 陈道琨 ; 杨超 ; 赵玉文
英文作者：Liu Fangfang;Chen Daokun;Yang Chao;Zhao Yuwen;Institute of Software, Chinese Academy of Sciences;University of Chinese Academy of Sciences;Peking University;Institute of Software,Chinese Academy of Sciences;
关键词：二维磁场重联 ; 磁流体动力学 ; 异构众核 ; 全隐求解器 ; 神威太湖之光 ; 申威26010处理器
英文关键词：2D Magnetic reconnection;;Magnetohydrodynamics;;heterogeous manycore computing;;Sunway TaihuLight;;SW 26010
中文刊名：SZJS
英文刊名：Journal on Numerical Methods and Computer Applications
机构：中国科学院软件研究所并行软件与计算科学实验室;中国科学院大学;北京大学数学科学学院;
出版日期：2019-03-14
出版单位：数值计算与计算机应用
年：2019
期：v.40
基金：国家重点研发计划高性能计算重点专项(2016YFB0200603);; 国家自然科学基金(91530323)资助
语种：中文;
页：SZJS201901004
页数：17
CN：01
ISSN：11-2124/TP
分类号：36-52

摘要

磁流体动力学方程组被广泛应用于受控核聚变装置托卡马克、天体物理、磁流体发电等问题的研究中,其往往具有非线性、多尺度、多物理等特征,大规模数值难度较大.目前国际上对不可压缩流体问题的大规模数值求解主要采用全隐或半隐方法,但都是在同构的超级计算机而不是目前主流的异构众核系统上进行计算.论文面向国产神威"太湖之光"超级计算机,开展面向磁流体动力学方程组的异构众核全隐求解器研究.针对Newton-Krylov这类全隐求解器,提出了面向申威26010众核处理器的异构众核并行算法,并对其核心函数开展了众核并行和优化.对核心函数稀疏矩阵向量乘采用Matrix Free的方法来提升性能,对稀疏三角求解采用基于几何信息的异构众核并行算法,针对其访存密集的特点提出了存储格式、数据读取与计算依赖分离、核间寄存器通信等多种优化方法,对非线性残差计算等stencil类计算及10多个向量函数进行了异构众核并行,该异构众核并行算法可被其它应用软件重用.论文采用二维磁场重联问题进行测试,实验结果表明16进程时加速比可达13.6倍,能够支持高分辨率长时间模拟,并准确捕捉磁场重联现象.另外整体并行扩展性已经达到53万核,强可扩展性并行效率达到了33.8%,弱可扩展性并行效率达到了80.7%.
The system of magnetohydrodynamics(MHD) equations is usually nonlinear, multiscale and multi-physics, and therefore hard to solve numerically. It is often considered to use fully implicit or semi-implicit method for Incompressible MHD problems, but most efforts have been done on homogeneous supercomputers instead of modern heterogeneous ones. This paper mainly aims at the Chinese homegrown heterogeneous supercomputer,Sunway TaihuLight, and carries out the research on optimizing the fully implicit solver for MHD dynamical equations. This paper proposes both heterogeous parallel algorithms and optimization methods for the kernel functions such as SpMV and SpTRSV in the fully implicit solvers, and develops some optimization technologies for many other kernels for the stencil and vector computations, such as Matrix Free, dynamic-static buffer, register communication. Numerical experiment shows that the speedup can reach about 13.6 times using 16 process, which enables high-resolution long-time simulations for accurately capturing the phenomenon of 2 D magnetic reconnection. In addition, we are able to scale the fully implicit solver to more than 532,480 cores, with a strong scalability of 33.8%, and a weak scalability of 80.7%.

引文

[1] Cai X C, Sarkis M. A restricted additive Schwarz preconditioner for general sparse linear systems[J]. Siam journal on scientific computing, 1999, 21(2):792-797.
    [2] Chacon L, Knoll D A, Finn J M. An implicit, nonlinear reduced resistive MHD solver. Journal of Computational Physics, 2002, 178(1):15-36.
    [3] Chacon L, Knoll D A, A 2D high-Hall MHD implicit nonlinear solver, Journal of Computational Physics, 2003, 188:573-592.
    [4] Fitzpatrick R. Scaling of forced magnetic reconnection in the Hall-magnetohydrodynamical Taylor problem with arbitrary guide field[J]. Physics of plasmas, 2004, 11(8):3961-3968.
    [5] Reynolds D R, Samtaney R, Woodward C S. A fully implicit numerical method for single-fluid resistive magnetohydrodynamics. Journal of Computational Physics, 2006, 219(1):144-162.
    [6] Ovtchinnikov S, Dobrian F, Cai X C, et al. Additive Schwarz-based fully coupled implicit methods for resistive Hall magnetohydrodynamic problems. Journal of Computational Physics, 2007,225(2):1919-1936
    [7] Jardin S C. Review of implicit methods for the magnetohydrodynamic description of magnetically confined plasmas[J]. Journal of Computational Physics, 2012, 231(3):822-838.
    [8] Chacon L, Stanier A. A scalable, fully implicit algorithm for the reduced two-field low-βextended MHD model. Journal of Computational Physics, 2016, 326:763-772.
    [9] Liitjens H, Luciani J F. XTOR-2F:a fully implicit Newton-Krylov solver applied to nonlinear 3D extended MHD in tokamaks. Journal of Computational Physics, 2010, 229(21):8130-8143.
    [10] Shadid J N, Pawlowski R P, Banks J W, et al. Towards a scalable fully-implicit fully-coupled resistive MHD formulation with stabilized FE methods. Journal of Computational Physics, 2010,229(20):7649-7671.
    [11] Phillips E G, Elman H C, Cyr E C, et al. A block preconditioner for an exact penalty formulation for stationary MHD[J]. SIAM Journal on Scientific Computing, 2014, 36(6):B930-B951.
    [12] Li L, Zheng W. A robust solver for the finite element approximation of stationary incompressible MHD equations in 3D[J]. Journal of Computational Physics, 2017, 351:254-270.
    [13] Phillips E G, Shadid J N, Cyr E C, et al. Block preconditioners for stable mixed nodal and edge finite element representations of incompressible resistive MHD[J]. SIAM Journal on Scientific Computing, 2016, 38(6):B1009-B1031.
    [14] Marx A, Liitjens H. Hybrid parallelization of the XTOR-2F code for the simulation of two-fluid MHD instabilities in tokamaks. Computer Physics Communications, 2017, 212:90-99.
    [15] Shadid J N, Pawlowski R P, Cyr E C, et al. Scalable implicit incompressible resistive MHD with stabilized FE and fully-coupled Newton-Krylov-AMG. Computer Methods in Applied Mechanics and Engineering, 2016, 304:1-25.
    [16] Rudi J, Malossi A C I, Isaac T, et al. An extreme-scale implicit solver for complex PDEs:highly heterogeneous flow in earth's mantle. Proceedings of the international conference for high performance computing, networking, storage and analysis. ACM, 2015:5.
    [17] Ichimura T, Fujita K, Quinay P E B, et al. Implicit nonlinear wave simulation with 1.08 T DOF and 0.270 T unstructured finite elements to enhance comprehensive earthquake simulation. High Performance Computing, Networking, Storage and Analysis, 2015 SC-International Conferencefor. IEEE, 2015:1-12.
    [18] Yang C, Xue W, Fu H, et al. lOM-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. High Performance Computing, Networking, Storage and Analysis, SC16:International Conference for. IEEE, 2016:57-68.
    [19] Hoemmen M. Communication-avoiding Krylov subspace methods[J]. 2010.
    [20] Ghysels P, Vanroose W. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm[J]. Parallel Computing, 2013.
    [21] Ao Y, Yang C, Wang X, et al. 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight. Parallel and Distributed Processing Symposium(IPDPS), 2017 IEEE International. IEEE, 2017:535-544.
    [22] Ying Cai, Yulong Ao, Chao Yang, Wenjing Ma, and Haitao Zhao. 2018. Extreme-Scale High-Order WENO 20 Simulations of 3-D Detonation Wave with 10 Million Cores. ACM Trans. Archit. Code Optim. 15, 2, Article 26 21(May 2018), 21 pages.
    [23] Anzt H, Chow E, Dongarra J. Iterative sparse triangular solves for preconditioning. European Conference on Parallel Processing. Springer, Berlin, Heidelberg, 2015:650-661.
    [24] Anderson E, Saad Y. Solving sparse triangular linear systems on parallel computers. International Journal of High Speed Computing, 1989, 1(01):73-95.
    [25] Liu W, Li A, Hogg J D, et al. Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides. Concurrency and Computation:Practice and Experience,2017, 29(21):e4244.
    [26] Park J, Smelyanskiy M, Sundaram N, et al. Sparsifying synchronization for high-performance shared-memory sparse triangular solver. International Supercomputing Conference. Springer,Cham, 2014:124-140.
    [27] XinliangWang,Weifeng Liu,Wei Xue, and LiWu. 2018. swSpTRSV:a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures. In PPoPP'18:Principles and Practice of Parallel Programming, February 24-28, 2018, Vienna, Austria. ACM, New York, NY, USA, 16pages.
    [28] Xinliang Wang, Ping Xu, Wei Xue, Yulong Ao, Chao Yang, Haohuan Fu, Lin Gan, Guangwen Yang, and Weiming Zheng. 2018. A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010. In Proceedings of 47th International Conference on Parallel Processing, Eugene,Oregon USA, August 2018(ICPP'18), 11 pages.
    [29] Ao Y, Yang C, Wang X, et al. 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight[C]//Parallel and Distributed Processing Symposium(IPDPS), 2017 IEEE International. IEEE, 2017:535-544.
    [30] Sedaghati N, Mu T, Pouchet L N, et al. Automatic selection of sparse matrix representation on GPUs[C]//Proceedings of the 29th ACM on International Conference on Supercomputing. ACM,2015:99-108.
    [31] Greathouse J L, Daga M. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format[C]//SC14:International Conference for High Performance Computing, Networking,Storage and Analysis. IEEE, 2014:769-780.
    [32] Tan G, Liu J, Li J. Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture[J]. ACM Transactions on Mathematical Software(TOMS), 2018, 44(4):46.
    [33] Liu C, Xie B, Liu X, et al. Towards Efficient SpMV on Sunway Manycore Architectures[C]//Proceedings of the 2018 International Conference on Supercomputing. ACM, 2018:363-373.
    [34]陈荣亮,蔡小川.高可扩展区域分解算法及其在流体模拟和优化问题中的应用[J].中国科学:数学,2016,7:003.
    [35]刘芳芳,杨超,袁欣辉,吴长茂,敖玉龙.面向国产申威26010处理器的SpMV实现与优化.软件学报.http://www.jos.org.cn/1000-9825/0000.htm
    [36] http://www.mcs.anl.gov/petsc/.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700