摘要
激光等离子体粒子模拟广泛用于探索极端物质状态下的科学问题。将一种基于粒子云网格方法的三维等离子体粒子模拟程序LARED-P移植到Intel Xeon Phi协处理器上。在移植的过程中,综合运用了Native和Offload两种编程模式:首先运用Native模式对LARED-P程序中热点计算任务进行优化研究,通过采用SIMD扩展指令使该计算任务获得了4.61倍的加速;然后运用Offload模式将程序移植到CPU-Intel Xeon Phi异构系统上,并通过使用异步数据传输和双缓冲技术分别提升了程序性能9.8%和21.8%。
Plasma simulations have been widely used to exploit scientific problems under extremely situations.The paper ports a particle-in-cell based plasma code,LARED-P,to Intel Xeon Phi co-processor.In order to accomplish this,two modes are employed,i.e.the Native mode and the Offload mode. Firstly,the Native mode is employed to study on the hot computing tasks,which have been accelerated to 4.61times faster by using SIMD extension instructions.Secondly,the Offload mode is employed to transplant the whole code onto a CPU-Intel Xeon Phi heterogeneous system.We also adopt optimizations such as asynchronous data transferring and double buffer technique to improve the performance. And we obtain 9.8%and 21.8%improvement respectively.
引文
[1]Zhu Shao-ping,Zhang Wei-yan.Overview of computer simulation on laser fusion in China[J].Journal of the Korean Physical Society,2006,49:33-38.
[2]Chang Tie-qiang.Laser-plasma interaction and laser fusion[M].Changsha:Hunan Science and Technology Press,1998.(in Chinese)
[3]Ma Yan-yun,Chang Wen-wei,Yin Yan,et al.An object-oriented 3Dparallel simulation program PLASIM3D[J].Chinese Journal of Computation Physics,2004,21(3):305-311.(in Chinese)
[4]Mo Ze-yao,Xu Lin-bao,Zhang Bao-lin,et al.Parallel computing and performance analysis for 2D plasma simulation with particle clouds in cells method[J].Chinese Journal of Computation Physics,1999,16(5)496-504.(in Chinese)
[5]Cao Xiao-lin,Zheng Chun-yang,Zhang Ai-qing,et al.Program development of 3Dplasma simulation oriented thousands of processors[J].Chinese Journal of Progress in Natural Science,2009,1(5):544-550.(in Chinese)
[6]Liu Lai-guo,Xu Wei-xia,Yang Can-qun,et al.Accelerating LARED-P algorithm based on GPU[J].Computer Engineering&Science,2009,31(A01):59-63.(in Chinese)
[7]Stantchev G,Dorland W,Gumerov N.Fast parallel particleto-grid interpolation for plasma PIC simulations on the GPU[J].Journal of Parallel and Distributed Computing,2008,68(10):1339-1349.
[8]IntelXeon Phi TMCoprocessor System Software Developers Guide[R].SKU#328207-001EN,2012.
[9]Yang Can-qun,Wu Qiang,Hu Hui-li,et al.Fast weighting method for plasma PIC simulation on the GPU-accelerated heterogeneous systems[J].Journal of Central South University,2013,20(6):1527-1535.
[10]MPI-2:Extensions to the message-passing interface[EB/OL].[2012-05-16].http://micro.ustc.edu.cn/Linux/MPI/mpi-20.pdf.
[11]Schulz K W,Ulerich R,Malaya N,et al.Early experiences porting scientific applications to the many integrated core(MIC)platform[C]∥Proc of the 2012 Highly Parallel Computing Symposium,2012:1.
[2]常铁强.激光等离子体相互作用与激光聚变[M].长沙:湖南科学技术出版社,1998.
[3]马燕云,常文蔚,银燕,等.三维面向对象的并行粒子模拟程序PLASIM3D[J].计算物理,2004,21(3):305-311.
[4]莫则尧,许林宝,张宝琳,等.二维等离子体模拟粒子云网格方法的并行计算与性能分析[J].计算物理,1999,16(5):496-504.
[5]曹小林,郑春阳,张爱清,等.面向数千处理器的三维等离子体粒子模拟程序研制[J].自然科学进展,2009,19(5):544-550.
[6]刘来国,徐炜遐,杨灿群,等.基于GPU的LARED-P算法加速[J].计算机工程与科学,2009,31(A01):59-63.