深穿透粒子输运蒙特卡罗模拟的CPU/GPU协同算法研究

英文题名：Research on CPU/GPU Synergetic Algorithm for Monte Carlo Deep Penetration Particle Transport
作者：杨博
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：深穿透粒子输运 ; 蒙特卡罗模拟 ; MCNP ; CPU/GPU协同算法 ; CUDA
英文关键词：deep penetration particle transport ; Monte Carlo method ; MCNP ; CPU/GPU synergic algorithm ; CUDA
学位年度：2011
导师：胡庆丰
学科代码：081202
学位授予单位：国防科学技术大学
论文提交日期：2011-11-01

摘要

近些年,由于GPU在性能和可编程性方面都有很大提升,通用GPU计算以高性价比的优势越来越受人们关注。众多研究人员都将GPU应用于所属领域,GPU的应用领域已从早期的单一图形计算扩展到通用计算,尤其是科学计算领域。粒子输运模拟在国民经济建设和大规模科学工程计算中具有重要应用,粒子输运蒙特卡罗(Monte Carlo,简称MC)方法求解相对于确定性方法在求解某些复杂粒子输运问题时有显著的优势,但往往需要的计算量极大。CPU/GPU异构混合系统的出现为这一问题的解决带来了机遇和挑战。
     本文在现有粒子输运MC模拟算法的基础上,针对CPU/GPU混合异构体系结构的特点,提出了一种面向大规模异构混合系统的深穿透粒子输运MC模拟CPU/GPU协同算法,并实现了该算法与MCNP程序的整合。主要工作如下:
     1)提出一种基于GPU的MCNP伪随机数发生器,采用了与已有MCNP伪随机数发生器相同参数的线性同余法(LCG)来生成随机数,首先通过跳跃法快速为每个线程生成随机数种子,然后利用GPU多线程并行生成多个随机数子序列。相对运行在Intel X5670上的MCNP伪随机数发生器,本文提出的基于GPU的伪随机数发生器在NVIDIA M2050上获得了11倍加速比。
     2)提出一种基于GPU的深穿透粒子输运MC模拟的细粒度数据级并行算法,在MCNP中粒子输运MC模拟算法的基础上,针对GPU的计算和访存特点提出了一种基于粒子数的任务划分方法和高效并行数据结构及和归约方法。给出了几种消除分支和优化存储器的方法,有效的提高了算法在GPU上的性能。相比运行在X5670上的MCNP程序,整合了基于GPU的深穿透粒子输运MC模拟细粒度数据级并行算法的MCNP-GPU程序获得3.4的加速比。
     3)给出了一种针对CPU/GPU混合异构系统的深穿透粒子输运MC模拟CPU/GPU协同算法,在该算法中提出了一种异构节点内部CPU/GPU之间的启发式任务划分方法,在此基础上给出一种针对大规模异构系统的多级任务划分方法,及其与之适应的多级伪随机数发生器和层次归约算法。基于MPI计算环境和CUDA编程模型,将改进后的基于GPU的MCNP伪随机数发生器和深穿透粒子输运MC模拟CPU/GPU协同算法与MCNP整合为MCNP-Hybrid程序,在TianHe-1A的64个节点上对整合后的MCNP-Hybrid程序进行了测试,结果表明该算法具有良好性能和可扩展性。
Over the last decade, the performance and programmability of GPU has been improved greatly. Due to general-purpose GPU computing with advantages such as cost-effective, it is paid more and more attention. Many researchers apply GPU in their field of study, so GPU have evolved from specialty hardware to massively parallel general computation devices. Simulation of particle transport plays an important role in national economical construction and large-scale computing in science and engineering. Monte Carlo (MC) simulation of particle transport owns great advantage over the determined methods to solve some complex types of particle transport, however, the computational complexity of MC method is very huge. CPU/GPU hybrid system has brought great many opportunities and challenges for the solution of that problem.
     On the basis of existing algorithm of MC particle transport, this thesis presents an algorithm based on large-scale hybrid system for MC deep penetration particle transport, which is designed to fit in with the peculiarity of the hybrid system and is well integrated with MCNP. The following is the main work:
     1) A GPU based MCNP pseudo random number generator is proposed, and in the generator LCG method is used with the same parameters of MCNP pseudo random number generator. First, the generator quickly generates the random number seeds of every thread through jump method, and then parallel generates several random number subsequences on GPU threads. Compared with MCNP pseudo random number generator on Intel X5670 6 cores CPU, the GPU based Pseudo random number generator proposed in this paper achieve 11 fold speedup on NVIDIA M2050 GPU.
     2) A GPU based algorithm for deep penetration particle transport MC simulation is proposed, and on the basis of MCNP algorithm for particle transport, a particle number based task decomposition method, high efficiency parallel data structure and reduction method are proposed in the algorithm. This thesis presents some methods to eliminate branch and optimize the usage of GPU memory, which effectively improve the performance of algorithm. Compared MCNP running on X5670, the MCNP-GPU which is MCNP integrated with GPU based algorithm for deep penetration particle transport MC simulation achieves up to 3.4-fold speedup on M2050.
     3) A hybrid system based CPU/GPU synergic algorithm for deep penetration particle transport MC simulation is proposed. In the algorithm a heuristic task decomposition method for CPU and GPU in a hybrid node is proposed, on the basis of which a multi-level task decomposition design to fit in with the peculiarity of the hybrid system is presented, then a multi-level pseudo random number generator and reduction method which are adapt to the multi-level task decomposition. Using MPI and CUDA, the multi-level pseudo random number generator and hybrid system based CPU/GPU synergic algorithm for deep penetration particle transport MC simulation can be integrated with MCNP to form MCNP-Hybrid, and the performance on subsystem of TianHe-1A proves that the synergic algorithm has good performance and scalability.

引文

[1]杜书华.输运问题的计算机模拟.湖南科学技术出版社, 1989.
    [2] von Neumann, J., N. Metropolis, and S. Ulam, Monte Carlo Method. National Bureau of Standards/Applied Math. Series, 1951. 12: p. 36-38.
    [3] Meteopolis, N. and S. Ulam, The monte carlo method. Journal of the American Statistical Association, 1949. 44(247): p. 335-341.
    [4]谢仲生,邓力.中子输运理论数值计算方法.西北工业大学出版社, 2005.
    [5]徐忠济.蒙特卡罗方法.上海:上海科技出版社, 1985.
    [6] Shreider, Y.A., Method of Statistical Testing: Monte Carlo Method1964: Elsevier.
    [7]裴鹿成,张孝泽.蒙特卡罗方法及其在粒子输运问题中的应用.科学出版社, 1980.
    [8] Briesmeister, J.F. and L.A.N. Laboratory, MCNP--A general Monte Carlo code for neutron and photon transport1986: Los Alamos National Laboratory.
    [9] Mattson, T.G., D. Scott, and S. Wheat. A TeraFLOP supercomputer in 1996: the ASCI TFLOP system. 1996. IEEE.
    [10] Pepeljugoski, P., et al. Low power and high density optical interconnects for future supercomputers. 2010. IEEE.
    [11] Kahn, H., Use of different Monte Carlo sampling techniques1955: Rand Corporation.
    [12] Blanchard, O.J. and C.M. Kahn, The solution of linear difference models under rational expectations. Econometrica: Journal of the Econometric Society, 1980: p. 1305-1311.
    [13] Berger, M.J. and J. Doggett, Reflection and transmission of gamma radiation by barriers: Semianalytic Monte Carlo calculation. J. Research Natl. Bur. Standards, 1956. 56.
    [14] Karcher, R., R. Erdmann, and O. Baldonado, THE APPLICATION OF TRACK-LENGTH DISTRIBUTION BIASING IN MONTE CARLO DEEP-PENETRATION CALCULATIONS, 1968, Holmes and Narver, Inc., Los Angeles.
    [15] Lanore, J., WEIGHTING AND BIASING OF A MONTE CARLO CALCULATION FOR VERY DEEP PENETRATION OF RADIATION, 1971, Nuclear Research Center, Fontenay-aux-Roses, France.
    [16] Tang, J. and T. Hoffman, Monte Carlo shielding analyses using an automated biasing procedure. Nucl. Sci. Eng.;(United States), 1988. 99(4).
    [17] Goldstein, M. and E. Greenspan, A recursive Monte Carlo method for estimating importance function distributions in deep-penetration problems. Nucl. Sci.Eng.;(United States), 1980. 76(3).
    [18] Gardner, R., M. Mickael, and K. Verghese, A new direction biasing approach for Monte Carlo simulation. Nuclear Science and Engineering;(USA), 1988. 98(1).
    [19]裴鹿成.蒙特卡罗方法发展中的若干问题.计算物理, 1992. 9(A01): p. 563-566.
    [20] Petrie, L.M. and N.F. Landers, KENO Va: An Improved Monte Carlo Criticality Program with Supergrouping. Sect. F, 1990. 11.
    [21] Hirayama, H., et al., The EGS5 code system2005: United States. Dept. of Energy.
    [22] Wright, G., et al., The status of the general radiation transport code MCBEND. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms, 2004. 213: p. 162-166.
    [23] Lee, Y., et al. CRISTAL Criticality Safety Package Validation: TRIPOLI 4 Monte Carlo code, JEF 2.2 Library and ICSBEP Experiments.
    [24] Arnautova, M., et al., Monte Carlo simulation in nuclear geophysics. Intercomparison of the PRIZMA Monte Carlo program and benchmark experiments. The International journal of radiation applications and instrumentation. Part E. Nuclear geophysics, 1993. 7(3): p. 407-418.
    [25] Fasso, A., et al., FLUKA: a multi-particle transport code, 2005, CERN-2005-10.
    [26] Emmett, M., MORSE-CGA: a Monte Carlo radiation transport code with array geometry capability, 1985, Oak Ridge National Lab., TN (USA).
    [27] Allison, J., et al., Geant4 developments and applications. Nuclear Science, IEEE Transactions on, 2006. 53(1): p. 270-278.
    [28] Brown, F.B., et al., MCNP version 5. Trans. Am. Nucl. Soc, 2002. 87(273): p. 4.
    [29] Howes, L. and D. Thomas, Efficient random number generation and application using CUDA. GPU gems, 2007. 3: p. 805-830.
    [30] Langdon, W. A fast high quality pseudo random number generator for nVidia CUDA. 2009. ACM.
    [31] Langdon, W. A fast high quality pseudo random number generator for graphics processing units. 2008. IEEE.
    [32] Pang, W.M., T.T. Wong, and P.A. Heng. Generating massive high-quality random numbers using GPU. 2008. IEEE.
    [33] Thomas, D.B., L. Howes, and W. Luk. A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation. 2009. ACM.
    [34] Zafar, F., M. Olano, and A. Curtis. GPU random numbers via the tiny encryption algorithm. 2010. Eurographics Association.
    [35] Demchik, V., Pseudo-random number generators for Monte Carlo simulations on ATI Graphics Processing Units. Computer Physics Communications, 2010.
    [36] Ford, E.B., Parallel algorithm for solving Kepler's equation on Graphics Processing Units: Application to analysis of Doppler exoplanet searches. New Astronomy, 2009. 14(4): p. 406-412.
    [37] Trapnell, C. and M.C. Schatz, Optimizing data intensive GPGPU computations for DNA sequence alignment. Parallel computing, 2009. 35(8): p. 429-440.
    [38] Gu, X., et al., GPU-based ultra-fast dose calculation using a finite size pencil beam model. Physics in medicine and biology, 2009. 54: p. 6287.
    [39] Preis, T., et al., Accelerated fluctuation analysis by graphic cards and complex pattern formation in financial markets. New Journal of Physics, 2009. 11: p. 093024.
    [40] Heimlich, A., A. Mol, and C. Pereira, GPU-based Monte Carlo simulation in neutron transport and finite differences heat equation evaluation. Progress in Nuclear Energy, 2010.
    [41] Tickner, J., Monte Carlo simulation of X-ray and gamma-ray photon transport on a graphics-processing unit. Computer Physics Communications, 2010. 181(11): p. 1821-1832.
    [42] Block, B., P. Virnau, and T. Preis, Multi-GPU accelerated multi-spin Monte Carlo simulations of the 2D Ising model. Computer Physics Communications, 2010. 181(9): p. 1549-1556.
    [43] Meredith, J.S., et al., Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study. Parallel computing, 2009. 35(3): p. 151-163.
    [44] NVIDIA, C., C Programming Guide v3. 2. Nvidia Corp, 2010.
    [45] NVIDIA, C., Best Practices Guide. URL http://developer. download. nvidia. com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_Best_PracticesGuide_2, 2010. 3.
    [46] Glaskowsky, P.N., NVIDIA’s Fermi: the first complete GPU computing architecture. Nvidia Corp, 2009.
    [47]张舒,褚艳利,赵开勇,张钰勃, GPU高性能运算之CUDA[M],北京:中国水利水电出版社, 2009, 1-8.
    [48] BRIESMEISTER, J., MCNP—A general Monte Carlo N-particle transport code system: Version 4C, LA-13709M, April 2000. CCC-700, MCNP4C, Radiation Safety Information Computational Center, Oak Ridge National Laboratory, Los Alamos, NM, 2000.
    [49] Brown, F.B. and Y. Nagaya, The MCNP5 random number generator, 2002, Citeseer.
    [50] Knuth, D.E., The Art of Computer Programming. Vol. 2: Seminumerical Algorithms, Addison-Wesley. Reading, Mass, 1969. 2: p. 00-00.
    [51] Lee, V.W., et al. Debunking the 100X GPU vs. CPU myth: an evaluation ofthroughput computing on CPU and GPU. 2010. ACM.
    [52] Nguyen, H., GPU Gem 3, 2007, Santa Clara, CA: Addison-Wesly.
    [53] Allen, J. and K. Kennedy. CP an Joe Warren. Conversion of control dependence to data dependence.
    [54]卢风顺,等. CPU/GPU协同并行计算研究综述.计算机科学, 2011. 38(3): p. 5-9.
    [55] Yang, X.J., et al., The TianHe-1A Supercomputer: Its Hardware and Software. Journal of Computer Science and Technology, 2011. 26(3): p. 344-351.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700