CPU/GPU协同并行计算研究综述

英文篇名：Survey of CPU/GPU Synergetic Parallel Computing
中文刊名：计算机科学
英文刊名：Computer Science
作者：卢风顺 ; 宋君强 ; 银福康 ; 张理论
英文作者：LU Feng-shun SONG Jun-qiang YIN Fu-kang ZHANG Li-lun (College of Computer Science ; National University of Defense Technology ; Changsha 410073 ; China)
中文关键词：异构混合 ; 协同并行计算 ; GPU计算 ; 性能优化 ; 可扩展
英文关键词：Heterogeneous hybrid ; Synergetic parallel computing ; GPU computing ; Performance optimization
出版日期：2011-03-15
机构：国防科学技术大学计算机学院;
年：2011
期：03
出版单位：计算机科学

摘要

CPU/GPU异构混合并行系统以其强劲计算能力、高性价比和低能耗等特点成为新型高性能计算平台,但其复杂体系结构为并行计算研究提出了巨大挑战。CPU/GPU协同并行计算属于新兴研究领域,是一个开放的课题。根据所用计算资源的规模将CPU/GPU协同并行计算研究划分为三类,尔后从立项依据、研究内容和研究方法等方面重点介绍了几个混合计算项目,并指出了可进一步研究的方向,以期为领域科学家进行协同并行计算研究提供一定参考。
With the features of tremendous capability,high performance/price ratio and low power,the heterogeneous hybrid CPU/GPU parallel systems have become the new high performance computing platforms.However,the architecture complexity of the hybrid system poses many challenges on the parallel algorithms design on the infrastructure.According to the scale of computational resources involved in the synergetic parallel computing,we classified the recent researches into three categories,detailed the motivations,methodologies and applications of several projects,and discussed some on-going research issues in this direction in the end.We hope the domain experts can gain useful information about synergetic parallel computing from this work.

引文

[1]Macedonia M.The GPU enters computing’s mainstream[J].IEEE Computer,2003,36(10):106-108
    [2]Owens J D,Houston M,Luebke D,et al.GPU computing[J].Proceedings of the IEEE,2008,96(5):879-899
    [3]张舒,褚艳利,等.GPU高性能运算之CUDA[M].北京:中国水利水电出版社,2009:1-13
    [4]Michalakes J,Vachharajani M.GPU acceleration of numericalweather prediction[J].Parallel Processing Letters,2008,18(4):531-548
    [5]刘钦,佟小龙.GPU/CPU协同并行计算(CPPC)在地震勘探资料处理中的应用[R].北京:北京吉星吉达公司,2008
    [6]Bell N,Garland M.Implementing Sparse Matrix-Vector Multi-plication on Throughput-oriented Processors[C]∥SC2009.NewYork:ACM,2009
    [7]Bolz J,Farmer I,Grinspun E,et al.Sparse matrix solvers on theGPU:Conjugate gradients and multigrid[J].ACM Transactionon Graphics,2003,22(3):917-924
    [8]Stone J,Phillips J,Hardy D,et al.Accelerating molecular model-ing applications with graphics processors[J].Journal of Compu-tational Chemistry,2007,28(16):2618-2640
    [9]Anderson J A,Lorenz C D,Travesset A.General purpose molec-ular dynamics simulations fully implemented on graphics pro-cessing units[J].Journal of Chemical Physics,2008,227(10):5342-5359
    [10]Govindaraju N K,Lloyd B,Wang W,et al.Fast computation ofdatabase operations using graphics processors[C]∥SIGMOD2004.New York:ACM,2004
    [11]Nukada A,Ogata Y,Endo T,et al.Bandwidth intensive 3-DFFT kernel for GPUs using CUDA[C]∥SC2008.New York:ACM,2008
    [12]Govindaraju N K,Lloyd B,Dotsenko Y,et al.High PerformanceDiscrete Fourier Transforms on Graphics Processors[C]∥SC2008.New York:ACM,2008
    [13]吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504
    [14]Blythe D.Rise of the Graphics Processor[J].Proceedings of theIEEE,2008,96(5):761-778
    [15]Goddeke D,Wobker H,Strzodka R,et al.Co-processor accelera-tion of an unmodified parallel solid mechanics code withFEASTGPU[J].International Journal of Computational Scienceand Engineering,2009,4(4):254-269
    [16]Goddeke D,Buijssen S H M,Wobker H,et al.GPU Accelerationof an Unmodified Parallel Finite Element Navier-Stokes Solver[C]∥High Performance Computing&Simulation 2009.LogosVerlag:IEEE,2009
    [17]Larsen E S,McAllister D.Fast matrix multiplies using graphicshardware[C]∥SC2001.New York:the ACM Press,2001
    [18]NVIDIA Corporation.CUDA Programming Guide Version 2.2[EB/OL].http://developer.download.nvidia.com/compute/cuda/2_21/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.2.1.pdf,2009-08-12
    [19]Devices A M.ATI Steam Computing User Guide[EB/OL].ht-tp://developer.amd.com/gpu_assets/Stream_Computing_Us-er_Guide.pdf,2010-03-25
    [20]Liu W,Schmidt B,Voss G,et al.Accelerating molecular dynam-ics simulations using graphics psrocessing units with CUDA[J].Computer Physics Communications,2008,179(9):634-641
    [21]Cevahir A,Nukada A,Matsuoka S.Fast Conjugate Gradientswith Multiple GPUs[C]∥Allen G,et al.,eds.ICCS 2009.PartI.LNCS 5544,2009:893-903
    [22]Chen S,Qin J,Xie Y.A Fast and Flexible Sorting Algorithmwith CUDA[C]∥Hua A,Chang S-L,eds.ICA3PP 2009.LNCS5574.2009:281-290
    [23]Igual F D,Mayo R,Quintana-orti E S.Attaining High Perform-ance in General-purpose Computations on Current Graphics Pro-cessors[C]∥Palma J M L M,et al.,eds.VECPAR 2008.LNCS5336.2008:406-419
    [24]方旭东.面向大规模科学计算的CPU-GPU异构并行技术研究[D].长沙:国防科学技术大学,2009
    [25]Michalakes J,Vachharajani M.GPU Acceleration of NWP:Benchmark Kernels[EB/OL].http://www.mmm.ucar.edu/wrf/WG2/GPU,2009-02-25
    [26]Michalakes J,Vachharajani M.GPU Acceleration of Scalar Ad-vection[EB/OL].http://www.mmm.ucar.edu/wrf/WG2/GPU/Scalar_Advect.htm,2009-02-25
    [27]Linford J,Michalakes J,Sandu A,et al.Multi-core accelerationof chemical kinetics for simulation and prediction[C]∥SC2009.Portland,Oregon,the IEEE Press,2009
    [28]Pande lab.Folding@Home[EB/OL].http://folding.stanford.edu,2010-03-18
    [29]Elsen E,Vishal V,Houston M,et al.N-body simulations onGPUs[C]∥SC 2006.New York:ACM,2006
    [30]Chen Yong,Sun Xian-he,Wu Ming.Algorithm-system scalabili-ty of heterogeneous computing[J].Journal of Parallel and Dis-tributed Computing,2008,68:1403-1412
    [31]Friedrichs M S,Eastman P,Vaidyanathan V,et al.AcceleratingMolecular Dynamic Simulation on Graphics Processing Units[J].Journal of Computational Chemistry,2009,30(6):864-872
    [32]Agullo M,Demmel J,Dongarra J,et al.Numerical linear algebraon emerging architectures:the PLASMA and MAGMA projects[J].Journal of Physics:Conference Series,2009,180(1)
    [33]Tomov S,Dongarra J,Baboulin M.Towards Dense Linear Alge-bra for Hybrid GPU Accelerated Manycore Systems[R].Ten-nessee:University of Tennessee Computer Science,2008
    [34]Tomov S,Nath R,Ltaief H,et al.Dense Linear Algebra Solversfor Multicore with GPU Accelerators[C]∥High-level ParallelProgramming Models and Supportive Environments 2010.At-lanta:IEEE,2010
    [35]Ltaief H,Tomov S,Nath R,et al.A Scalable High PerformantCholesky Factorization for Multicore with GPU Accelerators[R].Innovative Computing Laboratory,2009
    [36]FEAST Group.FEAST:Finite Element Analysis&SolutionsTools[EB/OL].http://www.feast.uni-dortmund.de/index.html,2010-4-07(下转第46页)
    [37]Becker C,Buijssen S H M,Wobker H,et al.FEAST:Develop-ment of HPC technologies for FEM applications[C]∥MünsterG,Wolf D,Kremer M,eds.High Performance Computing in Sci-ence and Engineering.Berlin:Springer,2008
    [38]Goddeke D,Strzodka R,Mohd-Yusof J,et al.Exploring weakscalability for FEM calculations on a GPU-enhanced cluster[J].Parallel Computing,2007(33):685-699
    [39]Pastor L,Orero J L B.An Efficiency and Scalability Model forHeterogeneous Clusters[C]∥Proceedings of the 2001IEEE In-ternational Conference on Cluster Computing.Newport Beach:IEEE,2001
    [40]Zhe F,Feng Q,Kaufman A,et al.GPU cluster for high perform-ance computing[C]∥SC2004.Washington:IEEE,2004
    [41]Ogawa S,Aoki T.GPU computing for 2-dimensional incom-pressible-flow simulation based on multigrid method[C]∥Transactions of the Japan Society for Computational Enginee-ring and Science.2009:20090021
    [42]Nukada A,Matsuoka S.Auto-tuning 3-D FFT Library for CU-DA GPUs[C]∥SC2009.Portland:ACM,2009
    [43]Matsuoka S.Petascaling Commodity onto Exascale:GPUs asMultithreaded Massively-parallel Vector Processors-the OnlyRoad to Exascale[C]∥IEEE Cluster Computing Conference2009.New Orleans:IEEE,2009
    [44]Matsuoka S,Aoki T,Endo T,et al.GPU accelerated compu-ting-from hype to mainstream,the rebirth of vector computing[J].Journal of Physics:Conference Series,2009,180(1):012043
    [45]葛震.GPU加速PQMRCGSTAB算法研究[D].长沙:国防科学技术大学,2009
    [46]吴强.GPU加速高速粒子碰撞模拟[D].长沙:国防科学技术大学,2009
    [47]Fang Xu-dong,Tang Yu-hua,Wang Gui-bin,et al.Optimizingstencil application on multi-thread GPU architecture usingstream programming model[C]∥Muller-Schloer C,Karl W,Ye-hia S,eds.ARCS.LNCS 5974.2010:234-245
    [48]Ma An-guo,Cai Jing,Cheng Yu,et al.Performance OptimizationStrategies of High Performance Computing on GPU[C]∥DouY,Gruber R,Joller J,eds.APPT.LNCS 5737.2009:150-164
    [49]Chen Fei-guo,Ge Wei,Guo Li,et al.Multi-scale HPC system formulti-scale discrete simulation—Development and application ofa supercomputer with 1Petaflops peak performance in singleprecision[J].Particuology,2009,7:332-335
    [50]Asanovic K,Bodik R,Catanzaro B,et al.The Landscape of Par-allel Computing Research:A View from Berkeley[R].Califor-nia:Electrical Engineering and Computer Sciences University ofCalifornia at Berkeley,2006
    [51]Bell G.Ultracomputers:a teraflop before its time[J].Communi-cation of the ACM,1992,35(8):26-47
    [52]Gupta A,Kumar V.Scalability of Parallel Algorithms for MatrixMultiplication[C]∥1993International Conference on ParallelProcessing.New York:IEEE,1993:115-123
    [53]Sun Xian-he,Rover D T.Scalability of Parallel Algorithm-Ma-chine Combinations[J].IEEE Transactions on Parallel and Dis-tributed Systems,1994,5(6):599-613
    [54]Kumar V,Gupta A.Analysis of scalability of parallel algorithmsand architectures:A survey[C]∥International Conference onSupercomputing.Cologne:ACM,1991:396-405