引文
[1]Macedonia M.The GPU enters computing’s mainstream[J].IEEE Computer,2003,36(10):106-108
[2]Owens J D,Houston M,Luebke D,et al.GPU computing[J].Proceedings of the IEEE,2008,96(5):879-899
[3]张舒,褚艳利,等.GPU高性能运算之CUDA[M].北京:中国水利水电出版社,2009:1-13
[4]Michalakes J,Vachharajani M.GPU acceleration of numericalweather prediction[J].Parallel Processing Letters,2008,18(4):531-548
[5]刘钦,佟小龙.GPU/CPU协同并行计算(CPPC)在地震勘探资料处理中的应用[R].北京:北京吉星吉达公司,2008
[6]Bell N,Garland M.Implementing Sparse Matrix-Vector Multi-plication on Throughput-oriented Processors[C]∥SC2009.NewYork:ACM,2009
[7]Bolz J,Farmer I,Grinspun E,et al.Sparse matrix solvers on theGPU:Conjugate gradients and multigrid[J].ACM Transactionon Graphics,2003,22(3):917-924
[8]Stone J,Phillips J,Hardy D,et al.Accelerating molecular model-ing applications with graphics processors[J].Journal of Compu-tational Chemistry,2007,28(16):2618-2640
[9]Anderson J A,Lorenz C D,Travesset A.General purpose molec-ular dynamics simulations fully implemented on graphics pro-cessing units[J].Journal of Chemical Physics,2008,227(10):5342-5359
[10]Govindaraju N K,Lloyd B,Wang W,et al.Fast computation ofdatabase operations using graphics processors[C]∥SIGMOD2004.New York:ACM,2004
[11]Nukada A,Ogata Y,Endo T,et al.Bandwidth intensive 3-DFFT kernel for GPUs using CUDA[C]∥SC2008.New York:ACM,2008
[12]Govindaraju N K,Lloyd B,Dotsenko Y,et al.High PerformanceDiscrete Fourier Transforms on Graphics Processors[C]∥SC2008.New York:ACM,2008
[13]吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504
[14]Blythe D.Rise of the Graphics Processor[J].Proceedings of theIEEE,2008,96(5):761-778
[15]Goddeke D,Wobker H,Strzodka R,et al.Co-processor accelera-tion of an unmodified parallel solid mechanics code withFEASTGPU[J].International Journal of Computational Scienceand Engineering,2009,4(4):254-269
[16]Goddeke D,Buijssen S H M,Wobker H,et al.GPU Accelerationof an Unmodified Parallel Finite Element Navier-Stokes Solver[C]∥High Performance Computing&Simulation 2009.LogosVerlag:IEEE,2009
[17]Larsen E S,McAllister D.Fast matrix multiplies using graphicshardware[C]∥SC2001.New York:the ACM Press,2001
[18]NVIDIA Corporation.CUDA Programming Guide Version 2.2[EB/OL].http://developer.download.nvidia.com/compute/cuda/2_21/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.2.1.pdf,2009-08-12
[19]Devices A M.ATI Steam Computing User Guide[EB/OL].ht-tp://developer.amd.com/gpu_assets/Stream_Computing_Us-er_Guide.pdf,2010-03-25
[20]Liu W,Schmidt B,Voss G,et al.Accelerating molecular dynam-ics simulations using graphics psrocessing units with CUDA[J].Computer Physics Communications,2008,179(9):634-641
[21]Cevahir A,Nukada A,Matsuoka S.Fast Conjugate Gradientswith Multiple GPUs[C]∥Allen G,et al.,eds.ICCS 2009.PartI.LNCS 5544,2009:893-903
[22]Chen S,Qin J,Xie Y.A Fast and Flexible Sorting Algorithmwith CUDA[C]∥Hua A,Chang S-L,eds.ICA3PP 2009.LNCS5574.2009:281-290
[23]Igual F D,Mayo R,Quintana-orti E S.Attaining High Perform-ance in General-purpose Computations on Current Graphics Pro-cessors[C]∥Palma J M L M,et al.,eds.VECPAR 2008.LNCS5336.2008:406-419
[24]方旭东.面向大规模科学计算的CPU-GPU异构并行技术研究[D].长沙:国防科学技术大学,2009
[25]Michalakes J,Vachharajani M.GPU Acceleration of NWP:Benchmark Kernels[EB/OL].http://www.mmm.ucar.edu/wrf/WG2/GPU,2009-02-25
[26]Michalakes J,Vachharajani M.GPU Acceleration of Scalar Ad-vection[EB/OL].http://www.mmm.ucar.edu/wrf/WG2/GPU/Scalar_Advect.htm,2009-02-25
[27]Linford J,Michalakes J,Sandu A,et al.Multi-core accelerationof chemical kinetics for simulation and prediction[C]∥SC2009.Portland,Oregon,the IEEE Press,2009
[28]Pande lab.Folding@Home[EB/OL].http://folding.stanford.edu,2010-03-18
[29]Elsen E,Vishal V,Houston M,et al.N-body simulations onGPUs[C]∥SC 2006.New York:ACM,2006
[30]Chen Yong,Sun Xian-he,Wu Ming.Algorithm-system scalabili-ty of heterogeneous computing[J].Journal of Parallel and Dis-tributed Computing,2008,68:1403-1412
[31]Friedrichs M S,Eastman P,Vaidyanathan V,et al.AcceleratingMolecular Dynamic Simulation on Graphics Processing Units[J].Journal of Computational Chemistry,2009,30(6):864-872
[32]Agullo M,Demmel J,Dongarra J,et al.Numerical linear algebraon emerging architectures:the PLASMA and MAGMA projects[J].Journal of Physics:Conference Series,2009,180(1)
[33]Tomov S,Dongarra J,Baboulin M.Towards Dense Linear Alge-bra for Hybrid GPU Accelerated Manycore Systems[R].Ten-nessee:University of Tennessee Computer Science,2008
[34]Tomov S,Nath R,Ltaief H,et al.Dense Linear Algebra Solversfor Multicore with GPU Accelerators[C]∥High-level ParallelProgramming Models and Supportive Environments 2010.At-lanta:IEEE,2010
[35]Ltaief H,Tomov S,Nath R,et al.A Scalable High PerformantCholesky Factorization for Multicore with GPU Accelerators[R].Innovative Computing Laboratory,2009
[36]FEAST Group.FEAST:Finite Element Analysis&SolutionsTools[EB/OL].http://www.feast.uni-dortmund.de/index.html,2010-4-07(下转第46页)
[37]Becker C,Buijssen S H M,Wobker H,et al.FEAST:Develop-ment of HPC technologies for FEM applications[C]∥MünsterG,Wolf D,Kremer M,eds.High Performance Computing in Sci-ence and Engineering.Berlin:Springer,2008
[38]Goddeke D,Strzodka R,Mohd-Yusof J,et al.Exploring weakscalability for FEM calculations on a GPU-enhanced cluster[J].Parallel Computing,2007(33):685-699
[39]Pastor L,Orero J L B.An Efficiency and Scalability Model forHeterogeneous Clusters[C]∥Proceedings of the 2001IEEE In-ternational Conference on Cluster Computing.Newport Beach:IEEE,2001
[40]Zhe F,Feng Q,Kaufman A,et al.GPU cluster for high perform-ance computing[C]∥SC2004.Washington:IEEE,2004
[41]Ogawa S,Aoki T.GPU computing for 2-dimensional incom-pressible-flow simulation based on multigrid method[C]∥Transactions of the Japan Society for Computational Enginee-ring and Science.2009:20090021
[42]Nukada A,Matsuoka S.Auto-tuning 3-D FFT Library for CU-DA GPUs[C]∥SC2009.Portland:ACM,2009
[43]Matsuoka S.Petascaling Commodity onto Exascale:GPUs asMultithreaded Massively-parallel Vector Processors-the OnlyRoad to Exascale[C]∥IEEE Cluster Computing Conference2009.New Orleans:IEEE,2009
[44]Matsuoka S,Aoki T,Endo T,et al.GPU accelerated compu-ting-from hype to mainstream,the rebirth of vector computing[J].Journal of Physics:Conference Series,2009,180(1):012043
[45]葛震.GPU加速PQMRCGSTAB算法研究[D].长沙:国防科学技术大学,2009
[46]吴强.GPU加速高速粒子碰撞模拟[D].长沙:国防科学技术大学,2009
[47]Fang Xu-dong,Tang Yu-hua,Wang Gui-bin,et al.Optimizingstencil application on multi-thread GPU architecture usingstream programming model[C]∥Muller-Schloer C,Karl W,Ye-hia S,eds.ARCS.LNCS 5974.2010:234-245
[48]Ma An-guo,Cai Jing,Cheng Yu,et al.Performance OptimizationStrategies of High Performance Computing on GPU[C]∥DouY,Gruber R,Joller J,eds.APPT.LNCS 5737.2009:150-164
[49]Chen Fei-guo,Ge Wei,Guo Li,et al.Multi-scale HPC system formulti-scale discrete simulation—Development and application ofa supercomputer with 1Petaflops peak performance in singleprecision[J].Particuology,2009,7:332-335
[50]Asanovic K,Bodik R,Catanzaro B,et al.The Landscape of Par-allel Computing Research:A View from Berkeley[R].Califor-nia:Electrical Engineering and Computer Sciences University ofCalifornia at Berkeley,2006
[51]Bell G.Ultracomputers:a teraflop before its time[J].Communi-cation of the ACM,1992,35(8):26-47
[52]Gupta A,Kumar V.Scalability of Parallel Algorithms for MatrixMultiplication[C]∥1993International Conference on ParallelProcessing.New York:IEEE,1993:115-123
[53]Sun Xian-he,Rover D T.Scalability of Parallel Algorithm-Ma-chine Combinations[J].IEEE Transactions on Parallel and Dis-tributed Systems,1994,5(6):599-613
[54]Kumar V,Gupta A.Analysis of scalability of parallel algorithmsand architectures:A survey[C]∥International Conference onSupercomputing.Cologne:ACM,1991:396-405