CPU-GPU异构平台计算模型的研究与应用

英文题名：Research and Application of Computing Model Based on CPU-GPU Heterogeneous Platform
作者：胡杰
论文级别：硕士
学科专业名称：计算机技术
中文关键词：CPU ; GPU ; 科学计算 ; 计算模型 ; 优化编程
英文关键词：CPU ; GPU ; scientific computing ; computing model ; optimization programming
学位年度：2011
导师：王占杰
学科代码：081202
学位授予单位：大连理工大学
论文提交日期：2011-10-26

摘要

随着近年来图形处理器(GPU)性能的飞速发展,GPU在计算机并行计算等相关领域尤其是在大规模的科学计算中发挥着极其重要的作用,采用CPU与GPU协作来加速大规模科学计算也成为今后高性能计算发展的必然趋势,因此基于CPU-GPU的并行计算模型日趋成为国内外高性能计算领域的热点研究方向,目前已经提出了许多基于CPU-GPU的并行计算模型,但多数是由GPU或者CPU单独进行计算,未能充分发挥CPU与GPU的整体性能。
     本文依据CPU与GPU的自身特点以及它们在并行计算中各自的优势,提出了一种基于CPU-GPU协作的计算模型,设计了一种任务调度机制,使得CPU与GPU能同时参与对任务的计算,并给出了协调策略,使得数据能在CPU与GPU上动态划分,实现各处理器上的负载均衡,与传统模型相比,更加高效地利用现有资源,该模型还提出了一种针对大规模任务的数据划分方法,有效解决了GPU无法一次容纳所有数据的问题,该模型的任务调度机制和划分方法适用于诸多典型应用,如矩阵运算、图像处理和文本处理等大规模并行计算,并设计了面向用户的接口模型,实现了内部并行对用户的透明,使用户能够方便有效地实现其并行需求。
     GPU的优化编程是GPU通用计算的核心部分之一,因此本文在提出的计算模型基础上,实现了3种典型应用,分析并结合应用算法的特点,设计了有效的并行算法,采用各种并行优化编程技术,试验结果表明模型给出的策略以及采用的优化编程方法是可行有效的。
With the rapid development of the graphics processor unit(GPU) performance recent years, in the related fields of parallel computing, especially in large-scale scientific computing GPU play an extremely important role, using CPU-GPU to accelerate the large-scale scientific computing has become an inevitable trend in the development of future high-performance computing, so CPU-GPU-based parallel computing model has increasingly become a hot area of domestic and foreign high-performance computing research, there has made a number of CPU-GPU-based parallel computing models, but most are very simple in computing model, and they could not make full use of the performance of the CPU and GPU.
     This paper based on the inherent characteristics of the CPU and GPU parallel computing and their respective advantages, we propose a CPU-GPU-based computational model, which design a task scheduling mechanism, makes the CPU and GPU can participate in the computing of task at the same time, and gives a coordination strategy, making the data could be dynamic division in the CPU and GPU to achieve load balancing of the processors, compared with the traditional model, this model could make use of existing resources more efficiently, the model also presents a data classification method for massive task, solute the problem that the GPU could not accommodate all the data at one time, the scheduling mechanism and division method of the model is suitable for many typical applications, such as matrix operations, image processing, text processing and many other massive parallel computing, and designed a interface model for users and implement the interfaces for users, the interface make users to achieve their parallel needs easily and effectively.
     Optimization of GPU programming is one core of the GPU general computing, so on the basis of this computing model this paper realized three typical applications, analysis and combined with the characteristics of application algorithms and design effective parallel algorithm, which using a variety of parallel optimization programming, test results show that the optimal strategies given by the model and use of the optimization programming methods is feasible and effective.

引文

[1]张军,易成,王邦平等.GPU加速的鲁棒性人脸2-5D重建方法[J].四川大学学报(工程科学版),2009,41(4)：155-162.
    [2]丁鹏,陈利学,龚捷等.GPU通用计算研究[J].计算机与现代化,2010(1)：12-15.
    [3]周林,韩文报,祝卫华等.MDx差分攻击算法改进及GPGPU上的有效实现[J].计算机学报,2010,33(7)：1177-1182.
    [4]M. Macedonia. The GPU enters computing's mainstream[J]. IEEE Computer,2003,36(10): 106-108.
    [5]S.Che, M. Boyer, D. Tarjan, et al. A performance study of generalpurpose applications on graphics processors using CUDA[J]. J. Parallel Distrib. Comput,2008,68(10):1370-1380.
    [6]G.Michael,L. G.Scott, N.John, et al.Parallel Computing Experience with CUDA[J]. IEEE Computer Society,2008(1):0272-1732.
    [7]谭彩凤,马安国,邢座程.基于CUDA平台的遗传算法并行实现研究[J].计算机工程与科学,2009,31：68-72.
    [8]宋晓丽,王庆.基于GPGPU的数字图像并行化预处理[J].计算机测量与控制,2009,17(6)：1169-1171.
    [9]王磊,王毅刚.基于GPU加速的多物体碰撞检测方法[J].计算机工程与科学,2009,31(12)：52-55.
    [10]Joselli, Mark, Clua, et al.A new physics engine with automatic process distribution between CPU-GPU[J]. Proceeding of the 2008 ACM SIGGRAPH symposium on video games.2008: 149-156.
    [11]程豪,张云泉,李玉成等CPU_GPU并行矩阵乘法的实现与性能分析[J].计算机工程.2010,36(13).24-29.
    [12]Shuming Miao, Xinhua Lin, Hong Liu, On Parallel Stiff ODEs Solver for Hybrid CPU-GPU Architecture[J], IEEE Computer Society,2010,365-368.
    [13]A. Paulo. Pagliosa. Fluid Simulation with Two-Way Interaction Rigid Body Using a Heterogeneous GPU and CPU Environment[J],IEEE Computer Society,2010,156-164.
    [14]Wenfeng Shen, Daming Wei. Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU[J], Computer Methods,2010,100(1):87-96.
    [15]Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili. Modeling GPU-CPU Workloads and Systems[J], GPGPU-3,2010,4(3):31-42.
    [16]Linchuan Li,Xingjian Li,Guangming Tan. Experience of Parallelizing cryo-EM 3D Reconstruction on a CPU-GPU Heterogeneous System [J],HPDC 11,2011,8(11):195-204.
    [17]Bingsheng, He Wenbin, Fang Naga, et al.Mars:A MapReduce Framework on Graphics Processors[J], PACT'08,2008.10:25-29.
    [18]B. Thomas. Jablin, Prakash Prabhu. Automatic CPU-GPU Communication Management and Optimization[J],PLDI'11,2011,4(8):142-151.
    [19]韩博,周秉锋GPGPU性能模型及应用实例分析[J].计算机辅助设计与图形学学报.2009,21(9)：1219-1226.
    [20]Hong Chuntao, Chen Dehao, Chen Wenguang. MapCG:Writing Parallel program portable between CPU and GPU[j].PACT'10.2010:217-226.
    [21]M. Pharr, R. Fernando. CPU Gems2[M].Boston:Addison Wesley,2005:473-495.
    [22]N. K. Govindaraju, S. Larsen, J. Gray,et al.A memory model for scientific algorithms on graphics processors[C]//Proceedings of the ACM/IEEE Conferencs on Supercomputing, Tampa,2006:1-6.
    [23]D. Blythe. The Direct3D 10 system [J]. ACM Transactions on Graphics.2006,25(3):724-734.
    [24]崔雪冰,张延红,李国徽.基于通用计算的GPU-CPU协作计算模式研究[J].微电子学与计算机.2009,26(8)：30-33.
    [25]陈华平,黄刘生.并行分布计算中的任务调度及其分类[J].计算机科学.2001,28(1)：45-47.
    [26]K.Jens,R. westermann. Linear algebra operators for GPU implementation of numerical algorithms[J]. ACM Transactions on Graphics.2003,22(3):908-916.
    [27]王伟,王磊,冯颖.一种基于CPU_GPU异构计算的混合编程模型[J].信息工程大学学报.2010,11(6)：674-678.
    [28]姚平CUDA平台上的CPU/GPU异步计算模式[D].合肥：中国科学技术大学,2010.
    [29]陈钢,吴百锋.面向OpenCL模型的GPU性能优化[J].计算机辅助设计与图形学学报.2011.23(4)：571-581.
    [30]Wu Enhua, Lin Youquan. General purpose computation on GPU[J]. Journal of Computer-Aided Design & computer Graphics.2004,16(5):601-612.
    [31]S. Ryoo, C. Rodrigues, S. Stone,et al. Program optimization space pruning for a multithreaded GPU[C]//Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization. New York:ACM Press.2008:195-204.
    [32]S. Ryoo, C. Rodrigues, S. Stone, et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA[C]//Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York:ACM Press,2008:73-82.
    [33]邹岩,杨志义,张凯龙.CUDA并行程序的内存访问优化技术研究[J].计算机测量与控制.2009,17(12)：2504-2506.
    [34]D. Luebke, G. Humphreys, How GPUs Work[J]. IEEE Computer,2007,40(2):96-100.
    [35]J. Nickolls, I. Buck, M. garland, et al. Scalable Parallel Programming with CUDA[J]. Queue,2008,6(2):40-53.
    [36]方旭东.面向大规模科学计算的CPU_GPU异构并行技术研究[D].长沙：国防科学技术大学,2009.
    [37]龙国平,范东睿.LU分解在Godson-Tv1众核体系结构上的并行化研究[J].计算机学报.2009,32(11).2157-2167.
    [38]姚旭,陈盼,徐晓飞.一种基于DoolittleLU分解的线性方程组并行求解方法[J].电子与信息学报,2010,32(8)：2019-2022.
    [39]纪坤,陈健平,石振国.矩阵三角分解分块算法的研究与实现[J].计算机应用与软件.2010,27(9).72-74.
    [40]苏畅,付忠良,谭雨辰.一种在GPU上高精度大型矩阵快速运算的实现[J].计算机应用.2009,29(4)：1177-1179.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700