统一计算设备架构技术的应用研究进展
详细信息 本馆镜像全文    |  推荐本文 | | 获取馆网全文
摘要
统一计算设备架构(Compute Unified Device Architecture,CUDA)是NVIDIA公司近年来推出的针对图像处理单元(Graphics Processing Unit,GPU)的全新并行计算框架。借助其C语言兼容特性以及GPU的强大并行计算能力,CUDA技术在图像处理、科学计算等领域取得了良好的加速效果。文章在对CUDA技术的应用情况进行回顾和总结的基础上,重点介绍了不同应用中采用CUDA技术进行计算加速的原理,并探讨了CUDA技术今后的发展方向。
Compute Unified Device Architecture(CUDA) is the recently released brand new parallel computing framework for Graphics Processing Units(GPU) by NVIDIA.By virtue of its C-compatibility and the powerful parallel computing ability of GPU,CUDA has achieved superior acceleration performance in various areas such as image processing,high performance computing.Based on summarizing the application of CUDA,this paper especially introduces the acceleration principles with the aid of CUDA technology.Meanwhile,the development of CUDA is also discussed.
引文
[1]NVIDIA.NVIDIA CUDA Compute Unified Device Ar-chitecture-Programming Guide Version 1.1.USA,2007
    [2]K Zhou,Q Hou,R Wang,et al.Real-time KD-treeConstruction on Graphics Hardware[J].ACM Transon Graphics,2008,27(5):Article No.126
    [3]R Wang,R Wang,K Zhou,et al.An efficient GPU-based Approach for Interactive Global Illumination[C]//ACM SIGGRAPH,New York,ACM Press,2009:Article No.91
    [4]T Ritshcel,T Engelhardt,T Grosch,et al.Micro-ren-dering for Scalable,Parallel Final Gathering[C]//ACMSIGGRAPH,New York,ACM Press,2009:ArticleNo.132
    [5]D Hughes,L Lim.Kd-Jump:a Path-Preserving Stack-less Traversal for Faster Isosurface Raytracing onGPUs[J].IEEE Trans on Visualization and ComputerGraphics,2009,15(6):1555~1562
    [6]吴向阳,柴学梁,王毅刚,等.利用形状因子采样的实时全局光照绘制[J].计算机辅助设计与图形学学报,2011,23(6):941~948
    [7]史可鉴,王斌,朱恬倩,等.GPU上的kD-tree雷达模拟加速[J].计算机辅助设计与图形学学报,2010,22(3):440~448
    [8]F Liu,M Huang,X Liu,et al.CUDA render:A Pro-grammable Graphics Pipeline[C]//Proc.of the ACMSIGGRAPH Asia,2009
    [9]黄梦成,刘芳,刘学慧,等.基于CUDA渲染器的顺序独立透明现象的单遍高效绘制[J].软件学报,2011,22(8):1927~1933
    [10]甘新标,沈立,王志英.基于CUDA的并行全搜索运动估计算法[J].计算机辅助设计与图形学学报,2010,22(3):457~460
    [11]任化敏,张勇东,林守勋.GPU加速的基于增量式聚类的视频拷贝检测方法[J].计算机辅助设计与图形学学报,2010,22(3):449~456
    [12]J Kim,E Park,X Cui,et al.A Fast Feature Extrac-tion in Object Recognition Using Parallel Processingon CPU and GPU[C]//Proc of IEEE SMC,2009,San Antonio,USA,2009:3842~3847
    [13]D Goddeke,R Strzodka.Cyclic Reduction TridiagonalSolvers on GPUs Applied to Mixed-Precision Multig-rid[J].IEEE Trans on Parallel and Distributed Sys-tems,2011,22(1):22~32
    [14]M Chao,C Chu,C Chao,et al.Efficient ParallelizedParticle Filtering Design on CUDA[C]//Proc of IEEEWorkshop on Signal Processing Systems,2010,CA,USA,2010:299~304
    [15]S Lahabar,PJ Narayanan.Singular Value Decomposi-tion on GPU using CUDA[C]//Proc of IEEE Int Symon Parallel&Distributed Processing,2009,Rome,Italy
    [16]C Juang,T Chen,W Cheng.Speedup of Implemen-ting Fuzzy Neural Networks With High-DimensionalInputs Through Parallel Processing on Graphic Pro-cessing Units[J].IEEE Trans on Fuzzy Systems,2011,19(4):717~728
    [17]S Islam,R Tandon,S Singh,et al.A Highly ScalableSolution of an NP-Complete Problem using CUDA[C]//Proc of Int Sym on PARELEC,2011,Luton,UK,2011:93~98
    [18]Y Zhuo,X Wu,J Haldar,et al.Accelerating Itera-tive Field-compenstated MR Image Reconstruction onGPUs[C]//Proc of IEEE Int Sym on Biomedical Ima-ging,2010,Rotterdam,Netherlands,2010:820~823
    [19]陈曦,王章野,何戬,等.GPU中的流体场景实时模拟算法[J].计算机辅助设计与图形学学报,2010,22(3):396~405
    [20]江雯倩,钟鼎坤,解张鹏,等.基于CPU/GPU的日冕偏振亮度并行计算模型[J].空间科学学报,2011,31(1):51~56
    [21]T Nagaoka,S Watanabe.A GPU-Based CalculationUsing the Three-Dimensional FDTD Method for Elec-tromagnetic Field Analysis[C]//Proc of IEEE EMBSConference,2010,Buenos Aires,Argentina,2010:327~330
    [22]C Webb,S Bilbao.Computing Room Acoustics withCUDA-3DFDTD Schemes with Boundary Losses andViscosity[C]//Proc of ICASSP,2011,Prague,Czech,2011:317~320
    [23]G Junkin.Conformal FDTD Modeling of ImperfectConductors at Millimeter Wave Bands[J].IEEETrans on Antennas and Propagation,2011,59(1):199~205
    [24]A Meel,A Arnold,D Frenkel,et al.HarvestingGraphics Power for MD Simulations[J].Mol Simul,2008,34(3):259~266
    [25]J Anderson,C Lorenz,A Travesset.General PurposeMolecular Dynamics Simulations Fully Implementedon Graphics Processing Units[J].Journal of Comput.Phys.,2008,227(10):5342~5359
    [26]陈飞国,葛蔚,李静海.复杂多相流动分子动力学模拟在GPU上的实现[J].中国科学B辑:化学,2008,38(12):1120~1128
    [27]S Gorbunov,D Rohr,K Aamodt,et al.ALICE HLTHigh Speed Tracking on GPU[J].IEEE Trans onNuclear.Science,2011,58(4):1845~185
    [28]X Li,G Tan,Z Guo.Accelerating EMAN on A GPUCluster System[J].ACTA BIOPHYSICA SINICA,2010,26(7):600~605
    [29]H Shi,B Schmidt,W Liu,et al.Accelerating ErrorCorrection in High-Throughput Short-Read DNA Se-quencing Data with CUDA[C]//Proc of IEEE Int Symon Parallel&Distributed Processing,2009,Rome,Italy
    [30]Y Zhuo,X Wu,J Haldar,et al.Accelerating Itera-tive Field-compenstated MR Image Reconstruction onGPUs[C]//Proc of IEEE Int Sym on Biomedical Ima-ging,2010,Rotterdam,Netherlands,2010:820~823
    [31]M Boyer,D Tarjan,S Acton,et al.AcceleratingLeukocyte Tracking using CUDA:A Case Study inLeveraging Manycore Coprocessors[C]//Proc ofIEEE Int Sym on Parallel and Distributed Processing,2009,Rome,Italy
    [32]J Espenshade,A Pangborn,J Cavenaugh.Accelera-ting Partition Algorithms for Flow Cytometry onGPUs[C]//Proc of IEEE Int Sym on Parallel and Dis-tributed Processing with Applications,2009,Cheng-du,China,2009:226~233
    [33]W Liu,B Schmidt,W Muller-Witting.CUDA-BLASTP:Accelerating BLASTP on CUDA-EnabledGraphics Hardware[J].IEEE/ACM Trans on Com-putational Biology and Bioinformatics,2011,8(6):1678~1685(下转第9页)
    [34]G Chalkidis,M Nagasaki,S Miyano.High Perform-ance Hybrid Functional Petri Net Simulations of Bio-logical Pathway Models on CUDA[J].IEEE/ACMTrans on Computational Biology and Bioinformatics,2011,8(6):1545~1556
    [35]林江,唐敏,童若锋.GPU加速的生物序列比对[J].计算机辅助设计与图形学学报,2010,22(3):420~427
    [36]张繁,王章野,姚建,等.应用GPU集群加速计算蛋白质分子场[J].计算机辅助设计与图形学学报,2010,22(3):412~419
    [37]C Song,Y Li,B Huang.A GPU-Accelerated WaveletDecompression System with SPIHT and Reed-Solo-mon Decoding for Satellite Images[J].IEEE Journalof Selected Topics in Applied Earth Observations andRemote Sensing,2011,4(3):683~690
    [38]S Park,J Ross,D Shires,et al.Hybrid Core Accel-eration of UWB SIRE Radar Signal Processing[J].IEEE Trans on Parallel and Distributed Systems,2011,22(1):46~57
    [39]C Chang,Y Chang,M Huang,et al.AcceleratingRegular LDPC Code Decoders on GPUs[J].IEEEJournal of Selected Topics in Applied Earth Observa-tions and Remote Sensing,2011,4(3):653~659
    [40]程俊仁,刘光斌,张博.基于CUDA的GPS信号快速捕获[J].宇航学报,2010,31(10):2407~2410
    [41]张兵,赵改善,黄骏,等.地震叠前深度偏移在CUDA平台上的实现[J].勘探地球物理进展,2008,31(6):427~432
    [42]G Cena,M Cereia,S Scanzio,et al.A High-Perform-ance CUDA-Based Computing Platform for IndustrialControl Systems[C]//Proc of IEEE Int Sym on Indus-trial Electronics,2011,Gdansk,Poland,2011:1169~1174
    [43]李晓敏,侯朝焕,鄢社锋.一种基于GPU的主动声纳带宽信号处理实时系统[J].传感技术学报,2011,24(9):1279~1284

版权所有:© 2023 中国地质图书馆 中国地质调查局地学文献中心