基于可编程图形硬件加速的若干技术研究

英文题名：Relevant Technology Study on Programmable Graphics Hardware
作者：董朝
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：GPU ; 可编程图形硬件 ; 体素化 ; 实时绘制 ; 点绘制 ; 阴影
英文关键词：GPU ; Programmable Graphics Hardware ; Voxelization ; Real-time rendering ; Point-based rendering ; Shadow mapping
学位年度：2005
导师：彭群生 ; 陈为
学科代码：081203
学位授予单位：浙江大学
论文提交日期：2005-03-01

摘要

目前图形硬件中的图形处理器(GPU)计算能力的增长速度已经超过了中央处理器(CPU)计算能力的增长速度，主流图形硬件制造商声称，现在每隔12个月GPU的性能就会增长一倍。图形硬件技术一个最主要的突破就是在图形硬件中引入了可编程功能，此功能允许用户编制自定义的着色器程序(Shader program)来替换原来固定流水线中的某些功能模块，使得GPU在功能上更像一个通用处理器。虽然GPU具有非常高的计算速度，但并不能直接将以前在CPU中实现的算法照搬到GPU中来执行，这是因为GPU的指令执行方式和CPU不一样，GPU的体系结构是一种高度并行的单指令多数据(SIMD)指令执行体系。所以要基于可编程图形硬件实现一些在CPU中效率较低的算法，就必须重新组织算法实现的数据结构和步骤，以充分利用GPU并行处理体系结构带来的性能优势。本文中的几种算法都基于可编程图形硬件实现，在达到实时效率的同时保证了结果的质量。
     本文中的研究工作主要包括以下几个方面：
     1．实时体素化及其应用
     提出了一种面向复杂几何模型的高效体素化方法。算法首先将几何模型依据各面片的朝向将它们分别变换到三个离散的体空间，然后将每个体空间中生成的体素以二维纹理的方式存储在三张工作表格(worksheet)中，三张工作表格最终合并成为一张包含全部体模型数据的工作表格。算法整个运行过程中只需要遍历初始几何模型一次。由于整个运行过程全部在GPU中实现，对于两百万面片数的几何模型算法能够达到实时。该算法实现简单并且易于扩展到体建模、透明绘制、碰撞检测等许多具体应用中。
     2．大尺寸点模型实时高质量绘制
     提出了一种大尺寸点模型的自适应绘制算法。该算法在预处理阶段首先将点模型分割为很多点片，建立每一个点片的层次结构并以线性二叉树的方式保存；在接下来的绘制过程中对点模型分片进行处理，通过快速的可见性测试剔除掉不可见的点片，可见的点片则会依据距离视点的远近选取合适的绘制模式在GPU中实时绘制。算法不仅充分发挥了GPU的性能并且有效地均衡了GPU和CPU之间的负载。为解决大尺寸模型数据量过大的问题，我们还提出了一种快速的压缩／解压缩技术，可以将显存中的绘制数据压缩8倍以上。基于以上算法，可以在普通PC平台上实现百万数量级

    浙江大学硕士学位论文
    摘要
    点采样模型的实时高质量绘制。
    3.实时阴影映射
     阴影映射是一种基于图像空间的阴影绘制算法。该算法基于图形硬件提供的纹理
    (t exture)和深度缓存(dePth buffer)等技术实现，依靠GPU加速可以达到很高的绘制效
    率。文中会详细介绍两种实时阴影映射的实现方法:普通基于GPU实现的阴影映射
    和硬件阴影映射。
     在本文的最后，作者总结了自己关于可编程图形硬件技术的一些经验和体会，并
    提出了一些未来的研究方向。
    关键词:GPU;可编程图形硬件;体素化;实时绘制;点绘制;阴影
    若
The computation power of the Graphics Processing Unit (GPU) in current commodity graphics hardware is increasing at a much faster rate than that of the Central Processing Unit (CPU) in computer systems. The projected time to double in efficiency for the GPU is quoted to be roughly 12 months by the leading graphics card manufacturers. A recent major breakthrough in graphics hardware technology has been the introduction of programmability; this allows the user to replace portions of the fixed graphics pipeline with customized shader programs exposing the ability of GPU to function more like a general processing unit. In spite of all the rendering power, it is not possible or meaningful to use algorithms designed with CPU in mind on graphics hardware. The essential difference is that GPU provides a highly parallel Single Instruction Multiple Data Set (SIMD) architecture. The key to harnessing this resource is reengineering the computationally expensive algorithms to take advantage of this architecture as well as making use of rendering optimizations built into the programmable graphics pipeline. This thesis presents several novel graphics approaches which utilize programmable graphics hardware to obtain both real-time frame rate performance and high quality result.Our research works in this thesis mainly focus on the following aspects:1. Real-time Voxelization for Complex ModelsWe present an efficient voxelization algorithm for complex polygonal models by exploiting newest programmable graphics hardware. We first convert the model into three discrete voxel spaces according to its surface orientation. The resultant voxels are encoded as 2D textures and stored in three intermediate sheet buffers called directional sheet buffers. These buffers are finally synthesized into one worksheet, which records the volumetric representation of the target. The whole algorithm traverses the geometric model only once and is accomplished entirely in GPU, achieving real-time frame rate for models with up to 2 million triangles. The algorithm is simple to implement and can be integrated easily into diverse applications such as volume based modeling, transparent rendering and collision detection.

    2. High Quality Real-time Rendering of Large Scale Point ModelHere we introduce an adaptive rendering algorithm for large scale point models. The algorithm first subdivide the target model into multiple patches in preprocess. A hierarchical structure is built for each patch and then converted into a linear binary tree. During rendering, the model is processed patch by patch. Fast visibility decision is made to cull invisible patches. Visible patches are displayed in GPU by choosing appropriate rendering mode, i.e, a distance-dependent strategy. Our algorithm takes full advantage of GPU and effectively balances the workload between CPU and GPU. We also propose a fast compression/decompression technique which achieves 8 times compression ratio. The results demonstrate high performance and image quality rendering for large scale point models in consumer PC.3. Real-time shadow mappingShadow mapping is an image-based shadowing technique. It is particularly amenable to hardware implementation because it makes use of hardware functionality- texturing and depth buffering existed. Here we present the implementation process of two real-time shadow mapping methods in detail: common GPU-based shadow mapping and hardware shadow mapping.Finally, I summarize my own research experience of programmable graphics pipeline and propose some potential research topics in the future.

引文

[1] Macedonia M. The GPU enters computing's mainstream. IEEE Computer, 2003,36(10): 106-108.
    [2] Hopgood FRA, Duce DA, Gallop JR, Sutcliffe DC. Introduction to the Graphics Kernel System (GKS). Academic Press, 1983.
    [3

    [3] Enderle G, Kansy K, Pfaff G. Computer Graphics Programming: GKS-The Graphics Standard. Berlin: Springer-Verlag, 1984.
    [4] Howard TLJ, Hewitt WT, Hubbold RJ, Wyrwas KM. A Practical Introduction to PHIGS and PHIGS Plus. Addition-Wesley, 1991.
    [5] clark JH. The geometry engine: A VLSI geometry system for graphics. In: Proc. Of the SIGGRAPH'82. 1982. 127-133.
    [6] Fuchs H, Poulton J. Pixel-Planes: A VLSI-oriented design for a raster graphics engine. VLSI Design, 1981, 2(3): 20-28.
    [7] Eyles J, Austin J, Fuchs H, Greer T, Poulton J. Pixel-Plane 4: A summary, advances in computer graphics hardware Ⅱ. In: Eurographics Seminars Tutorials and Perspectives in Computer Graphics. 1988. 183-208.
    [8] Fuchs H, Israel L, Poulton J, Eyles J, Greer T, Goldfeather J, Ellsworth D, Molnar S, Turk G, Tebbs B. Pixel-Planes 5: A heterogeneous multiprocessor graphics system using processor-enhanced memories. In: Proc. of the SIGGRAPH'89. 1989.79-88.
    [9] Molnar S, Eyles J, Poulton J, Greer T. PixelFlow: High-Speed rendering using image composition. In: Proc. of the SIGGRAPH'92. ACM Press, 1992.231-240.
    [10] Lindholm E, Kilgard MJ, Moreton H. A user-programable vertex engine. In: Proc. of the SIGGRAPH 2001. Los Angeles, 2001. 149-158.
    [11] Owens JD, Dally WJ, Kapasi UJ, Rixner S, Mattson P, Mowery B. Polygon redering on a stream architecture. In: Proc. of the Eurographics/SIGGRAPH Workshop on Graphics Hardware. 2000.23-32.
    [12] 吴恩华，柳有权，基于图形处理器(GPU)的通用计算．计算机辅助设计与图形学学报，2004，16(5)：601-612．
    [13] govindaraju NK, Sud A, Yoon SE, Manocha D. SWITCH: Parallel occlusion culling for interactive walkthroughs using multiple GPUs. Technical Report, TR02-027, UNC-CH, 2002.
    [14] Govindaraju NK, Redon S, Lin M, Manocha D. CULLIDE: Interactive collision detection between complex models in large environments using graphics hardware. In: Proc. of the Eurographics/SIGGRAPH Workshop on Graphics Hardware. 2003.25-32.
    [15] Sud A, Otaduy MA, Manocha D. DiFi: Fast 3D distance field computation using graphics hardware. In: Proc. of the Eurographics, 2004.
    [16] Tomov S, McGuigan M, Bennett R, Smith G, Spiletic J. Benchmarking and implementation of probability-based simulations on programmable graphics cards. Computer & Graphics, 2005, 29(1).
    [17] Larsen ES, Mcallister D. Fast matrix multiplies using graphics hardware. In: Proc. of the Supercomputing. 2001.55-60.
    [18] Thompson CJ, Hahn S, Oskin M. Using modern graphics architectures for general-purpose computing: A framework and analysis. In: Proc. of the Int'l Oymp. On Microarchitecture. 2002. 306-317.
    [19] Krüger J, Westermann R. Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans. On Graphics, 2003, 22(3): 908-916.
    [20] Hall JD, Carr NA, Hart JC. Cache and bandwidth aware matrix multiplication on the GPU. UIUCDCS-R-2003-2328, Champaign: University of Illinois at Urbana-Champaign, 2003.
    [21] Rumpf M, Strzodka R. Using graphics cards for quantized FEM computations. In: Proc. of the ⅦP 2001. 2001.98-107.

    [22] Harris MJ, Coombe G, Scheuermann T, Lastra A. Physically-Based visual simulation on graphics hardware. In: Proc. of the Graphics Hardware 2002. 2002. 109-118.
    [23] Li W, Wei XM, Kaufman A. Implementing lattice Boltzmann computation on graphics hardware. The Visual Computer, 2003, 19(7-8): 444-456.
    [24] Bolz J, Farmer I, Grinspun E, Schrooder P. Sparse matrix solvers on the GPU: Conjugate gradients and multigrid. ACM Trans, on Graphics, 2003,22(3): 917-924.
    [25] Goodnight N, Woolley C, Luebke D, Humphreys G. A multigrid solver for boundary balue problems using programmable graphics hardware. In: Proc. of the Graphics Hardware 2003. 2003. 102-111.
    [26] Harris MJ, Baxter W-V III, Scheuermann T, Lastra A. Simulation of cloud dynamics on graphics hardware. In: Proc. of the Graphics Hardware 2003. 2003. 92-101.
    [27] Li W, Fan Z, Wei XM, Kaufman A. GPU-Based flow simulation with complex boundaries. Technical Report, 031105, Computer Science Department, SUNY at Stony Brook, 2003.
    [28] Kim T, Lin MC. Visual simulation of ice crystal growth. In: Proc. of the SIGGRAPH/Eurographics Symp. on Computer Animation. 2003. 86-97.
    [29] Lefohn AE, Kniss JM, Hansen CD, Whitaker RT. Interactive deformation and visualization of level set surfaces using graphics Hardware. In: IEEE visualization. 20003. 75-82.
    [30] Stam J. Stable fluids. In: Proc. of the SIGGRAPH'99. 1999. 121-128.
    [31] Wu EH, Liu YQ, Liu XH. An improved study of real-time fluid simulation on GPU. Journal of Computer Animation & Virtual World (invited paper of CASA2004), 2004, 15(3-4): 139-146.
    [32] Liu YQ, Wu EH, Liu XH. Real-Time 3D fluid dimulation on GPU with complex obstacles. In: Proc. of the Pacific Graphics 2004. 2004.
    [33] Liu YQ, Wu EH, Liu XH. Fluid simulations on GPU with complex boundary conditions. In: ACM Workshop on General-Purpose Computing on Graphics Processors (GP2). 2004.
    [34] Govindaraju NK, Lloyd B, Wang W, Lin M, Manocha D. Fast computation of database operations using graphics processors. In: Proc. of the SIGMOD. 2004.
    [35] http://lava.cs.virgina.edu/bpred.html
    [36] Moreland K, Angel A. The FFT on a GPU. In: Proc. of the Graphics Hardware 2003. 2003.
    [37] Strang G, Nguyen T. Wavelets and Filter Bands. California: Wellesley-Cambridge Press, 1996.
    [38] Hopf M, Ertl T. Hardware accelerated wavelet transformations. In: Proc. of the EG/IEEE TCVG Symp. on Visualization 2000. 2000. 93-103.
    [39] Wang JQ, Wong TT, Heng PA, Leung CS. Discrete wavelet transform on GPU. In: ACM Workshop on General-Purpose Computing on Graphics Processors (GP2). 2004.
    [40] Wang JQ. Exploiting the GPU power for intensive geometric and imaging data computation [MPhil Thesis]. Chinese University of Hong Kong, 2004.
    [41] Hanrahan P, Lawson J. A language for shading and lighting calculations. Computer Graphics, 1990, 24(4): 289-298.
    [42] Upstill S. The RenderMan Companion: A Programmer's Guide to Realistic Computer Graphics. Addison-Wesley, 1990.
    [43] Apodaca AA, Gritz L. Advanced RenderMan: Creating CGI for Motion Pictures. Morgan Kaufmann Publishers, 2000.
    [44] Kessenich JD, Baldwin RR. OpenGL 2.0 Shading Language. 1.051 Ed, 2003.
    [45] Rost RJ. OpenGL Shading Language. Addison-Wesley, 2004.
    [46] Proudfoot K, Mark WR, Hanrahan P, Tzvetkov S. A real time procedural system for programmable graphics hardware. Computer Graphics, 2000. 159-170.
    [4

    [47] Mark WR, Proudfoot K. Compiling to a VLIW fragment pipeline. In: Proc. of the Graphics Hardware 2001. 2001.
    [48] Peeper C, Mitchell JL. Introduction to the DirectX 9 High-Level Shader Language. 2003.http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnhlsl/html/shaderx2_introductionto.asp
    [49] Mark WR, Glanville S, Akeley K, Kilgard MJ. Cg: A system for programming graphics hardware in a C-like language. ACM Trans. on Graphics, 2003, 22(3): 896-907.
    [50] 吴恩华，图形处理器用于通用计算的技术、现状及其挑战．软件学报2004，15(10)：1493-1504．
    [51] Dally WJ, Hanrahan P, Erez M, Knight TJ, Labont F, Ahn JH, Jayasena N, Kapasi UJ, Das A, Gummaraju J, Buck I. Merrimac: Supercomputing with streams. In: Proc. of the SC 2003. ACM Press, 2003.
    [52] Talor MB, Kim J, Miller J, Wentzlaff D, Ghodrat F, Greenwald B, Hoffmann H, Johnson P, Lee JW, Lee W, Ma A, Saraf A, Seneski M, Shnidman N, Strumpen V, Frank M, Amarasinghe S, Agarwal A. The raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro, 2002.
    [53] Wu EH. Challenge of highly detail object modeling and illumination calculation in virtual reality (Keynote Speech). In: Proc. of the ACM/VRST2002. Hong Kong, 2002.
    [54] 吴恩华．复杂虚拟场景的造型与绘制(大会特邀报告)．见：第3届全国虚拟现实与可视化学术会议(CCVRV 2003)．长沙：国防科学技术大学，2003。
    [55] Fan Z, Qiu F, Kaufman A, Yoakum-Stover S. GPU cluster for high performance computing. In: Proc. of the ACM/IEEE SC2004 Conf. 2004.
    [56] Gulde R, Weeks M, Owen S, Pan Y. Parallel computing with multiple GPUs on a single machine to achieve performance gains. In: ACM GP2: Workshop on General Purpose Computing on Graphics Processors. 2004.
    [57] Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M, Hanrahan P. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. on Graphics, 2004, 23(3): 777-786.
    [58] McCool M, Toit SD, Popa T, Chan B, Moule K. Shader algebra. ACM Trans. on Graphics, 2004, 23(3): 787-795.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700