基于动态任务调度的层次包围盒构建算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Bounding Volume Hierarchy Construction Algorithm Based on Dynamic Task Scheduling
  • 作者:张正昌 ; 何发智 ; 周毅
  • 英文作者:Zhang Zhengchang;He Fazhi;Zhou Yi;School of Compute Science, Wuhan University;
  • 关键词:GPU加速 ; 动态任务调度 ; 光线跟踪 ; 层次包围盒
  • 英文关键词:GPU acceleration;;dynamic task scheduling;;ray tracing;;bounding volume hierarchy
  • 中文刊名:JSJF
  • 英文刊名:Journal of Computer-Aided Design & Computer Graphics
  • 机构:武汉大学计算机学院;
  • 出版日期:2018-03-15
  • 出版单位:计算机辅助设计与图形学学报
  • 年:2018
  • 期:v.30
  • 基金:国家自然科学基金(61472289);; 国家重点研究发展计划(2016YFC0106305)
  • 语种:中文;
  • 页:JSJF201803015
  • 页数:8
  • CN:03
  • ISSN:11-2925/TP
  • 分类号:127-134
摘要
交点计算是光线跟踪算法中开销最大的部分,层次包围盒(BVH)则是主流加速结构.为了提高BVH的构建速度,提出一种基于动态任务调度和warp线程优化的BVH构建算法,并针对目前主流GPU架构特点进行优化.该算法根据表面积启发式(SAH)值对BVH进行自底向上多轮优化;在每次循环的开始阶段判断当前线程是否空闲,若空闲,则根据记录任务进度的全局变量进行任务分配,否则,继续遍历BVH;当遍历到符合条件的节点时以该节点为幼树根节点进行幼树重构,这一阶段使用同一warp中的32个线程协同进行幼树重构,并且可以依据幼树叶子节点数调整同时处理的幼树个数.对经典的三维场景进行实验的结果表明,在BVH构建质量相同的情况下,当场景中三角元片数超过10万时,BVH构建速度会得到提升;当三角元片数大于100万时,该算法比聚类幼树重构层次包围盒(Atr BVH)算法在BVH优化阶段速度提升47%,从而使整个构建速度提高25%.
        Intersection computation is the most expensive part of the ray-tracing while BVH is the main acceleration structure. To improve the construction speed of BVH, a new BVH construction method based on dynamic task scheduling and threads optimization of warp is presented and it is optimized according to some main GPU architectures. The algorithm performed multiple rounds of bottom-up traversal over the BVH based on the SAH value. At the beginning of each cycle, we assigned the task according to the global variables that recorded the progress of the task if the current thread was idle, otherwise we continued to traverse the BVH. We reconstructed the treelet that used the node as treelet root when the node met the conditions during the traversal. At this stage, we leveraged 32 threads in the same warp to reconstruct the treelet collaboratively and we could adjust the number of parallel tasks according to the number of treelet leaves. According to the experiments on some classical scenes, our solution can be faster while the number of triangles is greater than one hundred thousand and can be about 47% faster when the number of triangles is above one million in the optimization stage of Atr BVH, thus increasing the overall speed by 25%, in the case of BVH construction quality being the same.
引文
[1]Appel A.Some techniques for shading machine renderings of solids[C]//Proceedings of Spring Joint Computer Conference.New York:ACM Press,1968:37-45
    [2]Goldsmith J,Salmon J.Automatic creation of object hierarchies for ray tracing[J].IEEE Computer Graphics and Applications,1987,7(5):14-20
    [3]Mac Donald J D,Booth K S.Heuristics for ray tracing using space subdivision[J].The Visual Computer,1990,6(3):153-166
    [4]Zou Hua,Gao Xinbo,Lu Xinrong.A ray casting algorithm based on hierarchical bounding volumes and GPU[J].Journal of Computer-Aided Design&Computer Graphics,2009,21(2):172-178(in Chinese)(邹华,高新波,吕新荣.层次包围盒与GPU实现相结合的光线投射算法[J].计算机辅助设计与图形学学报,2009,21(2):172-178)
    [5]Du Peng,Tang Min,Tong Ruofeng.Parallel collision detection on multi-core platform[J].Journal of Computer-Aided Design&Computer Graphics,2011,23(5):833-838(in Chinese)(杜鹏,唐敏,童若锋.多核加速的并行碰撞检测[J].计算机辅助设计与图形学学报,2011,23(5):833-838)
    [6]Yang Xin,Wang Tianming,Xu Duanqing.Fast BVH construction on GPU[J].Journal of Zhejiang University:Engineering Science,2012,46(1):84-89(in Chinese)(杨鑫,王天明,许端清.基于GPU的层次包围盒快速构造方法[J].浙江大学学报:工学版,2012,46(1):84-89)
    [7]Lauterbach C,Garland M,Sengupta S,et al.Fast BVH construction on GPUs[J].Computer Graphics Forum,2009,28(2):375-384
    [8]Karras T.Maximizing parallelism in the construction of BVHs,octrees,and k-D trees[C]//Proceedings of the 4th ACM SIGGRAPH/Eurographics Conference on High-Performance Graphics.Aire-la-Ville:Eurographics Association Press,2012:33-37
    [9]Stich M,Friedrich H,Dietrich A.Spatial splits in bounding volume hierarchies[C]//Proceedings of the Conference on High Performance Graphics.New York:ACM Press,2009:7-13
    [10]Ernst M,Greiner G.Early split clipping for bounding volume hierarchies[C]//Proceedings of IEEE Symposium on Interactive Ray Tracing.Los Alamitos:IEEE Computer Society Press,2007:73-78
    [11]Dammertz H,Keller A.The edge volume heuristic-robust triangle subdivision for improved BVH performance[C]//Proceedings of IEEE Symposium on Interactive Ray Tracing.Los Alamitos:IEEE Computer Society Press,2008:155-158
    [12]Karras T,Aila T.Fast parallel construction of high quality bounding volume hierarchies[C]//Proceedings of the 5th HighPerformance Graphics Conference.New York:ACM Press,2013:89-99
    [13]Domingues L R,Pedrini H.Bounding volume hierarchy optimization through agglomerative treelet restructuring[C]//Proceedings of the 7th Conference on High-Performance Graphics.New York:ACM Press,2015:13-20
    [14]Walter B,Bala K,Kulkarni M,et al.Fast agglomerative clustering for rendering[C]//Proceedings of IEEE Symposium on Interactive Ray Tracing.Los Alamitos:IEEE Computer Society Press,2008:81-86
    [15]Gu Y,He Y,Fatahalian K,et al.Efficient BVH construction via approximate agglomerative clustering[C]//Proceedings of the5th High-Performance Graphics Conference.New York:ACM Press,2013:81-88
    [16]Aila T,Laine S.Understanding the efficiency of ray traversal on GPUs[C]//Proceedings of the Conference on High Performance Graphics.New York:ACM Press,2009:145-149
    [17]NVIDIA.CUDA C Programming Guide[OL].[2017-05-10].http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#abstract
    [18]Frey S,Reina G,Ertl T.SIMT microscheduling:reducing thread stalling in divergent iterative algorithms[C]//Proceedings of the20th Euromicro International Conference on Parallel,Distributed and Network-Based Processing.Los Alamitos:IEEE Computer Society Press,2012:399-406
    [19]Zhang T,Shu W,Wu M Y.Optimization of N-queens solvers on graphics processors[C]//Proceedings of the 9th International Conference on Advanced Parallel Processing Technologies.Heidelberg:Springer Press,2011:142-156
    [20]Zhang T,Shu W,Wu M Y.CUIRRE:An open-source library for load balancing and characterizing irregular applications on GPUs[J].Journal of Parallel and Distributed Computing,2014,74(10):2951-2966
    [21]Wald I,Ize T,Kensler A,et al.Ray tracing animated scenes using coherent grid traversal[J].ACM Transactions on Graphics,2006,25(3):485-493
    [22]Aila T,Laine S.Alias-free shadow maps[C]//Proceedings of the 15th Eurographics Conference on Rendering Techniques.Aire-la-Ville:Eurographics Association Press,2004:161-166
    [23]Zhou Y,He F Z,Qiu Y M.Optimization of parallel iterated local search algorithms on graphics processing unit[J].The Journal of Supercomputing,2016,72(6):2394-2416
    [24]Zhou Y,He F Z,Qiu Y M.Dynamic strategy based parallel ant colony optimization on GPUs for TSPs[J].Science China Information Sciences,2017,60(6):Article No.068102
    (1)http://g3d.cs.williams.edu/g3d/data10/
    (2)http://eg2011.bangor.ac.uk/dragon/Welsh-Dragon.html
    (3)https://hameed.deviantart.com/art/Time-Machine-3D-Model-OBJ-1-OF-3-346027845

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700