基于GPU的车身结构接触碰撞过程并行计算方法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于GPU的车身结构接触碰撞过程并行计算方法

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：GPU-based Parallel Computing Method for Contact and Impact Problems of Automotive Body
作者：蔡勇
论文级别：博士
学科专业名称：车辆工程
中文关键词：图形处理器 ; 统一计算架构 ; 并行有限元 ; 接触/碰撞 ; 板料成形
英文关键词：Graphics Processing Unit ; Compute Unified Device Architecture ; Parallel Finite Element Method ; Contact/impact ; Sheet Forming
学位年度：2013
导师：李光耀
学科代码：080204
学位授予单位：湖南大学
论文提交日期：2013-09-01
答辩委员会主席：刘腾喜

摘要

汽车车身结构接触碰撞过程有限元计算是汽车CAE的重要组成部分，主要涉及汽车碰撞和车身覆盖件成形等工程问题分析，在力学上涉及到材料非线性、几何非线性和接触界面的边界非线性三类非线性问题，经常面临着数值计算量庞大，计算效率低的问题，因而实际应用中对并行计算的需求十分强烈。目前常见的有限元并行计算方法多采用区域分解等粗粒度并行策略，在以CPU为计算核心的网络计算机集群上运行，计算效率与计算机节点数直接相关，使用流程复杂且需要昂贵的硬件支持，因此这种并行计算方法的性价比不高。
     现代的图形处理器(GPU)是一种内部高度并行的众核处理器，浮点计算能力远高于同时期CPU的运算能力。可编程着色器的出现，使得GPU具有了通用处理器的特征，并开始应用于通用计算领域，为大数据处理和数值模拟研究带来了新思路和方法。最初的基于GPU的通用计算技术(GPGPU)采用Cg等高级着色语言编程，并已经应用于各类有限元计算，但是，由于这一时期的GPGPU技术只支持单精度计算，数据传输效率也不高，导致有限元GPU并行计算的精度低且效率提升有限，工程应用局限性大。统一计算架构(CUDA)的出现，带来了高效、直观的GPU并行程序开发工具，基于CUDA架构的GPU并行计算方法具有计算硬件成本低，计算程序开发简单等特点。
     本文以工程应用需求为指导，采用CUDA架构研究高精度和高效率的显式有限元细粒度并行计算方法，以及全流程细粒度执行的并行接触算法，最终实现在普通个人计算机上进行汽车车身碰撞仿真和薄板冲压成形仿真两类大规模非线性有限元的快速并行计算。本文的主要工作和成果如下：
     (1)考虑到非线性显式有限元天然的可并行性以及GPU的轻量级线程执行模式，开发了具有自主知识产权的基于GPU的显式有限元计算平台(发明专利受理号：201210266435.1)。其主要特点在于：建立了线程与单元、线程与节点、线程与自由度三种层次的抽象映射方法，使显式有限元计算与GPU线程完美融合。同基于网格分区的粗粒度有限元并行策略相比，该细粒度并行策略没有任何前处理过程，在单块显卡也不存在边界数据处理问题，能够大幅度提升计算效率。因此，可以很方便的实现节点速度、位移计算等显式有限元绝大部分流程在GPU上的高效并行计算。
     (2)针对单元计算中节点应力组装在GPU平台上难以并行化的技术瓶颈，提出了预索引并行应力组装策略，实现了BT四边形单元和EST三角形单元两种壳单元在GPU上的细粒度并行。提出了GPU上基于并行缩减算法的时间步长等单值并行求解方法。实现了显式有限元算法在GPU上的全过程计算，减少了GPU与CPU间数据交换的同时，使程序的计算效率达到最佳化。通过对板壳非线性问题计算表明，该算法的GPU并行计算结果与原串行算法在CPU中计算的结果完全一致，与同时期同价格的CPU相比，计算效率有明显的提升。在GTX580显卡上采用EST单元进行185万个自由度的弹塑性大变形问题求解时，可以达到近37倍的计算加速比。
     (3)接触碰撞有限元分析中，接触算法需占用70%以上的计算时间，为此，本文提出了包含并行级域接触搜寻算法、并行防御节点接触力计算方法和并行罚函数接触力计算方法在内的全流程GPU执行的细粒度并行接触算法。级域算法是一种适用于复杂自接触问题的高效搜寻算法，其同一级内接触块的计算独立性也符合GPU细粒度计算的要求。本文提出了线程与接触块一一映射策略、GPU并行排序以及提升GPU线程计算粒度等技术手段，实现了测试对在GPU上的并行搜寻。在接触对搜寻阶段，本文提出了线程与测试对间的映射策略以实现同一级内接触对的并行搜寻，并采用计算后排序的策略进行上一级与下一级间的数据交换。在接触力计算阶段，本文采用线程与接触对间的映射策略给出了穿透量和接触力细粒度并行计算方法，并采用原子操作来实现接触力的离散。最后，基于自主开发的碰撞仿真软件DYSI3D开发了基于GPU的碰撞过程计算机仿真并行计算软件CPS-GPU(软件著作权编号：2011SR001966)。采用该软件在GTX580显卡上进行177万个自由度的白车身碰撞计算时，可以取得20倍左右的计算加速比。
     (4)本文提出了完整的薄板冲压成形GPU并行计算方法。针对薄板冲压成形对材料流动模拟要求高的有限元计算特征，提出了包含复杂材料本构计算的单元GPU并行计算技术以及考虑摩擦的接触力GPU并行计算方法。本文提出了一体化接触搜寻算法在GPU上的计算策略：引入了计算机图形学中用于实时碰撞检测的广域搜寻方法来完成测试对搜寻，并在建立了相邻接触块信息的前提下，给出了接触后搜寻中接触对细粒度并行更新方法。在自主开发的薄板成形仿真软件CADEMII软件的基础上，开发了基于GPU的板料成形并行计算软件CADEM-GPU(软件著作权编号：2010SR052426)，并加入异步数据输出模式以及基于OpenGL的实时显示技术，进一步提高了软件的计算效率和实用性。数值算例表明，该软件具有较高的计算精度和计算效率，在GTX460显卡上，对于数万网格数的仿真模型，可以取得20倍以上的加速比，有效缩短了仿真计算时间。
Finite element (FE) simulation of contact and impact process is an important partof the automotive CAE technology. It is widely applied to engineering problems, suchas car crash simulation and sheet metal forming simulation. This kind of simulationusually involves material nonlinearity, geometric nonlinearity and nonlinear boundaryconditions. Due to these three kinds of nonlinearity, the FE analysis of contact andimpact problems faced with enormous computations and low computing efficiency.Therefore, there is a very strong demand for parallel computing in practicalapplications. Nowadays, the most common parallel computing methods are based onthe coarse-grained parallel domain decomposition strategy, and use CPU-basedcomputer network as the computing hardware. In these traditional parallel computingmethods, computation efficiency is directly related to the number of computing nodes.Furthermore, in practice, more complex programming and expensive hardware arerequired for more computing nodes. Therefore, they are not cost effective for bothindividual and business.
     Modern graphics processor unit (GPU) has developed into a kind of multi-coreprocessors with highly internal parallelism, and its float point processing ability ismuch higher than CPUs at the same period. In the meantime, the appearance ofprogrammable shaders brings several general computing characteristics for GPU.Nowadays, general-purpose computing on GPU (GPGPU) becomes to a novel andeffective methods for general large data processing and numerical simulations. Theearly GPGPU needed to use high-level shading languages to code, such as Cg. Severalresearchers have tried to use early GPGPU to improve computing efficiency, but theseGPU-based FE codes cannot meet the demands of requirements in accuracy andefficiency. This is mainly due to the limited of double float support and the datatransfer efficiency. Later, an efficient and intuitive GPGPU program developmenttools named compute unified device architecture (CUDA) is presented by NVIDIA.CUDA brings an efficient way for GPGPU with low computing cost and generalprogramming language.
     In this paper, a GPU-based parallel strategy for explicit FE computing with a fullfine-grain parallel contact algorithm is presented to meet the demands of engineeringapplications. And, the high performance parallel computing of automotive body crashsimulation and sheet forming simulation on normal personal computer with a CUDA-capable device are realized. The main research content and result are asfollows:
     (1) A GPU-based parallel explicit FE computing platform with independentintellectual property rights based on the characteristics of explicit scheme andlightweight threads parallel computing model of GPU is presented (Patent PendingNumber:201210266435.1). The main advantage of this platform is constructed threekinds of one-to-one mapping relationship between CUDA thread and computingobject, including thread-to-element, thread-to-node and thread-to-freedom. Compareto the coarse-grained parallel FE algorithm based on grid partition technology, thefine-grained parallel strategy can enhance calculation efficiency without anypre-treatment processes and boundary data processes. Therefore, the most parts ofexplicit FE calculation processes involving nodal speed computing and displacementcomputing can mapped to GPU computing to achieve high efficient.
     (2) The nodal force assembling on fine-grained parallel platform has long been adifficult subject. This paper proposed a pre-index strategy to realized parallelassembling on GPU with few additional works. In the meantime, parallel strategiesfor two kinds of shell element including Belytschlko-Tsay (BT) shell element andEdged-based smoothed triangular (EST) shell element are presented based on theabove parallel computing platform. Parallel reduction method is introduced tocalculate all kinds of single variables, such as global time step. Finally, an entireparallelized explicit FE iterative process based on GPU is proposed, which can obtainan optimal computational efficiency by reduce the data transfers between CPU andGPU. The numerical examples for nonlinear shell structures show that this methodcan greatly improve the computational efficiency with the same computing results ofserial computing on CPU. For example, about37times speedup obtained by GTX580GPU compare to I7CPU for an elastic-plastic large deformation problem with18.5million degrees of freedom.
     (3) During a FE analysis of contact problem, the time consumption of contactalgorithm usually occupies more than70%of the total computation time. Therefore,an entire GPU-based parallel contact algorithm is proposed in this paper, includingparallel hierarchy-territory contact-searching algorithm (HITA) and two kinds ofparallel contact force calculation algorithms involve parallel penalty function methodparallel defense node algorithm. HITA is an efficient contact-searching algorithm andespecially suitable for complex problems contain self-contact phenomenon.Furthermore, the computing independence of contact segments searching in the same hierarchy is suited for GPU parallel computing. Firstly, this paper proposed severaltechnical means to realize the parallel search of test pair on GPU, including thread tosegment mapping scheme, the GPU-based sort method and the technology of improvethe size of thread granularity. Secondly, in contact pair searching phase, a mappingrelationship between thread and test pair is presented to achieve the parallel searchingin the same hierarchy. And, a store strategy based on sort is used to realize efficientdata transfer between higher-level hierarchies and lower-level hierarchies. In thecontact force calculation phase, fine-grained parallel strategy based on thread tocontact pair mapping is present to parallel computing contact force, and atomicoperation is used to contact force scatter. Based on the above mentioned algorithms, aGPU-based contact process simulation software named CPS-GPU (SoftwareRegistered Number:2011SR001966) is developed based on the self-developed serialcontact process simulation software DYSI3D. The numerical examples alsodemonstrate that this software can get highly accuracy and efficiency. For example,about20times speedup can obtain by using GTX580graphics card to calculate aBody in White (BIW) crash model with17million degrees of freedom.
     (4) This paper presents a complete GPU parallel computing method to acceleratethe FE analysis of sheet metal forming process. According to the requirement of highcomputing accuracy for material flow in sheet forming simulation, a parallelcomputing method for shell element with complex material constitutive andfriction-considering contact force computation are proposed. In the meantime, theway to parallel a simple contact algorithm integrated in the self-developed sheet metalforming simulation software CADEMII is studied. Firstly, the wide broad searchmethod used in real-time collision detection is introduction to test pair searchingduring the pre-contact searching. Secondly, a parallel contact pair update method afterpre-contact searching is proposed based on the information of adjacent contactsegments. Finally, a GPU-based sheet metal forming parallel computing softwarenamed CADEM-GPU (Software Registered Number:2010SR052426) is developedbased on CADEMII. To extend the computing efficiency and practicability of thissoftware, several usefully technologies such as data asynchronous transfer methodand real-time display technology based on OpenGL are added. Numerical examplesshow that more than20times speedup can be obtained by using GTX460graphics tocalculate sheet metal FE model with tens of thousands of elements.

引文

[1] CHEN C J, USMAN M. Design optimisation for automotive applications.International Journal of Vehicle Design,2001,25(1):126-141.
    [2] ONDA H, SAKANASHI T, MIHARA K, et al. Casting CAE for automotivecasting parts. Nissan Technical Review,2003,52:21-26.
    [3] LAKSHMINARAYAN V, WANG H, WILLIAMS W, et al. Application of CAEnonlinear crash analysis to aluminum automotive crashworthiness design. SAETechnical Paper951080,1995.
    [4]王勖成.有限单元法.第一版.北京:清华大学出版社有限公司,2003,117-120.
    [5] THOLE C A, STUBEN K. Industrial simulation on parallel computers. ParallelComput,1999,25(13-14):2015-2037.
    [6] NCAC. Finite Element Model Archive, http://www.ncac.gwu.edu/vml/models.html,2011-12-1.
    [7] TOMOV S, MCGUIGAN M, BENNETT R, et al. Benchmarking andimplementation of probability-based simulations on programmable graphicscards. Computers&Graphics,2005,29(1):71-80.
    [8] ALMASI G S, GOTTLIEB A. Highly parallel computing. Redwood City:Benjamin-Cummings Publishing Co.,1989,106-110.
    [9] NVIDIA C. Compute unified device architecture programming guide,2007,1-5.
    [10] CLOUGH R W. Thse Finite Element Method in Plane Stress Analysis. In:ASCE2nd Conference on Electronic Compuataion. Reston: American Societyof Civil Engineers,1960,345-378.
    [11] HRENNIKOFF A. Solution of problems of elasticity by the framework method.Journal of Applied Mechanics,1941,8(4):169-175.
    [12] MCHENRY D. A lattice analogy for the solution of plane stress problems.Journal of Insitute of Civil Engineers,1943,21(2):59-82.
    [13] COURANT R. Variational methods for the solution of problems of equilibriumand vibrations. Bull Amer Math Soc,1943,49(1):1-23.
    [14] TURNER M J, CLOUGH R W, MARTIN H C, et al. Stiffness and deflectionanalysis of complex structures. Journal of Aeronautical Society,1956,23(9):805-823.
    [15] MARCAL P V, KING I P. Elastic-Plastic Analysis of Two-dimensional StressSystems by the Finite Element Method. International Journal of MechanicalSciences,1967,9:143-155.
    [16] YAMADA Y, YISHIMURA N, SAKURAI T. Plastic Stress-Strain Matrix andits Application for the Solution of Elastic-Plastic Problems by the FiniteElement Method. International Journal of Mechanical Sciences,1968,10:343-354.
    [17] HIBBITT H D, MARCAL P V, RICE J R. A finite element formulation forproblems of large strain and large displacement. International Journal of Solidsand Structures,1970,6(8):1069-1086.
    [18] MCMEEKING R M, RICE J R. Finite-element formulations for problems oflarge elastic-plastic deformation. International Journal of Solids and Structures,1975,11(5):601-616.
    [19] OH S I, ALTAN T. Metal forming and the finite-element method. Oxford:Oxford University Press,1989:210-216.
    [20] REBELO N, NAGTEGAAL J, HIBBITT H. Finite element analysis of sheetforming processes. International journal for numerical methods in engineering,1990,30(8):1739-1758.
    [21] LI J, YANG Q, NIU P, et al. Analysis of Thermal Field on Integrated LEDLight Source Based on COMSOL Multi-physics Finite Element Simulation.Physics Procedia,2011,22:150-156.
    [22] REDDY J. Nonlinear finite element analysis. Oxford: Oxford University Press,2004:90-93.
    [23] LEE S H. A CAD–CAE integration approach using feature-based multi-resolution and multi-abstraction modelling techniques. Computer-AidedDesign,2005,37(9):941-955.
    [24] CLOUGH R W, WILSON E L. Early finite element research at Berkeley. In:Proceedings of the Fifth US National Conference Computational Mechanics.Boulder,1999,1-35.
    [25] WILSON E L. SAP: A General Structural Analysis Program, Report to WallaWalla District U.S. Engineers Office. Structural Engineering Laboratory,University of California,1970.
    [26] BATHE K J, WILSON E L, IDING R H. NONSAP: a structural analysisprogram for static and dynamic response of nonlinear systems. Springfield, VA:National Technical Information Service,1974,3-5.
    [27]袁明武,陈璞.微机结构分析通用程序SAP84(版本4.0).计算结构力学及其应用,1995,12(3):298-300.
    [28]张早明. CAE在汽车工业中的应用.汽车科技,2008,5:7-11.
    [29]张洪武,陈飙松,李云鹏,张盛,彭海军.面向集成化CAE软件开发的SiPESC研发工作进展.计算机辅助工程,2011,20(2):39-49.
    [30] NOOR A. Parallel processing in finite element structural analysis. In: Parallelcomputations and their impact on mechanics. USA: NASA,1987,253-277.
    [31] NOOR A K, LAMBIOTTE JR J J. Finite element dynamic analysis on CDCSTAR-100computer. Computers&Structures,1979,10(1–2):7-19.
    [32] NOOR A K, PETERS J M. Element stiffness computation on CDC CYBER205computer. Communications in Applied Numerical Methods,1986,2(3):317-328.
    [33] NOOR A K, KAMEL H A, FULTON R E. Substructuring techniques—statusand projections. Computers&Structures,1978,8(5):621-632.
    [34] FARHAT C, WILSON E. A new finite element concurrent computer programarchitecture. International journal for numerical methods in engineering,1987,24(9):1771-1792.
    [35] FARHAT C. A simple and efficient automatic FEM domain decomposer.Computers&Structures,1988,28(5):579-602.
    [36] FARHAT C, LESOINNE M. Automatic partitioning of unstructured meshes forthe parallel solution of problems in computational mechanics. Internationaljournal for numerical methods in engineering,1993,36(5):745-764.
    [37] FARHAT C, ROUX F-X. An unconventional domain decomposition method foran efficient parallel solution of large-scale finite element systems. SIAMJournal on Scientific and Statistical Computing,1992,13(1):379-396.
    [38] HUGHES T J, LEVIT I, WINGET J. Element-by-element implicit algorithmsfor heat conduction. Journal of Engineering Mechanics,1983,109(2):576-585.
    [39] BARRAGY E, CAREY G F. A parallel element‐by‐element solution scheme.International journal for numerical methods in engineering,2005,26(11):2367-2382.
    [40] CAREY G, BARRAGY E, MCLAY R, et al. Element‐by‐element vector andparallel computations. Communications in Applied Numerical Methods,1988,4(3):299-307.
    [41]李晓梅,程建钢,李明瑞.并行计算环境与有限元分析并行算法.中国科学基金,1995,9(1):15-21.
    [42]周树荃,梁维泰,邓绍忠.有限元结构分析并行计算.北京:科学出版社,1997,15-20.
    [43]周树荃,邓绍忠.有限元结构分析并行计算的若干研究进展.南京航空航天大学学报,1995,27(1):27-32.
    [44]周树荃,邓绍忠.变带宽大型稀疏线性方程组的并行直接解法及其在YH-1上的实现.在：航空科学基金论文集(3).北京:航空工业出版社,1993,216-220.
    [45]邓绍忠,周树荃.不规则结构分析有限元方程组的并行迭代解法及其实现.在：全国第三届并行算法学术会议论文集.武汉:华中理工大学出版社,1992,116-120.
    [46]张汝清.并行计算结构力学.重庆大学出版社,1993,2-5.
    [47]张汝清.并行计算结构力学的发展和展望.力学进展,1994,24(4):511-517.
    [48] BATHE K J, RAMM E, WILSON E L. Finite element formulations for largedeformation dynamic analysis. International Journal for Numerical Methods inEngineering,1975,9(2):353-386.
    [49] BATHE K-J, BOLOURCHI S. A geometric and material nonlinear plate andshell element. Computers&structures,1980,11(1):23-48.
    [50] KEY S W. A finite element procedure for the large deformation dynamicresponse of axisymmetric solids. Computer Methods in Applied Mechanics andEngineering,1974,4(2):195-218.
    [51] HUGHES T, LIU W. Implicit-explicit finite elements in transient analysis. I-Stability theory. II-Implementation and numerical examples. Journal ofApplied Mechanics(ASME Transactions),1978,45:371-378.
    [52] HUGHES T J, PISTER K S, TAYLOR R L. Implicit-explicit finite elements innonlinear transient analysis. Computer Methods in Applied Mechanics andEngineering,1979,17:159-182.
    [53] BELYTSCHKO T, LIN J I, CHEN-SHYH T. Explicit algorithms for thenonlinear dynamics of shells. Computer methods in applied mechanics andengineering,1984,42(2):225-251.
    [54] HUGHES T J, TAYLOR R L, SACKMAN J L, et al. A finite element methodfor a class of contact-impact problems. Computer Methods in AppliedMechanics and Engineering,1976,8(3):249-276.
    [55] HALLQUIST J, GOUDREAU G, BENSON D. Sliding interfaces withcontact-impact in large-scale Lagrangian computations. Computer Methods inApplied Mechanics and Engineering,1985,51(1):107-137.
    [56] ZHI-HUA Z, NILSSON L. A contact searching algorithm for general contactproblems. Computers&Structures,1989,33(1):197-209.
    [57] ZHONG Z-H. Finite element procedures for contact-impact problems. Oxford:Oxford university press,1993,345-360.
    [58]钟志华,李光耀.薄板冲压成型过程的计算机仿真与应用.北京:北京理工大学出版社,1998,72-75.
    [59] ZHONG Z-H, NILSSON L. A unified contact algorithm based on the territoryconcept. Computer Methods in Applied Mechanics and Engineering,1996,130(1):1-16.
    [60] BENSON D J, HALLQUIST J O. A single surface contact algorithm for thepost-buckling analysis of shell structures. Computer Methods in AppliedMechanics and Engineering,1990,78(2):141-163.
    [61] OLDENBURG M, NILSSON L. The position code algorithm for contactsearching. International journal for numerical methods in engineering,1994,37(3):359-386.
    [62] BELYTSCHKO T, NEAL M O. Contact‐impact by the pinball algorithm withpenalty and Lagrangian methods. International journal for numerical methodsin engineering,1991,31(3):547-572.
    [63] CHAMORET D, SAILLARD P, RASSINEUX A, et al. New smoothingprocedures in contact mechanics. Journal of Computational and appliedMathematics,2004,168(1):107-116.
    [64] K Yamazaki. M Mori. Analysis of an elastic contact problem by the boundaryelement method (Anapproach by the penalty function method). JSME Int. J.Series1,1989,32(4):508-513.
    [65] BATHE K J, CHAUDHARY A. A solution method for planar and axisymmetriccontact problems. International Journal for Numerical Methods in Engineering,1985,21(1):65-88.
    [66] KAMAL M M. Analysis and simulation of vehicle to barrier impact. SAETechnical Paper700414,1970.
    [67] TANI M. Study of Automobile Crashworthiness. SAE Technical Paper700175,1970.
    [68] CARRUTHERS J J, KETTLE A, ROBINSON A. Energy absorption capabilityand crashworthiness of composite material structures: A review. AppliedMechanics Reviews,1998,51(10):635-649.
    [69] LANGSETH M, HOPPERSTAD O, BERSTAD T. Crashworthiness ofaluminium extrusions: validation of numerical simulation, effect of mass ratioand impact velocity. International Journal of Impact Engineering,1999,22(9):829-854.
    [70] JACOB G C, FELLERS J F, STARBUCK J M, et al. Crashworthiness ofautomotive composite material systems. Journal of applied polymer science,2004,92(5):3218-3225.
    [71] ZAREI H, KR GER M, ALBERTSEN H. An experimental and numericalcrashworthiness investigation of thermoplastic composite crash boxes.Composite structures,2008,85(3):245-257.
    [72] PERRONE N. Crashworthiness and biomechanics of vehicle impact.Washington, D.C.: Catholic University of America,1970,10-13.
    [73] AMBR SIO J, SILVA M. Structural and biomechanical crashworthiness usingmulti-body dynamics. Proceedings of the Institution of Mechanical Engineers,Part D: Journal of Automobile Engineering,2004,218(6):629-645.
    [74] HU K, HU P. Parameters Optimization Method for Hot Forming Panels Basedon the Crashworthiness and Lightweight. Applied Mechanics and Materials,2013,327:318-321.
    [75] LONSDALE G, CLINCKEMAILLIE J, VLACHOUTSIS S, et al.Communication requirements in parallel crashworthiness simulation. LectureNotes in Computer Science,1994,796:55-61.
    [76] BELYTSCHKO T, PLASKACZ E, CHIANG H-Y. Explicit finite elementmethods with contact-impact on SIMD computers. Computing Systems inEngineering,1991,2(2):269-276.
    [77] PLASKACZ E J, BELYTSCHKO T, CHIANG H-Y. Contact-impactsimulations on massively parallel SIMD supercomputers. Computing Systemsin Engineering,1992,3(1):347-355.
    [78] BROWN K, ATTAWAY S, PLIMPTON S, et al. Parallel strategies for crash andimpact simulations. Computer Methods in Applied Mechanics and Engineering,2000,184(2):375-390.
    [79] LONSDALE G, PETITET A, ZIMMERMANN F, et al. Programmingcrashworthiness simulation for parallel platforms. Mathematical and computermodelling,2000,31(1):61-76.
    [80]钟志华.汽车耐撞性分析的有限元法.汽车工程,1994,16(1):1-6.
    [81]黄世霖,张金换,王晓冬.汽车碰撞与安全.北京:清华大学出版社,2000,15-20.
    [82]寇哲君.可扩展冲击-接触并行计算及其在汽车碰撞模拟中的应用:[清华大学博士学位毕业论文].北京:清华大学,2003,60-65.
    [83] MEI L, THOLE C. Data analysis for parallel car-crash simulation results andmodel optimization. Simulation modelling practice and theory,2008,16(3):329-337.
    [84] PARTL A M, MASELLI A, CIARDI B, et al. Enabling parallel computing inCRASH. Monthly Notices of the Royal Astronomical Society,2011,414(1):428-444.
    [85] WANG N-M, BUDIANSKY B. Analysis of sheet metal stamping by afinite-element method. Journal of Applied Mechanics,1978,45:73-76.
    [86] NAKAMACHI E, SOWERBY R. A numerical analysis of bulging and punchstretching of circular disks based on Kirchhoff plate theory. AdvancedTechnology of Plasticity,1984,1:666-671.
    [87] TANG S C. Computer prediction of the deformed shape of a draw blank duringthe binder-wrap stage. Journal of Applied Metalworking,1980,1(3):22-29.
    [88]林忠钦.车身覆盖件冲压成形仿真.北京:机械工业出版社,2005,35-40.
    [89]李光耀.三维板料成形过程的显式有限元分析.计算结构力学及其应用,1995,13(3):253-268.
    [90]陈涛,李光耀.覆盖件拉延模工艺补充及压料面的参数化设计新方法.机械工程学报,2006,42(5):69-74.
    [91] BUBAK M, CHROBAK R, KITOWSKI J, et al. Parallel finite elementcalculation of plastic deformations on Exemplar SPP1000and on networkedworkstations. Journal of materials processing technology,1996,60(1):409-413.
    [92] XIE H, ZHONG Z, LI G, et al. Parallel computation and application of sheetforming numerical simulation. Zhongguo Jixie Gongcheng/China MechanicalEngineering,2003,14(21):1842-1844.
    [93] NIKISHKOV G, KAWKA M, MAKINOUCHI A, et al. Porting an industrialsheet metal forming code to a distributed memory parallel computer [J].Computers&Structures,1998,67(6):439-449.
    [94] MORI K-I, OTOMO Y, YOSHIMURA H. Parallel processing of3Drigid-plastic finite element method using diagonal matrix. Journal of MaterialsProcessing Technology,2006,177(1):63-67.
    [95] GODDEKE D, STRZODKA R, MOHD-YUSOF J, et al. Exploring weakscalability for FEM calculations on a GPU-enhanced cluster. ParallelComputing,2007,33(10–11):685-699.
    [96] YANG C-T, HUANG C-L, LIN C-F. Hybrid CUDA, OpenMP, and MPI parallelprogramming on multicore GPU clusters. Computer Physics Communications,2011,182(1):266-269.
    [97] EKMAN M, WARG F, NILSSON J. An in-depth look at computer performancegrowth. SIGARCH Comput Archit News,2005,33(1):144-147.
    [98] LINDHOLM E, KILGARD M J, MORETON H. A user-programmable vertexengine. In: Proceedings of the28th annual conference on Computer graphicsand interactive techniques. New York: ACM,2001,149-158.
    [99] PURCELL T J, BUCK I, MARK W R, et al. Ray tracing on programmablegraphics hardware. In: Proceedings of the29th annual conference on Computergraphics and interactive techniques. San Antonio: ACM,2002,703-712.
    [100] KR GER J, WESTERMANN R. Linear algebra operators for GPUimplementation of numerical algorithms. In: Proceedings of the ACMTransactions on Graphics (TOG). New York: ACM,2003,908-916.
    [101] OWENS J D, LUEBKE D, GOVINDARAJU N, et al. A Survey ofGeneral-Purpose Computation on Graphics Hardware. Computer GraphicsForum,2007,26(1):80-113.
    [102] HILLESLAND K E, MOLINOV S, GRZESZCZUK R. Nonlinear optimizationframework for image-based modeling on programmable graphics hardware. In:ACM SIGGRAPH2005Courses. Los Angeles: ACM,2005,224-233.
    [103] LI W, WEI X, KAUFMAN A. Implementing lattice Boltzmann computation ongraphics hardware. Visual Comp,2003,19(7-8):444-456.
    [104]吴恩华,柳有权.基于图形处理器(GPU)的通用计算.计算机辅助设计与图形学学报,2004,16(5):601-612.
    [105] BUCK I, FOLEY T, HORN D, et al. Brook for GPUs: stream computing ongraphics hardware. In: ACM SIGGRAPH2004Papers. Los Angeles: ACM,2004,777-786.
    [106] GODDEKE D, WOBKER H, STRZODKA R, et al. Co-processor accelerationof an unmodified parallel solid mechanics code with FEASTGPU. InternationalJournal of Computational Science and Engineering,2009,4(4):254-269.
    [107] GARCIA M, GUTIERREZ J, RUEDA N. Fluid–structure coupling usinglattice-Boltzmann and fixed-grid FEM. Finite Elements in Analysis and Design,2011,47(8):906-912.
    [108] HAIXIANG S, SCHMIDT B, WEIGUO L, et al. Accelerating error correctionin high-throughput short-read DNA sequencing data with CUDA. In:Proceedings of the Parallel&Distributed Processing2009. Los Alamitos:IEEE,2009,11-18.
    [109] LIGOWSKI L, RUDNICKI W. An efficient implementation of Smith Watermanalgorithm on GPU using CUDA, for massively parallel scanning of sequencedatabases. In: Proceedings of the Parallel&Distributed Processing2009. LosAlamitos: IEEE,2009,1-8.
    [110] SCHERL H, KECK B, KOWARSCHIK M, et al. Fast GPU-Based CTReconstruction using the Common Unified Device Architecture (CUDA). In:Proceedings of the Nuclear Science Symposium Conference Record2007. LosAlamitos: IEEE,2007,4464-4466.
    [111] STONE S S, HALDAR J P, TSAO S C, et al. Accelerating advanced MRIreconstructions on GPUs. Journal of Parallel and Distributed Computing,2008,68(10):1307-1318.
    [112] GOVETT M, MIDDLECOFF J, HENDERSON T. Running the NIMNext-Generation Weather Model on GPUs. In: Proceedings of the Cluster,Cloud and Grid Computing (CCGrid),201010th IEEE/ACM InternationalConference. Los Alamitos: IEEE,2010,792-796.
    [113] MICHALAKES J, VACHHARAJANI M. GPU ACCELERATION OFNUMERICAL WEATHER PREDICTION. Parallel Processing Letters,2008,18(4):531-548.
    [114] FANG R, HE B, LU M, et al. GPUQP: query co-processing using graphicsprocessors. In: Proceedings of the2007ACM SIGMOD internationalconference on Management of data. Los Angeles: ACM,2007,1061-1063.
    [115] CHEN S, QIN J, XIE Y, et al. A Fast and Flexible Sorting Algorithm withCUDA. In:9th International Conference, ICA3PP2009. Berlin: SpringerBerlin Heidelberg,2009,281-290.
    [116] MA W, AGRAWAL G. A compiler and runtime system for enabling data miningapplications on gpus. SIGPLAN Not,2009,44(4):287-288.
    [117] CATANZARO B, SU B-Y, SUNDARAM N, et al. Efficient, high-quality imagecontour detection. In: Proceedings of the Computer Vision,2009IEEE12thInternational Conference. Los Alamotis: IEEE,2009,2381-2388.
    [118] ALLUSSE Y, HORAIN P, AGARWAL A, et al. GpuCV: A GPU-AcceleratedFramework for Image Processing and Computer Vision. In: Proceedings of the4th International Symposium on Visual Computing. Berlin: Springer BerlinHeidelberg,2008,430-439.
    [119] LEE M, JEON J H, BAE J, et al. Parallel implementation of a financialapplication on a GPU. In: Proceedings of the2nd International Conference onInteraction Sciences: Information Technology. Los Angeles: ACM,2009,1136-1141.
    [120] SINGLA N, HALL M, SHANDS B, et al. Financial Monte Carlo simulation onarchitecturally diverse systems. In: Proceedings of the High PerformanceComputational Finance2008. Los Alamitos: IEEE,2008,1-7.
    [121] CORRIGAN A, CAMELLI F F, L HNER R, et al. Running unstructuredgrid-based CFD solvers on modern graphics hardware. International Journal forNumerical Methods in Fluids,2011,66(2):221-229.
    [122] APPLEYARD J, DRIKAKIS D. Higher-order CFD and interface trackingmethods on highly-Parallel MPI and GPU systems. Computers&Fluids,2011,46(1):101-105.
    [123] DE DONNO D, ESPOSITO A, TARRICONE L, et al. Introduction to GPUComputing and CUDA Programming: A Case Study on FDTD. Antennas andPropagation Magazine,2010,52(3):116-122.
    [124] TAKADA N, SHIMOBABA T, MASUDA N, et al. High-speed FDTDsimulation algorithm for GPU with compute unified device architecture. In:Proceedings of the Antennas and Propagation Society International Symposium2009. Los Alamitos: IEEE,2009,1-4.
    [125] ANDERSON J A, LORENZ C D, TRAVESSET A. General purpose moleculardynamics simulations fully implemented on graphics processing units. Journalof Computational Physics,2008,227(10):5342-5359.
    [126] LIU W, SCHMIDT B, VOSS G, et al. Molecular Dynamics Simulations onCommodity GPUs with CUDA. In: High Performance Computing HiPC2007.Berlin: Springer Berlin Heidelberg,2007,185-196.
    [127]柳有权,尹康学,吴恩华.大规模稀疏线性方程组的GMRES-GPU快速求解算法.计算机辅助设计与图形学学报,2011,23(4):553-560.
    [128]李建江,路川,张磊.基于指导语句的CUDA程序性能分析工具研究与实现.电子科技大学学报,2012,41(2):280-284.
    [129]刘小虎,胡耀国,符伟.大规模有限元系统的GPU加速计算研究.计算力学学报,2012,29(1):146-152.
    [130] XU J, WANG X, HE X, et al. Application of the Mole-8.5supercomputer:Probing the whole influenza virion at the atomic level. Chin Sci Bull,2011,56(20):2114-2118.
    [131] HAN L, HIPWELL J, TAYLOR Z, et al. Fast deformation simulation of breastsusing GPU-based dynamic explicit finite element method. DigitalMammography,2010,6236:728-735.
    [132] KRAWEZIK G P, POOLE G. Accelerating the ANSYS direct sparse solverwith GPUs. In: Proceedings of the2010Symposium on ApplicationAccelerators in High Performance Computing (SAAHPC’10). Illinois,2010,1-3.
    [133] NVIDIA. Popular GPU-accelerated Applications, http://www.nvidia.com/object/gpu-accelerated-applications.html,2012-10-1.
    [134] HENSLEY J. Amd ctm overview. In: SIGGRAPH’07. New York: ACM,2007,1-26.
    [135] BAYOUMI A, CHU M, HANAFY Y, et al. Scientific and engineeringcomputing using ati stream technology. Computing in Science and Engineering,2009,11(6):92-97.
    [136] STONE J E, GOHARA D, SHI G. OpenCL: A parallel programming standardfor heterogeneous computing systems. Computing in science&engineering,2010,12(3):66-72.
    [137] KARIMI K, DICKSON N G, HAMZE F. A performance comparison of CUDAand OpenCL. arXiv:10052581,2010.
    [138] DU P, WEBER R, LUSZCZEK P, et al. From CUDA to OpenCL: Towards aperformance-portable solution for multi-platform GPU programming. ParallelComputing,2012,38(8):391-407.
    [139] CUI X, LIU GR, LI GY, et al. Analysis of plates and shells using an edge-basedsmoothed finite element method. Computational Mechanics,2010,45(2):141-156.
    [140] ZHENG G, CUI X, LI G, et al. An edge-based smoothed triangle element fornon-linear explicit dynamic analysis of shells. Computational Mechanics,2011,48(1):65-80.
    [141] FLYNN M J. Some Computer Organizations and Their Effectiveness.Computers, IEEE Transactions on,1972, C21(9):948-960.
    [142]陈国良.并行算法的设计与分析.北京:高等教育出版社,1994,11-15.
    [143] SIEWERT S. Using Intel Streaming SIMD Extensions and Intel IntegratedPerformance Primitives to Accelerate Algorithms. http://software.intel.com/en-us/articles/using-intel-streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms,2009-11-04.
    [144] REINDERS J. Intel threading building blocks: outfitting C++for multi-coreprocessor parallelism. Sebastopol,CA: O'Reilly Media,2007,20-25.
    [145] SHAPIRO B A, WU J C, BENGALI D, et al. The massively parallel geneticalgorithm for RNA folding: MIMD implementation and population variation.Bioinformatics,2001,17(2):137-148.
    [146] CANT-PAZ E. A survey of parallel genetic algorithms. Calculateurs paralleles,reseaux et systems repartis,1998,10(2):141-171.
    [147] YUAN F, LIAO G, FAN W, et al. An interactive3D visualization system basedon PC using Intel SIMD,3D texturing and thinning techniques. InternationalJournal of Pattern Recognition and Artificial Intelligence,2006,20(3):393-416.
    [148] CLARK T W, MCCAMMON J A. Parallelization of a molecular dynamicsnon-bonded force algorithm for MIMD architecture. Computers&Chemistry,1990,14(3):219-224.
    [149] KENNEDY K, BENDER C F, CONNOLLY J W D, et al. A nationwide parallelcomputing environment. Commun ACM,1997,40(11):62-72.
    [150] WITTENBRINK C M, KILGARIFF E, PRABHU A. Fermi GF100GPUarchitecture. IEEE Micro,2011,31(2):50-59.
    [151] NICKOLLS J, DALLY W J. The GPU computing era. IEEE Micro,2010,30(2):56-69.
    [152] EWING R E, SHARPLEY R C, MITCHUM D, et al. Distributed computationof wave propagation models using PVM. Parallel&Distributed Technology:Systems&Applications, IEEE,1994,2(1):26-31.
    [153] UEHARA H, TAMURA M, YOKOKAWA M. An MPI benchmark programlibrary and its application to the Earth Simulator. In: Proceedings of the HighPerformance Computing. Berlin: Springer,2002,219-230.
    [154] GEIST A, BEGUELIN A, DONGARRA J, et al. PVM: Parallel Virtual Machine:A Users' Guide and Tutorial for Network Parallel Computing. Cambridge: MITpress,1994,87-95.
    [155] SNIR M, OTTO S, HUSS-LEDERMAN S, et al. MPI: The Complete Reference(Vol.1): Volume1-The MPI Core. Cambridge: MIT press,1998,1-3.
    [156]张武生,薛巍,李建江. MPI并行程序设计实例教程.北京:清华大学出版社,2009,2-5.
    [157] HILLIS W D, STEELE JR G L. Data parallel algorithms. Communications ofthe ACM,1986,29(12):1170-1183.
    [158] HATCHER P J, QUINN M J, LAPADULA A J, et al. Data-parallelprogramming on MIMD computers. Parallel and Distributed Systems, IEEETransactions on,1991,2(3):377-383.
    [159] DAGUM L, MENON R. OpenMP: an industry standard API forshared-memory programming. Computational Science&Engineering, IEEE,1998,5(1):46-55.
    [160] GLASKOWSKY P N. NVIDIA’s Fermi: the first complete GPU computingarchitecture. http://www.nvidia.com/content/PDF/fermi_white_papers/P.Glaskowsky_NVIDIA's_Fermi-The_First_Complete_GPU_Architecture.pdf,2009-9-1.
    [161] ATALLAH M J. Algorithms and Theory of Computation Handbook. BocaRaton: CRC press,1998,17-20.
    [162] WULF W A, MCKEE S A. Hitting the memory wall: Implications of theobvious. ACM SIGARCH Computer Architecture News,1995,23(1):20-24.
    [163] KANTER D. Nvidia's gt200: Inside a parallel processor, http://www.realworldtech.com/gt200/,2008-9-8.
    [164] GODEL N, NUNN N, WARBURTON T, et al. Scalability of higher-orderdiscontinuous Galerkin FEM computations for solving electromagnetic wavepropagation problems on GPU clusters. Magnetics(IEEE Transactions),2010,46(8):3469-3472.
    [165] BELYTSCHKO T, HUGHES T J. Computational methods for transient analysis(Mechanics and Mathematical Methods-Series of Handbooks). Amsterdam:North-Holland,1983,84-87.
    [166]崔向阳.机械结构分析中的新型低阶高精度单元理论研究:[湖南大学博士学位论文].长沙:湖南大学,2011,61-65.
    [167] BELYTSCHKO T, LIU W K, MORAN B. Nonlinear finite elements forcontinua and structures.1edition. Singapore: Wiley,2000,104-108.
    [168]王琥.基于并行计算的金属塑性成形仿真分析中关键技术研究:[湖南大学博士学位毕业论文].长沙:湖南大学,2006,14-20.
    [169] ELSEN E, LEGRESLEY P, DARVE E. Large calculation of the flow over ahypersonic vehicle using a GPU. Journal of Computational Physics,2008,227(24):10148-10161.
    [170]王建华,李光耀,李胜等.基于GPU弹性问题的快速计算方法.中国机械工程,2011,22(8):932-937.
    [171] JOLDES G R, WITTEK A, MILLER K. Real-time nonlinear finite elementcomputations on GPU–Application to neurosurgical simulation. ComputerMethods in Applied Mechanics and Engineering,2010,199(49):3305-3314.
    [172] COORPERATION N. NVIDIA CUDA C programming guide4.0. Santa Clara:NVIDIA Corporation,2011,10-20.
    [173] AHN J H, EREZ M, DALLY W J. Scatter-add in data parallel architectures. In:HPCA '05Proceedings of the11th International Symposium onHigh-Performance Computer Architecture.Washington:IEEE Computer SocietyWashington,2005,132-142.
    [174] HE B, GOVINDARAJU N K, LUO Q, et al. Efficient gather and scatteroperations on graphics processors. In: Proceedings of the2007ACM/IEEEconference on Supercomputing. Reno, Nevada: ACM,2007,1-12.
    [175] GENAUD S, GIERSCH A, VIVIEN F. Load-balancing scatter operations forgrid computing. Parallel Computing,2004,30(8):923-946.
    [176] KIRK D B, WEN-MEI W H. Programming massively parallel processors: ahands-on approach. San Francisco: Morgan Kaufmann,2010,2-5.
    [177] BELYTSCHKO T, LEVIATHAN I. Physical stabilization of the4-node shellelement with one point quadrature. Computer Methods in Applied Mechanicsand Engineering,1994,113(3):321-350.
    [178]钟阳,钟志华,李光耀等.机械系统接触碰撞界面显式计算的算法综述.机械工程学报,2011,47(13):44-58.
    [179] WHIRLEY R G, ENGELMANN B E. Automatic contact in DYNA-3D forvehicle crashworthniess. In: Crashworithniess and Occupant Protection inTransportation System, Proceedings of the1993ASME Winter AnnunalMeeting. New York:ASME Applied Mechanics Division,1993,15-29.
    [180]谢晖,钟志华,李光耀等.板料冲压数值模拟的并行计算与应用.中国机械工程,2003,14(21):1842-1844.
    [181] BELL N, HOBEROCK J. GPU Computing Gems: Jade Edition. GPUComputing Gems: Jade Edition. San Francisco: Morgan Kaufmann,2011,359-372.
    [182] OLDENBURG M. Finite element analysis of thin-walled structures subjectedto impact loading:[Doctoral thesis]. Pors n: Lule University ofTechnologyy,1988,3-8.
    [183] Makinouchi A. Sheet metal forming simulation in industry. Journal ofMaterials processing Technology,1996,60(1-4):19-26.
    [184]郑刚.汽车覆盖件冲压成形中拉延筋模型及其参数反演研究:[湖南大学博士学位论文].长沙:湖南大学,2008.
    [185] HILL R. A theory of the yielding and plastic flow of anisotropic metals.Proceedings of the Royal Society of London Series A Mathematical andPhysical Sciences,1948,193(1033):281-297.
    [186] BARLAT F, LIAN K. Plastic behavior and stretchability of sheet metals. Part I:A yield function for orthotropic sheets under plane stress conditions.International Journal of Plasticity,1989,5(1):51-66.
    [187] PAPELEUX L, PONTHOT JP. Finite element simulation of springback in sheetmetal forming. Journal of Materials Processing Technology,2002,125:785-791.
    [188] ERICSON C. Real-time collision detection. San Francisco: Morgan Kaufmann,2004,1-6.
    [189] GREEN S. Particle simulation using CUDA. Santa Clara: NVIDIA Corporation,2010,1-3.
    [190] LE GRAND S. Broad-phase collision detection with CUDA, GPUGems.Boston: Addison-Wesley Professional,2007,3697-3721.
    [191] G ddeke D, STRZODKA R, TUREK S. Accelerating double precision FEMsimulations with GPUs. In: Proceedings of ASIM2005-18th Symposium onSimulation Technique. Erlangen,2005,1-6.
    [192] KUTTER O, SHAMS R, NAVAB N. Visualization and GPU-acceleratedsimulation of medical ultrasound from CT images. Computer methods andprograms in biomedicine,2009,94(3):250-266.
    [193] FARBER R. CUDA application design and development. Waltham: MorganKaufmann,2011,207-239.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700