基于多GPU的FDTD并行算法及其在电磁仿真中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
理论、实验与计算相结合已成为科学研究的基本模式,在电磁科学与工程领域中,时域有限差分(FDTD)算法已成为进行电磁场分析的重要方法。FDTD算法是一种麦克斯韦(Maxwell)方程组的时域求解方法,直接将电磁场按照Yee网格的方式进行离散,在空间及时域上利用中心差分近似Maxwell旋度方程中的偏微分,就可以实现电磁场在时域的交替递推。其实现简洁,易于理解,对各种形状以及各种材料的介质有着广泛的适应性;因为FDTD方法直接求解Maxwell方程组,所以各种电磁现象均隐含其中,因此其适用于求解电磁场的辐射、传输及散射等各种问题。自从1966年FDTD由Yee提出以来,也在不断地发展并已广泛地应用于各频段的电磁场仿真领域。
     作为一种差分方法,受到数值色散及数值稳定性的影响,为保证FDTD算法的精度,对网格划分有着较为严格的限制。一般其空间步长要小于波长的1/10,当物体结构更为复杂时,空间取样点更要足够多以尽可能真实地模拟物体,而时间步长要满足Courant稳定性条件,与空间步长相关。因此进行电大问题或者精细结构问题的计算时,FDTD方法往往是十分耗时的。
     FDTD算法具有天然可并行优势,因此进行并行计算可有效地减少计算时间,加速仿真设计进度。FDTD并行计算主要集中在基于网络设备的并行算法上,如超级计算机以及个人计算机集群,但由于成本及网络速度影响,这种并行方式的性价比并不高;基于可编程器件的FDTD并行算法也得到部分研究者关注,不过由于可编程器件的复杂性以及器件发展问题也并未得到广泛应用。
     近年来,图形处理器(GPU)受到游戏市场需求的带动以超过摩尔定律的速度发展,而且其浮点运算能力远高于同时期CPU的运算能力,所以GPU在通用科学计算领域中的应用也逐渐受到关注,如今随着通用图形处理器(GPGPU)技术的迅速发展,GPU已广泛应用于各种通用算法以及各领域的科学计算中,在电磁计算方面特别是FDTD算法上的应用得到了研究者的广泛关注。计算统一设备架构(CUDA)模型出现以后,使得通用图形处理器并行程序的开发更为快速高效,受到科学研究者的欢迎并迅速应用于各学科的计算领域。
     本论文研究课题来源于国家重点基础研究发展计划项目:金属/介质纳米异质结构中的局域耦合效应及其在光电转换器件中的应用,本论文研究内容为其中的应用GPU技术进行发光二极管(LED)并行仿真计算系统研究部分,主要研究了基于GPU的FDTD并行算法,最终实现了多GPU平台上的FDTD混合并行运算,极大地提高了利用FDTD算法进行电磁仿真的运算速度,已应用于LED的仿真设计中,进行了LED发光增强研究。论文主要分为以下几个部分:
     首先,本论文对研究相关的基础做了介绍,包括电磁计算以及并行计算基础,说明了本文的研究意义以及主要内容,然后对并行计算技术进行了研究,分析了各种并行方法的特点,并对GPU以及通用图形处理器技术的发展应用作深入探讨,研究了CUDA模型的软硬件基础以及编程模型,最终选择CUDA模型作为研究FDTD并行算法的基础。
     其次,本文研究了基本FDTD算法原理以及相关知识,如数值色散、边界条件以及激励源等,然后讨论了并行FDTD计算的发展现状,引出本文所要研究的具体内容。
     论文提出了一种在CUDA架构下二维及三维FDTD并行算法的实现方式,并实现了二维FDTD算法的各向异性完全匹配层(UPML)吸收边界条件,以及三维FDTD算法的UPML和卷积完全匹配层(CPML)吸收边界条件,实现的入射源包括二维线电流源,三维偶极子源以及平面波入射源,并且在平面波入射源的加入中也实现了一维Mur吸收边界条件的FDTD并行算法。本文提出利用二维线程组织控制电磁场的递推的方式处理二维问题,并提出了多种存储器访问优化方案,包括共享存储器的两种访问方式以及纹理存储器的使用等。在处理三维问题时,本文提出并实现了两种线程组织方案,并对两种方案进行了优化,对比了其计算速度,相对于传统CPU串行算法均达到了10倍以上的加速比。针对UPML和CPML的不同特点,本文采取了扩展PML以及分立计算的不同处理方式,并采取了相应的优化方式,在保证计算精度的前提下,均实现了较高的计算速度,与串行算法相比普遍达到20倍以上的速度提升,最高达到了58倍的加速比。
     在单GPU并行计算的基础上,本文将并行算法扩展到多GPU平台。采用FDTD区域分解以及合理的边界交换方案,并利用GPU与CPU内存之间的同步数据传输方案实现了FDTD算法的多GPU并行,为降低数据传输的影响,本文针对多GPU的FDTD算法提出了异步数据传输方案,经验证本方案能够有效地提升多GPU的并行效率。首次实现了GPU内部并行计算,GPU之间并行计算以及数据传输与计算之间的任务并行的FDTD混合并行计算。本文对多GPU算法进行性能测试,包含10层CPML的FDTD算法,在8块GTX295组成的计算平台上达到了4000Mcells/s以上的运算速度。
     本文利用GPU运算平台研究了三维FDTD算法中CPML各参数对其吸收效果的影响,进行了微带天线以及滤波器的仿真分析。本文提出了利用FDTD算法计算偶极子辐射功率的方法,在多GPU平台上进行了验证,并利用此方法计算了LED模型的辐射光功率,并利用顶部光子晶体提高了其辐射功率。
Combination of theory, experiment and computation has become the basic pattern of scientific research. In the electromagnetics science and engineering field, finite-difference time-domain (FDTD) has been an important method for electromagnetics analysis. FDTD is a time domain method solving for Maxwell equations. Electromagnetic fields are discretized with Yee cells. Maxwell equations are changed to difference equations by using central difference both in space and time domain. Then electric fields and magnetic fields can be updated alternatively in time domain. This method is simple both in implementation and comprehending. Most dielectric and complex objects can be constructed easily with this method. And it can be used to solve radiation, transmission and scattering problems because all the propagation phenomena are implicitly taken into account throughout its formulation. It has been developing and widely applied in the electromagnetics simulations in any band of the whole spectrum since 1966 proposed by Yee.
     As a difference method, FDTD is restricted by numerical dispersion and stability, and therefore the space and time step must be small enough to guarantee the accuracy of FDTD method. The space steps should be less than 1/10 of the wavelength generally. If the geometric model is more complex, samples in one wavelength should be increased to simulate the object as closely as possible. The time step must be satisfies Courant stability condition, which has relationship with the space step. So it will be take long time to simulate electrically large or fine structures using FDTD method.
     As FDTD is an inherently data parallel algorithm, parallel computing is an efficient way to reduce computation time and accelerate the progress of simulations. Most parallel FDTD computing algorithms are based on computer network, including supercomputer systems and personal computer clusters. However, this method is not cost-effective because of the expensive equipments of supercomputers or the network speed of clusters. Using programmable devices is another way, but the hardware program language is too complex to do FDTD computing and the developing of devices is slower than the personal computer. So this method has not been widely used.
     In recent years, graphics processing unit (GPU) has been developing faster more than Moore's Law as the developing of game demand. The floating-point processing performance of GPU is much higher than contemporary CPU. The implementation of GPU in general scientific computation is an increasing concern. And more and more general algorithms in many scientific fields are applied on GPU with the developing of general purpose computation on GPU (GPGPU) technology. The programming on GPU becomes rapid and efficient as the appearance of compute unified device architecture (CUDA) model. It has been popular with scientific researchers and applied in many fields rapidly.
     The content of this dissertation is a part of a National Basic Research Program of China, which is named effect of localization coupling in metal/dielectric nano heterogeneity structure and its'applications in photoelectric conversion devices. Its purpose is researching parallel computation system for simulation of light emitting diode (LED) by using general computation on graphics processing unit technology. In this dissertation, parallel FDTD algorithm is studied. Hybrid parallel FDTD computing is implemented on multi-GPU platforms, which greatly improves computational speed of simulation with FDTD method. The parallel computational system researched by this dissertation is used for LED simulation, such as enhancement light emission power by using top photonic crystal. This dissertation is divided into the following sections:
     Firstly, background and related knowledge is presented, including electromagnetic computing and basic information of parallel computing technology. The significance of the study and the content are introduced. Then parallel computing technology is studied. Various computing methods are demonstrated and contrasted. The development and application of GPU and GPGPU technology are discussed. Software and hardware environment of CUDA are investigated. And CUDA model is chosen to be used as parallel FDTD computing implement.
     Secondly, Basic FDTD algorithm and relative knowledge are introduced, such as numerical dispersion, boundary conditions and sources. The situation of parallel FDTD computing development is discussed, which induces the content of this dissertation.
     Two-dimensional (2D) and three-dimensional (3D) parallel FDTD algorithm implematations are proposed based on CUD A model.2D FDTD with uniaxial perfect matched layer (UPML), three dimensional FDTD with UPML and convolutional PML (CPML) are implemented on GPU. Line electronic current source in 2D, dipole and plane wave sources in 3D are implemented. One-dimensional FDTD with Mur absorbing boundary condition is implemented in 3D plane wave sources application.2D thread assigned to control electromagnetic field updating for solving 2D problems. Several memory access optimization schemes are proposed in order to accelerate computing speed, such as two ways for shared memory access and using texture memory. Two thread arrangement schemes are proposed and implemented to solve 3D problems. Optimization is proposed and speed of two schemes is contrasted, which is shown that above 10 times speedup are obtained in almost every case. PML parameters expanding and discrete computing are used to process UPML and CPML respectively. And corresponding optimization approaches are implemented for each PML. The speed of PML-FDTD computation is accelerated above 20 times commonly ensuring the computational accuracy.
     The parallel FDTD algorithm is extended to multiple GPUs (multi-GPU). Domain decomposition and appropriate boundary data exchanging are used in multi-GPU system, and synchronous memory copy scheme is used for data exchanging between GPU and CPU memory. In order to hide the memory transmission time, asynchronous memory copy scheme is used, which is proved to be efficient for multi-GPU parallel computing. Parallel computing on single GPU, parallel computing on multi-GPU, parallel tasks of computing and data exchanging is implemented for the first time. The performance of these schemes is evaluated on multi-GPU system, which contains 8 GTX295 graphics cards. Speed of above 4000Mcell/s is obtained in 3D FDTD application with 10 layers CPML.
     The effect on absorption of parameters in CPML is tested on GPU platform. Microstrip antenna and filter are simulated by 3D parallel FDTD computing. Method of calculating radiation power of dipole with FDTD is proposed and verified on multi-GPU system. A light emitting diode (LED) model is computed and its radiation power is calculated with our method. Photonics crystal is used for emitting enhancement.
引文
[1]Taflove A, Hagness SC. Computational Electrodynamics:The Finite-Difference Time-domain method [M].3rd ed. Boston:Artech House Norwood, MA,2005.
    [2]王秉中.计算电磁学[M].北京:科学出版社,2002.
    [3]Trowbridge CW, Sykulski JK. Some key developments in computational electromagnetics and their attribution [J]. Magnetics, IEEE Transactions on,2006, 42(4):503-8.
    [4]葛德彪,闫玉波.电磁波时域有限差分方法[M].2nd ed.西安:西安电子科技大学出版社,2005.
    [5]Southwell RV, Lindsay RB. Relaxation methods in theoretical physics [J]. Physics Today,1957,2(04):849-57.
    [6]Turner MJ, Clough RW, Martin HC, et al. Stiffness and deflection analysis of complex structures [J]. Journal of Aeronautical Sciences,1956,23(9):805-23.
    [7]Kouyoumjian RG, Pathak PH. A uniform geometrical theory of diffraction for an edge in a perfectly conducting surface [J]. Proceedings of the IEEE,1974,62(11): 1448-61.
    [8]Hanington RF. Field Computation by Moment Methods [M].1st ed. New York: The Macmillan Company,1968.
    [9]Yee KS. Numerical solution of initial boundary value problems involving maxwell's equations in isotropic media [J]. Antennas and Propagation, IEEE Transactions on,1966,14(3):302-7.
    [10]Johns PB. Application of the transmission-line matrix method to homogeneous waveguides of arbitrary cross-section [J]. Proc IEE,1972,119(8):1086-91.
    [11]Purcell EM, Pennypacker CR. Scattering and absorption of light by nonspherical dielectric grains [J]. The Astrophysical Journal,1973,186(705-14.
    [12]Greengard L, Rokhlin V. A new version of the fast multipole method for the Laplace equation in three dimensions [J]. Acta numerica,1997,6(1):229-69.
    [13]Jaswon MA. Integral Equation Methods in Potential Theory. I [J]. Proceedings of the Royal Society of London Series A, Mathematical and Physical Sciences,1963, 275(1360):23-32.
    [14]Hrennikoff A. Solution of problems of elasticity by the framework method [J]. Journal of Applied Mechanics,1941,8(4):169-75.
    [15]McHenry D. A lattice analogy for the solution of plane stress problems [J]. Journal of Institute of Civil Engineers,1943,21(2):59-82.
    [16]COURANT R. VARIATIONAL METHODS FOR THE SOLUTION OF PROBLEMS OF EQUILIBRIUM AND VIBRATIONS [J]. Bulletin of the American Mathematical Society,1943,49(1):1-23.
    [17]ZIENKIEWICZ OC. The stress distribution in gravity dams [J]. Journal of Institute of Civil Engineers,1947,244-71.
    [18]Clough RW. The Finite Element Method in Plane Stress Analysis;proceedings of the Proceeding 2nd ASCE on Electronic Computation, Pittsburg,1960 [C]. American Society of Civil Engineers.
    [19]Winslow AM. Numerical Calculation of Static Magnetic Fields in an Irregular Triangle Mesh [J]. Journal of Computational Physics,1966,1(1):149.
    [20]Chari MVK, Silvester P. Finite-Element Analysis of Magnetically Saturated D-C Machines [J]. Power Apparatus and Systems, IEEE Transactions on,1971, PAS-90(5):2362-72.
    [21]Jin JM. The finite element method in electromagnetics [M]. New York:Wiley New York,2002.
    [22]McDonald BH, Wexler A. Finite-Element Solution of Unbounded Field Problems [J]. Microwave Theory and Techniques, IEEE Transactions on,1972,20(12): 841-7.
    [23]Boyse WE, Seidl AA. A hybrid finite element method for 3-D scattering using nodal and edge elements [J]. Antennas and Propagation, IEEE Transactions on, 1994,42(10):1436-42.
    [24]Jianxing W, Michalski KA. Hybrid finite element-mixed-potential integral equation-discrete complex image approach for inhomogeneous waveguides in layered media; proceedings of the Antennas and Propagation Society International Symposium,1995 AP-S Digest,18-23 Jun 1995,1995 [C].
    [25]Eibert TF, Hansen V.3-D FEM/BEM-hybrid approach based on a general formulation of Huygens' principle for planar layered media [J]. Microwave Theory and Techniques, IEEE Transactions on,1997,45(7):1105-12.
    [26]Mei K, Van Bladel J. Scattering by perfectly-conducting rectangular cylinders [J]. Antennas and Propagation, IEEE Transactions on,1963,11(2):185-92.
    [27]Harrington RF. Field computation by moment methods [M]. Melbourne:Krieger Publishing Company,1982.
    [28]Harrington RF. Field computation by moment methods [M]. Wiley-IEEE Press, 1993.
    [29]Klein C, Mittra R. The effect of different testing functions in the moment method solution of thin-wire antenna problems [J]. Antennas and Propagation, IEEE Transactions on,1975,23(2):258-61.
    [30]Ekelman E, Thiele G. A hybrid technique for combining the moment method treatment of wire antennas with the GTD for curved surfaces [J]. Antennas and Propagation, IEEE Transactions on,1980,28(6):831-9.
    [31]Sien-Chong W, Chow YL. An Application of the Moment Method to Waveguide Scattering Problems [J]. Microwave Theory and Techniques, IEEE Transactions on,1972,20(11):744-9.
    [32]Engheta N, Murphy WD, Rokhlin V, et al. The fast multipole method (FMM) for electromagnetic scattering problems [J]. Antennas and Propagation, IEEE Transactions on,1992,40(6):634-41.
    [33]Song JM, Chew WC. Fast multipole method solution of three dimensional integral equation; proceedings of the Antennas and Propagation Society International Symposium,1995 AP-S Digest,18-23 Jun 1995,1995 [C].
    [34]Song JM, Chew WC. Multilevel fast-multipole algorithm for solving combined field integral equations of electromagnetic scattering [J]. Microwave and Optical Technology Letters,1995,10(1):14-9.
    [35]Taflove A, Brodwin ME. Numerical Solution of Steady-State Electromagnetic Scattering Problems Using the Time-Dependent Maxwell's Equations [J]. Microwave Theory and Techniques, IEEE Transactions on,1975,23(8):623-30.
    [36]Taflove A, Brodwin ME. Computation of the Electromagnetic Fields and Induced Temperatures Within a Model of the Microwave-Irradiated Human Eye [J]. Microwave Theory and Techniques, IEEE Transactions on,1975,23(11):888-96.
    [37]Mur G. Absorbing Boundary Conditions for the Finite-Difference Approximation of the Time-Domain Electromagnetic-Field Equations [J]. Electromagnetic Compatibility, IEEE Transactions on,1981, EMC-23(4):377-82.
    [38]Umashankar K, Taflove A. A Novel Method to Analyze Electromagnetic Scattering of Complex Objects [J]. Electromagnetic Compatibility, IEEE Transactions on,1982, EMC-24(4):397-405.
    [39]Taflove A, Umashankar K. Radar Cross Section of General Three-Dimensional Scatterers [J]. Electromagnetic Compatibility, IEEE Transactions on,1983, EMC-25(4):433-40.
    [40]Yee KS, Ingham D, Shlager K. Time-domain extrapolation to the far field based on FDTD calculations [J]. Antennas and Propagation, IEEE Transactions on,1991, 39(3):410-3.
    [41]Luebbers RJ, Kunz KS, Schneider M, et al. A finite-difference time-domain near zone to far zone transformation [electromagnetic scattering] [J]. Antennas and Propagation, IEEE Transactions on,1991,39(4):429-33.
    [42]Luebbers R, Ryan D, Beggs J. A two-dimensional time-domain near-zone to far-zone transformation [J]. Antennas and Propagation, IEEE Transactions on, 1992,40(7):848-51.
    [43]Berenger JP. A perfectly matched layer for the absorption of electromagnetic waves [J]. Journal of Computational Physics,1994,114(2):185-200.
    [44]Liao ZP, Wong HL, Yang BP. A transmitting boundary for transient wave analysis [J]. Scientia Sinica,1984,27(10):1063-76.
    [45]Gedney SD. An anisotropic perfectly matched layer-absorbing medium for the truncation of FDTD lattices [J]. Antennas and Propagation, IEEE Transactions on, 1996,44(12):1630-9.
    [46]Roden JA, Gedney SD. Convolution PML (CPML):An efficient FDTD implementation of the CFS-PML for arbitrary media [J]. Microwave and Optical Technology Letters,2000,27(5):334-9.
    [47]Taflove A, Hagness SC. Computational Electrodynamics:The Finite-Difference Time-domain method [M].2nd ed. Boston:Artech House Norwood, MA,2000.
    [48]Maloney JG, Smith GS, Scott WR, Jr. Accurate computation of the radiation from simple antennas using the finite-difference time-domain method [J]. Antennas and Propagation, IEEE Transactions on,1990,38(7):1059-68.
    [49]Marrocco G, Ciattaglia M. Ultrawide-band modeling of transient radiation from aperture antennas [J]. Antennas and Propagation, IEEE Transactions on,2004, 52(9):2341-7.
    [50]Sheen DM, Ali SM, Abouzahra MD, et al. Application of the three-dimensional finite-difference time-domain method to the analysis of planar microstrip circuits [J]. Microwave Theory and Techniques, IEEE Transactions on,1990,38(7): 849-57.
    [51]Chu ST, Chaudhuri SK. A finite-difference time-domain method for the design and analysis of guided-wave optical structures [J]. Lightwave Technology, Journal of, 1989,7(12):2033-8.
    [52]Huang WP, Chu ST, Chaudhuri SK. A semivectorial finite-difference time-domain method (optical guided structure simulation) [J]. Photonics Technology Letters, IEEE,1991,3(9):803-6.
    [53]Xiao H, Yao D. Analysis of the design of a new tunable photonic crystal filter at visible band [J]. Physica E:Low-dimensional Systems and Nanostructures,2005, 27(1-2):1-4.
    [54]Sakai K, Miyai E, Sakaguchi T, et al. Lasing band-edge identification for a surface-emitting photonic crystal laser [J]. Selected Areas in Communications, IEEE Journal on,2005,23(7):1335-40.
    [55]Kwon MK, Kim JY, Park IK, et al. Enhanced emission efficiency of GaN/ InGaN multiple quantum well light-emitting diode with an embedded photonic crystal [J]. Applied Physics Letters,2008,92(251110.
    [56]Dang Hoang L, In-Kag H, Sang-Wan R. Design Optimization of Photonic Crystal Structure for Improved Light Extraction of GaN LED [J]. Selected Topics in Quantum Electronics, IEEE Journal of,2009,15(4):1257-63.
    [57]Sullivan D. Three-dimensional computer simulation in deep regional hyperthermia using the finite-difference time-domain method [J]. Microwave Theory and Techniques, IEEE Transactions on,1990,38(2):204-11.
    [58]Shalaev VM, Cai W, Chettiar UK, et al. Negative index of refraction in optical metamaterials [J]. Optics Letters,2005,30(24):3356.
    [59]Kirby El, Hamm JM, Tsakmakidis KL, et al. FDTD analysis of slow light propagation in negative-refractive-index metamaterial waveguides [J]. Journal of Optics A:Pure and Applied Optics,2009,11(114027.
    [60]Fu Y, Li K, Kong F. Analysis of the optical transmission through the metal plate with slit array [J]. Progress In Electromagnetics Research,2008,82(1):109-25.
    [61]Deveze T, Beaulieu L, Tabbara W. A fourth order scheme for the FDTD algorithm applied to Maxwell's equations; proceedings of the IEEE Antennas and Propagation Society International Symposium, Chicago,2002 [C]. IEEE.
    [62]Kong FM, Li K, Liu X. Accurate analysis of planar optical waveguide devices using higher-order FDTD scheme [J]. Optics Express,2006,14(24):11796-803.
    [63]Krumpholz M, Katehi LPB. MRTD:new time-domain schemes based on multiresolution analysis [J]. Microwave Theory and Techniques, IEEE Transactions on,1996,44(4):555-71.
    [64]Fenghua Z, Zhizhang C, Jiazong Z. Toward the development of a three-dimensional unconditionally stable finite-difference time-domain method [J]. Microwave Theory and Techniques, IEEE Transactions on,2000,48(9):1550-8.
    [65]Zheng F, Chen Z. Numerical dispersion analysis of the unconditionally stable 3-D ADI-FDTD method [J]. Microwave Theory and Techniques, IEEE Transactions on, 2001,49(5):1006-9.
    [66]Jensen MA, Fijany A, Rahmat-Samii Y. Time-parallel computational strategy for FDTD solution of Maxwell's equations; proceedings of the Antennas and Propagation Society International Symposium,1994 AP-S Digest,20-24 Jun 1994,1994 [C].
    [67]Guiffaut C, Mahdjoubi K. A parallel FDTD algorithm using the MPI library [J]. Antennas and Propagation Magazine, IEEE,2001,43(2):94-103.
    [68]Lei JZ, Liang CH, Wei D, et al. Study on MPI-Based Parallel Modified Conformal FDTD for 3-D Electrically Large Coated Targets by Using Effective Parameters [J]. Antennas and Wireless Propagation Letters, IEEE,2008,7(175-8.
    [69]Wenhua Y, Hashemi MR, Mittra R, et al. Massively Parallel Conformal FDTD on a BlueGene Supercomputer [J]. Advanced Packaging, IEEE Transactions on,2007, 30(2):335-41.
    [70]Sano K, Hatsuda Y, Wang L, et al. Performance Evaluation of Finite-Difference Time-Domain (FDTD) Computation Accelerated by FPGA-based Custom Computing Machine [J]. Interdisciplinary Information Sciences,2009,15(1): 67-78.
    [71]Inman MJ, Elsherbeni AZ. Programming video cards for computational electromagnetics applications [J]. Antennas and Propagation Magazine, IEEE, 2005,47(6):71-8.
    [72]Petersen WP, Arbenz P. Introduction to parallel computing [M]. Oxford University Press, USA,2004.
    [73]Flynn MJ. Some computer organizations and their effectiveness [J]. Computers, IEEE Transactions on,2009,100(9):948-60.
    [74]周伟明.多核计算与程序设计[M].武汉:华中科技大学出版社.2009.
    [75]Mattson TG, Sanders BA, Massingill B. Patterns for parallel programming [M]. Boston:Addison-Wesley Professional,2005.
    [76]Dongarra JJ, Foster I, Fox G, et al. Sourcebook of parallel computing [M]. San Francisco:Morgan Kaufmann Pub,2003.
    [77]Dagum L, Menon R. OpenMP:an industry standard API for shared-memory programming [J]. Computational Science & Engineering, IEEE,1998,5(1):46-55.
    [78]Geist GA, Sunderam VS. The PVM System:Supercomputer Level Concurrent Computation on a Heterogeneous Network of Workstations; proceedings of the Distributed Memory Computing Conference,1991 Proceedings, The Sixth, Arlington, TX, USA,28 Apr-1 May 1991,1991 [C]. IEEE.
    [79]Gropp W, Lusk E. The MPI communication library:its design and a portable implementation; proceedings of the Scalable Parallel Libraries Conference,1993, Proceedings of the, Mississippi State, MS 6-8 Oct 1993,1993 [C]. IEEE.
    [80]Ewing RE, Sharpley RC, Mitchum D, et al. Distributed computation of wave propagation models using PVM [J]. Parallel & Distributed Technology:Systems & Applications, IEEE,1994,2(1):26-31.
    [81]Nupairoj N, Ni LM. Performance evaluation of some MPI implementations on workstation clusters; proceedings of the Scalable Parallel Libraries Conference, 1994, Proceedings of the 1994, Mississippi State, MS 12-14 Oct 1994,1994 [C]. IEEE.
    [82]Vrenios A. Parallel Programming in C with MPI and OpenMP [Book Review] [J]. Distributed Systems Online, IEEE,2004,5(1):7.1-7.3.
    [83]Jones MD, Yao R, Bhole CP. Hybrid MPI-OpenMP Programming for Parallel OSEM PET Reconstruction [J]. Nuclear Science, IEEE Transactions on,2006, 53(5):2752-8.
    [84]Creel M, Goffe WL. Multi-core CPUs, Clusters, and Grid Computing:A Tutorial [J]. Computational Economics,2008,32(4):353-82.
    [85]Kumar V. Introduction to parallel computing [M]. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.,2002.
    [86]陈国良.并行计算:结构算法编程(修订版)[M].2 ed.北京:高等教育出版社,2003.
    [87]Liu WG, Schmidt B, Voss G, et al. Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA [J]. Computer Physics Communications,2008,179(9):634-41.
    [88]Weibin G, Cheqing J, Jianhua L. High Performance Lattice Boltzmann Algorithms for Fluid Flows; proceedings of the Information Science and Engineering,2008 ISISE'08 International Symposium on, Shanghai 20-22 Dec.2008,2008 [C]. IEEE.
    [89]Balevic A, Rockstroh L, Tausendfreund A, et al. Accelerating Simulations of Light Scattering Based on Finite-Difference Time-Domain Method with General Purpose GPUs; proceedings of the Computational Science and Engineering,2008 CSE'08 11th IEEE International Conference on,16-18 July 2008,2008 [C].
    [90]Owens D, Luebke D, Govindaraju N, et al. A survey of general-purpose computation on graphics hardware [J]. Computer Graphics Forum,2005,26(1): 80-113.
    [91]Strzodka R, Doggett M, Kolb A. Scientific computation for simulations on programmable graphics hardware [J]. Simulation Modelling Practice and Theory, 2005,13(8):667-80.
    [92]Fernando R. GPU Gems:Programming Techniques, Tips and Tricks for Real-Time Graphics [M]. Upper Saddle River:Pearson Higher Education,2004.
    [93]Kruger J, Westermann R. Linear algebra operators for GPU implementation of numerical algorithms [J]. ACM Transactions on Graphics,2003,22(3):908-16.
    [94]Bolz J, Farmer I, Grinspun E, et al. Sparse matrix solvers on the GPU:conjugate gradients and multigrid [J]. ACM Transactions on Graphics,2003,22(3):917-24.
    [95]Hsieh H-H, Tai W-K. A simple GPU-based approach for 3D Voronoi diagram construction and visualization [J]. Simulation Modelling Practice and Theory, 2005,13(8):681-92.
    [96]Portegies Zwart SF, Belleman RG, Geldof PM. High-performance direct gravitational N-body simulations on graphics processing units [J]. New Astronomy, 2007,12(8):641-50.
    [97]Hasan MM, Sazzad Karim M, Ahmed E. Generating and Rendering Procedural Clouds in Real Time on Programmable 3D Graphics Hardware; proceedings of the 9th International Multitopic Conference, IEEE INMIC 2005, Karachi,24-25 Dec.2005,2005 [C]. IEEE.
    [98]Buck I, Foley T, Horn D, et al. Brook for GPUs:stream computing on graphics hardware [J]. ACM Transactions on Graphics,2004,23(3):777-86.
    [99]Setoain J, Prieto M, Tenllado C, et al. Parallel Morphological Endmember Extraction Using Commodity Graphics Hardware [J]. Geoscience and Remote Sensing Letters, IEEE,2007,4(3):441-5.
    [100]NVIDIA. NVIDIA CUDA Programming Guide 2.3 [M].2009.
    [101]NVIDIA. The CUDA Compiler Driver NVCC [M].2009.
    [102]Komatitsch D, Mich D, Erlebacher G. Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA [J]. Journal of Parallel and Distributed Computing,2009,69(5):451-60.
    [103]Martinsen P, Blaschke J, Kunemeyer R, et al. Accelerating Monte Carlo simulations with an NVIDIA graphics processor [J]. Computer Physics Communications,2009,180(10):1983-9.
    [104]Preis T, Virnau P, Paul W, et al. GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model [J]. Journal of Computational Physics,2009,228(12): 4468-77.
    [105]Khajeh-Saeed A, Poole S, Perot JB. Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors [J]. Journal of Computational Physics,2010,229(11):4247-58.
    [106]Lezar E, Davidson DB. GPU-Accelerated Method of Moments by Example: Monostatic Scattering [J]. Antennas and Propagation Magazine, IEEE,2010,52(6): 120-35.
    [107]Costen F, Berenger JP, Brown AK. Comparison of FDTD Hard Source With FDTD Soft Source and Accuracy Assessment in Debye Media [J]. Antennas and Propagation, IEEE Transactions on,2009,57(7):2014-22.
    [108]Calalo RH, Lyons JR, Imbriale WA. Finite difference time domain solution of electromagnetic scattering on the hypercube; proceedings of the Proceedings of the third conference on Hypercube concurrent computers and applications, Pasadena, California, United States,1988 [C]. ACM.
    [109]Tatalias KD, Bornholdt JM. Mapping electromagnetic field computations to parallel processors [J]. Magnetics, IEEE Transactions on,1989,25(4):2901-6.
    [110]Gedney SD. Finite-difference time-domain analysis of microwave circuit devices on high performance vector/parallel computers [J]. Microwave Theory and Techniques, IEEE Transactions on,1995,43(10):2510-4.
    [111]Liu ZM, Mohan AS, Aubrey TA, et al. Techniques for implementation of the FDTD method on a CM-5 parallel computer [J]. Antennas and Propagation Magazine, IEEE,1995,37(5):64-71.
    [112]Alleon G, Duceau E, Ecer A, et al. Simulation of electromagnetic wave interactions on large MIMD supercomputers [M]. Parallel Computational Fluid Dynamics 1993. Amsterdam; North-Holland.1995:289-94.
    [113]Buchanan WJ, Gupta NK. Parallel processing techniques in EMP propagation using 3D Finite-Difference Time-Domain (FDTD) method [J]. Advances in Engineering Software,1993,18(3):149-59.
    [114]Oguni N, Aasai H. Estimation of parallel FDTD-based electromagnetic field solver on PC cluster with multi-core CPUs; proceedings of the Advanced Packaging and Systems Symposium,2008 EDAPS 2008 Electrical Design of, Seoul,10-12 Dec. 2008,2008 [C]. IEEE.
    [115]Varadarajan V, Mittra R. Finite-difference time-domain (FDTD) analysis using distributed computing [J]. Microwave and Guided Wave Letters, IEEE,1994,4(5): 144-5.
    [116]Su MF, El-Kady I, Bader DA, et al. A novel FDTD application featuring OpenMP-MPI hybrid parallelization; proceedings of the Parallel Processing,2004 ICPP 2004 International Conference on, Montreal,Que,Canada,15-18 Aug.2004, 2004 [C].
    [117]Aversa R, Di Martino B, Rak M, et al. Performance prediction through simulation of a hybrid MPI/OpenMP application [J]. Parallel Computing,2005,31(10-12): 1013-33.
    [118]Liu Y, Liang Z, Yang ZQ. A novel FDTD approach featuring two-level parallelization on PC cluster [J]. Progress in Electromagnetics Research-Pier,2008, 80(393-408.
    [119]Wenhua Y, Yongjun L, Tao S, et al. A robust parallel conformal finite-difference time-domain processing package using the MPI library [J]. Antennas and Propagation Magazine, IEEE,2005,47(3):39-59.
    [120]Yu W, Mittra R, Su T, et al. Parallel Finite-Difference Time-Domain Method (Artech House Electromagnetic Analysis) [M]. Artech House, Inc. Norwood, MA, USA,2006.
    [121]Marck JR, Mehalic MA, A.J.. T. A dedicated VLSI architecture for finite-difference time-domain calculations; proceedings of the Proceedings of The 8th Annual Review of Progress in Applied Computational Electromagnetics, Naval Postgraduate School, Monterey, CA,,1992 [C].
    [122]Schneider RN, Turner LE, Okoniewski MM. Application of FPGA technology to accelerate the finite-difference time-domain (FDTD) method; proceedings of the FPGA'02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays New York, NY, USA,2002 [C]. ACM.
    [123]Schneider RN, Okoniewski MM, Turner LE. Custom hardware implementation of the finite-difference time-domain (FDTD) method; proceedings of the Microwave Symposium Digest,2002 IEEE MTT-S International,2002,2002 [C].
    [124]Schneider RN, Okoniewski MM, Turner LE. Finite-difference time-domain method in custom hardware? [J]. Microwave and Wireless Components Letters, IEEE,2002,12(12):488-90.
    [125]Matsuoka S, Kawaguchi H. FPGA implementation of the FDTD data flow machine; proceedings of the Wireless Communication Technology,2003 IEEE Topical Conference on, Honolulu, Hawaii,15-17 Oct.2003,2003 [C]. IEEE.
    [126]Durbano JP, Ortiz FE, Humphrey JR, et al. Hardware implementation of a three-dimensional finite-difference time-domain algorithm [J]. Antennas and Wireless Propagation Letters, IEEE,2003,2(54-7.
    [127]Durbano JP, Ortiz FE. FPGA-based acceleration of the 3D finite-difference time-domain method; proceedings of the Field-Programmable Custom Computing Machines,2004 FCCM 2004 12th Annual IEEE Symposium on, Napa, CA, United states,20-23 April 2004,2004 [C]. IEEE.
    [128]Kawaguchi H, Fujita Y, Fujishima Y, et al. Improved Architecture of FDTD/FIT Dedicated Computer for Higher Performance Computation [J]. Magnetics, IEEE Transactions on,2008,44(6):1226-9.
    [129]Fujita Y, Kawaguchi H. Full-Custom PCB Implementation of the FDTD/FIT Dedicated Computer [J]. Magnetics, IEEE Transactions on,2009,45(3):1100-3.
    [130]Krakiwsky SE, Turner LE, Okoniewski MM. Graphics processor unit (GPU) acceleration of finite-difference time-domain (FDTD) algorithm; proceedings of the Circuits and Systems,2004 ISCAS'04 Proceedings of the 2004 International Symposium on, Sheraton Vancouver Wall Centre Hotel, Vancouver, Canada, 23-26 May 2004,2004 [C]. IEEE.
    [131]Krakiwsky SE, Turner LE, Okoniewski MM. Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU);proceedings of the Microwave Symposium Digest,2004 IEEE MTT-S International, Fort Worth, TX, United states,6-11 June 2004,2004 [C]. IEEE.
    [132]Baron GS, Sarris CD, Fiume E. Fast and accurate time-domain simulations with commodity graphics hardware; proceedings of the Antennas and Propagation Society International Symposium,2005 IEEE, Washington, DC, United states, 3-8 July 2005,2005 [C]. IEEE.
    [133]Sypek P, Mrozowski M. Optimization of a FDTD code for graphical processing units; proceedings of the Microwaves, Radar and Wireless Communications,2008 MIKON 2008 17th International Conference on, Wroclaw, Poland,19-21 May 2008,2008 [C]. IEEE.
    [134]Valcarce A, De La Roche G, Juttner A, et al. Applying FDTD to the Coverage Prediction of WiMAX Femtocells [J]. Eurasip Journal on Wireless Communications and Networking,2009,2009 (1):1-13.
    [135]史光国.半导体发光二极管及固体照明[M].北京:科学出版社,2007.
    [136]Cho CY, Kang SE, Kim KS, et al. Enhanced light extraction in light-emitting diodes with photonic crystal structure selectively grown on p-GaN [J]. Applied Physics Letters,2010,96(18):181110.