Time-Domain BEM for the Wave Equation: Optimization and Hybrid Parallelization

详细信息查看全文

作者：Berenger Bramas (16)
Olivier Coulaud (16)
Guillaume Sylvand (17)
关键词：Boundary element method (BEM) ; time domain ; sparse matrix ; vector product (SpMV) ; shared/distributed memory parallelization ; SIMD
刊名：Lecture Notes in Computer Science
出版年：2014
出版时间：2014
年：2014
卷：8632
期：1
页码：511-523
全文大小：992 KB
参考文献：1. Liu, Y.J., Mukherjee, S., Nishimura, N., Schanz, M., Ye, W., Sutradhar, A., Pan, E., Dumont, N.A., Frangi, A., Saez, A.: Recent advances and emerging applications of the boundary element method. ASME Applied Mechanics Review聽64(5), 138 (2011)
2. I. Terrasse, R茅solution math茅matique et num茅rique des 茅quations de Maxwell instationnaires par une m茅thode de potentiels retard茅s, PhD dissertation, Ecole Polytechnique Palaiseau France (1993)
3. Abboud, T., Pallud, M., Teissedre, C.: SONATE: A Parallel Code for Acoustics Nonlinear oscillations and boundary-value problems for Hamiltonian systems, Technical report (1982), http://imacs.xtec.polytechnique.fr/Reports/sonate-parallel.pdf
4. Hu, F.Q.: An efficient solution of time domain boundary integral equations for acoustic scattering and its acceleration by Graphics Processing Units. In: 19th AIAA/CEAS Aeroacoustics Conference, ch. (2013), doi:10.2514/6.2013-2018
5. Langer, S., Schanz, M.: Time Domain Boundary Element Method. In: Marburg, S., Nolte (eds.) Computational Acoustics of Noise Propagation in Fluids - Finite and Boundary Element Methods, pp. 495鈥?16. Springer, Heidelberg (2008)
6. Takahashi, T.: A Time-domain BIEM for Wave Equation accelerated by Fast Multipole Method using Interpolation, pp. 191鈥?92 (2013), doi:10.1115/1.400549
7. Karakasis, V., Goumas, G., Koziris, N.: Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels. In: International Conference on Parallel Processing 2009, pp. 356鈥?64 (2009), doi:10.1109/ICPP.2009.21
8. Nishtala, R., Vuduc, R.W.: When Cache Blocking of Sparse Matrix Vector Multiply Works and Why. In: Proceedings of the PARA 2004 Workshop on the State-of-the-art in Scientific Computing (2004)
9. Toledo, S.: Improving the memory-system performance of sparse-matrix vector multiplication. IBM Journal of Research and Development聽41(6), 711鈥?25 (1997) CrossRef
10. Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. ACM (1999)
11. Yzelman, A.N., Bisseling, R.H.: Cache-Oblivious Sparse MatrixVector Multiplication by Using Sparse Matrix Partitioning Methods. SIAM Journal on Scientific Computing聽31(4), 3128鈥?154 (2009), doi:10.1137/080733243 CrossRef
12. Vuduc, R.W., Moon, H.-J.: Fast sparse matrix-vector multiplication by exploiting variable block structure. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds.) HPCC 2005. LNCS, vol.聽3726, pp. 807鈥?16. Springer, Heidelberg (2005) CrossRef
13. Goto, K., Advanced, T.: High-Performance Implementation of the Level-3 BLAS, 117 (2006)
14. Morton, G.M.: A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. International Business Machines Company (1966)
15. Amestoy, P.R., Duff, I.S., L鈥橢xcellent, J.-Y.: MUMPS MUltifrontal Massively Parallel Solver Version 2.0 (1998)
16. Snir, M., Otto, S., et al.: The MPI core, 2nd edn (1998)
17. OpenMP specifications, Version 3.1 (2011), http://www.openmp.org
作者单位：Berenger Bramas (16)
Olivier Coulaud (16)
Guillaume Sylvand (17)

16. Inria Bordeaux, Sud-Ouest, 33405, Talence, France
17. Airbus Group Innovations, Applied Mathematics and Simulation, Toulouse, France
ISSN：1611-3349

文摘

The problem of time-domain BEM for the wave equation in acoustics and electromagnetism can be expressed as a sparse linear system composed of multiple interaction/convolution matrices. It can be solved using sparse matrix-vector products which are inefficient to achieve high Flop-rate. In this paper we present a novel approach based on the re-ordering of the interaction matrices in slices. We end up with a custom multi-vectors/vector product operation and compute it using SIMD intrinsic functions. We take advantage of the new order of the computation to parallelize in shared and distributed memory. We demonstrate the performance of our system by studying the sequential Flop-rate and the parallel scalability, and provide results based on an industrial test-case with up to 32 nodes.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700