参考文献:1. Liu, Y.J., Mukherjee, S., Nishimura, N., Schanz, M., Ye, W., Sutradhar, A., Pan, E., Dumont, N.A., Frangi, A., Saez, A.: Recent advances and emerging applications of the boundary element method. ASME Applied Mechanics Review聽64(5), 138 (2011) 2. I. Terrasse, R茅solution math茅matique et num茅rique des 茅quations de Maxwell instationnaires par une m茅thode de potentiels retard茅s, PhD dissertation, Ecole Polytechnique Palaiseau France (1993) 3. Abboud, T., Pallud, M., Teissedre, C.: SONATE: A Parallel Code for Acoustics Nonlinear oscillations and boundary-value problems for Hamiltonian systems, Technical report (1982), http://imacs.xtec.polytechnique.fr/Reports/sonate-parallel.pdf 4. Hu, F.Q.: An efficient solution of time domain boundary integral equations for acoustic scattering and its acceleration by Graphics Processing Units. In: 19th AIAA/CEAS Aeroacoustics Conference, ch. (2013), doi:10.2514/6.2013-2018 5. Langer, S., Schanz, M.: Time Domain Boundary Element Method. In: Marburg, S., Nolte (eds.) Computational Acoustics of Noise Propagation in Fluids - Finite and Boundary Element Methods, pp. 495鈥?16. Springer, Heidelberg (2008) 6. Takahashi, T.: A Time-domain BIEM for Wave Equation accelerated by Fast Multipole Method using Interpolation, pp. 191鈥?92 (2013), doi:10.1115/1.400549 7. Karakasis, V., Goumas, G., Koziris, N.: Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels. In: International Conference on Parallel Processing 2009, pp. 356鈥?64 (2009), doi:10.1109/ICPP.2009.21 8. Nishtala, R., Vuduc, R.W.: When Cache Blocking of Sparse Matrix Vector Multiply Works and Why. In: Proceedings of the PARA 2004 Workshop on the State-of-the-art in Scientific Computing (2004) 9. Toledo, S.: Improving the memory-system performance of sparse-matrix vector multiplication. IBM Journal of Research and Development聽41(6), 711鈥?25 (1997) CrossRef 10. Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. ACM (1999) 11. Yzelman, A.N., Bisseling, R.H.: Cache-Oblivious Sparse MatrixVector Multiplication by Using Sparse Matrix Partitioning Methods. SIAM Journal on Scientific Computing聽31(4), 3128鈥?154 (2009), doi:10.1137/080733243 CrossRef 12. Vuduc, R.W., Moon, H.-J.: Fast sparse matrix-vector multiplication by exploiting variable block structure. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds.) HPCC 2005. LNCS, vol.聽3726, pp. 807鈥?16. Springer, Heidelberg (2005) CrossRef 13. Goto, K., Advanced, T.: High-Performance Implementation of the Level-3 BLAS, 117 (2006) 14. Morton, G.M.: A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. International Business Machines Company (1966) 15. Amestoy, P.R., Duff, I.S., L鈥橢xcellent, J.-Y.: MUMPS MUltifrontal Massively Parallel Solver Version 2.0 (1998) 16. Snir, M., Otto, S., et al.: The MPI core, 2nd edn (1998) 17. OpenMP specifications, Version 3.1 (2011), http://www.openmp.org
16. Inria Bordeaux, Sud-Ouest, 33405, Talence, France 17. Airbus Group Innovations, Applied Mathematics and Simulation, Toulouse, France
ISSN:1611-3349
文摘
The problem of time-domain BEM for the wave equation in acoustics and electromagnetism can be expressed as a sparse linear system composed of multiple interaction/convolution matrices. It can be solved using sparse matrix-vector products which are inefficient to achieve high Flop-rate. In this paper we present a novel approach based on the re-ordering of the interaction matrices in slices. We end up with a custom multi-vectors/vector product operation and compute it using SIMD intrinsic functions. We take advantage of the new order of the computation to parallelize in shared and distributed memory. We demonstrate the performance of our system by studying the sequential Flop-rate and the parallel scalability, and provide results based on an industrial test-case with up to 32 nodes.