FFTs and Multiple Collective Communication on Multiprocessor-Node Architectures
详细信息    查看全文
  • 作者:Andreas Jocksch (1)
  • 关键词:FFT &#8211 ; all ; to ; all personalized communication &#8211 ; multiprocessor nodes
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2012
  • 出版时间:2012
  • 年:2012
  • 卷:7203
  • 期:1
  • 页码:163-172
  • 全文大小:222.1 KB
  • 参考文献:1. Adler, M., Byers, J.W., Karp, R.M.: Scheduling Parallel Communication: The h-Relation Problem. In: Wiedermann, J., H谩jek, P. (eds.) MFCS 1995. LNCS, vol. 969, pp. 1–20. Springer, Heidelberg (1995)
    2. Brass, A., Pawley, G.S.: Two and three dimensional FFTs on highly parallel computers. Parallel Comput. 3, 167–184 (1986)
    3. Bruck, J., Ho, C.T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE T. Parall. Distr. 8(11), 1143–1156 (1997)
    4. Chan, A., Balaji, P., Gropp, W., Thakur, R.: Communication Analysis of Parallel 3D FFT for Flat Cartesian Meshes on Large Blue Gene Systems. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2008. LNCS, vol. 5374, pp. 350–364. Springer, Heidelberg (2008)
    5. Fang, B., Deng, Y., Martyna, G.: Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer. Comput. Phys. Commun. 176, 531–538 (2007)
    6. Fraigniaud, P., Lazard, E.: Methods and problems of communication in usual networks. Discrete Appl. Math. 53, 79–133 (1994)
    7. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. P IEEE 93(2), 216–231 (2005)
    8. Goldman, A., Peters, J.G., Trystram, D.: Exchanging messages of different size. J. Parallel Distr. Com. 66, 1–18 (2006)
    9. Gupta, A., Kumar, V.: The scalability of FFT on parallel computers. IEEE T. Parall. Distr. 4(8), 922–932 (1993)
    10. Helman, D.R., Bader, D.A., J谩J谩, J.: Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract). In: SPAA 1996: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 211–222. ACM, New York (1996)
    11. Johnsson, S.L., Ho, C.T.: Optimum broadcasting and personalized communication in hypercubes. IEEE T. Comput. 38(9), 1249–1268 (1989)
    12. van Loan, C.: Computational Frameworks for the Fast Fourier Transfrom. SIAM, Philadelphia (1992)
    13. Sanders, P., Solis-Oba, R.: How helpers hasten h-relations. J. Algorithm. 41, 86–98 (2001)
    14. Sanders, P., Tr盲ff, J.L.: The Hierarchical Factor Algorithm for All-to-All Communication. In: Monien, B., Feldmann, R. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 799–803. Springer, Heidelberg (2002)
    15. Swarztrauber, P.N.: Multiprocessor FFTs. Parallel Comput. 5, 197–210 (1987)
    16. Takahashi, D.: An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-core Processors. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 606–614. Springer, Heidelberg (2010)
    17. Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. C. 19(1), 49–66 (2005)
    18. Tipparaju, V., Nieplocha, J., Panda, D.: Fast collective operations using shared and remote memory access protocols on clusters. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France (April 2003)
  • 作者单位:1. CSCS, Swiss National Supercomputing Centre, Galleria 2, Via Cantonale, 6928 Manno, Switzerland
  • ISSN:1611-3349
文摘
We consider FFTs for networks with multiprocessor nodes using 2D data decomposition. In this application, processors perform collective all-to-all communication in different groups independently at the same time. Thus the individual processors of the nodes might be involved in independent collective communication. The underlying communication algorithm should account for that fact. For short messages, we propose a sparse version of Bruck’s algorithm which handles such multiple collectives. The distribution of the FFT data to the nodes is discussed for the local and global application of Bruck’s original algorithm, as well as the suggested sparse version. The performance of the different approaches is compared.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700