Hierarchical task mapping for parallel applications on supercomputers
详细信息    查看全文
  • 作者:Jingjin Wu ; Xuanxing Xiong ; Zhiling Lan
  • 关键词:Topology ; aware task mapping ; Hierarchical task mapping ; Parallel applications ; Communication optimization ; Fat ; tree network ; Torus network
  • 刊名:The Journal of Supercomputing
  • 出版年:2015
  • 出版时间:May 2015
  • 年:2015
  • 卷:71
  • 期:5
  • 页码:1776-1802
  • 全文大小:2,345 KB
  • 参考文献:1.LibTopoMap: A Generic Topology Mapping Library. http://?www.?unixer.?de/?research/?mpitopo/?libtopomap/-/span>
    2.MPI: A message-passing interface standard. version 3.0. http://?www.?mpi-forum.?org/-/span>
    3.ALCF Intrepid. https://?www.?alcf.?anl.?gov/?intrepid
    4.METIS: Graph Partitioning Tool. http://?glaros.?dtc.?umn.?edu/?gkhome/?views/?metis
    5.NICS Kraken User Guide. https://?www.?xsede.?org/?web/?guest/?nics-kraken
    6.TACC Stampede User Guide. http://?www.?tacc.?utexas.?edu/?user-services/?user-guides/?stampede-user-guide
    7.Top 500 Supercomputer Sites. http://?www.?top500.?org/-/span>
    8.TOPOMap. http://?bluesky.?cs.?iit.?edu/?topomap/-/span>
    9.Abts D (2011) The Cray XT4 and Seastar 3-D Torus interconnect. Encycl Parallel Comput 4:470-77.
    10.Agarwal T, Sharma A, Laxmikant A, Kale LV (2006) Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS)
    11.Aleliunas R, Rosenberg AL (1982) On embedding rectangular grids in square grids. IEEE Trans Comput C-31(9):907-13. doi:10.-109/?TC.-982.-676109
    12.Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurogr Assoc (Computer Graphics Forum) 8(1):3-2View Article
    13.Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium, pp 349-57.
    14.Arimilli B, Arimilli R, Chung V, Clark S, Denzel W, Drerup B, Hoefler T, Joyner J, Lewis J, Li J, Ni N, Rajamony R (2010) The PERCS high-performance interconnect. In: Proceedings of the 18th IEEE symposium on high performance interconnects, pp 75-2
    15.Berman F, Snyder L (1987) On mapping parallel algorithms into parallel architectures. J Parallel Distrib Comput 4(5):439-58View Article
    16.Bhatele A (2010) Automating topology aware mapping for supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, Urbana.
    17.Bhatele A, Kale LV (2008) Application-specific topology-aware mapping for three dimensional topologies. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 1-
    18.Bokhari SH (1981) On the mapping problem. IEEE Trans Comput 30(3):207-14View Article MathSciNet
    19.Broquedis F, Clet-Ortega J, Moreaud S, Furmento N, Goglin B, Mercier G, Thibault S, Namyst R (2010) hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th euromicro international conference on parallel, distributed and network-based processing (PDP), pp 180-86. doi:10.-109/?PDP.-010.-7
    20.Chockalingam T, Arunkumar S (1992) A randomized heuristics for the mapping problem: the genetic approach. Parallel Comput 18(10):1157-165View Article MATH
    21.Chung IH, Lee CR, Zhou J, Chung YC (2011) Hierarchical mapping for HPC applications. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW), pp 1815-823
    22.Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 24th National Conference, pp 157-72.
    23.Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1-5MathSciNet
    24.Deveci M, Rajamanickam S, Leung VJ, Pedretti K, Olivier SL, Bunde DP, ?atalyürek UV, Devine K (2014) Exploiting geometric partitioning in task mapping for parallel computers. In: Proceedings of the 2014 IEEE 28th international parallel and distributed processing symposium, pp 27-6
    25.Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proceedings of the third conference on hypercube concurrent computers and applications: architecture, software, computer systems, and general issues, vol 1, no C3P, pp 210-21
    26.Hoefler T, Snir M (2011) Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the international conference on supercomputing (ICS), pp 75-4
    27.Jeannot E, Mercier G (2010) Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: Proceedings of the 16th International euro-par conference on parallel processing: part II, pp 199-10
    28.Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96-29View Article MathSciNet
    29.Kravtsov AV, Klypin AA, Khokhlov AM (1997) Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations. Astrophys J Suppl Ser 111:73-4View Article
    30.Lee C, Bic L (1989) On the mapping problem using simulated annealing. In: Proceedings of international phoenix co
  • 作者单位:Jingjin Wu (1)
    Xuanxing Xiong (2)
    Zhiling Lan (3)

    1. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
    2. Design Group, Synopsys, Inc., Mountain View, CA, 94043, USA
    3. Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616, USA
  • 刊物类别:Computer Science
  • 刊物主题:Programming Languages, Compilers and Interpreters
    Processor Architectures
    Computer Science, general
  • 出版者:Springer Netherlands
  • ISSN:1573-0484
文摘
As the scale of supercomputers grows, so does the size of the interconnect network. Topology-aware task mapping, which maps parallel application processes onto processors to reduce communication cost, becomes increasingly important. Previous works mainly focus on the task mapping between compute nodes (i.e., inter-node mapping), while ignoring the mapping within a node (i.e., intra-node mapping). In this paper, we propose a hierarchical task mapping strategy, which performs both inter-node and intra-node mapping. We consider supercomputers with popular fat-tree and torus network topologies, and introduce two mapping algorithms: (1) a generic recursive tree mapping algorithm, which can handle both inter-node mapping and intra-node mapping; (2) a recursive bipartitioning mapping algorithm for torus topology, which efficiently partitions the compute nodes according to their coordinates. Moreover, a hierarchical task mapping library is developed. Experimental results show that the proposed approach significantly improves the communication performance by up to 77?% with low runtime overhead.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700