Hierarchical task mapping for parallel applications on supercomputers

详细信息查看全文

作者：Jingjin Wu ; Xuanxing Xiong ; Zhiling Lan
关键词：Topology ; aware task mapping ; Hierarchical task mapping ; Parallel applications ; Communication optimization ; Fat ; tree network ; Torus network
刊名：The Journal of Supercomputing
出版年：2015
出版时间：May 2015
年：2015
卷：71
期：5
页码：1776-1802
全文大小：2,345 KB
参考文献：1.LibTopoMap: A Generic Topology Mapping Library. http://?www.?unixer.?de/?research/?mpitopo/?libtopomap/-/span>
2.MPI: A message-passing interface standard. version 3.0. http://?www.?mpi-forum.?org/-/span>
3.ALCF Intrepid. https://?www.?alcf.?anl.?gov/?intrepid
4.METIS: Graph Partitioning Tool. http://?glaros.?dtc.?umn.?edu/?gkhome/?views/?metis
5.NICS Kraken User Guide. https://?www.?xsede.?org/?web/?guest/?nics-kraken
6.TACC Stampede User Guide. http://?www.?tacc.?utexas.?edu/?user-services/?user-guides/?stampede-user-guide
7.Top 500 Supercomputer Sites. http://?www.?top500.?org/-/span>
8.TOPOMap. http://?bluesky.?cs.?iit.?edu/?topomap/-/span>
9.Abts D (2011) The Cray XT4 and Seastar 3-D Torus interconnect. Encycl Parallel Comput 4:470-77.
10.Agarwal T, Sharma A, Laxmikant A, Kale LV (2006) Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS)
11.Aleliunas R, Rosenberg AL (1982) On embedding rectangular grids in square grids. IEEE Trans Comput C-31(9):907-13. doi:10.-109/?TC.-982.-676109
12.Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurogr Assoc (Computer Graphics Forum) 8(1):3-2View Article
13.Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium, pp 349-57.
14.Arimilli B, Arimilli R, Chung V, Clark S, Denzel W, Drerup B, Hoefler T, Joyner J, Lewis J, Li J, Ni N, Rajamony R (2010) The PERCS high-performance interconnect. In: Proceedings of the 18th IEEE symposium on high performance interconnects, pp 75-2
15.Berman F, Snyder L (1987) On mapping parallel algorithms into parallel architectures. J Parallel Distrib Comput 4(5):439-58View Article
16.Bhatele A (2010) Automating topology aware mapping for supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, Urbana.
17.Bhatele A, Kale LV (2008) Application-specific topology-aware mapping for three dimensional topologies. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 1-
18.Bokhari SH (1981) On the mapping problem. IEEE Trans Comput 30(3):207-14View Article MathSciNet
19.Broquedis F, Clet-Ortega J, Moreaud S, Furmento N, Goglin B, Mercier G, Thibault S, Namyst R (2010) hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th euromicro international conference on parallel, distributed and network-based processing (PDP), pp 180-86. doi:10.-109/?PDP.-010.-7
20.Chockalingam T, Arunkumar S (1992) A randomized heuristics for the mapping problem: the genetic approach. Parallel Comput 18(10):1157-165View Article MATH
21.Chung IH, Lee CR, Zhou J, Chung YC (2011) Hierarchical mapping for HPC applications. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW), pp 1815-823
22.Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 24th National Conference, pp 157-72.
23.Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1-5MathSciNet
24.Deveci M, Rajamanickam S, Leung VJ, Pedretti K, Olivier SL, Bunde DP, ?atalyürek UV, Devine K (2014) Exploiting geometric partitioning in task mapping for parallel computers. In: Proceedings of the 2014 IEEE 28th international parallel and distributed processing symposium, pp 27-6
25.Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proceedings of the third conference on hypercube concurrent computers and applications: architecture, software, computer systems, and general issues, vol 1, no C3P, pp 210-21
26.Hoefler T, Snir M (2011) Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the international conference on supercomputing (ICS), pp 75-4
27.Jeannot E, Mercier G (2010) Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: Proceedings of the 16th International euro-par conference on parallel processing: part II, pp 199-10
28.Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96-29View Article MathSciNet
29.Kravtsov AV, Klypin AA, Khokhlov AM (1997) Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations. Astrophys J Suppl Ser 111:73-4View Article
30.Lee C, Bic L (1989) On the mapping problem using simulated annealing. In: Proceedings of international phoenix co
作者单位：Jingjin Wu (1)
Xuanxing Xiong (2)
Zhiling Lan (3)

1. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
2. Design Group, Synopsys, Inc., Mountain View, CA, 94043, USA
3. Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616, USA
刊物类别：Computer Science
刊物主题：Programming Languages, Compilers and Interpreters
Processor Architectures
Computer Science, general
出版者：Springer Netherlands
ISSN：1573-0484

文摘

As the scale of supercomputers grows, so does the size of the interconnect network. Topology-aware task mapping, which maps parallel application processes onto processors to reduce communication cost, becomes increasingly important. Previous works mainly focus on the task mapping between compute nodes (i.e., inter-node mapping), while ignoring the mapping within a node (i.e., intra-node mapping). In this paper, we propose a hierarchical task mapping strategy, which performs both inter-node and intra-node mapping. We consider supercomputers with popular fat-tree and torus network topologies, and introduce two mapping algorithms: (1) a generic recursive tree mapping algorithm, which can handle both inter-node mapping and intra-node mapping; (2) a recursive bipartitioning mapping algorithm for torus topology, which efficiently partitions the compute nodes according to their coordinates. Moreover, a hierarchical task mapping library is developed. Experimental results show that the proposed approach significantly improves the communication performance by up to 77?% with low runtime overhead.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700