Utilizing Multiple Xeon Phi Coprocessors on One Compute Node

详细信息查看全文

作者：Xinnan Dong (25)
Jun Chai (25)
Jing Yang (25)
Mei Wen (25)
Nan Wu (25)
Xing Cai (26) (27)
Chunyuan Zhang (25)
Zhaoyun Chen (25)
刊名：Lecture Notes in Computer Science
出版年：2014
出版时间：2014
年：2014
卷：8631
期：1
页码：68-81
全文大小：1,160 KB
参考文献：1. Top500, China’s Tianhe-2 Supercomputer Takes No.1 Ranking on 41st TOP500 List, http://www.top500.org/blog/lists/2013/06/press-release/
2. Dongarra, J.: Visit to the National University for Defense Technology Changsha, http://www.netlib.org/utk/people/JackDongarra/PAPERS/tianhe-2-dongarra-report.pdf
3. Intel Corporation, Intel Xeon Phi Coprocessor Instruction Set Architecture Reference Manual. Reference number 327364-001 (2012)
4. Jeffers, J., Reinders, J.C.: Intel Xeon Phi Coprocessor High-Performance Programming. Morgan Kaufmann, Walthman (2013)
5. Intel MIC Architecture, http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner
6. Intel Corporation, Intel Xeon Phi System Software Developer’s Guide. Reference number 328207-001EN (2012)
7. Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R., Henry, G., Chrysos, G., Dubey, P.: Design and implementation of the Linpack benchmark for single and multi-node systems based on Intel Xeon Phi coprocessor. In: IPDPS (2013), doi:10.1109/IPDPS.2013.113
8. Si, M., Ishikawa, Y., Direct, M.P.I.: library for Intel Xeon Phi Co-Processors. In: 27th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), Boston, MA, USA (2013), doi:10.1109/IPDPSW.2013.179
9. MPICH: High-performance and Portable MPI, http://www.mpich.org/
10. OFS for Xeon Phi, https://www.openfabrics.org/images/docs/2013Dev_WorkshopnewlineMon_0422/2013_Workshop_Mon_1430_OpenFabrics_OFS_software_for_Xeon_Phi.pdf
11. Cadambi, S., Coviello, G., Li, C., Phull, R., Rao, K., Sankaradass, M., Chakradhar, S.: COSMIC: Middleware for high performance and reliable multiprocessing on Xeon Phi coprocessors. In: Proceedings of the 22nd Int’l Symposium on High-Performance Parallel and Distributed Computing, HPDC 2013 (2013), doi:10.1145/2462902.2462921
12. Dokulila, J., Bajrovica, E., Benknera, S., Pllanaa, S., Sandriesera, M., Bachmayerb, B.: High-level support for hybrid parallel execution of C++ applications targeting Intel Xeon Phi coprocessors. In: 2013 International Conference on Computational Science, ICCS 2013 (2013), doi:10.1016/j.procs.2013.05.430
13. Schulz, W., Ulerich, K., Malaya, R., Bauman, N., Stogner, T.P., Simmons, R., Early, C.: experiences porting scientific applications to the many integrated core (MIC) platform. In: TACC-Intel Highly Parallel Computing Symposium, Tech. Rep. (2012), doi:10.1145/2016741.2016764
14. Pennycook, J., Hughes, S., Smelyanskiy, J.C., Jarvis, M., Exploring, A.S.: SIMD for molecular dynamics, using Intel Xeon processors and Intel Xeon Phi coprocessors. In: IEEE Int’l Parallel & Distributed Processing Symposium (2013), doi:10.1109/IPDPS.2013.44
15. Rosales, C.: Porting to the Intel Xeon Phi: Opportunities and challenges. In: Extreme Scaling Workshop, XSCALE 2013 (2013)
16. Potluri, S., Bureddy, D., Hamidouche, K., Venkatesh, A., Kandalla, K., Subramoni, H., Panda, D.K.: MVAPICH-PRISM: A Proxy-based Communication Framework using InfiniBand and SCIF for Intel MIC Clusters. In: Int’l Conference on Supercomputing (2013)
17. Potluri, S., Venkatesh, A., Bureddy, D., Kandalla, K., Panda, K.: D., Efficient intra-node communication on Intel-MIC clusters. In: 13th IEEE Int’l Symposium on Cluster Computing and the Grid, CCGrid 2013 (2013), doi:10.1109/CCGrid.2013.86
18. The Heterogeneous Offload Model for Intel Many Integrated Core Architecture, http://software.intel.com/sites/default/files/article/326701/heterogeneous-programming-model.pdf
19. Intel Manycore Platform Software Stack (MPSS), http://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss#downloads
20. Intel Corporation, MIC COI API Reference Manual 0.65. Monday December 17 12:12:33 (2012)
21. Intel Corporation, MIC SCIF API Reference Manual 0.65 for User Mode Linux. Mon Dec17 12:05:03 (2012)
22. Chai, Jun, Hake, Johan, Wu, Nan, Wen, Mei, Cai, Xing, Lines, T., Glenn, Yang, Jing, Su, Huayou, Zhang, Chunyuan, Liao, Xiangke, S.: Towards simulation of subcellular calcium dynamics at nanometre resolution. International Journal of High Performance Computing Applications (2013)
作者单位：Xinnan Dong (25)
Jun Chai (25)
Jing Yang (25)
Mei Wen (25)
Nan Wu (25)
Xing Cai (26) (27)
Chunyuan Zhang (25)
Zhaoyun Chen (25)

25. School of Computer Science, National University of Defense Technology, Changsha, Hunan, 410073, China
26. Simula Research Laboratory, P.O.?Box 134, 1325, Lyakser, Norway
27. Department of Informatics, University of Oslo., P.O.?Box 1080, Blindern, 0316, Oslo, Norway
ISSN：1611-3349

文摘

Future exascale systems are expected to adopt compute nodes that incorporate many accelerators. This paper thus investigates the topic of programming multiple Xeon Phi coprocessors that lie inside one compute node. Besides a standard MPI-OpenMP programming approach, which belongs to the symmetric usage mode, two offload-mode programming approaches are considered. The first offload approach is conventional and uses compiler pragmas, whereas the second one is new and combines Intel’s APIs of coprocessor offload infrastructure (COI) and symmetric communication interface (SCIF) for low-latency communication. While the pragma-based approach allows simpler programming, the COI-SCIF approach has three advantages in (1) lower overhead associated with launching offloaded code, (2) higher data transfer bandwidths, and (3) more advanced asynchrony between computation and data movement. The low-level COI-SCIF approach is also shown to have benefits over the MPI-OpenMP counterpart. All the programming approaches are tested by a real-world 3D application, for which the COI-SCIF approach shows a performance upper hand on a Tianhe-2 compute node with three Xeon Phi coprocessors.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700