Effective Barrier Synchronization on Intel Xeon Phi Coprocessor
详细信息    查看全文
  • 关键词:Barrier synchronization ; Scalability ; Algorithms ; Many ; core architectures ; Intel Xeon Phi
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2015
  • 出版时间:2015
  • 年:2015
  • 卷:9233
  • 期:1
  • 页码:588-600
  • 全文大小:4,294 KB
  • 参考文献:1.Agarwal, A., Cherian, M.: Adaptive backoff synchronization techniques. In: Proceedings of the of the International Symposium on Computer Architecture, pp. 396-06 (1989)
    2.Brooks III, E.D.: The butterfly barrier. Int. J. Parallel Program. 15(4), 295-07 (1986)View Article MATH
    3.Bull, J.M.: Measuring synchronisation and scheduling overheads in OpenMP. In: Proceedings of the First European Workshop on OpenMP, pp. 99-05 (1999)
    4.Caballero, D., Duran, A., Martorell, X.: An OpenMP barrier usingSIMD instructions for Intel\(^{\textregistered }\) Xeon Phi\(^{\rm TM}\) coprocessor. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 99-13. Springer, Heidelberg (2013)
    5.Cownie, J.: Fastest possible barrier (Intel developer zone forum discussion) (2013). http://?software.?intel.?com/?en-us/?forums/?topic/-92587 . Last accessed 1-Jun-2015
    6.Dolbeau, R.: Address selection for efficient barriers on the Intel Xeon Phi (2013). http://?www.?dolbeau.?name/?dolbeau/?publications/?barrierphi.?pdf . Last accessed 1 Jun 2015
    7.Grunwald, D., Vajracharya, S.: Efficient barriers for distributed shared memory computers. In: Proceedings of International Parallel Processing Symposium, pp. 604-08 (1994)
    8.Hensgen, D., Finkel, R., Manber, U.: Two algorithms for barrier synchronization. Int. J. Parallel Program. 17(1), 1-7 (1988)View Article MATH
    9.Hoefler, T., Mehlan, T., Mietke, F., Rehm, W.: Fast barrier synchronization for InfiniBand. In: 20th International Parallel and Distributed Processing Symposium, p. 7 (2006)
    10.Intel Xeon Phi coprocessor system software developers guide (2014). https://?software.?intel.?com/?sites/?default/?files/?managed/-9/-7/?xeon-phi-coprocessor-system-software-developers-guide.?pdf . Last accessed 1 Jun 2015
    11.Krishnaiyer, R., Kultursay, E., Chawla, P., Preis, S., Zvezdin, A., Saito, H.: Compiler-based data prefetching and streaming non-temporal store generation for the Intel Xeon Phi coprocessor. In: Workshop on Multithreaded Architectures and Applications published as 27th IEEE IPDPSW, pp. 1575-586 (2013)
    12.Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21-5 (1991)View Article
    13.NAS parallel benchmarks. http://?www.?nas.?nasa.?gov/?publications/?npb.?html . Last accessed 1 Jun 2015
    14.Ramos, S., Hoefler, T.: Modeling communication in cache-coherent smp systems: A case-study with Xeon Phi. In: High-Performance Parallel and Distributed Computing 2013, pp. 97-08 (2013)
    15.Sartori, J., Kumar, R.: Low-overhead, high-speed multi-core barrier synchronization. In: Proceedings of the 5th International Conference on High Performance and Embedded Architecture and Compilation, pp. 18-4 (2010)
    16.Seo, S., Jo, G., Lee, J.: Performance characterization of the NAS parallel benchmarks in OpenCL. In: 2011 IEEE International Symposium on Workload Characterization, pp. 137-48 (2011)
    17.Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phasers: A unified deadlock-free construct for collective and point-to-point synchronization. In: Proceedings of the 22nd International Conference on Supercomputing, pp. 277-88 (2008)
    18.Yew, P.C., Tzeng, N.F., Lawrie, D.H.: Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. C-6(4), 388-95 (1987)
  • 作者单位:Andrey Rodchenko (16)
    Andy Nisbet (16)
    Antoniu Pop (16)
    Mikel Luján (16)

    16. School of Computer Science, The University of Manchester, Manchester, UK
  • 丛书名:Euro-Par 2015: Parallel Processing
  • ISBN:978-3-662-48096-0
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
Barriers are a fundamental synchronization primitive, underpinning the parallel execution models of many modern shared-memory parallel programming languages such as OpenMP, OpenCL or Cilk, and are one of the main challenges to scaling. State-of-the-art barrier synchronization algorithms differ in tradeoffs between critical path length, communication traffic patterns and memory footprint. In this paper, we evaluate the efficiency of five such algorithms on the Intel Xeon Phi coprocessor. In addition, we present a novel hybrid barrier implementation that exploits the topology, the memory hierarchy and streaming stores of the Xeon Phi architecture to achieve a 3\(\times \) lower overhead than the Intel OpenMP barrier implementation (ICC 14.0.0), thus outperforming, to the best of our knowledge, all other implementations, and which we evaluate on the CG and MG kernels from the NAS Parallel Benchmarks, the direct N-body simulation kernel and the EPCC barrier OpenMP microbenchmark. The optimized barriers presented in the paper are available at https://?github.?com/?arodchen/?cbarriers released as free software.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700