Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference
详细信息    查看全文
  • 作者:Jun Chai ; Huayou Su ; Mei Wen ; Xing Cai ; Nan Wu…
  • 关键词:Bayesian inference ; CPU/GPU ; based heterogeneous ; Supercomputer ; Hybrid programming ; Resource ; efficient utilization
  • 刊名:The Journal of Supercomputing
  • 出版年:2013
  • 出版时间:October 2013
  • 年:2013
  • 卷:66
  • 期:1
  • 页码:364-380
  • 全文大小:887KB
  • 参考文献:1. Bader DA, Moret B, Vawter L (2001) Industrial applications of high-performance computing for phylogeny reconstruction. Proc SPIE 4528:159-68 CrossRef
    2. U.S. National Science Foundation (2004) Assembling the tree of life (ATOL): to construct a phylogeny for the 1.7 million described species of life. National Science Foundation, Program Solicitation, NSF 04-526
    3. Feng X, Cameron KW, Buell DA (2006) PBPI: a high performance implementation of Bayesian phylogenetic inference. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing. IEEE, New York
    4. Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310-314 CrossRef
    5. Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F (2004) Parallel metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20(3):407-15 CrossRef
    6. Feng X, Buell DA, Rose JR, Waddell PJ (2003) Parallel algorithms for Bayesian phylogenetic inference. J Parallel Distrib Comput 63(7-):707-18 CrossRef
    7. Feng X, Cameron KW, Sosa CP, Smith B (2007) Building the tree of life on terascale systems. In: Proceedings of the 21st international parallel and distributed processing symposium, Long Beach, CA, March 2007, pp 1-0.
    8. Van der Wath RC, Van der Wath E, Carapelli A, Nardi F, Frati F, Milanesi L, Li P (2008) Bayesian phylogeny on grid. Bioinf Res Dev 13:404-16 CrossRef
    9. Pfeiffer W, Stamatakis A s (2010) Hybrid parallelization of the MrBayes & RAxML phylogenetics codes. http://sco.h-its.org/exelixis/Phylo100225.pdf
    10. Zhou J, Wang G, Liu X (2010) A new hybrid parallel algorithm for MrBayes. In: Proceedings ICA3PP 2010. LNCS, vol 6081, pp 102-12
    11. Pratas F, Trancoso P, Stamatakis A, Sousa L (2009) Fine-grain parallelism using multi-core, cell/BE, and GPU systems: accelerating the phylogenetic likelihood function. In: Proceedings ICPP 2009. IEEE Computer Society, Los Alamitos, pp 9-7
    12. Zhou J, Liu X, Stones DS, Xie Q, Wang G (2011) MrBayes on a graphics processing unit. Bioinformatics 27(9):1255-261 CrossRef
    13. Suchard MA, Rambaut A (2009) Many-core algorithms for statistical phylogenetics. Bioinformatics 25(11):1370-376 CrossRef
    14. Ayres DL, Darling A, Zwickl DJ, Beerli P, Holder MT, Lewis PO, Huelsenbeck JP, Ronquist F, Swofford DL, Cummings MP, Rambaut A, Suchard MA (2012) BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Syst Biol 61(1):170-73 CrossRef
    15. Ronquist F, Teslenko M, Van der Mark P, Ayres DL, Darling A, H?hna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539-42 CrossRef
    16. Yang XJ, Liao XK, Lu K, Hu QF, Song JQ, Su JS (2011) The TianHe-1A supercomputer: its hardware and software. J Comput Sci Technol 26(3):344-51 CrossRef
    17. Top500. http://www.top500.org/lists/2010/11
    18. Jaguar. http://en.wikipedia.org/wiki/Jaguar_(supercomputer)
    19. Sun NH, Xing J, Huo ZG, Huo ZG, Tan GM, Xiong J, Li B, Ma C (2010) Dawning Nebulae: a?PetaFLOPS supercomputer with a heterogeneous structure. J Comput Sci Technol 26(3):352-62 CrossRef
    20. Shimokawabe T, Aoki T, Takaki A, Yamanaka A, Nukada T, Endo N, Maruyama S, Matsuoka S (2011) Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer. In: Proceedings of the 2011 ACM/IEEE conference on supercomputing, Seattle, WA. IEEE, New York
    21. Huelsenbeck JP, Ronquist F (2005) Bayesian analysis of molecular evolution using MrBayes. In: Statistical methods in molecular evolution. Springer, Berlin, pp 183-26 CrossRef
    22. MrBayes version3.1.2. http://sourceforge.net/projects/mrbayes/files/mrbayes/3.1.2/
    23. NVIDIA CUDA C programming guide version 4.1. http://developer.nvidia.com/cuda-downloads
    24. Optimal MrBayes (version 1.0) user manual. https://sourceforge.net/projects/optimal-mrbayes/
  • 作者单位:Jun Chai (1)
    Huayou Su (1)
    Mei Wen (1)
    Xing Cai (2) (3)
    Nan Wu (1)
    Chunyuan Zhang (1)

    1. School of Computer, National University of Defense Technology, 410073, Changsha, P.R. China
    2. Simula Research Laboratory, P.O. Box 134, 1325, Lysaker, Norway
    3. Department of Informatics, University of Oslo, P.O. Box 1080, Blindern, 0316, Oslo, Norway
  • ISSN:1573-0484
文摘
Bayesian inference is one of the most important methods for estimating phylogenetic trees in bioinformatics. Due to the potentially huge computational requirements, several parallel algorithms of Bayesian inference have been implemented to run on CPU-based clusters, multicore CPUs, or small clusters of CPUs and GPUs. To the best of our knowledge, however, none of the existing methods is able to simultaneously and fully utilize both CPUs and GPUs for the computations, leaving idle either the CPU part or the GPU part of modern heterogeneous supercomputers. Aiming at an optimized utilization of heterogeneous computing resources, which is a promising hardware architecture for future bioinformatics applications, we present a new hybrid parallel algorithm and implementation of Bayesian phylogenetic inference, which combines MPI, OpenMP, and CUDA programming. The novelty of our algorithm, denoted as oMC3, is its ability of using CPU cores simultaneously with GPUs for the computations, while ensuring a fair work division between the two types of hardware components. We have implemented oMC3 based on MrBayes, which is one of the most popular software packages for Bayesian phylogenetic inference. Numerical experiments show that oMC3 obtains 2.5× speedup over nMC3, which is a cutting-edge GPU implementation of MrBayes, on a single server consisting of two GPUs and sixteen CPU cores. Moreover, oMC3 scales nicely when 128 GPUs and 1536 CPU cores are in use.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700