WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads
详细信息    查看全文
  • 作者:Murray Patterson (20) (21)
    Tobias Marschall (20)
    Nadia Pisanti (22) (23)
    Leo van Iersel (20)
    Leen Stougie (20) (24)
    Gunnar W. Klau (20) (24)
    Alexander Sch?nhuth (20)
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2014
  • 出版时间:2014
  • 年:2014
  • 卷:8394
  • 期:1
  • 页码:237-249
  • 全文大小:262 KB
  • 参考文献:1. Aguiar, D., Istrail, S.: Hapcompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data. J. of Comp. Biol.?19(6), 577-90 (2012) CrossRef
    2. Aguiar, D., Istrail, S.: Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics?360, i352–i360 (2013)
    3. Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics?24(16), i153–i159 (2008)
    4. Bansal, V., et al.: An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research?18(8), 1336-346 (2008) CrossRef
    5. Boomsma, D.I., et al.: The Genome of the Netherlands: design, and project goals. European Journal of Human Genetics (2013), doi:10.1038/ejhg.2013.118
    6. Chen, Z.Z., Deng, F., Wang, L.: Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics?29(16), 1938-945 (2013) CrossRef
    7. Cilibrasi, R., van Iersel, L., Kelk, S., Tromp, J.: On the complexity of several haplotyping problems. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol.?3692, pp. 128-39. Springer, Heidelberg (2005) CrossRef
    8. Delaneau, O., Howie, B., Cox, A., Zagury, J., Marchini, J.: Haplotype estimation using sequencing reads. Am. J. of Human Genetics?93(4), 687-96 (2013) CrossRef
    9. Deng, F., Cui, W., Wang, L.: A highly accurate heuristic algorithm for the haplotype assembly problem. BMC Genomics?14(suppl. 2), S2 (2013)
    10. Earl, D.A., et al.: Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Research (2011), doi:10.1101/gr.126599.111
    11. Fouilhoux, P., Mahjoub, A.: Solving VLSI design and DNA sequencing problems using bipartization of graphs. Comp. Optim. and Appl.?51(2), 749-81 (2012) CrossRef
    12. Greenberg, H., Hart, W., Lancia, G.: Opportunities for combinatorial optimization in computational biology. Informs J. on Computing?16(3), 211-31 (2004) CrossRef
    13. Hartl, D., Clark, A.: Principles of Population Genetics. Sinauer Associates, Inc., Sunderland (2007)
    14. He, D., Eskin, E.: Hap-seqX: expedite algorithm for haplotype phasing with imputataion using sequence data. Gene.?518(1), 2- (2013) CrossRef
    15. He, D., Han, B., Eskin, E.: Hap-seq: an optimal algorithm for haplotype phasing with imputation using sequencing data. J. Comp. Biol.?20(2), 80-2 (2013) CrossRef
    16. He, D., et al.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics?26(12), i183–i190 (2010)
    17. Howie, B., Donnelly, P., Marchini, J.: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics?5(6), e1000529 (2009)
    18. Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: SNPs problems, complexity and algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol.?2161, pp. 182-93. Springer, Heidelberg (2001) CrossRef
    19. Levy, S., et al.: The diploid genome sequence of an individual human. PLoS Bio. (2007), doi:10.1371/journal.pbio.0050254
    20. Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv (1303.3997) (2013)
    21. Li, Y., et al.: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol.?34, 816-34 (2010) CrossRef
    22. Lippert, R., et al.: Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings in Bioinformatics?3(1), 23-1 (2002) CrossRef
    23. Menelaou, A., Marchini, J.: Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold. Bioinformatics?29(1), 84-1 (2013) CrossRef
    24. Mossige, S.: An algorithm for Gray codes. Computing?18, 89-2 (1977) CrossRef
    25. Panconesi, A., Sozio, M.: Fast hare: A fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol.?3240, pp. 266-77. Springer, Heidelberg (2004) CrossRef
    26. Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics?78, 629-44 (2006) CrossRef
    27. Selvaraj, S., et al.: Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nature Biotechnology?31, 1111-118 (2013) CrossRef
    28. The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature?467(7319), 1061-073 (2010)
    29. The International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature?449, 851-61 (2007)
    30. The International HapMap Consortium: Integrating common and rare genetic variation in diverse human populations. Nature?467, 52-8 (2010)
    31. Wang, R.S., Wu, L.Y., Li, Z.P., Zhang, X.S.: Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics?21(10), 2456-462 (2005) CrossRef
    32. Yang, W.Y., Hormozdiari, F., Wang, Z., He, D., Pasaniuc, B., Eskin, E.: Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data. Bioinformatics?29(18), 2245-252 (2013) CrossRef
    33. Zhang, Y.: A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing. Bioinformatics?29(7), 878-85 (2013) CrossRef
    34. Zhao, Y.T., Wu, L.Y., Zhang, J.H., Wang, R.S., Zhang, X.S.: Haplotype assembly from aligned weighted SNP fragments. Computational Biology and Chemistry?29, 281-87 (2005) CrossRef
  • 作者单位:Murray Patterson (20) (21)
    Tobias Marschall (20)
    Nadia Pisanti (22) (23)
    Leo van Iersel (20)
    Leen Stougie (20) (24)
    Gunnar W. Klau (20) (24)
    Alexander Sch?nhuth (20)

    20. Life Sciences, CWI, Amsterdam, The Netherlands
    21. LBBE, CNRS, Université de Lyon 1, Villeurbanne, France
    22. Department of Computer Science, University of Pisa, Italy
    23. LIACS, Leiden University, The Netherlands
    24. VU University Amsterdam, The Netherlands
  • ISSN:1611-3349
文摘
The human genome is diploid, that is each of its chromosomes comes in two copies. This requires to phase the single nucleotide polymorphisms (SNPs), that is, to assign them to the two copies, beyond just detecting them. The resulting haplotypes, lists of SNPs belonging to each copy, are crucial for downstream analyses in population genetics. Currently, statistical approaches, which avoid making use of direct read information, constitute the state-of-the-art. Haplotype assembly, which addresses phasing directly from sequencing reads, suffers from the fact that sequencing reads of the current generation are too short to serve the purposes of genome-wide phasing. Future sequencing technologies, however, bear the promise to generate reads of lengths and error rates that allow to bridge all SNP positions in the genome at sufficient amounts of SNPs per read. Existing haplotype assembly approaches, however, profit precisely, in terms of computational complexity, from the limited length of current-generation reads, because their runtime is usually exponential in the number of SNPs per sequencing read. This implies that such approaches will not be able to exploit the benefits of long enough, future-generation reads. Here, we suggest WhatsHap, a novel dynamic programming approach to haplotype assembly. It is the first approach that yields provably optimal solutions to the weighted minimum error correction (wMEC) problem in runtime linear in the number of SNPs per sequencing read, making it suitable for future-generation reads. WhatsHap is a fixed parameter tractable (FPT) approach with coverage as the parameter. We demonstrate that WhatsHap can handle datasets of coverage up to 20x, processing chromosomes on standard workstations in only 1-2 hours. Our simulation study shows that the quality of haplotypes assembled by WhatsHap significantly improves with increasing read length, both in terms of genome coverage as well as in terms of switch errors. The switch error rates we achieve in our simulations are superior to those obtained by state-of-the-art statistical phasers.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700