GAM-NGS: genomic assemblies merger for next generation sequencing
详细信息    查看全文
  • 作者:Riccardo Vicedomini (1) (2)
    Francesco Vezzi (3)
    Simone Scalabrin (2)
    Lars Arvestad (3) (4)
    Alberto Policriti (1) (2)
  • 刊名:BMC Bioinformatics
  • 出版年:2013
  • 出版时间:April 2013
  • 年:2013
  • 卷:14
  • 期:7-supp
  • 全文大小:931KB
  • 参考文献:1. Mardis ER: The impact of next-generation sequencing technology on genetics. [http://www.ncbi.nlm.nih.gov/pubmed/18262675] / Trends in genetics: TIG 2008,24(3):133鈥?1. CrossRef
    2. Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Q: The sequence and de novo assembly of the giant panda genome. / Nature 2009, 463:311鈥?17. January CrossRef
    3. Dalloul Ra, Long Ja, Zimin AV, Aslam L, Beal K, Ann Blomberg L, Bouffard P, Burt DW, Crasta O, Crooijmans RPMa, Cooper K, Coulombe Ra, De S, Delany ME, Dodgson JB, Dong JJ, Evans C, Frederickson KM, Flicek P, Florea L, Folkerts O, Groenen MaM, Harkins TT, Herrero J, Hoffmann S, Megens HJ, Jiang A, de Jong P, Kaiser P, Kim H, Kim KW, Kim S, Langenberger D, Lee MK, Lee T, Mane S, Marcais G, Marz M, McElroy AP, Modise T, Nefedov M, Notredame C, Paton IR, Payne WS, Pertea G, Prickett D, Puiu D, Qioa D, Raineri E, Ruffier M, Salzberg SL, Schatz MC, Scheuring C, Schmidt CJ, Schroeder S, Searle SMJ, Smith EJ, Smith J, Sonstegard TS, Stadler PF, Tafer H, Tu ZJ, Van Tassell CP, Vilella AJ, Williams KP, Yorke Ja, Zhang L, Zhang HB, Zhang X, Zhang Y, Reed KM: Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis. / PLoS Biology 2010,8(9):e1000475. CrossRef
    4. Nagarajan N, Pop M: Parametric complexity of sequence assembly: Theory and applications to next generation sequencing. / Journal of Computational Biology 2009, 16:897鈥?08. CrossRef
    5. Nowrousian M, Stajich J, Chu M, Engh I, Espagne E, Halliday K, Kamerewerd J, Kempken F, Knab B, Kuo HC, Osiewacz HD, Poggeler S, Read ND, Seiler S, Smith KM, Zickler D, Kuck U, Freitag M: De novo Assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis. [http://dx.plos.org/10.1371/journal.pgen.1000891] / PLoS Genet 2010.
    6. Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J: De novo assembly of human genomes with massively parallel short read sequencing. [http://genome.cshlp.org/content/20/2/265.full] / Genome 2010.
    7. Alkan C, Sajjadian S, Eichler E: Limitations of next-generation genome sequence assembly. / Nature methods 2010, 8:61鈥?5. CrossRef
    8. Birney E: Assemblies: the good, the bad, the ugly. / Nature methods 2011, 8:59鈥?0. CrossRef
    9. Earl Da, Bradnam K, St John J, Darling a, Lin D, Faas J, Yu HOK, Vince B, Zerbino DR, Diekhans M, Nguyen N, Nuwantha P, Sung aWK, Ning Z, Haimel M, Simpson JT, Fronseca Na, Birol I, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelly DR, Phillippy aM, Koren S, Yang SP, Wu W, Chou WC, Srivastava a, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, L Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman Ja, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green R, Haussler D, Korf I, Paten B: Assemblathon 1: A competitive assessment of de novo short read assembly methods. [http://genome.cshlp.org/cgi/doi/10.1101/gr.126599.111] / Genome Research 2011.
    10. Salzberg SL, Phillippy aM, Zimin aV, Puiu D, Magoc T, Koren S, Treangen T, Schatz MC, Delcher aL, Roberts M, Marcais G, Pop M, Yorke Ja: GAGE: A critical evaluation of genome assemblies and assembly algorithms. [http://genome.cshlp.org/cgi/doi/10.1101/gr.131383.111] / Genome Research 2012.
    11. Zimin AV, Smith DR, Sutton G, Yorke Ja: Assembly reconciliation. / Bioinformatics (Oxford, England) 2008, 24:42鈥?. CrossRef
    12. Casagrande A, Del Fabbro C, Scalabrin S, Policriti A: GAM: Genomic Assemblies Merger: A Graph Based Method to Integrate Different Assemblies. [http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5341771] / 2009 IEEE International Conference on Bioinformatics and Biomedicine 2009, 321鈥?26. CrossRef
    13. Yao G, Ye L, Gao H, Minx P, Warren WC, Weinstock GM: Graph accordance of next-generation sequence assemblies. [http://bioinformatics.oxfordjournals.org/content/early/2011/10/23/bioinformatics.btr588.abstract] / Bioinformatics 2012.
    14. ZORRO [http://lge.ibi.unicamp.br/zorro/]
    15. Nijkamp J, Winterbach W, van den Broek M, Daran JM, Reinders M, de Ridder D: Integrating genome assemblies with MAIA. / Bioinformatics (Oxford, England) 2010,26(18):i433-i439. CrossRef
    16. Cattonaro F, Policriti A, Vezzi F: [http://ieeexplore.ieee.org/xpl/freeabs\_all.jsp?arnumber=5706540] / Enhanced reference guided assembly. IEEE; 2010.
    17. Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. / Bioinformatics 2009,25(14):1754鈥?760. CrossRef
    18. ERNE (Extended Randomized Numerical alignEr) [http://erne.sourceforge.net/]
    19. Vezzi F, Narzisi G, Mishra B: Feature-by-Feature -- Evaluating De Novo Sequence Assembly. / PLoS ONE 2012,7(2):e31002. CrossRef
    20. Google's SparseHash library [http://code.google.com/p/sparsehash/]
    21. Tarjan R: Depth-First Search and Linear Graph Algorithms. [http://epubs.siam.org/doi/abs/10.1137/0201010] / SIAM Journal on Computing 1972,1(2):146鈥?60. CrossRef
    22. Phillippy AM, Schatz MC, Pop M: Genome assembly forensics: finding the elusive mis-assembly. / Genome biology 2008,9(3):R55. CrossRef
    23. Ukkonen E: Algorithms for approximate string matching. / Information and Control 1985,64(1鈥?):100鈥?18. [International Conference on Foundations of Computation Theory] CrossRef
    24. GAGE [http://gage.cbcb.umd.edu]
    25. Assemblathon 2 [http://assemblathon.org/]
    26. Vezzi F, Narzisi G, Mishra B: Reevaluating Assembly Evaluations with Feature Response Curves: GAGE and Assemblathons. / PLoS ONE 2012,7(12):e52210. CrossRef
    27. CLC de novo assembler [http://www.clcdenovo.com/]
    28. Simpson J, Wong K, Jackman S, Schein J: ABySS: A parallel assembler for short read sequence data. [http://genome.cshlp.org/content/19/6/1117.short] / Genome 2009, 1117鈥?123.
  • 作者单位:Riccardo Vicedomini (1) (2)
    Francesco Vezzi (3)
    Simone Scalabrin (2)
    Lars Arvestad (3) (4)
    Alberto Policriti (1) (2)

    1. Department of Mathematics and Computer Science, University of Udine, 33100, Udine, Italy
    2. IGA, Institute of Applied Genomics, 33100, Udine, Italy
    3. KTH Royal Institute of Technology, Science for Life Laboratory, School of Computer Science and Communication, 17121, Solna, Sweden
    4. Swedish e-Science Research Centre, Dept. of Computer Science and Numerical Analysis, Stockholm University, 17121, Solna, Sweden
  • ISSN:1471-2105
文摘
Background In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions. Results GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools. Conclusions The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700