Comparative analysis of de novo transcriptome assembly
详细信息    查看全文
  • 作者:Kaitlin Clarke (1)
    Yi Yang (1) (2)
    Ronald Marsh (2)
    LingLin Xie (3)
    Ke K. Zhang (1)
  • 关键词:transcriptome assembly ; next ; generation sequencing ; RNA ; Seq ; De Bruijn graph ; overlap graph
  • 刊名:Science China Life Sciences
  • 出版年:2013
  • 出版时间:February 2013
  • 年:2013
  • 卷:56
  • 期:2
  • 页码:156-162
  • 全文大小:665KB
  • 参考文献:1. Martin J A, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet, 2011, 12: 671鈥?82 CrossRef
    2. Schliesky S, Gowik U, Weber A P, et al. RNA-Seq assembly-are we there yet? Front Plant Sci, 2012, 3: 220 CrossRef
    3. Li Z, et al. Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief Funct Genom, 2012, 11: 25鈥?7 CrossRef
    4. Myers E W. Toward simplifying and accurately formulating fragment assembly. J Comput Biol, 1995, 2: 275鈥?90 CrossRef
    5. Chevreux B, Pfisterer T, Drescher B, et al. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res, 2004, 14: 1147鈥?159 CrossRef
    6. Mullikin J C, Ning Z. The phusion assembler. Genome Res, 2003, 13: 81鈥?0 CrossRef
    7. Margulies M, Egholm M, Altman W E, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005, 437: 376鈥?80
    8. Idury R M, Waterman M S. A new algorithm for DNA sequence assembly. J Comput Biol, 1995, 2: 291鈥?06 CrossRef
    9. Simpson J T, Wong K, Jackman S D, et al. ABySS: a parallel assembler for short read sequence data. Genome Res, 2009, 19: 1117鈥?123 CrossRef
    10. Zerbino D R, Birney E. Velvet: algorithms for / de novo short read assembly using de Bruijn graphs. Genome Res, 2008, 18: 821鈥?29 CrossRef
    11. Grabherr M G, Haas B J, Yassour M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol, 2011, 29: 644鈥?52 CrossRef
    12. Schulz M H, Zerbino D R, Vingron M, et al. Oases: robust / de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 2012, 28: 1086鈥?092 CrossRef
    13. Hsu F, Kent W J, Clawson H, et al. The UCSC known genes. Bioinformatics, 2006, 22: 1036鈥?046 CrossRef
    14. Jiang L, Schlesinger F, Davis C A, et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res, 2011, 21: 1543鈥?551 CrossRef
    15. Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009, 10: R25 CrossRef
    16. Au K F, Jiang H, Lin L, et al. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res, 2010, 38: 4570鈥?578 CrossRef
    17. Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25: 1105鈥?111 CrossRef
    18. Trapnell C, Roberts A, Goff L, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Prot, 2012, 7: 562鈥?78 CrossRef
    19. Lander E S, Linton L M, Birren B, et al. Initial sequencing and analysis of the human genome. Nature, 2001, 409: 860鈥?21 CrossRef
    20. External RNA Controls Consortium. Proposed methods for testing and selecting the ERCC external RNA controls. BMC genomics, 2005, 6: 150 CrossRef
  • 作者单位:Kaitlin Clarke (1)
    Yi Yang (1) (2)
    Ronald Marsh (2)
    LingLin Xie (3)
    Ke K. Zhang (1)

    1. Bioinformatics Core, Department of Pathology, University of North Dakota, Grand Forks, ND, 58202, USA
    2. Department of Computer Science, University of North Dakota, Grand Forks, ND, 58202, USA
    3. Department of Biochemistry and Molecular Biology, University of North Dakota, Grand Forks, ND, 58202, USA
文摘
The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis. A fast algorithm, de Bruijn graph has been successfully used for genome DNA de novo assembly; nevertheless, its performance for transcriptome assembly is unclear. In this study, we used both simulated and real RNA-Seq data, from either artificial RNA templates or human transcripts, to evaluate five de novo assemblers, ABySS, Mira, Trinity, Velvet and Oases. Of these assemblers, ABySS, Trinity, Velvet and Oases are all based on de Bruijn graph, and Mira uses an overlap graph algorithm. Various numbers of RNA short reads were selected from the External RNA Control Consortium (ERCC) data and human chromosome 22. A number of statistics were then calculated for the resulting contigs from each assembler. Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate. Trinity had relative good performance for both ERCC and human data, but it may not consistently generate full length transcripts. ABySS was the fastest method but its assembly quality was low. Mira gave a good rate for mapping its contigs onto human chromosome 22, but its computational speed is not satisfactory. Our results suggest that transcript assembly remains a challenge problem for bioinformatics society. Therefore, a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700