VTBuilder: a tool for the assembly of multi isoform transcriptomes
详细信息    查看全文
  • 作者:John Archer ; Gareth Whiteley ; Nicholas R Casewell ; Robert A Harrison…
  • 关键词:Transcriptomics ; de novo ; Contigs ; Next generation sequencing ; Software ; Java ; Chimeras ; Haplotypes ; Non ; chimeric ; Transcripts
  • 刊名:BMC Bioinformatics
  • 出版年:2014
  • 出版时间:December 2014
  • 年:2014
  • 卷:15
  • 期:1
  • 全文大小:793 KB
  • 参考文献:1. Bennett, S (2004) Solexa Ltd. Pharmacogenomics 5: pp. 433-438 CrossRef
    2. Droege, M, Hill, B (2008) The genome Sequencer FLX system–longer reads, more applications, straight forward bioinformatics and more complete data sets. J Biotechnol 136: pp. 3-10 CrossRef
    3. Durban, J, Juarez, P, Angulo, Y, Lomonte, B, Flores-Diaz, M, Alape-Giron, A, Sasa, M, Sanz, L, Gutierrez, JM, Dopazo, J, Conesa, A, Calvete, JJ (2011) Profiling the venom gland transcriptomes of Costa Rican snakes by 454 pyrosequencing. BMC Genomics 12: pp. 259 CrossRef
    4. Rokyta, DR, Wray, KP, Margres, MJ (2013) The genesis of an exceptionally lethal venom in the timber rattlesnake (Crotalus horridus) revealed through comparative venom-gland transcriptomics. BMC Genomics 14: pp. 394 CrossRef
    5. Grabherr, MG, Haas, BJ, Yassour, M, Levin, JZ, Thompson, DA, Amit, I, Adiconis, X, Fan, L, Raychowdhury, R, Zeng, Q, Chen, Z, Mauceli, E, Hacohen, N, Gnirke, A, Rhind, N, Palma, F, Birren, BW, Nusbaum, C, Lindblad-Toh, K, Friedman, N, Regev, A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29: pp. 644-652 CrossRef
    6. Margulies, M, Egholm, M, Altman, WE, Attiya, S, Bader, JS, Bemben, LA, Berka, J, Braverman, MS, Chen, YJ, Chen, Z, Dewell, SB, Du, L, Fierro, JM, Gomes, XV, Godwin, BC, He, W, Helgesen, S, Ho, CH, Irzyk, GP, Jando, SC, Alenquer, ML, Jarvie, TP, Jirage, KB, Kim, JB, Knight, JR, Lanza, JR, Leamon, JH, Lefkowitz, SM, Lei, M, Li, J (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: pp. 376-380
    7. Simpson, JT, Wong, K, Jackman, SD, Schein, JE, Jones, SJ, Birol, I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19: pp. 1117-1123 CrossRef
    8. Zerbino, DR, Birney, E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18: pp. 821-829 CrossRef
    9. Zhang, T, Luo, Y, Chen, Y, Li, X, Yu, J (2012) BIGrat: a repeat resolver for pyrosequencing-based re-sequencing with Newbler. BMC Res Notes 5: pp. 567 CrossRef
    10. Katz, Y, Wang, ET, Airoldi, EM, Burge, CB (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7: pp. 1009-1015 CrossRef
    11. Li, JJ, Jiang, CR, Brown, JB, Huang, H, Bickel, PJ (2011) Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proc Natl Acad Sci U S A 108: pp. 19867-19872 CrossRef
    12. Li, W, Jiang, T (2012) Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads. Bioinformatics 28: pp. 2914-2921 CrossRef
    13. Trapnell, C, Williams, BA, Pertea, G, Mortazavi, A, Kwan, G, Baren, MJ, Salzberg, SL, Wold, BJ, Pachter, L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28: pp. 511-515 CrossRef
    14. Kalra, S, Puniya, BL, Kulshreshtha, D, Kumar, S, Kaur, J, Ramachandran, S, Singh, K (2013) De novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant chlorophytum borivilianum. PLoS One 8: pp. e83336 CrossRef
    15. Eriksson, N, Pachter, L, Mitsuya, Y, Rhee, SY, Wang, C, Gharizadeh, B, Ronaghi, M, Shafer, RW, Beerenwinkel, N (2008) Viral population estimation using pyrosequencing. PLoS Comput Biol 4: p
  • 刊物主题:Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms;
  • 出版者:BioMed Central
  • ISSN:1471-2105
文摘
Background Within many research areas, such as transcriptomics, the millions of short DNA fragments (reads) produced by current sequencing platforms need to be assembled into transcript sequences before they can be utilized. Despite recent advances in assembly software, creating such transcripts from read data harboring isoform variation remains challenging. This is because current approaches fail to identify all variants present or they create chimeric transcripts within which relationships between co-evolving sites and other evolutionary factors are disrupted. We present VTBuilder, a tool for constructing non-chimeric transcripts from read data that has been sequenced from sources containing isoform complexity. Results We validated VTBuilder using reads simulated from 54 Sanger sequenced transcripts (SSTs) expressed in the venom gland of the saw scaled viper, Echis ocellatus. The SSTs were selected to represent genes from major co-expressed toxin groups known to harbor isoform variants. From the simulated reads, VTBuilder constructed 55 transcripts, 50 of which had a greater than 99% sequence similarity to 48 of the SSTs. In contrast, using the popular assembler tool Trinity (r2013-02-25), only 14 transcripts were constructed with a similar level of sequence identity to just 11 SSTs. Furthermore VTBuilder produced transcripts with a similar length distribution to the SSTs while those produced by Trinity were considerably shorter. To demonstrate that our approach can be scaled to real world data we assembled the venom gland transcriptome of the African puff adder Bitis arietans using paired-end reads sequenced on Illumina’s MiSeq platform. VTBuilder constructed 1481 transcripts from 5 million reads and, following annotation, all major toxin genes were recovered demonstrating reconstruction of complex underlying sequence and isoform diversity. Conclusion Unlike other approaches, VTBuilder strives to maintain the relationships between co-evolving sites within the constructed transcripts, and thus increases transcript utility for a wide range of research areas ranging from transcriptomics to phylogenetics and including the monitoring of drug resistant parasite populations. Additionally, improving the quality of transcripts assembled from read data will have an impact on future studies that query these data. VTBuilder has been implemented in java and is available, under the GPL GPU V0.3 license, from http:// http://www.lstmed.ac.uk/vtbuilder.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700