Barnacle: detecting and characterizing tandem duplications and fusions in transcriptome assemblies
详细信息    查看全文
  • 作者:Lucas Swanson (1) (2)
    Gordon Robertson (1)
    Karen L Mungall (1)
    Yaron S Butterfield (1)
    Readman Chiu (1)
    Richard D Corbett (1)
    T Roderick Docking (1)
    Donna Hogge (3)
    Shaun D Jackman (1)
    Richard A Moore (1)
    Andrew J Mungall (1)
    Ka Ming Nip (1)
    Jeremy DK Parker (1)
    Jenny Qing Qian (1)
    Anthony Raymond (1)
    Sandy Sung (1)
    Angela Tam (1)
    Nina Thiessen (1)
    Richard Varhol (1)
    Sherry Wang (1)
    Deniz Yorukoglu (1) (2) (5)
    YongJun Zhao (1)
    Pamela A Hoodless (3) (4)
    S Cenk Sahinalp (2)
    Aly Karsan (1)
    Inanc Birol (1) (2) (4)
  • 关键词:Transcriptome assembly ; Chimeric transcripts ; Fusion ; Partial tandem duplication ; PTD ; Internal tandem duplication ; ITD ; RNA ; seq ; Transcriptome
  • 刊名:BMC Genomics
  • 出版年:2013
  • 出版时间:December 2013
  • 年:2013
  • 卷:14
  • 期:1
  • 全文大小:370KB
  • 参考文献:1. Gingeras TR: Implications of chimaeric non-co-linear transcripts. / Nat Geosci 2009,461(7261):206鈥?11.
    2. Melnick A, Licht JD: Deconstructing a disease: RARalpha, its fusion partners, and their roles in the pathogenesis of acute promyelocytic leukemia. / Blood 1999,93(10):3167鈥?215.
    3. Basecke J, Whelan JT, Griesinger F, Bertrand FE: The MLL partial tandem duplication in acute myeloid leukaemia. / Br J Haematol 2006,135(4):438鈥?49. CrossRef
    4. Zheng R, Small D: Mutant FLT3 signaling contributes to a block in myeloid differentiation. / Leuk Lymphoma 2005,46(12):1679鈥?687. CrossRef
    5. Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO: Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. / PLoS One 2012,7(2):e30733. CrossRef
    6. Al-Balool HH, Weber D, Liu Y, Wade M, Guleria K, Nam PL, Clayton J, Rowe W, Coxhead J, Irving J, Elliott DJ, Hall AG, Santibanez-Koref M, Jackson MS: Post-transcriptional exon shuffling events in humans can be evolutionarily conserved and abundant. / Genome Res 2011,21(11):1788鈥?799. CrossRef
    7. Horiuchi T, Giniger E, Aigaki T: Alternative trans-splicing of constant and variable exons of a Drosophila axon guidance gene, lola. / Genes Dev 2003,17(20):2496鈥?501. CrossRef
    8. Krause M, Hirsh D: A trans-spliced leader sequence on actin mRNA in C. elegans. / Cell 1987,49(6):753鈥?61. CrossRef
    9. Sutton RE, Boothroyd JC: Evidence for trans splicing in trypanosomes. / Cell 1986,47(4):527鈥?35. CrossRef
    10. Tessier LH, Keller M, Chan RL, Fournier R, Weil JH, Imbault P: Short leader sequences may be transferred from small RNAs to pre-mature mRNAs by trans-splicing in Euglena. / EMBO J 1991,10(9):2621鈥?625.
    11. Hirano M, Noda T: Genomic organization of the mouse Msh4 gene producing bicistronic, chimeric and antisense mRNA. / Gene 2004,342(1):165鈥?77. CrossRef
    12. Caudevilla C, Serra D, Miliar A, Codony C, Asins G, Bach M, Hegardt FG: Natural trans-splicing in carnitine octanoyltransferase pre-mRNAs in rat liver. / Proc Natl Acad Sci USA 1998,95(21):12185鈥?2190. CrossRef
    13. Frantz SA, Thiara AS, Lodwick D, Ng LL, Eperon IC, Samani NJ: Exon repetition in mRNA. / Proc Natl Acad Sci USA 1999,96(10):5400鈥?405. CrossRef
    14. Kannan K, Wang L, Wang J, Ittmann MM, Li W, Yen L: Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing. / Proc Natl Acad Sci USA 2011,108(22):9172鈥?177. CrossRef
    15. Rickman DS, Pflueger D, Moss B, VanDoren VE, Chen CX, de la Taille A, Kuefer R, Tewari AK, Setlur SR, Demichelis F, Rubin MA: SLC45A3-ELK4 is a novel and frequent erythroblast transformation-specific fusion transcript in prostate cancer. / Cancer Res 2009,69(7):2734鈥?738. CrossRef
    16. Song J, Mercer D, Hu X, Liu H, Li MM: Common leukemia- and lymphoma-associated genetic aberrations in healthy individuals. / J Mol Diagn 2011,13(2):213鈥?19. CrossRef
    17. Li H, Wang J, Mor G, Sklar J: A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells. / Science 2008,321(5894):1357鈥?361. CrossRef
    18. Schnittger S, Bacher U, Haferlach C, Alpermann T, Kern W, Haferlach T: Diversity of the juxtamembrane and TKD1 mutations (exons 13鈥?5) in the FLT3 gene with regards to mutant load, sequence, length, localization, and correlation with biological data. / Genes Chromosomes Cancer 2012,51(10):910鈥?24. CrossRef
    19. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. / Nat Biotechnol 2011,29(7):644鈥?52. CrossRef
    20. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu AL, Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJ, Hoodless PA, Birol I: De novo assembly and analysis of RNA-seq data. / Nat Methods 2010,7(11):909鈥?12. CrossRef
    21. Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. / Bioinformatics 2012,28(8):1086鈥?092. CrossRef
    22. Abyzov A, Gerstein M: AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision. / Bioinformatics 2011,27(5):595鈥?03. CrossRef
    23. Rausch T, Zichner T, Schlattl A, Stutz AM, Benes V, Korbel JO: DELLY: structural variant discovery by integrated paired-end and split-read analysis. / Bioinformatics 2012,28(18):i333-i339. CrossRef
    24. Kim D, Salzberg SL: TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. / Genome Biol 2011,12(8):R72. CrossRef
    25. McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, Sun MG, Griffith M, Heravi Moussavi A, Senz J, Melnyk N, Pacheco M, Marra MA, Hirst M, Nielsen TO, Sahinalp SC, Huntsman D, Shah SP: deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. / PLoS Comput Biol 2011,7(5):e1001138. CrossRef
    26. Sboner A, Habegger L, Pflueger D, Terry S, Chen DZ, Rozowsky JS, Tewari AK, Kitabayashi N, Moss BJ, Chee MS, Demichelis F, Rubin MA, Gerstein MB: FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data. / Genome Biol 2010,11(10):R104. CrossRef
    27. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. / Nucleic Acids Res 2010,38(18):e178. CrossRef
    28. Yorukoglu D, Hach F, Swanson L, Collins CC, Birol I, Sahinalp SC: Dissect: detection and characterization of novel structural alterations in transcribed sequences. / Bioinformatics 2012,28(12):i179-i187. CrossRef
    29. Kent WJ: BLAT鈥搕he BLAST-like alignment tool. / Genome Res 2002,12(4):656鈥?64.
    30. Smit AFA: / RepeatMasker Documentation. http://www.animalgenome.org/bioinfo/resources/manuals/RepeatMasker.html
    31. Wu TD, Watanabe CK: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. / Bioinformatics 2005,21(9):1859鈥?875. CrossRef
    32. Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. / Bioinformatics 2009,25(14):1754鈥?760. CrossRef
    33. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: a parallel assembler for short read sequence data. / Genome Res 2009,19(6):1117鈥?123. CrossRef
    34. Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE: Segmental duplications: organization and impact within the current human genome project assembly. / Genome Res 2001,11(6):1005鈥?017. CrossRef
    35. Butterfield Y: / JAGuaR. http://www.bcgsc.ca/platform/bioinfo/software/jaguar
    36. Edgren H, Murumagi A, Kangaspeska S, Nicorici D, Hongisto V, Kleivi K, Rye IH, Nyberg S, Wolf M, Borresen-Dale AL, Kallioniemi O: Identification of fusion genes in breast cancer by paired-end RNA-sequencing. / Genome Biol 2011,12(1):R6. CrossRef
    37. Kangaspeska S, Hultsch S, Edgren H, Nicorici D, Murumagi A, Kallioniemi O: Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms. / PLoS One 2012,7(10):e48745. CrossRef
    38. The Cancer Genome Atlas Research Network: Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. / N Engl J Med 2013,368(22):2059鈥?074. CrossRef
    39. Krzyzosiak WJ, Sobczak K, Wojciechowska M, Fiszer A, Mykowska A, Kozlowski P: Triplet repeat RNA structure and its role as pathogenic agent and therapeutic target. / Nucleic Acids Res 2012,40(1):11鈥?6. CrossRef
    40. Houseley J, Tollervey D: Apparent non-canonical trans-splicing is generated by reverse transcriptase in vitro. / PLoS One 2010,5(8):e12271. CrossRef
    41. Homer N: / Whole Genome Simulation. http://sourceforge.net/apps/mediawiki/dnaa/index.php?title=Whole_Genome_Simulation
    42. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. / Bioinformatics 2009,25(16):2078鈥?079. CrossRef
    43. Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D: The UCSC Known Genes. / Bioinformatics 2006,22(9):1036鈥?046. CrossRef
    44. Smit AFA, Hubley R, Green P: / RepeatMasker Open-3.0. 1996鈥?010. http://www.repeatmasker.org
    45. Benson G: Tandem repeats finder: a program to analyze DNA sequences. / Nucleic Acids Res 1999,27(2):573鈥?80. CrossRef
    46. Swanson L: / Barnacle. http://www.bcgsc.ca/platform/bioinfo/software/barnacle
  • 作者单位:Lucas Swanson (1) (2)
    Gordon Robertson (1)
    Karen L Mungall (1)
    Yaron S Butterfield (1)
    Readman Chiu (1)
    Richard D Corbett (1)
    T Roderick Docking (1)
    Donna Hogge (3)
    Shaun D Jackman (1)
    Richard A Moore (1)
    Andrew J Mungall (1)
    Ka Ming Nip (1)
    Jeremy DK Parker (1)
    Jenny Qing Qian (1)
    Anthony Raymond (1)
    Sandy Sung (1)
    Angela Tam (1)
    Nina Thiessen (1)
    Richard Varhol (1)
    Sherry Wang (1)
    Deniz Yorukoglu (1) (2) (5)
    YongJun Zhao (1)
    Pamela A Hoodless (3) (4)
    S Cenk Sahinalp (2)
    Aly Karsan (1)
    Inanc Birol (1) (2) (4)

    1. Canada鈥檚 Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada
    2. School of Computing Science, Simon Fraser University, Burnaby, Canada
    3. Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, Canada
    5. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA
    4. Department of Medical Genetics, University of British Columbia, Vancouver, Canada
文摘
Background Chimeric transcripts, including partial and internal tandem duplications (PTDs, ITDs) and gene fusions, are important in the detection, prognosis, and treatment of human cancers. Results We describe Barnacle, a production-grade analysis tool that detects such chimeras in de novo assemblies of RNA-seq data, and supports prioritizing them for review and validation by reporting the relative coverage of co-occurring chimeric and wild-type transcripts. We demonstrate applications in large-scale disease studies, by identifying PTDs in MLL, ITDs in FLT3, and reciprocal fusions between PML and RARA, in two deeply sequenced acute myeloid leukemia (AML) RNA-seq datasets. Conclusions Our analyses of real and simulated data sets show that, with appropriate filter settings, Barnacle makes highly specific predictions for three types of chimeric transcripts that are important in a range of cancers: PTDs, ITDs, and fusions. High specificity makes manual review and validation efficient, which is necessary in large-scale disease studies. Characterizing an extended range of chimera types will help generate insights into progression, treatment, and outcomes for complex diseases.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.