VERSE: a novel approach to detect virus integration in host genomes through reference genome customization
详细信息    查看全文
  • 作者:Qingguo Wang (1)
    Peilin Jia (1) (2)
    Zhongming Zhao (1) (2) (3) (4)

    1. Department of Biomedical Informatics
    ; Vanderbilt University School of Medicine ; Nashville ; TN ; 37203 ; USA
    2. Center for Quantitative Sciences
    ; Vanderbilt University Medical Center ; Nashville ; TN ; 37232 ; USA
    3. Department of Psychiatry
    ; Vanderbilt University School of Medicine ; Nashville ; TN ; 37232 ; USA
    4. Department of Cancer Biology
    ; Vanderbilt University School of Medicine ; Nashville ; TN ; 37232 ; USA
  • 刊名:Genome Medicine
  • 出版年:2015
  • 出版时间:December 2015
  • 年:2015
  • 卷:7
  • 期:1
  • 全文大小:1,509 KB
  • 参考文献:1. Parkin DM. The global health burden of infection-associated cancers in the year 2002. Int J Cancer. 2006;118:3030鈥?4. CrossRef
    2. De Martel C, Ferlay J, Franceschi S, Vignat J, Bray F, Forman D, et al. Global burden of cancers attributable to infections in 2008: a review and synthetic analysis. Lancet Oncol. 2012;13:607鈥?5. CrossRef
    3. Dunne Jr WM, Westblade LF, Ford B. Next-generation and whole-genome sequencing in the diagnostic clinical microbiology laboratory. Eur J Clin Microbiol Infect Dis. 2012;31:1719鈥?6. CrossRef
    4. Chiu CY. Viral pathogen discovery. Curr Opin Microbiol. 2013;16:468鈥?8. CrossRef
    5. Firth C, Lipkin WI. The genomics of emerging pathogens. Annu Rev Genomics Hum Genet. 2013;14:281鈥?00. CrossRef
    6. Wylie KM, Mihindukulasuriya KA, Sodergren E, Weinstock GM, Storch GA. Sequence analysis of the human virome in febrile and afebrile children. PLoS One. 2012;7:e27735. CrossRef
    7. Sung W-K, Zheng H, Li S, Chen R, Liu X, Li Y, et al. Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma. Nat Genet. 2012;44:765鈥?. CrossRef
    8. Jiang Z, Jhunjhunwala S, Liu J, Haverty PM, Kennemer MI, Guan Y, et al. The effects of hepatitis B virus integration into the genomes of hepatocellular carcinoma patients. Genome Res. 2012;22:593鈥?01. CrossRef
    9. Akagi K, Li J, Broutian TR, Padilla-Nash H, Xiao W, Jiang B, et al. Genome-wide analysis of HPV integration in human cancers reveals recurrent, focal genomic instability. Genome Res. 2014;24:185鈥?9. CrossRef
    10. Tang K-W, Alaei-Mahabadi B, Samuelsson T, Lindh M, Larsson E. The landscape of viral expression and host gene fusion and adaptation in human cancer. Nat Commun. 2013;4:2513.
    11. Lau C-C, Sun T, Ching AKK, He M, Li J-W, Wong AM, et al. Viral-human chimeric transcript predisposes risk to liver cancer development and progression. Cancer Cell. 2014;25:335鈥?9. CrossRef
    12. Sanju谩n R, Nebot MR, Chirico N, Mansky LM, Belshaw R. Viral mutation rates. J Virol. 2010;84:9733鈥?8. CrossRef
    13. Drake JW, Holland JJ. Mutation rates among RNA viruses. Proc Natl Acad Sci USA. 1999;96:13910鈥?. CrossRef
    14. Kostic AD, Ojesina AI, Pedamallu CS, Jung J, Verhaak RGW, Getz G, et al. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat Biotechnol. 2011;29:393鈥?. CrossRef
    15. Bhaduri A, Qu K, Lee CS, Ungewickell A, Khavari PA. Rapid identification of non-human sequences in high-throughput sequencing datasets. Bioinformatics. 2012;28:1174鈥?. CrossRef
    16. Naeem R, Rashid M, Pain A. READSCAN: a fast and scalable pathogen discovery program with accurate genome relative abundance estimation. Bioinformatics. 2013;29:391鈥?. CrossRef
    17. Borozan I, Wilson S, Blanchette P, Laflamme P, Watt SN, Krzyzanowski PM, et al. CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes. BMC Bioinformatics. 2012;13:206. CrossRef
    18. Naccache SN, Federman S, Veeeraraghavan N, Zaharia M, Lee D, Samayoa E, et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014;24:1180鈥?2. CrossRef
    19. Francis OE, Bendall M, Manimaran S, Hong C, Clement NL, Castro-Nallar E, et al. Pathoscope: species identification and strain attribution with unassembled sequencing data. Genome Res. 2013;23:1721鈥?. CrossRef
    20. Xu G, Strong MJ, Lacey MR, Baribault C, Flemington EK, Taylor CM. RNA CoMPASS: a dual approach for pathogen and host transcriptome analysis of RNA-seq datasets. PLoS One. 2014;9:e89445. CrossRef
    21. Zhao G, Krishnamurthy S, Cai Z, Popov VL, da Rosa APT, Guzman H, et al. Identification of novel viruses using VirusHunter - an automated data analysis pipeline. PLoS One. 2013;8:e78470. CrossRef
    22. McElroy K, Zagordi O, Bull R, Luciani F, Beerenwinkel N. Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias. BMC Genomics. 2013;14:501. CrossRef
    23. Macalalad AR, Zody MC, Charlebois P, Lennon NJ, Newman RM, Malboeuf CM, et al. Highly sensitive and specific detection of rare variants in mixed viral populations from massively parallel sequence data. PLoS Comput Biol. 2012;8:e1002417. CrossRef
    24. Yang X, Charlebois P, Macalalad A, Henn MR, Zody MC. V-Phaser 2: variant inference for viral populations. BMC Genomics. 2013;14:674. CrossRef
    25. Routh A, Johnson JE. Discovery of functional genomic motifs in viruses with ViReMa - a Virus Recombination Mapper - for analysis of next-generation sequencing data. Nucleic Acids Res. 2014;42:e11. CrossRef
    26. Chen Y, Yao H, Thompson EJ, Tannir NM, Weinstein JN, Su X. VirusSeq: software to identify viruses and their integration sites using next-generation sequencing of human cancer tissue. Bioinformatics. 2013;29:266鈥?. CrossRef
    27. Li J-W, Wan R, Yu C-S, Co NN, Wong N, Chan T-F. ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution. Bioinformatics. 2013;29:649鈥?1. CrossRef
    28. Wang Q, Jia P, Zhao Z. VirusFinder: software for efficient and accurate detection of viruses and their integration sites in host genomes through next generation sequencing data. PLoS One. 2013;8:e64465. CrossRef
    29. Schelhorn S-E, Fischer M, Tolosi L, Altm眉ller J, N眉rnberg P, Pfister H, et al. Sensitive detection of viral transcripts in human tumor transcriptomes. PLoS Comput Biol. 2013;9:e1003228. CrossRef
    30. VirusFinder. http://bioinfo.mc.vanderbilt.edu/VirusFinder/.
    31. Duncavage EJ, Magrini V, Becker N, Armstrong JR, Demeter RT, Wylie T, et al. Hybrid capture and next-generation sequencing identify viral integration sites from formalin-fixed, paraffin-embedded tissue. J Mol Diagn. 2011;13:325鈥?3. CrossRef
    32. Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods. 2009;6:677鈥?1. CrossRef
    33. Abel HJ, Duncavage EJ, Becker N, Armstrong JR, Magrini VJ, Pfeifer JD. SLOPE: a quick and accurate method for locating non-SNP structural variation from targeted next-generation sequence data. Bioinformatics. 2010;26:2684鈥?. CrossRef
    34. Hu X, Yuan J, Shi Y, Lu J, Liu B, Li Z, et al. pIRS: Profile-based Illumina pair-end reads simulator. Bioinformatics. 2012;28:1533鈥?. CrossRef
    35. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357鈥?. CrossRef
    36. Otto TD, Sanders M, Berriman M, Newbold C. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics. 2010;26:1704鈥?. CrossRef
    37. ACCRE. http://www.accre.vanderbilt.edu/.
    38. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589鈥?5. CrossRef
    39. Zeitouni B, Boeva V, Janoueix-Lerosey I, Loeillet S, Legoix-n茅 P, Nicolas A, et al. SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data. Bioinformatics. 2010;26:1895鈥?. CrossRef
    40. Wang J, Mullighan CG, Easton J, Roberts S, Heatley SL, Ma J, et al. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods. 2011;8:652鈥?. CrossRef
    41. Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinformatics. 2013;14:506鈥?9. CrossRef
    42. Wang Q, Jia P, Li F, Chen H, Ji H, Hucks D, et al. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Med. 2013;5:91. CrossRef
  • 刊物主题:Human Genetics; Proteomics; Bioinformatics; Internal Medicine;
  • 出版者:BioMed Central
  • ISSN:1756-994X
文摘
Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700