Identification of indels in next-generation sequencing data
详细信息    查看全文
  • 作者:Aakrosh Ratan (1) (3)
    Thomas L Olson (2)
    Thomas P Loughran Jr (2)
    Webb Miller (1)

    1. Center for Comparative Genomics and Bioinformatics
    ; Pennsylvania State University ; 506 ; Wartik Laboratory ; University Park ; PA ; 16802 ; USA
    3. Department of Public Health Sciences and Center for Public Health Genomics
    ; University of Virginia ; Charlottesville ; VA ; 22908 ; USA
    2. Department of Medicine
    ; Hematology and Oncology ; and the University of Virginia Cancer Center ; University of Virginia ; Charlottesville ; VA ; 22908 ; USA
  • 关键词:Indels ; Variants ; Sequencing analysis
  • 刊名:BMC Bioinformatics
  • 出版年:2015
  • 出版时间:December 2015
  • 年:2015
  • 卷:16
  • 期:1
  • 全文大小:879 KB
  • 参考文献:1. Lam HYK, Pan C, Clark MJ, Lacroute P, Chen R, Haraksingh R, et al. Detecting and annotating genetic variations using the HugeSeq pipeline. Nat Biotechnol. 2012;30:226鈥?. CrossRef
    2. Stenson PD, Ball EV, Mort M, Phillips AD, Shaw K, Cooper DN. The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr Protoc Bioinform. 2012;39:1. 13:1.13.1鈥?.13.20.
    3. Li S, Li R, Li H, Lu J, Li Y, Bolund L, et al. SOAPindel: efficient identification of indels from short paired reads. Genome Res. 2013;23:195鈥?00. CrossRef
    4. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25:2865鈥?1. CrossRef
    5. Jiang Y, Wang Y, Brudno M. PRISM: Pair read informed split read mapping for base-pair level detection of insertion, deletion and structural variants. Bioinformatics. 2012;28(20):2576鈥?3. CrossRef
    6. Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: An approach to discover, genotype and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21:974鈥?4. CrossRef
    7. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. CrossRef
    8. Hu X, Yuan J, Shi Y, Lu J, Liu B, Li Z, et al. pIRS: Profile-based Illumina pair-end reads simulator. Bioinformatics. 2012;28:1533鈥?. CrossRef
    9. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754鈥?0. CrossRef
    10. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297鈥?03. CrossRef
    11. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078鈥?. CrossRef
    12. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008;453:56鈥?4. CrossRef
    13. Albers C, Lunter G, MacArthur DG, McVean G, Ouwehand WH, Durbin R. Dindel: accurate indel calls from short-read data. Genome Res. 2010:961鈥?73
    14. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53鈥?. CrossRef
    15. McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu Y, Tsung EF, et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 2009;19:1527鈥?1. CrossRef
    16. Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40(Database issue):D130鈥?. CrossRef
    17. Mills RE, Pittard WS, Mullaney JM, Farooq U, Creasy TH, Mahurkar AA, et al. Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res. 2011;21:830鈥?. CrossRef
    18. Mullaney JM, Mills RE, Pittard WS, Devine SE. Small insertions and deletions (INDELs) in human genomes. Hum Mol Genet. 2010;19:R131鈥?. CrossRef
    19. Chen H, Boutros PC. VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics. 2011;12:35. CrossRef
    20. Sokol L, Loughran TP. Large granular lymphocyte leukemia. Curr Hematol Malig Rep. 2007;2:278鈥?2. CrossRef
    21. Loughran TP, Kadin ME, Starkebaum G, Abkowitz JL, Clark EA, Disteche C, et al. Leukemia of large granular lymphocytes: association with clonal chromosomal abnormalities and autoimmune neutropenia, thrombocytopenia, and hemolytic anemia. Ann Intern Med. 1985;102:169鈥?5. CrossRef
    22. Banerji S, Cibulskis K, Rangel-Escareno C, Brown KK, Carter SL, Frederick AM, et al. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature. 2012;486:405鈥?. CrossRef
    23. Ratan A, Miller W, Guillory J, Stinson J, Seshagiri S, Schuster SC. Comparison of sequencing platforms for single nucleotide variant calls in a human sample. PLoS One. 2013;8:e55089. CrossRef
    24. Chao KM, Pearson WR, Miller W. Aligning two sequences within a specified diagonal band. Comput Appl Biosci. 1992;8:481鈥?.
  • 刊物主题:Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms;
  • 出版者:BioMed Central
  • ISSN:1471-2105
文摘
Background The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases. Results We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools. Conclusions indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700