ANGSD: Analysis of Next Generation Sequencing Data
详细信息    查看全文
  • 作者:Thorfinn Sand Korneliussen (1)
    Anders Albrechtsen (2)
    Rasmus Nielsen (1) (3)

    1. Centre for GeoGenetics
    ; Natural History Museum of Denmark ; Copenhagen ; Denmark
    2. Bioinformatics Centre
    ; Department of Biology ; University of Copenhagen ; Ole Maaloes Vej 5 ; Copenhagen ; DK-2200 ; Denmark
    3. Department of Integrative Biology and Statistics
    ; UC-Berkeley ; 4098 VLSB ; Berkeley ; California ; 94720 ; USA
  • 关键词:Next ; generation sequencing ; Bioinformatics ; Population genetics ; Association studies
  • 刊名:BMC Bioinformatics
  • 出版年:2014
  • 出版时间:December 2014
  • 年:2014
  • 卷:15
  • 期:1
  • 全文大小:1,220 KB
  • 参考文献:1. Nielsen, R, Paul, JS, Albrechtsen, A, Song, YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12: pp. 443-451 CrossRef
    2. Li, H, Durbin, R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: pp. 1754-1760 CrossRef
    3. Li, R, Yu, C, Li, Y, Lam, TW, Yiu, SM, Kristiansen, K, Wang, J (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25: pp. 1966-1967 CrossRef
    4. Langmead, B, Trapnell, C, Pop, M, Salzberg, SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: pp. R25 CrossRef
    5. Marco-Sola, S, Sammeth, M, Guigo, R, Ribeca, P (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods 9: pp. 1185-1188 CrossRef
    6. Li, R, Li, Y, Fang, X, Yang, H, Wang, J, Kristiansen, K, Wang, J (2009) SNP detection for massively parallel whole-genome resequencing. Genome Res 19: pp. 1124-1132 CrossRef
    7. Cabanski, CR, Cavin, K, Bizon, C, Wilkerson, MD, Parker, JS, Wilhelmsen, KC, Perou, CM, Marron, JS, Hayes, DN (2012) ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data. BMC Bioinformatics 13: pp. 221 CrossRef
    8. McKenna, A, Hanna, M, Banks, E, Sivachenko, A, Cibulskis, K, Kernytsky, A, Garimella, K, Altshuler, D, Gabriel, S, Daly, M, DePristo, MA (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20: pp. 1297-1303 CrossRef
    9. Nielsen, R, Korneliussen, T, Albrechtsen, A, Li, Y, Wang, J (2012) SNP calling, genotype calling, and sample allele frequency estimation from new-generation sequencing data. PLoS ONE 7: pp. e37558 CrossRef
    10. Kim, SY, Lohmueller, KE, Albrechtsen, A, Li, Y, Korneliussen, T, Tian, G, Grarup, N, Jiang, T, Andersen, G, Witte, D, Jorgensen, T, Hansen, T, Pedersen, O, Wang, J, Nielsen, R (2011) Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics 12: pp. 231 CrossRef
    11. Skotte, L, Korneliussen, TS, Albrechtsen, A (2012) Association testing for next-generation sequencing data using score statistics. Genet Epidemiol 36: pp. 430-437 CrossRef
    12. Korneliussen, T, Moltke, I, Albrechtsen, A, Nielsen, R (2013) Calculation of Tajima鈥檚 D and other neutrality test statistics from low depth next-generation sequencing data. BMC Bioinformatics 14: pp. 289 CrossRef
    13. Vieira, FG, Fumagalli, M, Albrechtsen, A, Nielsen, R (2013) Estimating inbreeding coefficients from NGS data: Impact on genotype calling and allele frequency estimation. Genome Res 23: pp. 1852-1861 CrossRef
    14. Fumagalli, M, Vieira, FG, Korneliussen, TS, Linderoth, T, Huerta-Sanchez, E, Albrechtsen, A, Nielsen, R (2013) Quantifying population genetic differentiation from next-generation sequencing data. Genetics 195: pp. 979-992 CrossRef
    15. Li, H, Durbin, R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: pp. 1754-1760 CrossRef
    16. Browning, BL, Yu, Z (2009) Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am J Hum Genet 85: pp. 847-861 CrossRef
    17. Fumagalli, M, Vieira, FG, Linderoth, T, Nielsen, R (2014) ngsTools: methods for population genetics analyses from next-generation sequencing data. Bioinformatics 30: pp. 1486-1487 CrossRef
    18. Patterson, N, Moorjani, P, Luo, Y, Mallick, S, Rohland, N, Zhan, Y, Genschoreck, T, Webster, T, Reich, D (2012) Ancient admixture in human history. Genetics 192: pp. 1065-1093 CrossRef
    19. Rasmussen, M, Li, Y, Lindgreen, S, Pedersen, JS, Albrechtsen, A, Moltke, I, Metspalu, M, Metspalu, E, Kivisild, T, Gupta, R, Bertalan, M, Nielsen, K, Gilbert, MT, Wang, Y, Raghavan, M, Campos, PF, Kamp, HM, Wilson, AS, Gledhill, A, Tridico, S, Bunce, M, Lorenzen, ED, Binladen, J, Guo, X, Zhao, J, Zhang, X, Zhang, H, Li, Z, Chen, M, Orlando, L (2010) Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463: pp. 757-762 CrossRef
    20. Orlando, L, Ginolhac, A, Zhang, G, Froese, D, Albrechtsen, A, Stiller, M, Schubert, M, Cappellini, E, Petersen, B, Moltke, I, Johnson, PL, Fumagalli, M, Vilstrup, JT, Raghavan, M, Korneliussen, T, Malaspinas, AS, Vogt, J, Szklarczyk, D, Kelstrup, CD, Vinther, J, Dolocan, A, Stenderup, J, Velazquez, AM, Cahill, J, Rasmussen, M, Wang, X, Min, J, Zazula, GD, Seguin-Orlando, A, Mortensen, C (2013) Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499: pp. 74-78 CrossRef
    21. Li, Y, Vinckenbosch, N, Tian, G, Huerta-Sanchez, E, Jiang, T, Jiang, H, Albrechtsen, A, Andersen, G, Cao, H, Korneliussen, T, Grarup, N, Guo, Y, Hellman, I, Jin, X, Li, Q, Liu, J, Liu, X, Spars酶, T, Tang, M, Wu, H, Wu, R, Yu, C, Zheng, H, Astrup, A, Bolund, L, Holmkvist, J, J酶rgensen, T, Kristiansen, K, Schmitz, O, Schwartz, TW (2010) Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet 42: pp. 969-972 CrossRef
    22. Skotte, L, Korneliussen, TS, Albrechtsen, A (2013) Estimating individual admixture proportions from next generation sequencing data. Genetics 195: pp. 693-702 CrossRef
    23. Li, H (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27: pp. 2987-2993 CrossRef
    24. Green, RE, Krause, J, Briggs, AW, Maricic, T, Stenzel, U, Kircher, M, Patterson, N, Li, H, Zhai, W, Fritz, MH, Hansen, NF, Durand, EY, Malaspinas, AS, Jensen, JD, Marques-Bonet, T, Alkan, C, Prufer, K, Meyer, M, Burbano, HA, Good, JM, Schultz, R, Aximu-Petri, A, Butthof, A, Hober, B, Hoffner, B, Siegemund, M, Weihmann, A, Nusbaum, C, Lander, ES, Russ, C (2010) A draft sequence of the Neandertal genome. Science 328: pp. 710-722 CrossRef
    25. Consortium, TGP (2010) A map of human genome variation from population-scale sequencing. Nature 467: pp. 1061-1073 CrossRef
    26. Han, E, Sinsheimer, JS, Novembre, J (2014) Characterizing bias in population genetic inferences from low-coverage sequencing data. Mol Biol Evol 31: pp. 723-735 CrossRef
    27. Albrechtsen, A, Nielsen, FC, Nielsen, R (2010) Ascertainment biases in SNP chips affect measures of population divergence. Mol Biol Evol 27: pp. 2534-2547 CrossRef
    28. Gravel, S, Henn, BM, Gutenkunst, RN, Indap, AR, Marth, GT, Clark, AG, Yu, F, Gibbs, RA, Bustamante, CD, Altshuler, DL, Durbin, RM, Abecasis, GR, Bentley, DR, Chakravarti, A, Clark, AG, Collins, FS, De la Vega, FM, Donnelly, P, Egholm, M, Flicek, P, Gabriel, SB, Gibbs, RA, Knoppers, BM, Lander, ES, Lehrach, H, Mardis, ER, McVean, GA, Nickerson, DA, Peltonen, L, Schafer, AJ (2011) Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci U S A 108: pp. 11983-11988 CrossRef
    29. Ewing, G, Hermisson, J (2010) MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26: pp. 2064-2065 CrossRef
    30. Meyer, M, Kircher, M, Gansauge, MT, Li, H, Racimo, F, Mallick, S, Schraiber, JG, Jay, F, Prufer, K, de Filippo, C, Sudmant, PH, Alkan, C, Fu, Q, Do, R, Rohland, N, Tandon, A, Siebauer, M, Green, RE, Bryc, K, Briggs, AW, Stenzel, U, Dabney, J, Shendure, J, Kitzman, J, Hammer, MF, Shunkov, MV, Derevianko, AP, Patterson, N, Andres, AM, Eichler, EE (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338: pp. 222-226 CrossRef
    31. Raghavan, M, Skoglund, P, Graf, KE, Metspalu, M, Albrechtsen, A, Moltke, I, Rasmussen, S, Stafford, TW, Orlando, L, Metspalu, E, Karmin, M, Tambets, K, Rootsi, S, Magi, R, Campos, PF, Balanovska, E, Balanovsky, O, Khusnutdinova, E, Litvinov, S, Osipova, LP, Fedorova, SA, Voevoda, MI, DeGiorgio, M, Sicheritz-Ponten, T, Brunak, S, Demeshchenko, S, Kivisild, T, Villems, R, Nielsen, R, Jakobsson, M (2014) Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505: pp. 87-91 CrossRef
    32. Reich, D, Green, RE, Kircher, M, Krause, J, Patterson, N, Durand, EY, Viola, B, Briggs, AW, Stenzel, U, Johnson, PL, Maricic, T, Good, JM, Marques-Bonet, T, Alkan, C, Fu, Q, Mallick, S, Li, H, Meyer, M, Eichler, EE, Stoneking, M, Richards, M, Talamo, S, Shunkov, MV, Derevianko, AP, Hublin, JJ, Kelso, J, Slatkin, M, Paabo, S (2010) Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468: pp. 1053-1060 CrossRef
    33. Wang, Y, Lu, J, Yu, J, Gibbs, RA, Yu, F (2013) An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. Genome Res 23: pp. 833-842 CrossRef
    34. Yu, X (2013) Sun S: Comparing a few SNP calling algorithms using low-coverage sequencing data. BMC Bioinformatics 14: pp. 274 CrossRef
    35. Li, H (2011) Improving SNP discovery by base alignment quality. Bioinformatics 27: pp. 1157-1158 CrossRef
    36. Frazer, KA, Ballinger, DGEa (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: pp. 851-861 CrossRef
  • 刊物主题:Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms;
  • 出版者:BioMed Central
  • ISSN:1471-2105
文摘
Background High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously. Results We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods. Conclusions The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd. The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700