Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans
详细信息    查看全文
  • 作者:A. Xavier ; William M. Muir ; Katy M. Rainey
  • 关键词:Empirical Bayes ; Heritability ; Genomic selection ; Association studies
  • 刊名:BMC Bioinformatics
  • 出版年:2016
  • 出版时间:December 2016
  • 年:2016
  • 卷:17
  • 期:1
  • 全文大小:744 KB
  • 参考文献:1.Acquaah G. Principles of plant genetics and breeding. 2009. John Wiley & Sons. The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK.
    2.Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.CrossRef
    3.Browning SR, Browning BL. High-resolution detection of identity by descent in unrelated individuals. Am J Hum Genet. 2010;86(4):526–39.PubMedCentral CrossRef PubMed
    4.Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007;81(5):1084–97.PubMedCentral CrossRef PubMed
    5.Chen L, Li C, Sargolzaei M, Schenkel F. Impact of Genotype Imputation on the Performance of GBLUP and Bayesian Methods for Genomic Prediction. PLoS One. 2014;9(7):e101544.PubMedCentral CrossRef PubMed
    6.Dassonneville R, Brøndum RF, Druet T, Fritz S, Guillaume F, Guldbrandtsen B, et al. Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations. J Dairy Sci. 2011;94(7):3679–86.CrossRef PubMed
    7.de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics. 2013;193(2):327–45.
    8.Dimauro C, Cellesi M, Gaspa G, Ajmone-Marsan P, Steri R, Marras G, et al. Use of partial least squares regression to impute SNP genotypes in Italian Cattle breeds. Gen Sel Evol. 2013;45(1):1–8.CrossRef
    9.Druet T, Georges M. A hidden Markov model combining linkage and linkage disequilibrium information for haplotype reconstruction and quantitative trait locus fine mapping. Genetics. 2010;184(3):789–98.PubMedCentral CrossRef PubMed
    10.Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One. 2011;6(5):e19379.PubMedCentral CrossRef PubMed
    11.Forneris NS, Legarra A, Vitezica ZG, Tsuruta S, Aguilar I, Misztal I, et al. Quality Control of Genotypes Using Heritability Estimates of Gene Content at the Marker. Genetics. 2015;199(3):675–81.CrossRef PubMed
    12.Gastwirth JL, Gel YR, Miao W. The impact of Levene’s test of equality of variances on statistical theory and practice. Stat Sci. 2009;24(3):343–60.CrossRef
    13.Gengler N, Mayeres P, Szydlowski M. A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle. Animal. 2007;1(1):21–8.CrossRef PubMed
    14.Gianola D. Priors in whole-genome regression: the Bayesian alphabet returns. Genetics. 2013;194(3):573–96.PubMedCentral CrossRef PubMed
    15.Gianola D, de los Campos G, González-Recio O, Long N, Okut H, Rosa GJ, et al. Statistical learning methods for genome-based analysis of quantitative traits. Proceedings of the 9th World Congress on Genetics Applied to Livestock Production. Vol. 14. 2010.
    16.Halperin E, Stephan DA. SNP imputation in association studies. Nat Biotechnol. 2009;27(4):349–51.CrossRef PubMed
    17.He S, Zhao Y, Mette MF, Bothe R, Ebmeyer E, Sharbel TF, et al. Prospects and limits of marker imputation in quantitative genetic studies in European elite wheat (Triticum aestivum L.). BMC Genomics. 2015;16(1):168.
    18.Heffner EL, Sorrells ME, Jannink JL. Genomic selection for crop improvement. Crop Sci. 2009;49(1):1–12.CrossRef
    19.Henderson CR. Estimation of variances and covariances under multiple trait models. J Dairy Sci. 1984;67(7):1581–9.CrossRef
    20.Howard R, Carriquiry AL, Beavis WD. Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3. 2014;4(6):1027–46.
    21.Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5(6):e1000529.PubMedCentral CrossRef PubMed
    22.Jarquín D, Kocak K, Posadas L, Hyma K, Jedlicka J, Graef G, et al. Genotyping by sequencing for genomic prediction in a soybean breeding population. BMC Genomics. 2014;15(1):740.PubMedCentral CrossRef PubMed
    23.Kimmel G, Karp RM, Jordan MI, Halperin E. Association mapping and significance estimation via the coalescent. Am J Hum Genet. 2008;83(6):675–83.PubMedCentral CrossRef PubMed
    24.Legarra A, Misztal I. Technical note: computing strategies in genome-wide selection. J Dairy Sci. 2008;91(1):360–6.CrossRef PubMed
    25.Lehermeier C, Wimmer V, Albrecht T, Auinger HJ, Gianola D, Schmid VJ, et al. Sensitivity to prior specification in Bayesian genome-based prediction models. Stat Appl Genet Mol Biol. 2013;12(3):375–91.PubMed
    26.Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 2010;34(8):816–34.PubMedCentral CrossRef PubMed
    27.Lorenz AJ. Resource allocation for maximizing prediction accuracy and genetic gain of genomic selection in plant breeding: a simulation experiment. G3. 2013;3(3):481–91.PubMedCentral CrossRef PubMed
    28.Lynch M, Walsh B. Genetics and analysis of quantitative traits. Sinauer Associates, Inc. Sunderland, MA. 1998.
    29.Ma P, Brøndum RF, Zhang Q, Lund MS, Su G. Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle. J Dairy Sci. 2013;96(7):4666–77.CrossRef PubMed
    30.Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010;11(7):499–511.CrossRef PubMed
    31.Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–29.PubMedCentral PubMed
    32.Misztal I. Computational techniques in animal breeding. Athens: University of Georgia; 2000. http://​nce.​ads.​uga.​edu/​~ignacy/​course2002/​notes.​pdf .
    33.Misztal I, Tsuruta S, Strabel T, Auvray B, Druet T, Lee DH. BLUPF90 and related programs (BGF90). In: Proceedings of the 7th World Congress on Genetics Applied to Livestock Production, Montpellier, France, August, 2002. 2002. Session 28. (pp. 1-2). Institut National de la Recherche Agronomique (INRA).
    34.Morota G, Gianola D. Kernel-based whole-genome prediction of complex traits: a review. Front Genet. 2014;5(1):363.PubMedCentral PubMed
    35.Morota G, Boddhireddy P, Vukasinovic N, Gianola D, DeNise S. Kernel-based variance component estimation and whole-genome prediction of pre-corrected phenotypes and progeny tests for dairy cow health traits. Front Genet. 2014;5(1):56.PubMedCentral PubMed
    36.Owen AB, Perry PO. Bi-cross-validation of the SVD and the nonnegative matrix factorization. Ann Appl Stat. 2009;3(2):564–94.CrossRef
    37.Pasaniuc B, Rohland N, McLaren PJ, Garimella K, Zaitlen N, Li H, et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat Genet. 2012;44(6):631–5.PubMedCentral CrossRef PubMed
    38.Pausch H, Aigner B, Emmerling R, Edel C, Götz KU, Fries R. Imputation of high-density genotypes in the Fleckvieh cattle population. Genet Sel Evol. 2013;45(3):10–1186.
    39.Pérez P, de los Campos G. Genome-wide regression & prediction with the BGLR statistical package. Genetics. 2014;198(2):483–95.PubMedCentral CrossRef PubMed
    40.Perry PO. bcv: Cross-Validation for the SVD (Bi-Cross-Validation). R package version 1.0. 2009. http://​cran.​r-project.​org/​web/​packages/​bcv/​bcv.​pdf .
    41.Pimentel EC, Wensch-Dorendorf M, König S, Swalve HH. Enlarging a training set for genomic selection by imputation of un-genotyped animals in populations of varying genetic architecture. Genet Sel Evol. 2013;45(12).
    42.Poland JA, Rife TW. Genotyping-by-sequencing for plant breeding and genetics. Plant Genome. 2012;5(3):92–102.CrossRef
    43.Poland J, Endelman J, Dawson J, Rutkoski J, Wu S, Manes Y, et al. Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genome. 2012;5(3):103–13.CrossRef
    44.Rutkoski JE, Poland J, Jannink JL, Sorrells ME. Imputation of unordered markers and the impact on genomic selection accuracy. G3. 2013;3(3):427–39.
    45.Schaeffer LR. Strategy for applying genome‐wide selection in dairy cattle. J Anim Breed Genet. 2006;123(4):218–23.CrossRef PubMed
    46.Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78(4):629–44.PubMedCentral CrossRef PubMed
    47.Schneider T. Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. J Climate. 2001;14(5):853–71.CrossRef
    48.Sonah H, Bastien M, Iquira E, Tardivel A, Légaré G, Boyle B, et al. An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping. PLoS One. 2013;8(1):e54603.PubMedCentral CrossRef PubMed
    49.Spencer CC, Su Z, Donnelly P, Marchini J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009;5(5):e1000477.PubMedCentral CrossRef PubMed
    50.Stekhoven DJ, Bühlmann P. MissForest - nonparametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–8.CrossRef PubMed
    51.Strandén I, Christensen OF. Allele coding in genomic evaluation. Genet Sel Evol. 2011;43(1):25.PubMedCentral CrossRef PubMed
    52.Surakka I, Kristiansson K, Anttila V, Inouye M, Barnes C, Moutsianas L, et al. Founder population-specific HapMap panel increases power in GWA studies through improved imputation accuracy and CNV tagging. Genome Res. 2010;20(10):1344–51.PubMedCentral CrossRef PubMed
    53.Swarts K, Li H, Romero Navarro JA, An D, Romay MC, Hearne S, et al. Novel Methods to Optimize Genotypic Imputation for Low-Coverage, Next-Generation Sequence Data in Crop Plants. The Plant Genome. 2014;7(3):0.
    54.Tabangin ME, Woo JG, Martin LJ. The effect of minor allele frequency on the likelihood of obtaining false positives. BMC Proc. 2009;3(7):S41. BioMed Central Ltd.
    55.Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–5.CrossRef PubMed
    56.van Binsbergen R. Accuracy of Imputation to Whole-Genome Sequence Data in Holstein Friesian Cattle. Genet Sel Evol. 2014;46(1):41-54.
    57.VanRaden PM, O’Connell JR, Wiggans GR, Weigel KA. Genomic evaluations with many more genotypes. Genet Sel Evol. 2011;43(1):10-21.
    58.VanRaden PM, Null DJ, Sargolzaei M, Wiggans GR, Tooker ME, Cole JB, et al. Genomic imputation and evaluation using high-density Holstein genotypes. J Dairy Sci. 2013;96(1):668–78.CrossRef PubMed
    59.VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.CrossRef PubMed
    60.Wei B, Yang F, Wang X, Ge Y. knnGarden: Multi-distance based k-Nearest Neighbors. R package version 1.0.1. 2012. URL: http://​cran.​r-project.​org/​web/​packages/​knnGarden .
    61.Wilson DR, Martinez TR. Reduction techniques for instance-based learning algorithms. Mach Learn. 2000;38(3):257–86.CrossRef
    62.Wright S. Coefficients of inbreeding and relationship. Am Nat. 1922;56:330–8.CrossRef
    63.Xavier A, Beavis WD, Specht JE, Diers B, Muir WM, Rainey KM. SoyNAM: Soybean Nested Association Mapping Dataset. R package version 1.2. 2015. URL http://​CRAN.​R-project.​org/​package=​SoyNAM .
    64.Xu S. Estimating polygenic effects using markers of the entire genome. Genetics. 2003;163(2):789–801.PubMedCentral PubMed
    65.Yang Y, Wang Q, Chen Q, Liao R, Zhang X, Yang H, et al. A new genotype imputation method with tolerance to high missing rate and rare variants. PLoS One. 2014;9(6):e101025.
    66.Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods. 2014;11(4):407–9.PubMedCentral CrossRef PubMed
  • 作者单位:A. Xavier (1)
    William M. Muir (2)
    Katy M. Rainey (1)

    1. Department of Agronomy, Purdue University, Lilly Hall of Life Sciences, 915 W. State St., West Lafayette, Indiana, 47907, USA
    2. Department of Animal Science, Purdue University, Lilly Hall of Life Sciences, 915 W. State St., West Lafayette, Indiana, 47907, USA
  • 刊物主题:Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms;
  • 出版者:BioMed Central
  • ISSN:1471-2105
文摘
Background Success in genome-wide association studies and marker-assisted selection depends on good phenotypic and genotypic data. The more complete this data is, the more powerful will be the results of analysis. Nevertheless, there are next-generation technologies that seek to provide genotypic information in spite of great proportions of missing data. The procedures these technologies use to impute genetic data, therefore, greatly affect downstream analyses. This study aims to (1) compare the genetic variance in a single-nucleotide polymorphism panel of soybean with missing data imputed using various methods, (2) evaluate the imputation accuracy and post-imputation quality associated with these methods, and (3) evaluate the impact of imputation method on heritability and the accuracy of genome-wide prediction of soybean traits. The imputation methods we evaluated were as follows: multivariate mixed model, hidden Markov model, logical algorithm, k-nearest neighbor, single value decomposition, and random forest. We used raw genotypes from the SoyNAM project and the following phenotypes: plant height, days to maturity, grain yield, and seed protein composition.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700