群体基因组学方法:从经典统计学到有监督学习
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Population genomics:From classical statistics to supervised learning
  • 作者:施怿 ; 李海鹏
  • 英文作者:SHI Yi;LI Hai Peng;Key Laboratory of Computational Biology,CAS-MPG Partner Institute for Computational Biology,Shanghai Institute of Nutrition and Health,Shanghai Institutes for Biological Sciences,Chinese Academy of Sciences;University of Chinese Academy of Sciences;Center for Excellence in Animal Evolution and Genetics,Chinese Academy of Sciences;
  • 关键词:群体基因组学 ; 自然选择 ; 重组率 ; 经典统计学 ; 有监督学习
  • 英文关键词:population genomics;;natural selection;;recombination rate;;classical statistics;;supervised learning
  • 中文刊名:JCXK
  • 英文刊名:Scientia Sinica(Vitae)
  • 机构:中国科学院上海生命科学研究院中国科学院上海营养与健康研究所中国科学院马普学会计算生物学伙伴研究所中国科学院计算生物学重点实验室;中国科学院大学;中国科学院动物进化与遗传前沿交叉卓越创新中心;
  • 出版日期:2019-03-25 14:08
  • 出版单位:中国科学:生命科学
  • 年:2019
  • 期:v.49
  • 基金:中国科学院战略性先导科技专项(批准号:XDB13040800);; 国家自然科学基金(批准号:91531306,91731304)资助
  • 语种:中文;
  • 页:JCXK201904015
  • 页数:11
  • CN:04
  • ISSN:11-5840/Q
  • 分类号:159-169
摘要
群体遗传学的一个主要研究目标是理解突变、自然选择、遗传漂变、群体结构和数量变化等进化力量如何共同影响基因组中的遗传变异.通过分析DNA序列多态数据,可以推测曾经作用于基因组的各种力量,进而探讨生物演化的过程.近年来,随着第二代DNA测序技术的快速革新,群体遗传学进入了基因组学时代,相关的方法在不断发展,并可将群体基因组学方法分为经典统计学方法和新兴的机器学习方法.前者包括经典群体遗传学统计量、单一统计量或多统计量联合检测自然选择、群体历史与自然选择的联合估计以及基于溯祖树和祖先重组图的方法.后者主要基于有监督学习,为群体基因组时代的大数据分析带来了全新范式.本文从理论基础出发,全面回顾了群体基因组学方法发展变化的历程,着重介绍了该领域的最新进展,并就未来的发展方向进行了展望.
        It is essential to understand how the patterns of genetic variation in organisms have been shaped by different evolutionary forces,such as mutation,natural selection,genetic drift,population structure,and population size change.In recent years,with the rapid innovation of next-generation sequencing technology,we are facing the new era of population genomics.The relevant population genomics methods can be classified as classical statistics and supervised learning.The classical statistics methods include many popular ones for detecting natural selection and inferring the parameters of demography,which are based on single or multiple combined statistics.The supervised learning methods may promise a new paradigm to make sense of large datasets in the genomic era.Here a brief introduction was first given on the important theory in population genomics.Then we overviewed the recent research progress in population genomics and shared our perspectives on its future development.
引文
1 Darwin C.On The Origin of Species By Means of Natural Selection,or The Preservation of Favoured Races in The Struggle For Life.London:John Murray,1859.126-127
    2 Huxley J.Evolution:The Modern Synthesis.London:George Allen and Unwin,1942.1-45
    3 Crow J F.Population genetics history:a personal view.Annu Rev Genet,1987,21:1-22
    4 Kimura M.Evolutionary rate at the molecular level.Nature,1968,217:624-626
    5 Hughes A L.Looking for Darwin in all the wrong places:the misguided quest for positive selection at the nucleotide sequence level.Heredity,2007,99:364-373
    6 Voight B F,Kudaravalli S,Wen X,et al.A map of recent positive selection in the human genome.PLo S Biol,2006,4:e72
    7 Consortium T I H.A haplotype map of the human genome.Nature,2005,437:1299-1320
    8 Gibbs R A,Boerwinkle E,Doddapaneni H,et al.A global reference for human genetic variation.Nature,2015,526:68-74
    9 Sudmant P H,Rausch T,Gardner E J,et al.An integrated map of structural variation in 2,504 human genomes.Nature,2015,526:75-81
    10 Walter K,Min J L,Huang J,et al.The UK10K project identifies rare variants in health and disease.Nature,2015,526:82-90
    11 Geihs M,Yan Y,Walter K,et al.An interactive genome browser of association results from the UK10K cohorts project.Bioinformatics,2015,31:4029-4031
    12 Hoban S,Bertorelle G,Gaggiotti O E.Computer simulations:tools for population and evolutionary genetics.Nat Rev Genet,2012,13:110-122
    13 Ke Y,Su B,Song X,et al.African origin of modern humans in East Asia:a tale of 12,000 Y chromosomes.Science,2001,292:1151-1153
    14 He Y X,Qi X B,Ouzhuluobu X,et al.Blunted nitric oxide regulation in Tibetans under high-altitude hypoxia.Natl Sci Rev,2018,5:516-529
    15 Marciniak S,Perry G H.Harnessing ancient genomes to study the history of human adaptation.Nat Rev Genet,2017,18:659-674
    16 Wang G D,Shao X J,Bai B,et al.Structural variation during dog domestication:insights from grey wolf and dhole genomes.Natl Sci Rev,2019,doi:10.1093/nsr/nwy076
    17 Ling S,Hu Z,Yang Z,et al.Extremely high genetic diversity in a single tumor points to prevalence of non-Darwinian cell evolution.Proc Natl Acad Sci USA,2015,112:E6496-E6505
    18 Wu C I,Wang H Y,Ling S,et al.The ecology and evolution of cancer:the ultra-microevolutionary process.Annu Rev Genet,2016,50:347-369
    19 Wang H Y,Chen Y X,Tong D,et al.Is the evolution in tumors Darwinian or non-Darwinian?Natl Sci Rev,2018,5:15-17
    20 Schneider A,Souvorov A,Sabath N,et al.Estimates of positive Darwinian selection are inflated by errors in sequencing,annotation,and alignment.Genome Biol Evol,2009,1:114-118
    21 Wright S.Evolution in Mendelian populations.Genetics,1931,16:97-159
    22 Wright S.Breeding structure of populations in relation to speciation.Am Natist,1940,74:232-248
    23 Watterson G A.On the number of segregating sites in genetical models without recombination.Theor Populat Biol,1975,7:256-276
    24 Tajima F.Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.Genetics,1989,123:585-595
    25 Fu Y X,Li W H.Statistical tests of neutrality of mutations.Genetics,1993,133:693-709
    26 Fay J C,Wu C I.Hitchhiking under positive Darwinian selection.Genetics,2000,155:1405-1413
    27 Zeng K,Fu Y X,Shi S,et al.Statistical tests for detecting positive selection by utilizing high-frequency variants.Genetics,2006,174:1431-1439
    28 Fu Y X.A phylogenetic estimator of effective population size or mutation rate.Genetics,1994,136:685-692
    29 Fu Y X,Li W H.Maximum likelihood estimation of population parameters.Genetics,1993,134:1261-1270
    30 Wright S.The genetical structure of populations.Ann Eugen,1949,15:323-354
    31 Weir B S,Cockerham C C.Estimating F-statistics for the analysis of population structure.Evolution,1984,38:1358-1370
    32 Hudson R R,Kaplan N L.Statistical properties of the number of recombination events in the history of a sample of DNA sequences.Genetics,1985,111:147-164
    33 Nielsen R.Molecular signatures of natural selection.Annu Rev Genet,2005,39:197-218
    34 Vitti J J,Grossman S R,Sabeti P C.Detecting natural selection in genomic data.Annu Rev Genet,2013,47:97-120
    35 Zhou Q,Wang W.Detecting natural selection at the DNA level(in Chinese).Zool Res,2004,25:73-80[周琦,王文.DNA水平自然选择作用的检测.动物学研究,2004,25:73-80]
    36 Lin K,Li H P.Advances in detecting positive selection on genome(in Chinese).Hereditas,2009,31:896-902[林栲,李海鹏.DNA水平上检测正选择方法的研究进展.遗传,2009,31:896-902]
    37 Chen H,Hey J,Slatkin M.A hidden Markov model for investigating recent positive selection through haplotype structure.Theor Popul Biol,2015,99:18-30
    38 Chen H,Patterson N,Reich D.Population differentiation as a test for selective sweeps.Genome Res,2010,20:393-402
    39 Yi X,Liang Y,Huerta-Sanchez E,et al.Sequencing of 50 human exomes reveals adaptation to high altitude.Science,2010,329:75-78
    40 Akey J M.Constructing genomic maps of positive selection in humans:where do we go from here?Genome Res,2009,19:711-722
    41 Li W H,Wu C I,Luo C C.A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes.Mol Biol Evol,1985,2:150--174
    42 Hurst L D.The Ka/Ksratio:diagnosing the form of sequence evolution.Trends Genets,2002,18:486-487
    43 Mc Donald J H,Kreitman M.Adaptive protein evolution at the Adh locus in Drosophila.Nature,1991,351:652-654
    44 Pollard K S,Salama S R,King B,et al.Forces shaping the fastest evolving regions in the human genome.PLo S Genet,2006,2:e168
    45 Pollard K S,Salama S R,Lambert N,et al.An RNA gene expressed during cortical development evolved rapidly in humans.Nature,2006,443:167-172
    46 Kim Y,Stephan W.Detecting a local signature of genetic hitchhiking along a recombining chromosome.Genetics,2002,160:765-777
    47 Sabeti P C,Reich D E,Higgins J M,et al.Detecting recent positive selection in the human genome from haplotype structure.Nature,2002,419:832-837
    48 Wang E T,Kodama G,Baldi P,et al.Global landscape of recent inferred Darwinian selection for Homo sapiens.Proc Natl Acad Sci USA,2006,103:135-140
    49 Lewontin R C,Krakauer J.Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms.Genetics,1973,74:175-195
    50 Shriver M D,Kennedy G C,Parra E J,et al.The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs.Hum Genom,2004,1:274-286
    51 Jaillon O,Aury J M,Brunet F,et al.Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype.Nature,2004,431:946-957
    52 Hufford M B,Xu X,van Heerwaarden J,et al.Comparative population genomics of maize domestication and improvement.Nat Genet,2012,44:808-811
    53 Zhou Z,Jiang Y,Wang Z,et al.Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean.Nat Biotechnol,2015,33:408-414
    54 Li J,Li H,Jakobsson M,et al.Joint analysis of demography and selection in population genetics:where do we stand and where could we go?Mol Ecol,2012,21:28-44
    55 Bank C,Ewing G B,Ferrer-Admettla A,et al.Thinking too positive?Revisiting current methods of population genetic selection inference.Trends Genets,2014,30:540-546
    56 Zeng K,Shi S,Wu C I.Compound tests for the detection of hitchhiking under positive selection.Mol Biol Evol,2007,24:1898-1908
    57 Watterson G A.The homozygosity test of neutrality.Genetics,1978,88:405-417
    58 Grossman S R,Shlyakhter I,Shylakhter I,et al.A composite of multiple signals distinguishes causal variants in regions of positive selection.Science,2010,327:883-886
    59 Ewens W J.The sampling theory of selectively neutral alleles.Theor Popul Biol,1972,3:87-112
    60 Simonson T S,Yang Y,Huff C D,et al.Genetic evidence for high-altitude adaptation in Tibet.Science,2010,329:72-75
    61 Lin K,Li H,Schl?tterer C,et al.Distinguishing positive selection from neutral evolution:boosting the performance of summary statistics.Genetics,2011,187:229-244
    62 Nielsen R,Williamson S,Kim Y,et al.Genomic scans for selective sweeps using SNP data.Genome Res,2005,15:1566-1575
    63 Pavlidis P,Hutter S,Stephan W.A population genomic approach to map recent positive selection in model species.Mol Ecol,2008,17:3585-3598
    64 Li H,Stephan W.Inferring the demographic history and rate of adaptive substitution in Drosophila.PLo S Genet,2006,2:1580-1589
    65 Li H,Durbin R.Inference of human population history from individual whole-genome sequences.Nature,2011,475:493-496
    66 Schiffels S,Durbin R.Inferring human population size and separation history from multiple genome sequences.Nat Genet,2014,46:919-925
    67 Liu X,Fu Y X.Exploring population size changes using SNP frequency spectra.Nat Genet,2015,47:555-559
    68 Chen H,Hey J,Chen K.Inferring very recent population growth rate from population-scale sequencing data:using a large-sample coalescent estimator.Mol Biol Evol,2015,32:2996-3011
    69 Kaplan N L,Hudson R R,Langley C H.The hitchhiking effect revisited.Genetics,1989,123:887-899
    70 Li H.A new test for detecting recent positive selection that is free from the confounding impacts of demography.Mol Biol Evol,2011,28:365-375
    71 Li H,Wiehe T.Coalescent tree imbalance and a simple test for selective sweeps based on microsatellite variation.PLo S Comput Biol,2013,9:e1003060
    72 Yang Z,Li J,Wiehe T,et al.Detecting recent positive selection with a single locus test bipartitioning the coalescent tree.Genetics,2018,208:791-805
    73 Wang M,Huang X,Li R,et al.Detecting recent positive selection with high accuracy and reliability by conditional coalescent tree.Mol Biol Evol,2014,31:3068-3080
    74 Disanto F,Schlizio A,Wiehe T.Yule-generated trees constrained by node imbalance.Math Biosci,2013,246:139-147
    75 Ronen R,Tesler G,Akbari A,et al.Predicting carriers of ongoing selective sweeps without knowledge of the favored allele.PLo S Genet,2015,11:e1005527
    76 Akbari A,Vitti J J,Iranmehr A,et al.Identifying the favored mutation in a positive selective sweep.Nat Meth,2018,15:279-282
    77 Hunter-Zinck H,Clark A G.Aberrant time to most recent common ancestor as a signature of natural selection.Mol Biol Evol,2015,32:2784-2797
    78 Schrider D R,Kern A D.Supervised machine learning for population genetics:a new paradigm.Trends Genets,2018,34:301-312
    79 Ghahramani Z.Unsupervised Learning.Advanced Lectures On Machine Learning.Heidelberg:Springer,2003.72-112
    80 Kotsiantis S B.Supervised machine learning:a review of classification techniques.Informatica(lithuanian Academy of Sciences),2007,31:249-268
    81 Zhu X J,Goldberg A B.Introduction To Semi-supervised Learning.San Rafael:Morgan&Claypool,2009.9-20
    82 Mnih V,Kavukcuoglu K,Silver D,et al.Human-level control through deep reinforcement learning.Nature,2015,518:529-533
    83 Novembre J,Johnson T,Bryc K,et al.Genes mirror geography within Europe.Nature,2008,456:98-101
    84 Silver D,Schrittwieser J,Simonyan K,et al.Mastering the game of Go without human knowledge.Nature,2017,550:354-359
    85 Zeiler M D,Fergus R.Visualizing and understanding convolutional networks.Lect Notes Comput Sc,2014,8689:818-833
    86 Pavlidis P,Jensen J D,Stephan W.Searching for footprints of positive selection in whole-genome SNP data from nonequilibrium populations.Genetics,2010,185:907-922
    87 Schrider D R,Kern A D.S/HIC:robust identification of soft and hard sweeps using machine learning.PLo S Genet,2016,12:e1005928
    88 Sheehan S,Song Y S.Deep learning for population genetic inference.PLo S Comput Biol,2016,12:e1004845
    89 Pudlo P,Marin J M,Estoup A,et al.Reliable ABC model choice via random forests.Bioinformatics,2016,32:859-866
    90 Lin K,Futschik A,Li H.A fast estimate for the population recombination rate based on regression.Genetics,2013,194:473-484
    91 Gao F,Ming C,Hu W,et al.New software for the fast estimation of population recombination rates(Fast EPRR)in the genomic era.G3,2016,6:1563-1571
    92 Auton A,Mc Vean G.Recombination rate estimation in the presence of hotspots.Genome Res,2007,17:1219-1227
    93 Enard W,Przeworski M,Fisher S E,et al.Molecular evolution of FOXP2,a gene involved in speech and language.Nature,2002,418:869-872
    94 Enard W,Gehre S,Hammerschmidt K,et al.A humanized version of Foxp2 affects cortico-basal ganglia circuits in mice.Cell,2009,137:961-971
    95 Krause J,Lalueza-Fox C,Orlando L,et al.The derived FOXP2 variant of modern humans was shared with Neandertals.Curr Biol,2007,17:1908-1912
    96 Maricic T,Günther V,Georgiev O,et al.A recent evolutionary change affects a regulatory element in the human FOXP2 gene.Mol Biol Evol,2013,30:844-852
    97 Coop G,Bullaughey K,Luca F,et al.The timing of selection at the human FOXP2 gene.Mol Biol Evol,2008,25:1257-1259
    98 Preuss T M.Human brain evolution:from gene discovery to phenotype discovery.Proc Natl Acad Sci USA,2012,109:10709-10716
    99 Atkinson E G,Audesse A J,Palacios J A,et al.No evidence for recent selection at FOXP2 among diverse human populations.Cell,2018,174:1424-1435.e15
    100 Xiang-Yu J,Yang Z,Tang K,et al.Revisiting the false positive rate in detecting recent positive selection.Quant Biol,2016,4:207-216
    101 Gao F,Li H P.Application of computer simulators in population genetics(in Chinese).Hereditas,2016,38:707-717[高峰,李海鹏.群体遗传学模拟软件应用现状.遗传,2016,38:707-717]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700