摘要
本研究介绍了基因组结构变异检测的生物信息学基本方法和前沿技术。对基于第二代测序技术的四种检测方法(读对方法,读深方法,分裂片段方法和序列拼接方法)的原理和特点进行了详细解读,分析了第二代测序技术应用在检测结构变异上的特点与发展趋势。最后介绍了三代测序、Linked-reads和光学物理图谱等新技术在基因组结构变异检测中的应用,论述了融合新技术的结构变异检测方法的特点与优势。
The basic methods and frontier technologies of genome structural variations detection were introduced in this paper. The principles and features of the 4 detection methods(Read-pair method, Read-depth method, Spiltread method and Sequence Assembly method) based on next generation sequencing technology were elaborated and the characteristics and development trend of the next generation sequencing technology on detecting structural variations were analyzed. Finally, some new technologies and their applications in detecting genome structural variations were introduced, including the third generation sequencing, linked-reads and optics physical maps. The features and advantages of the detection methods mixed with new technologies were discussed.
引文
Abyzov A.,and Gerstein M.,2011,AGE:defining breakpoints of genomic structural variants at single-nucleotide resolution,through optimal alignments with gap excision,Bioinformatics,27(5):595-603
Abyzov A.,Urban A.E.,Snyder M.,and Gerstein M.,2011,CN-Vnator:an approach to discover,genotype,and characterize typical and atypical CNVs from family and population genome sequencing,Genome Research,21(6):974-984
Alkan C.,Coe B.P.,and Eichler E.E.,2011,Genome structural variation discovery and genotyping,Nature Reviews Genetics,12(5):363-376
Campbell P.J.,Stephens P.J.,Pleasance E.D.,O'Meara S.,Li H.,Santarius T.,Stebbings L.A.,Leroy C.,Edkins S.,Hardy C.,Teague J.W.,Menzies A.,Goodhead I.,Turner D.J.,Clee C.M.,Quail M.A.,Cox A.,Brown C.,Durbin R.,Hurles M.E.,Edwards P.A.W.,Bignell G.R.,Stratton M.R.,and Futreal P.A.,2008,Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel pairedend sequencing,Nature Genetics,40(6):722-729
Chaisson M.J.,Wilson R.K.,and Eichler E.E.,2015,Genetic variation and the de novo assembly of human genomes,Nature Reviews Genetics,16(11):627-640
Check E.,2005,Human genome:patchwork people,Nature,437(7062):1084-1086
Chen K.,Wallis J.W.,McLellan M.D.,Larson D.E.,Kalicki J.M.,Pohl C.S.,McGrath S.D.,Wendl M.C.,Zhang Q.,Locke D.P.,Shi X.,Fulton R.S.,Ley T.J.,Wilson R.K.,Ding L.,and Mardis E.R.,2009,Break dancer:an algorithm for high-resolution mapping of genomic structural variation,Nature Methods,6(9):677-681
Cheng C.,Zhou Y.,Li H.,Xiong T.,Li S.,Bi Y.,Kong P.,Wang F.,Cui H.,Li Y.,Fang X.,Yan T.,Li Y.,Wang J.,Yang B.,Zhang L.,Jia Z.,Song B.,Hu X.,Yang J.,Qiu H.,Zhang G.,Liu J.,Xu E.,Shi R.,Zhang Y.,Liu H.,He C.,Zhao Z.,Qian Y.,Rong R.,Han Z.,Zhang Y.,Luo W.,Wang,J.,Peng S.,Yang X.,Li X.,Li L.,Fang H.,Liu X.,Ma L.,Chen Y.,Guo S.,Chen X.,Xi Y.,Li G.,Liang J.,Yang X.,Guo J.,Jia J.,Li Q.,Cheng X.,Zhan Q.,and Cui Y.,2016,Whole-genome sequencing reveals diverse models of structural variations in esophageal squamous cell carcinoma,American Journal of Human Genetics,98(2):256-274
Cooper G.M.,Nickerson D.A.,and Eichler E.E.,2007,Mutational and selective effects on copy-number variants in the human genome,Nature Genetics,39(7S):22-29
Falchi M.,El-Sayed Moustafa J.S.,Takousis P.,Pesce F.,Bonnefond A.,Andersson-Assarsson J.C.,Sudmant P.H.,Dorajoo R.,Al-Shafai M.N.,Bottolo L.,Ozdemir E.,So H.C.,Davies R.W.,Patrice A.,Dent R.,Mangino M.,Hysi P.G.,Dechaume A.,Huyvaert M.,Skinner J.,Pigeyre M.,Caiazzo R.,Raverdy V.,Vaillant E.,Field S.,Balkau B.,Marre M.,Visvikis-Siest S.,Weill J.,Poulain-Godefroy O.,Jacobson P.,Sjostrom L.,Hammond C.J.,Deloukas P.,Sham P.C.,McPherson R.,Lee J.,Tai E.S.,Sladek R.,Carlsson L.M.,Walley A.,Eichler E.E.,Pattou F.,Spector T.D.,and Froguel P.,2014,Low copy number of the salivary amylase gene predisposes to obesity,Nature Genetics,46(5):492-497
Feuk L.,Carson A.R.,and Scherer S.W.,2006,Structural variation in the human genome,Nature Reviews,Genetics,7(2):85-97
Genomes Project C.,Abecasis G.R.,Auton A.,Brooks L.D.,DePristo M.A.,Durbin R.M.,Handsaker R.E.,Kang H.M.,Marth G.T.,and McVean G.A.,2012,An integrated map of genetic variation from 1 092 human genomes,Nature,491(7422):56-65
George J.,Lim J.S.,Jang S.J.,Cun Y.,Ozretic L.,Kong G.,Leenders F.,Lu X.,Fernandez-Cuesta L.,Bosco G.,Muller C.,Dahmen I.,Jahchan N.S.,Park K.S.,Yang D.,Karnezis A.N.,Vaka D.,Torres A.,Wang M.S.,Korbel J.O.,Menon R.,Chun S.M.,Kim D.,Wilkerson M.,Hayes N.,Engelmann D.,Putzer B.,Bos M.,Michels S.,Vlasic I.,Seidel D.,Pinther B.,Schaub P.,Becker C.,Altmuller J.,Yokota J.,Kohno T.,I-wakawa R.,Tsuta K.,Noguchi M.,Muley T.,Hoffmann H.,Schnabel P.A.,Petersen I.,Chen Y.,Soltermann A.,Tischler V.,Choi C.M.,Kim Y.H.,Massion P.P.,Zou Y.,Jovanovic D.,Kontic M.,Wright G.M.,Russell P.A.,Solomon B.,Koch I.,Lindner M.,Muscarella L.A.,la Torre A.,Field J.K.,Jakopovic M.,Knezevic J.,Castanos-Velez E.,Roz L.,Pastorino U.,Brustugun O.T.,Lund-Iversen M.,Thunnissen E.,Kohler J.,Schuler M.,Botling J.,Sandelin M.,Sanchez-Cespedes M.,Salvesen H.B.,Achter V.,Lang U.,Bogus M.,Schneider P.M.,Zander T.,Ansen S.,Hallek M.,Wolf J.,Vingron M.,Yatabe Y.,Travis W.D.,Nurnberg P.,Reinhardt C.,Perner S.,Heukamp L.,Buttner R.,Haas S.A.,Brambilla E.,Peifer M.,Sage J.,and Thomas R.K.,2015,Comprehensive genomic profiles of small cell lung cancer,Nature,524(7563):47-53
Gonzalez E.,Kulkarni H.,Bolivar H.,Mangano A.,Sanchez R.,Catano G.,Nibbs R.J.,Freedma B.I.,Quinones M.P.,Bam shad M.J.,Murthy K.K.,Rovin B.H.,Bradley W.,Clark R.A.,Anderson S.A.,O'Connell R.J.,Agan B.K.,Ahuja S.S.,Bologna R.,Sen L.,Dolan M.J.,and Ahuja S.K.,2005,The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility,Science,307(5714):1434-1440
Handsaker R.E.,Korn J.M.,Nemesh J.,and Mc Carroll S.A.,2011,Discovery and genotyping of genome structural polymorphism by sequencing on a population scale,Nature Genetics,43(3):269-276
Huddleston J.,and Eichler E.E.,2016,An incomplete understanding of human genetic variation,Genetics,202(4):1251-1254
Hurles M.E.,Dermitzakis E.T.,and Tyler-Smith C.,2008,The functional impact of structural variation in humans,Trends in Genetics Tig,24(5):238-245
Iafrate A.J.,Feuk L.,Rivera M.N.,Listewnik M.L.,Donahoe P.K.,Qi Y.,Scherer S.W.,and Lee C.,2004,Detection of largescale variation in the human genome,Nature Genetics,36(9):949-951
Kitzman J.O.,2016,Haplotypes drop by drop,Nature Biotechnology,34(3):296-298
Koolen D.A.,Vissers L.E.,Pfundt R.,de Leeuw N.,Knight S.J.,Regan R.,Kooy R.F.,Reyniers E.,Romano C.,Fichera M.,Schinzel A.,Baumer A.,Anderlid B.M.,Schoumans J.,Knoers N.V.,van Kessel A.G.,Sistermans E.A.,Veltman J.A.,Brunner H.G.,and de Vries B.B.,2006,A new chromosome 17q21.31 microdeletion syndrome associated with a common inversion polymorphism,Nature Genetics,38(9):999-1001
Korbel J.O.,Urban A.E.,Affourtit J.P.,Godwin B.,Grubert F.,Simons J.F.,Kim P.M.,Palejev D.,Carriero N.J.,Du L.,Taillon B.E.,Chen Z.,Tanzer A.,Saunders A.C.,Chi J.,Yang F.,Carter N.P.,Hurles M.E.,Weissman S.M.,Harkins T.T.,Gerstein M.B.,Egholm M.,and Snyder M.,2007,Paired-end mapping reveals extensive structural variation in the human genome,Science,318(5849):420-426
Li R.,Zhu H.,Ruan J.,Qian W.,Fang X.,Shi Z.,Li Y.,Li S.,Shan G.,Kristiansen K.,Li S.,Yang H.,Wang J.,and Wang J.,2010,De novo assembly of human genomes with massively parallel short read sequencing,Genome Research,20(2):265-272
Lupski J.R.,2015,Structural variation mutagenesis of the human genome:Impact on disease and evolution,Environmental and Molecular Mutagenesis,56(5):419-436
Mostovoy Y.,Levy-Sakin M.,Lam J.,Lam E.T.,Hastie A.R.,Marks P.,Lee J.,Chu C.,Lin C.,Dzakula Z.,Cao H.,Schlebusch S.A.,Giorda K.,Schnall-Levin M.,Wall J.D.,and Kwok P.Y.,2016,A hybrid approach for de novo human genome sequence assembly and phasing,Nature Methods,13(7):587-590
Medvedev P.,Fiume M.,Dzamba M.,Smith T.,and Brudno M.,2010,Detecting copy number variation with mated short reads,Genome Research,20(11):1613-1622
Medvedev P.,Stanciu M.,and Brudno M.,2009,Computational methods for discovering structural variation with next-generation sequencing,Nature Methods,6(11 Suppl):S13-S20
Mills R.E.,Luttig C.T.,Larkins C.E.,Beauchamp A.,Tsui C.,Pittard W.S.,and Devine S.E.,2006,An initial map of insertion and deletion(INDEL)variation in the human genome,Genome Research,16(9):1182-1190
Mills R.E.,Walter K.,Stewart C.,Handsaker R.E.,Chen K.,Alkan C.,Abyzov A.,Yoon S.C.,Ye K.,Cheetham R.K.,Chinwalla A.,Conrad D.F.,Fu Y.,Grubert F.,Hajirasouliha I.,Hormozdiari F.,Iakoucheva L.M.,Iqbal Z.,Kang S.,Kidd J.M.,Konkel M.K.,Korn J.,Khurana E.,Kural D.,Lam H.Y.,Leng J.,Li R.,Li Y.,Lin C.Y.,Luo R.,Mu X.J.,Nemesh J.,Peckham H.E.,Rausch T.,Scally A.,Shi X.,Stromberg M.P.,Stutz A.M.,Urban A.E.,Walker J.A.,Wu J.,Zhang Y.,Zhang Z.D.,Batzer M.A.,Ding L.,Marth G.T.,McVean G.,Sebat J.,Snyder M.,Wang J.,Ye K.,Eichler E.E.,Gerstein M.B.,Hurles M.E.,Lee C.,McCarroll S.A.,Korbel J.O.,and Genomes P.,2011,Mapping copy number variation by population-scale genome sequencing,Nature,470(7332):59-65
Pendleton M.,Sebra R.,Pang A.W.,Ummat A.,Franzen O.,Rausch T.,Stutz A.M.,Stedman W.,Anantharaman T.,Hastie A.,Dai H.,Fritz M.H.,Cao H.,Cohain A.,Deikus G.,Durrett R.E.,Blanchard S.C.,Altman R.,Chin C.S.,Guo Y.,Paxinos E.E.,Korbel J.O.,Darnell R.B.,McCombie W.R.,Kwok P.Y.,Mason C.E.,Schadt E.E.,and Bashir A.,2015,Assembly and diploid architecture of an individual human genome via single-molecule technologies,Nature Methods,12(8):780-786
Pugh T.J.,Morozova O.,Attiyeh E.F.,Asgharzadeh S.,Wei J.S.,Auclair D.,Carter S.L.,Cibulskis K.,Hanna M.,Kiezun A.,Kim J.,Lawrence M.S.,Lichenstein L.,McKenna A.,Pedamallu C.S.,Ramos A.H.,Shefler E.,Sivachenko A.,Sougnez C.,Stewart C.,Ally A.,Birol I.,Chiu R.,Corbett R.D.,Hirst M.,Jackman S.D.,Kamoh B.,Khodabakshi A.H.,Krzywinski M.,Lo A.,Moore R.A.,Mungall K.L.,Qian J.,Tam A.,Thiessen N.,Zhao Y.,Cole K.A.,Diamond M.,Diskin S.J.,Mosse Y.P.,Wood A.C.,Ji L.,Sposto R.,Badgett T.,London W.B.,Moyer Y.,Gastier-Foster J.M.,Smith M.A.,Guidry Auvil J.M.,Gerhard D.S.,Hogarty M.D.,Jones S.J.,Lander E.S.,Gabriel S.B.,Getz G.,Seeger R.C.,Khan J.,Marra M.A.,Meyerson M.,and Maris J.M.,2013,The genetic landscape of high-risk neuroblastoma,Nature Genetics,45(3):279-284
Quinlan A.R.,Clark R.A.,Sokolova S.,Leibowitz M.L.,Zhang Y.,Hurles M.E.,Mell J.C.,and Hall I.M.,2010,Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome,Genome Research,20(5):623-635
Rhoads A.,and Au K.F.,2015,PacBio sequencing and its applications,Genomics,Proteomics and Bioinformatics,13(5):278-289
Ross M.G.,Russ C.,Costello M.,Hollinger A.,Lennon N.J.,Hegarty R.,Nusbaum C.,and Jaffe D.B.,2013,Characterizing and measuring bias in sequence data,Genome Biology,14(5):R51
Seo J.S.,Rhie A.,Kim J.,Lee S.,Sohn M.H.,Kim C.U.,Hastie A.,Cao H.,Yun J.Y.,Kim J.,Kuk J.,Park G.H.,Kim J.,Ryu H.,Kim J.,Roh M.,Baek J.,Hunkapiller M.W.,Korlach J.,Shin J.Y.,and Kim C.,2016,De novo assembly and phasing of a Korean human genome,Nature,538(7624):243-247
Simpson J.T.,Wong K.,Jackman S.D.,Schein J.E.,Jones S.J.,and Birol I.,2009,ABy SS:a parallel assembler for short read sequence data,Genome Research,19(6):1117-1123
Sperling K.,and Wiesner R.,1972,Rapid banding technique for routine use in human and comparative cytogenetics,Humangenetik,15(4):349
Sudmant P.H.,Rausch T.,Gardner E.J.,Handsaker R.E.,Abyzov A.,Huddleston J.,Zhang Y.,Ye K.,Jun G.,Fritz M.H.,Konkel M.K.,Malhotra A.,Stutz A.M.,Shi X.,Casale F.P.,Chen J.,Hormozdiari F.,Dayama G.,Chen K.,Malig M.,Chaisson M.J.P.,Walter K.,Meiers S.,Kashin S.,Garrison E.,Auton A.,Lam H.Y.K.,Mu X.J.,Alkan C.,Antaki D.,Bae T.,Cerveira E.,Chines P.,Chong Z.,Clarke L.,Dal E.,Ding L.,Emery S.,Fan X.,Gujral M.,Kahveci F.,Kidd J.M.,Kong Y.,Lameijer E.W.,McCarthy S.,Flicek P.,Gibbs R.A.,Marth G.,Mason C.E.,Menelaou A.,Muzny D.M.,Nelson B.J.,Noor A.,Parrish N.F.,Pendleton M.,Quitadamo A.,Raeder B.,Schadt E.E.,Romanovitch M.,Schlattl A.,Sebra R.,Shabalin A.A.,Untergasser A.,Walker J.A.,Wang M.,Yu F.,Zhang C.,Zhang J.,Zheng-Bradley X.,Zhou W.,Zichner T.,Sebat J.,Batzer M.A.,McCarroll S.A.,Genomes Project C.,Mills R.E.,Gerstein M.B.,Bashir A.,Stegle O.,Devine S.E.,Lee C.,Eichler E.E.,and Korbel J.O.,2015,An integrated map of structural variation in 2 504 human genomes,Nature,526(7571):75-81
Volik S.,Zhao S.,Chin K.,Brebner J.H.,Herndon D.R.,Tao Q.,Kowbel D.,Huang G.,Lapuk A.,Kuo W.L.,Magrane G.,De Jong P.,Gray J.W.,and Collins C.,2003,End-sequence profiling:sequence-based analysis of aberrant genomes,Proceedings of the National Academy of Sciences of the U-nited States of America,100(13):7696-7701
Wang J.,Yang Y.,Guo S.,Chen Y.,Yang C.,Ji H.,Song X.,Zhang F.,Jiang Z.,Ma Y.,Li Y.,Du A.,Jin L.,Reveille J.D.,Zou H.,and Zhou X.,2013,Association between copy number variations of HLA-DQA1 and ankylosing spondylitis in the Chinese Han population,Genes and Immunity,14(8):500-503
Weiss L.A.,Shen Y.P.,Korn J.M.,Arking D.E.,Miller D.T.,Fossdal R.,Saemundsen E.,Stefansson H.,Ferreira M.A.R.,Green T.,Platt O.S.,Ruderfer D.M.,Walsh C.A.,Altshuler D.,Chakravarti A.,Tanzi R.E.,Stefansson K.,Santangelo S.L.,Gusella J.F.,Sklar P.,Wu B.,Daly M.J.,and Consortium A.,2008,Association between microdeletion and microduplication at 16p11.2 and autism,New England Journal of Medicine,358(7):667-675
Yang T.L.,Chen X.D.,Guo Y.,Lei S.F.,Wang J.T.,Zhou Q.,Pan F.,Chen Y.,Zhang Z.X.,Dong S.S.,Xu X.H.,Yan H.,Liu X.,Qiu C.,Zhu X.Z.,Chen T.,Li M.,Zhang H.,Zhang L.,Drees B.M.,Hamilton J.J.,Papasian C.J.,Recker R.R.,Song X.P.,Cheng J.,and Deng H.W.,2008,Genome-wide copy-number-variation study identified a susceptibility gene,UGT2B17,for osteoporosis,American Journal of Human Genetics,83(6):663-674
Yang Y.,Chung E.K.,Wu Y.L.,Savelli S.L.,Nagaraja H.N.,Zhou B.,Hebert M.,Jones K.N.,Shu Y.L.,Kitzmiller K.,Blanchong C.A.,McBride K.L.,Higgins G.C.,Rennebohm R.M.,Rice R.R.,Hackshaw K.V.,Roubey R.A.S.,Grossman J.M.,Tsao B.P.,Birmingham D.J.,Rovin B.H.,Hebert L.A.,and Yu C.Y.,2007,Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus(SLE):Low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans,American Journal of Human Genetics,80(6):1037-1054
Ye K.,Schulz M.H.,Long Q.,Apweiler R.,and Ning Z.,2009,Pindel:a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads,Bioinformatics,25(21):2865-2871
Yoon S.,Xuan Z.,Makarov V.,Ye K.,and Sebat J.,2009,Sensitive and accurate detection of copy number variants using read depth of coverage,Genome Research,19(9):1586-1592
Zerbino D.R.,and Birney E.,2008,Velvet:algorithms for de novo short read assembly using de Bruijn graphs,Genome Research,18(5):821-829