基于第三代测序技术的基因组组装方法及其在烟草中的应用
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Genome assembly based on the third-generation sequencing technology and its application in tobacco
  • 作者:卢鹏 ; 金静静 ; 李泽锋 ; 曹培健 ; 范楷 ; 许亚龙
  • 英文作者:LU Peng;JIN Jingjing;LI Zefeng;CAO Peijian;FAN Kai;XU Yalong;Zhengzhou Tobacco Research Institute of CNTC;
  • 关键词:第三代测序 ; 基因组 ; 组装 ; 烟草
  • 英文关键词:Third-generation sequencing;;Genome;;Assembly;;Tobacco
  • 中文刊名:YCKJ
  • 英文刊名:Tobacco Science & Technology
  • 机构:中国烟草总公司郑州烟草研究院;
  • 出版日期:2018-02-15
  • 出版单位:烟草科技
  • 年:2018
  • 期:v.51;No.371
  • 基金:中国烟草总公司郑州烟草研究院院长科技发展基金项目“利用三代测序数据填补基因组缺失序列的方法研究”(902016CA0170)
  • 语种:中文;
  • 页:YCKJ201802013
  • 页数:8
  • CN:02
  • ISSN:41-1137/TS
  • 分类号:93-100
摘要
第三代测序技术凭借着片段读长更长的优势在基因组研究中得到广泛应用。为此,回顾了测序技术的发展,总结了三代测序技术的优缺点,重点对第三代测序技术的基因组组装方法,包括数据预处理、序列组装、组装之后的序列修补和序列的染色体定位方法进行了追踪和统计,同时介绍了测序技术和基因组组装方法在烟草基因组研究中的应用。
        The third-generation sequencing technology has been widely used in genome research due to its advantage of long-fragment reads. The development of the sequencing technology was reviewed and the advantages and disadvantages of three different generations of sequencing technologies were summarized. Based on the third-generation sequencing, the methods of genome assembly including data pre-processing, sequence assembly, sequence repairing after assembly and chromosome localization methods were tracked and statistically analyzed. Application of sequencing technology and genome assembly in tobacco genome research was introduced.
引文
[1]Wilkins M H F,Stokes A R,Wilson H R.Molecular structure of nucleic acids:molecular structure of deoxypentose nucleic acids[J].Nature,1953,171(4356):738-740.
    [2]Whitfeld P R.A method for the determination of nucleotide sequence in polyribonucleotides[J].Biochemical Journal,1954,58(3):390-396.
    [3]Sanger F,Nicklen S,Coulson A R.DNA sequencing with chain-terminating inhibitors[J].Proceedings of the National Academy of Sciences of the United States of America,1977,74(12):5463-5467.
    [4]Maxam A M,Gilbert W.A new method for sequencing DNA[J].Proceedings of the National Academy of Sciences of the United States of America,1977,74(2):560-564.
    [5]刘振波.DNA测序技术比较[J].生物学通报,2012,47(7):14-17.LIU Zhenbo.Comparison of DNA sequencing techniques[J].Bulletin of Biology,2012,47(7):14-17.
    [6]Sedat J,Ziff E,Galibert F.Direct determination of DNA nucleotide sequences.Structure of large specific fragments of bacteriophageφX174 DNA[J].Journal of Molecular Biology,1976,107(4):391-416.
    [7]Fleischmann R D,Adams M D,White O,et al.Wholegenome random sequencing and assembly of Haemophilus influenzae Rd[J].Science,1995,269(5223):496-512.
    [8]The Arabidopsis Genome Initiative.Analysis of the genome sequence of the flowering plant Arabidopsis thaliana[J].Nature,2000,408(6814):796-815.
    [9]高阳,薛大伟,钱前,等.二代测序技术在水稻基因组学和转录组学研究中的应用[J].中国水稻科学,2015,29(2):208-214.GAO Yang,XUE Dawei,QIAN Qian,et al.Application of the second generation sequencing technology in rice genomics and transcriptomics[J].Chinese Journal of Rice Science,2015,29(2):208-214.
    [10]The Chimpanzee Sequencing and Analysis Consortium.Initial sequence of the chimpanzee genome and comparison with the human genome[J].Nature,2005,437(7055):69-87.
    [11]Margulies M,Egholm M,Altman W E,et al.Genome sequencing in open microfabricated high density picoliter reactors[J].Nature,2005,437(7057):376-380.
    [12]Fedurco M,Romieu A,Williams S,et al.BTA,a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies[J].Nucleic Acids Research,2006,34(3):e22.
    [13]Shendure J,Porreca G J,Reppas N B,et al.Accurate multiplex polony sequencing of an evolved bacterial genome[J].Science,2005,309(5741):1728-1732.
    [14]张丁予,章婷曦,王国祥.第二代测序技术的发展及应用[J].环境科学与技术,2016,39(9):96-102.ZHANG Dingyu,ZHANG Tingxi,WANG Guoxiang.Development and application of second-generation sequencing technology[J].Environmental Science&Technology,2016,39(9):96-102.
    [15]Eid J,Fehr A,Gray J,et al.Real-time DNA sequencing from single polymerase molecules[J].Science,2009,323(5910):133-138.
    [16]Pacific Biosciences of California.Inc.USA on world wide web URL[EB/OL].[2017-05-20].http://www.pacb.com/science/smrt-sequencing/.
    [17]Clarke J,Wu H C,Jayasinghe L,et al.Continuous base identification for single-molecule nanopore DNA sequencing[J].Nature Nanotechnology,2009,4(4):265-270.
    [18]Rhoads A,Au K F.Pac Bio sequencing and its applications[J].Genomics,Proteomics&Bioinformatics,2015,13(5):278-289.
    [19]Rasko D A,Webster D R,Sahl J W,et al.Origins of the E.coli strain causing an outbreak of hemolyticuremic syndrome in Germany[J].The New England Journal of Medicine,2011,365(8):709-717.
    [20]Liao Y C,Lin S H,Lin H H.Completing bacterial genome assemblies:strategy and performance comparisons[J].Scientific Reports,2015,5:8747.
    [21]Kim K E,Peluso P,Babayan P,et al.Long-read,whole-genome shotgun sequence data for five model organisms[J].Scientific Data,2014,1:140045.
    [22]Jarvis D E,Ho Y S,Lightfoot D J,et al.The genome of Chenopodium quinoa[J].Nature,2017,542(7641):307-312.
    [23]张得芳,马秋月,尹佟明,等.第三代测序技术及其应用[J].中国生物工程杂志,2013,33(5):125-131.ZHANG Defang,MA Qiuyue,YIN Tongming,et al.The third generation sequencing technology and its application[J].China Biotechnology,2013,33(5):124-131.
    [24]曹晨霞,韩琬,张和平.第三代测序技术在微生物研究中的应用[J].微生物学通报,2016,43(10):2269-2276.CAO Chenxia,HAN Wan,ZHANG Heping.Application of third generation sequencing technology to microbial research[J].Microbiology China,2016,43(10):2269-2276.
    [25]陈文辉,罗军,赵超.固态纳米孔:下一代DNA测序技术—原理、工艺与挑战[J].中国科学:生命科学,2014,44(7):649-662.CHEN Wenhui,LUO Jun,ZHAO Chao.Solid-state nanopore:the next-generation sequencing technology—principles,fabrication and challenges[J].SCIENTIA SINICA Vitae,2014,44(7):649-662.
    [26]Ye C X,Hill C M,Wu S G,et al.DBG2OLC:efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies[J].Scientific Reports,2016,6:31900.
    [27]Koren S,Schatz M C,Walenz B P,et al.Hybrid error correction and de novo assembly of single-molecule sequencing reads[J].Nature Biotechnology,2012,30(7):693-700.
    [28]Au K F,Underwood J G,Lee L,et al.Improving Pac Bio long read accuracy by short read alignment[J].PLo S One,2012,7(10):e46679.
    [29]Miclotte G,Heydari M,Demeester P,et al.Jabba:hybrid error correction for long sequencing reads[J].Algorithms for Molecular Biology,2016,11:10.
    [30]Koren S,Walenz B P,Berlin K,et al.Canu:scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation[J].Genome Research,2017,27(5):722-736.
    [31]Hackl T,Hedrich R,Schultz J,et al.Proovread:large-scale high-accuracy Pac Bio correction through iterative short read consensus[J].Bioinformatics,2014,30(21):3004-3011.
    [32]Salmela L,Rivals E.Lo RDEC:accurate and efficient long read error correction[J].Bioinformatics,2014,30(24):3506-3514.
    [33]Seung C S,Do H A,Su J K,et al.Advantages of single-molecule real-time sequencing in high-GC content genomes[J].PLo S One,2013:8(7):e68824.
    [34]Miller J R,Koren S,Sutton G.Assembly algorithms for next-generation sequencing data[J].Genomics,2010,95(6):315-327.
    [35]Zerbino D R,Birney E.Velvet:algorithms for de novo short read assembly using de Bruijn graphs[J].Genome Research,2008,18(5):821-829.
    [36]Li R Q,Zhu H M,Ruan J,et al.De novo assembly of human genomes with massively parallel short read sequencing[J].Genome Research,2010,20(2):265-272.
    [37]Simpson J T,Wong K,Jackman S D,et al.Aby SS:a parallel assembler for short read sequence data[J].Genome Research,2009,19(6):1117-1123.
    [38]Bashir A,Klammer A,Robins W P,et al.A hybrid approach for the automated finishing of bacterial genomes[J].Nature Biotechnology,2012,30(7):701-707.
    [39]Boetzer M,Pirovano W.SSPACE-Long Read:scaffolding bacterial draft genomes using long read sequence information[J].BMC Bioinformatics,2014,15:211.
    [40]Ribeiro F J,Przybylski D,Yin S Y,et al.Finished bacterial genomes from shotgun sequence data[J].Genome Research,2012,22(11):2270-2277.
    [41]Walker B J,Abeel T,Shea T,et al.Pilon:an integrated tool for comprehensive microbial variant detection and genome assembly improvement[J].PLo S One,2014,9(11):e112963.
    [42]Jeong-Sun S,Arang R,Junsoo K,et al.De novo assembly and phasing of a Korean human genome[J].Nature,2016,538(7624):243-247.
    [43]柳延虎,王璐,于黎.单分子实时测序技术的原理与应用[J].遗传,2015,37(3):259-268.LIU Yanhu,WANG Lu,YU Li.The principle and application of the single-molecule real-time sequencing technology[J].Hereditas(Beijing),2015,37(3):259-268.
    [44]尤晓颜,张彬,郑华军,等.微生物完整基因组测定中的Gap closure策略[J].微生物学通报,2014,41(5):924-933.YOU Xiaoyan,ZHANG Bin,ZHENG Huajun,et al.Strategies of Gap closure in complete microbial genome sequencing[J].Microbiology China,2014,41(5):924-933.
    [45]黄勇,范航,张志毅,等.微生物全基因组测序组装中的gap填补方法[J].生物技术通讯,2013,24(6):819-821.HUANG Yong,FAN Hang,ZHANG Zhiyi,et al.Method of gap filling in microbial sequencing whole genome assembly[J].Letters in Biotechnology,2013,24(6):819-821.
    [46]Kosugi S,Hirakawa H,Tabata S.GMcloser:closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments[J].Bioinformatics,2015,31(23):3733-3741.
    [47]Kusuma W A,Ishida T,Akiyama U.A combined approach for de novo DNA sequence assembly of very short reads[J].IPSJ Transactions on Bioinformatics,2011,4:21-33.
    [48]Di Guistini S,Liao N Y,Platt D,et al.De novo genome sequence assembly of a filamentous fungus using Sanger,454 and Illumina sequence data[J].Genome Biology,2009,10(9):R94.
    [49]Assefa S,Keane T M,Otto T D,et al.ABACAS:algorithm-based automatic contiguation of assembled sequences[J].Bioinformatics,2009,25(15):1968-1969.
    [50]The open source code is available at[EB/OL].[2017-05-20].https://sourceforge.net/projects/wgsassembler/.
    [51]Jiao W B,Accinelli G G,Hartwig B,et al.Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data[J].Genome Research,2017,27(5):778-786.
    [52]杨强.我国羊巴贝斯虫分子流行病学调查及基因组物理图谱绘制[D].北京:中国农业科学院,2016.YANG Qiang.Molecular epidemiological investigation and genomic physical mapping of ovine Babesia spp.in China[D].Beijing:Chinese Academy of Agricultural Sciences,2016.
    [53]王艳.白菜参考遗传图谱的构建[D].北京:中国农业科学研究院,2011.WANG Yan.A sequence-based reference genetic linkage map of Brassica rapa[D].Beijing:Chinese Academy of Agricultural Sciences,2011.
    [54]陶婧芬,谢婷,郑觉非,等.基于染色质交互数据的基因组组装方法[J].生物技术通报,2015,31(11):43-50.TAO Jingfen,XIE Ting,ZHENG Juefei,et al.Genome assembly based on chromatin interaction[J].Biotechnology Bulletin,2015,31(11):43-50.
    [55]任学良,徐海明,崔海瑞,等.烟草种质资源极其创新技术研究[M].北京:科学出版社,2010:307-308.
    [56]Bombarely A,Rosli H G,Vrebalov J,et al.A draft genome sequence of Nicotiana benthamiana to enhance molecular plant-microbe biology research[J].Molecular Plant-microbe Interactions,2012,25(12):1523-1530.
    [57]国家烟草专卖局.中国烟草基因组计划取得重大突破绒毛状烟草和林烟草全基因组序列图谱完成[EB/OL].(2011-12-13)[2017-05-20].http://www.tobacco.gov.cn/html/30/3004/3893491_n.html.
    [58]刘贯山,龚达平,李凤霞.烟草基因组学的发展现状与趋势[C]//2012—2013年烟草科学与技术学科发展研究报告.北京:中国烟草学会,2013.
    [59]Xu S Q,Brockm?ller T,Navarro-Quezada A,et al.Wild tobacco genomes reveal the evolution of nicotine biosynthesis[J].Proceedings of the National Academy of Sciences of the United States of America,2017,114(23):6133-6138.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700