宏基因组研究的生物信息学平台现状
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A review on the bioinformatics pipelines for metagenomic research
  • 作者:叶丹丹 ; 樊萌萌 ; 关琼 ; 陈红菊 ; 马占山
  • 英文作者:YE Dan-Dan1, FAN Meng-Meng1, GUAN Qiong1, CHEN Hong-Ju1,2, MA Zhan-Shan1,* (1.Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China; 2. College of Mathematics, Honghe University, Mengzi Yunnan 661100, China)
  • 关键词:宏基因组学 ; 扩增子测序 ; 全基因组测序 ; 生物信息学平台 ; 高通量测序技术
  • 英文关键词:Metagenomics; Amplicon sequencing; Whole genome sequencing; Bioinformatics pipeline; High throughput sequencing
  • 中文刊名:DWXY
  • 英文刊名:Zoological Research
  • 机构:中国科学院昆明动物研究所 遗传资源国家重点实验室 计算生物学与医学生态学研究组;红河学院数学学院;
  • 出版日期:2012-12-17 15:46
  • 出版单位:动物学研究
  • 年:2012
  • 期:v.33
  • 基金:中国国家自然科学基金(批准号:61175071);; 云南省高端科技人才项目;; 云南省海外高层次人才项目;; 中国科学院“计算进化-自然进化”协同研究云南省创新团队
  • 语种:中文;
  • 页:DWXY201206007
  • 页数:12
  • CN:06
  • ISSN:53-1040/Q
  • 分类号:30-41
摘要
由Handelsman et al(1998)提出的宏基因组(metagenome)泛指特定环境样品(例如:人类和动物的肠道、母乳、土壤、湖泊、冰川和海洋等环境)中微生物群落所有物种的基因组。宏基因组技术起源于环境微生物学研究,而新一代高通量测序技术使其广泛应用成为可能。与基因组学研究相类似,目前宏基因组学发展的瓶颈在于如何高效分析高通量测序产生的海量数据,因此,相关的生物信息学分析方法和平台是宏基因组学研究的关键。该文介绍了目前宏基因组研究领域中主要的生物信息学软件及工具;鉴于目前宏基因组研究所采用的"全基因组测序"(whole genome sequencing)和"扩增子测序"(amplicon sequencing)两大测序方法所获得的数据和相应分析方法有较大差异,文中分别对相应软件平台进行了介绍。
        Metagenome, a term first dubbed by Handelsman in 1998 as "the genomes of the total microbiota found in nature", refers to sequence data directly sampled from the environment (which may be any habitat in which microbes live, such as the guts of humans and animals, milk, soil, lakes, glaciers, and oceans). Metagenomic technologies originated from environmental microbiology studies and their wide application has been greatly facilitated by next-generation high throughput sequencing technologies. Like genomics studies, the bottle neck of metagenomic research is how to effectively and efficiently analyze the gigantic amount of metagenomic sequence data using the bioinformatics pipelines to obtain meaningful biological insights. In this article, we briefly review the state-of-the-art bioinformatics software tools in metagenomic research. Due to the differences between the metagenomic data obtained from whole genome sequencing (i.e., shotgun metagenomics) and amplicon sequencing (i.e., 16S-rRNA and gene-targeted metagenomics) methods, there are significant differences between the corresponding bioinformatics tools for these data; accordingly, we review the computational pipelines separately for these two types of data.
引文
Ashburner M,Ball CA,Blake JA,Botstein D,Butler H,Cherry JM,DavisAP,Dolinski K,Dwight SS,Eppig JT,Harris MA,Hill DP,Issel-TarverL,Kasarskis A,Lewis S,Matese JC,Richardson JE,Ringwald M,Rubin GM,Sherlock G.2000.Gene ontology:tool for the unificationof biology[J].Nat Genet,25(1):25-29.
    Batzoglou S,Jaffe DB,Stanley K,Butler J,Gnerre S,Mauceli E,Berger B,Mesirov JP,Lander ES.2002.ARACHNE:a whole-genome shotgunassembler[J].Genome Res,12(1):177-189.
    Borodovsky M,McIninch J.1993.GeneMark:parallel gene recognition forboth DNA strands[J].Computers&Chemistry,17(19):123-133.
    Burge C,Karlin,S.1997.Prediction of complete gene structures in humangenomic DNA[J].J Molecular Biol,268(1):78-94.
    Caporaso JK,Kuczynski J,Stombaugh J,Bittinger K,Bushman FD,Costello EK,Fierer N,Pena AG,Goodrich J,Gordon JI,Gavin AHuttley,Kelley ST,Dan,EKoenig J,Ley RE,Lozupone CA,McDonald D,Muegge BD,Meg,Pirrung,Reeder J,Sevinsky JR,Turnbaugh PJ,Walters WA,Widmann J,Yatsunenko T,Zaneveld J,and Knight R.2012.QIIME allows analysis of high-throughputcommunity sequencing data[J].Nat Methods,7(5):335–336.
    Cock PJ,Fields CJ,Goto N,Heuer M L,Rice P M.2010.The SangerFASTQ file format for sequences with quality scores,and theSolexa/Illumina FASTQ variants[J].Nucleic Acids Res,38(6):1767-1771.
    Cole JR,Wang Q,Cardenas E,Fish J,Chai B,Farris RJ,Kulam-Syed-Mohideen AS,McGarrell DM,Marsh T,Garrity GM,Tiedje JM.2009.The Ribosomal Database Project:improvedalignments and new tools for rRNA analysis[J].Nucleic Acids Res,37:141–145.
    Conesa A,Gotz S,Garcia-Gomez JM,Terol J,Talon M,Robles M.2005.Blast2GO:a universal tool for annotation,visualization and analysisin functional genomics research[J].Bioinfomatics,21:3674-3676.
    Delcher AL,HarmonD,Kasif S,White O,Salzberg SL.2001.Improvedmicrobial gene identification with GLIMMER[J].Nucleic Acids Res,27:4636-4641.
    Dennis G Jr,Sherman BT,Hosack DA,Yang J,Gao W,Lane HC,LempickiRA.2003.DAVID:Database for Annotation,Visualization,andIntegrated Discovery[J].Genome Biol,4(5):3.
    DeSantis TZ,Hugenholtz P,Larsen N,Rojas M,Brodie EL,Keller K,Huber T,Dalevi D,Hu P and Andersen GL.2006.Greengenes,aChimera-Checked16S rRNA Gene Database and WorkbenchCompatible with ARB[J].Appl Environ Microbiol,72(7):5069-5072.
    Frigaard NU,Martinez A,Mincer TJ,DeLong EF.2006.Proteorhodopsinlateral gene transfer between marine planktonic Bacteria andArchaea[J].Nature,439(7078):847–850.
    Handelsman J,Rondon MR,Brady SF,Clardy J,Goodman RM.1998.Molecular biological access to the chemistry of unknown soil microbes:a new frontier for natural products[J].Chem Biol,5(10):245-249.
    He JZ,Zhang LM,Sen JM,Zhu YG.2008.Advances and perspectives ofmetagenomics[J].Aata Scientiae Circumstantiae,25(2):231-234.[贺纪正,张丽梅,沈菊培,朱永官.2008.宏基因组学(Metagenomics)的研究现状和发展趋势.环境科学学报,25(2):231-234.]
    Huang X,Yang SP.2005.Generating a genome assembly with PCAP[M]//Current Protocols in Bioinformatics.New York:John Wiley&Sons.
    Miller JR,Koren S,Sutton G.2010.Assembly algorithms fornext-generation sequencing data[J].Genomics,95(6):315-327.
    Kanehisa M,Goto S.2000.KEGG:kyoto encyclopedia of genes andgenomes[J].Nucleic Acids Res,28(1):27-30.
    Korf I,Flicek P,Duan D,Brent MR.2001.Integrating genomic homologyinto gene structure prediction[J].Bioinformatics,17(1):140-148.
    Krogh A,Mian IS,Haussler D.1994.A hidden Markov model that findsgenes in E.coli DNA[J].Nucleic Acids Res,22(22):4768-4778.
    Li H,HeJJ,Zhang Y,Xu H,Chen GX.2008a.Application of metagenomictechnique in the exploring of uncultured environmental microbial generesource[J].Acta Ecol Sin,28(4):1762-1762,1773.[李慧,何晶晶,张颖,徐慧,陈冠雄.2008.宏基因组技术在开发未培养环境微生物基因资源中的应用.生态学报,28(4):1762-1762,1773.]
    Li R,Li Y,Kristiansen K,Wang J.2008b.SOAP:short oligonucleotidealignment program[J].Bioinformatics,24(5):713-714.
    Ludwig W,Strunk O,Westram R,Richter L,Meier H,Yadhukumar,Buchner A,Lai T,Steppi S,Jobb G,F rster W,Brettske I,Gerber S,Ginhart AW,Gross O,Grumann S,Hermann S,Jost R,K nig A,LissT,Lümann R,May M,Nonhoff B,Reichel R,Strehlow R,StamatakisA,Stuckmann S,Vilbig A,Lenke M,Ludwig T,Bode A,Schleifer KH.2004.ARB:a software environment for sequence data[J].NucleicAcids Res,32(4):1363-1371.
    Lukashin AV,Borodovsky M.1998.Genemark.hmm:new solutions forgene finding[J].Nucleic Acids Res,26(4):1107-1115.
    Ma,Z S.2012.A note on extending Taylor’s power law for characterizinghuman microbial communities:Inspiration from comparative studieson the distribution patterns of insects and galaxies,and as a case studyfor medical ecology.[Online]Available:arXiv.org/abs/1205.3504(2012/5/15).
    Ma Z S,Geng JW,Abdo Z,Forney LJ.2012.A Bird’s Eye View ofMicrobial Community Dynamics//Microbial Ecology Theory:CurrentPerspectives.Norwich,UK:Horizon Scientific Press:57-70.
    Maidak BL,Olsen GJ,Larsen1N,Overbeek R,McCaughey,Woese CR.1997.The RDP(Ribosomal Database Project).Nucleic Acids Res,25(1):109–110.
    Melsted P,Pritchard JK.2011.Efficient counting of K-mers in DNAsequence using a bloom filter[J].BMC Bioinformaitcs,12(1):333.
    Myers EW,Sutton GG,Delcher AL,Dew IM,Fasulo DP,Flanigan MJ,Kravitz SA,Mobarry CM,Reinert KH,Remington KA,Anson EL,Bolanos RA,Chou HH,Jordan CM,Halpern AL,Lonardi S,BeasleyEM,Brandon RC,Chen L,Dunn PJ,Lai Z,Liang Y,Nusskern DR,Zhan M,Zhang Q,Zheng X,Rubin GM,Adams MD,Venter JC.2000.A whole-genome assembly of Drosophila.Science,287(5461):2196-2204.
    Pevzner PA,Tang HX,Waterman MS.2001.An Eulerian path approach toDNA fragment assembly[J].Proc Natl Acad Sci USA,98(17):9748-9753.
    Pruesse E,Quast C,Knittel K,Fuchs BM,Ludwig W,Peplies J,GlocknerFO.2007.SILVA:a comprehensive online resource for qualitychecked and aligned ribosomal RNA sequence data compatible withARB[J].Nucleic Acids Res,35(21):7188-7196.
    Salamov AA,Solovyev VV.2000.Ab initio gene finding in Drosophilagenomic DNA[J].Genome Res,10(4):516-522.
    Schloss PD,Westcott SL,Ryabin T,Hall J R,Hartmann M,Hollister EB,Lesniewski RA,Oakley BB,Parks DH,Robinson CJ,Sahl JW,Stres B,Thallinger GG,Van Horn DJ,Weber CF.2009.Introducing mothur:open-source,platform independent,community-supported software fordescribing and comparing microbial communities[J].Appl EnvironMicrobiol,75(23):7537-7541.
    Simpson JT,Wong K,Jackman SD,Schein JE,Jones SJ,Birol I.2009.ABySS:A parallel assembler for short read sequence data[J].GenomeRes,19(6):1117-1123.
    Sun HX,Wang XJ.2009.The development and future perspectives of DNAsequencing technology[J].e-Science,2(3):19-29.[孙海汐,王秀杰.2009.DNA测序技术发展及其展望.e-Science技术,2(3):19-29.]
    Sun RY,Li QF,Niu CJ,Lou AR.2002.Basic Ecology[M].Beijing:HigherEducation Press:112-144.[孙儒泳,李庆芬,牛翠娟,娄安如.2002.基础生态学.北京:高等教育出版社:112-114.]
    Wang X.2011.The Generation Algorithm Based on de Brujin GraphDNAContig[D].Harbin:Harbin Institute of Technology.[王旭.2011.基于de Brujin图的DNAContig生成算法.哈尔滨:哈尔滨工业大学.]
    Warren RL,Sutton GG,Jones SJ,Holt RA.2007.Assembling millions ofshort DNA sequences using SSAKE[J].Bioinformatics,23(4):500-501.
    Wu QF.2003.An introduction of several programs used in genomicanalysis[J].Hereditas,25(6):708-712.[吴清发.2003.基因组学研究中一些常用软件的概述.遗传,25(6):708-712.]
    Ye CX,Ma ZS,Cannon CH,Pop M,Yu DW.2011a.SparseAssembler:denovo Assembly with the Sparse de Bruijn Graph.[Online]Available:arXiv.org/abs/1106.2603(2011/6/14).
    Ye CX,Cannon CH,Ma ZS,Yu DW,Pop M.2011b.SparseAssembler2:Sparse k-mer Graph for Memory Efficient Genome Assembly.[Online]Available:arXiv.org/abs/1108.3556(2011/8/17).
    Ye CX,Ma ZS,Cannon CH,Pop M,Yu DW.2012.Exploiting sparseness inde novo genome assembly[J].BMC Bioinformatics,13(S1):S1.
    Ye J,Fang L,Zheng HK,Zhang Y,Chen J,Zhang ZJ,Wang J,Li ST,Li RQ,Bolund L,Wang J.2006.WEGO:a web tool for plotting GOannotations[J].Nucleic Acids Res,34(S2):293-297.
    Zdobnov EM,Apweiler R.2001.InterProScan-an intergration platform forthe signature-recognition methods in InterPro[J].Bioinformatics,17(9):847-848.
    Zerbino DR,Birney E.2008.Velvet:algorithms for de novo short Readassembly using de Bruijn graphs[J].Genome Res,18(5):821-829.
    Zhang H,Cui HZ.2010.Metagenomics and its research progress[J].ChinaAnimal H usbandry&Veterinary Medicine,37(3):87-90.[张辉,崔焕忠.2010.宏基因组学及其研究进展,中国畜牧兽医,37(3):87-90.]
    Zhang EM,Hai R,Yu DZ.2009.Research progress of gene predictionmethods[J].Chin J Vector Bio&Control,20(3):271-273.[张恩民,海荣,俞东征.2009.基因预测方法的研究进展.中国媒介生物学及控制杂志,20(3):271-273.]
    Zhou DQ.1993.Laboratory Experiments in Microbiology[M].Beijing:Higher Education Press,396-398.[周德庆.1993.微生物学教程.北京:高等教育出版社,396-398.]
    Zhu HQ,Hu GQ,Yang YF,Wang J,She ZS.2007.MED:a newnon-supervised gene prediction algorithm for bacterial and archaealgenomes.BMC Bioinformatics,8(1):97.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700