摘要
在系统发育学中,构建系统进化树的算法是研究进化关系的基础.本文探讨并验证了基于k-mer的相异度算法d2S结合邻接法NJ构建系统进化树的可行性,对两种进化情景下的12个16S rRNA和36个噬菌体分子序列构建了系统进化树(其中k=4,6,8,10),3类16S rRNA很清晰的按照古菌、细菌和真核生物分类开来,效果和Woese目录基本一致;同时可把4种类型的噬菌体按原来的进化关系区分开,而且受参数k影响小,分类效果非常好.这展现了相异度算法d2S结合邻接法NJ强大的功能.使用相异度算法d2S结合邻接法NJ构建系统进化树,将为系统进化关系提供新的算法和思路.
In phylogenetic,the algorithm of constructing evolutionary tree is the basis of studying evolutionary relationship.The feasibility of constructing evolutionary tree based on k-mer dissimilarity algorithm d2 S and adjacency method NJ was discussed and verified. The evolutionary tree was constructed for 12 16 S rRNA and 36 bacteriophage molecule sequences in two evolutionary scenarios(k=4,6,8,10).Three types of 16 S rRNA were clearly classified according to archaea,bacteria and eukaryotes,and the results are basically consistent with Woese cata-logue.At the same time,four types of bacteriophages can be distinguished according to the original evolutionary relationship,and the influence of parameter k is small,so the classification effect is very good.This demonstrates the powerful function of the dissimilarity algorithm d2 S.Using the dissimilarity algorithm d2 S and the adjacency method(NJ) to construct the evolutionary tree will provide a new algorithm and idea for the phylogenetic relationship of the system.
引文
[1] DELSUC F,AI E.Phylogenomics and the reconstruction of the tree of life [J].Nature Reviews Genetics,2005,6(5):361- 375.
[2] PEDERSENA A G,BALDIN P,CHAUVIN Y,et al.The biology of eukaryotic promoter prediction a review [J].Computers Chemistry,1999,23(3):191- 207.
[3] WOESE C R,FOX G E.Phylogenetic structure of the prokaryotic domain:the primary kingdoms [J].Proceedings of the National Academy of Sciences of the Unite States of America,1977,74(11):5088- 5090.
[4] WEISBURG W G,BARNS S M,PELLETIER D A,et al.16s ribosomal DNA amplification for phylogenetic study [J].Journal of Bacteriology,1991,173(2):697- 703.
[5] FENG D F,DOOLITTLE R F.Progressive sequence alignment as a prerequisite to correct phylogenetic trees [J].Journal of Molecular Evolution,1987,25(4):351- 360.
[6] PRIDE D T,MEINERSMANN R J,WASSENAAR T M,et al.Evolutionary implications of microbial genome tetranucleotide frequency biases [J].Genome Research,2003,13(2):145- 158.
[7] MILLER R T.A comprehensive approach to clustering of expressed human gene sequence:the sequence tag alignment and consensus knowledge base [J].Genome Research,1999,9(11):1143- 1155.
[8] SAITOU N.The neighbor-jioning method:A new method for reconstructing phylogenetic trees [J].Molecular Bio-logy and Evolution,1987,4(4):406- 425.
[9] FAN H,IVES A R,SURGET-GROBA Y,et al.An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data [J].BMC Genomics,2015,16(1):522.
[10] CHAN C X,BERNARD G,POIRION O,et al.Inferring phylogenies of evolving sequences without multiple sequence alignment [J].Scientific Reports,2014,4(39):6504.
[11] CATTANEO G,PETRILLO U F,GIANCARLO R,et al.An effective extension of the applicability of alignment-free biological sequence comparison algorithms with hadoop [J].The Journal of Supercomputing,2017,73(4):1467- 1483.
[12] BERNARD G,CHAN C X,CHAN Y B,et al.Alignment-free inference of hierarchical and reticulate phylogenetic relationships [J/OL].Briefings in Bioinformatics,2017:1- 10[2018- 06- 10].DOI:10.1093/bib/bbx067.
[13] QI J,LUO H,HAO B L.CVTree:a phylogenetic tree reconstruction tool based on whole genomes [J].Nucleic Acids Research,2004,32(12):45- 47.
[14] REN J,SONG K,SUN F Z,et al.Multiple alignment-free sequence comparison [J].Bioinformatics,2013,29(21):2690- 2698.
[15] LIPPERT R A,HUANG H,WATERMAN M S.Distributional regimes for the number of k word matches between two random sequences [J].Proceedings of the National Academy of Sciences of the Unite States of America,2002,99(22):13980- 13989.
[16] FORET S,WILSON S R,BURDEN C J.Characterizing the D2 statistic:word matches in biological sequences [J].Statistical Applications in Genetics and Molecular Biology,2009,8(1):1- 21.
[17] REINERT G,CHEW D,SUN F Z,et al.Alignment-free sequence comparison (I):statistics and power [J].Journal of Computational Biology:A Journal of Computational MolecularCell Biology,2009,16(12):1615- 1634.
[18] WAN L,REINERT G,SUN F,et al.Alignment-free sequence comparison (Ⅱ):theoretical power of comparison statistics [J].J Computational Biology,2010,17(11):1467- 1490.
[19] SONG K,REN J,ZHAI Z Y,et al.Alignment-free sequence comparison based on next-generation sequencing reads [J].Journal of Computational Biology,2013,20(2):64- 79.
[20] AHLGREN N A,REN J,LU Y Y,et al.Alignment-free d*2 ligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences [J].Nucleic Acids Research,2017,45(1):39- 53.
[21] SAITOU N,NEI M.The neighbor-joining method:a new method for reconstructing phylogenetic trees [J].Molecu-lar Biology and Evolution,1987,4(4):406- 425.