基于时间序列理论方法的生物序列特征分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
生物信息学的主要研究对象是DNA、RNA和蛋白质分子,因为这些生物大分子包含了遗传及物种进化的所有信息.随着DNA和蛋白质被测序,如何从这些DNA和蛋白质序列中获得更多的生物信息是具有挑战性的问题.随着碱基和氨基酸在基因数据库中的规模呈指数增长,利用新的理论方法去研究DNA和蛋白质序列就变得越来越重要.许多生物学家、物理学家、数学家和计算机专家都被吸引到这个研究领域中来.
     在介绍了生物信息学的研究背景之后,本文首先介绍了研究生物序列特性的时间序列理论方法,对本文要用到的短记忆ARMA模型和长记忆ARFIMA模型作了详细的阐述,为研究DNA序列、蛋白质序列特性做了理论上的准备工作.
     混沌游走表示(Chaos Game Representation,简记为CGR)是一种迭代映射技术,它可以把序列中的每一个单元,如DNA序列中的核苷酸,蛋白质序列中的氨基酸,映射到一个连续的坐标空间中去.我们基于CGR坐标提出了一种将DNA序列转换成一个时间序列(CGR-游走序列)的方法,并引入长记忆ARFIMA (p, d, q)模型来分析.我们分析了十条DNA序列的CGR-游走序列,发现都能用长记忆ARFIMA (p, d, q)模型高度显著地拟合.作为一个具有完善算法的经典时间序列模型,ARFIMA模型能帮助我们挖掘DNA序列中未知的特性.
     因为合适的ARFIMA模型在模型选择时成功率较低,且在参数估计中最大似然计算量较大,用短记忆模型去近似长记忆模型是研究者们感兴趣的问题.我们考虑利用短记忆ARMA(1, 1)过程去近似长记忆ARFIMA(p, d, q)过程,证明了这种适应性方法的均方误差准则,并引入DNA序列的十条CGR-游走序列用以分析,验证了这种近似方法的有效性,为长记忆DNA序列找到了一个算法更为简单的近似模型.
     在此基础上,我们还考虑利用ARMA(2, 2)模型去逼近ARFIMA(0, d, 0)模型.基于ARMA(2, 2)模型和ARMA(1, 1)模型有效性损失率的比较可知,ARMA(2, 2)近似模型优于ARMA(1, 1)近似模型.为验证此结论,还引入了服从ARFIMA(0, d, 0)模型的CGR-游走序列用以分析,比较了ARMA(1, 1)和ARMA(2, 2)这两个模型近似ARFIMA(0, d, 0)模型的有效性,根据残差标准差的结果可得ARMA(2, 2)近似模型优于ARMA(1, 1)近似模型.
     我们修改了Kalman滤波递推公式,解决了长记忆ARFIMA模型的缺失数据问题,并利用DNA序列的CGR-游走序列验证了此方法的有效性.
     基于已建立的DNA序列的CGR-游走模型,我们建立了一个类似的基于详细HP模型的连接蛋白质序列的CGR-游走模型,并引入长记忆ARFIMA (p, d, q)模型来分析,发现来自12条细菌全基因组的连接蛋白质序列的CGR-游走序列能用长记忆ARFIMA (p, d, q)模型显著地拟合.
DNA, RNA and protein sequences are of fundamental importance in understanding living organisms, since all information of the hereditary and species evolution is contained in these macromolecules. After DNA and protein are sequenced, how to gain more bioinformation from these DNA and protein sequences is a challenging problem. The nucleotides and amino acids stored in GenBank have been growing exponentially. It has become important to improve on new theoretical methods to conduct DNA and protein sequences analysis. Many biologists, physicists, mathematicians and computer specialists are attracted to this interesting research field.
     After introducing the background of Bioinformatics, this paper first introduces the time series theory methods applied to characteristics researches of biological sequences. We introduce the short-memory ARMA model and the long-memory ARFIMA model which will be applied to biological sequences analysis in the paper.
     Chaos Game Representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to find the coordinates for their positions in a continuous space. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series model, and a long-memory ARFIMA (p, d, q) model is introduced to DNA sequence analysis. This model is applied to simulate real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data and these models are fitted highly reasonably by ARFIMA (p, d, q) models. As a classical time series model with perfect algorithm, ARFIMA model can help us find out the unknown characteristics of DNA sequences.
     Since there is low success rate in the selection of the right ARFIMA model, along with the complicated maximum likelihood calculations in the parameters estimation, the approximation by a short-memory process in the prediction of ARFIMA model is a topic of interest in the literature. We analyze the approximation of a general long-memory ARFIMA(p, d, q) process by a short-memory ARMA(1, 1) process. To validate this approximation, a mean square error forecast criterion is proved. The performance of the ARMA(1, 1) approximation to an ARFIMA model is illustrated by using an application to ten DNA sequences. We find an approximating model with more simple algorithm.
     We also study the approximation of a long-memory fractionally differenced ARFIMA(0, d, 0) model by a short-memory ARMA(2, 2) process. Based on the difference of the efficiency loss ratio of the ARMA(2, 2) model and the ARMA(1, 1) model, we know that the approximating ARMA(2, 2) model is better than that ARMA(1, 1) model to ARFIMA(0,d,0) model. To validate this conclusion, the two approximating models are applied to simulate CGR-walk sequence obeying ARFIMA(0, d, 0) model .We find the approximating ARMA(2, 2) model is better than that ARMA(1, 1) model to ARFIMA(0,d,0) model according to the prediction error standard deviation.
     By modifying the Kalman filter recursive equations, the proposed method allows an efficient estimation of a long-memory ARFIMA process with missing values. In order to illustrate the application and effectiveness, we analyzes a CGR-walk sequence of DNA sequence, and draws a conclusion: the proposed approach is really very efficient.
     Based on the CGR-walk model of DNA sequences, a new CGR-walk model of the linked protein sequences from complete genomes is proposed based on the detailed HP model. A long-memory ARFIMA (p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CGR-walk sequence data of twelve linked protein sequences from twelve complete genomes of bacteria. Remarkably long-range correlations are uncovered in the data and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.
引文
[1]郝柏林.生物信息学[J].中国科学院院刊, 2000, (4): 260-264.
    [2]郝柏林,张淑誉编著.生物信息学手册(第7版)[M].上海:上海科学技术出版社, 2002.
    [3] Attwood TK, Parry-Smith DJ.生物信息学概论[M].罗静初等译.北京:北京大学出版社, 2002.
    [4]芒特著.生物信息学[M].钟扬,王莉,张亮主译.北京:高等教育出版社, 2003.
    [5]塞图宝,梅丹尼斯.计算分子生物学导论[M].朱浩等译.北京:科学出版社, 2003.
    [6]李魏主编.生物信息学导论[M].郑州:郑州大学出版社, 2004.
    [7]赵国屏等编著.生物信息学[M].北京:科学出版社, 2004.
    [8]蔡禄编著.生物信息学教程[M].北京:化学工业出版社, 2006.
    [9]李红.中心法则图解[EB/OL]. http://ebio.wjszzx.cn/html/2006-08/3446p15.htm, 2006.8.18.
    [10]欧阳曙光,贺福初.生物信息学:生物实验数据和计算技术结合的新领域[J].科学通报, 1999, 44 (14): 1457-1468.
    [11] Pearson WR. Rapid and sensitive sequence comparison with FASTP and FASTA [J]. Methods Enzymol, 1990, 183: 63-98.
    [12] Chao KM, Ostell J, Miller W. A local aligning tool for very long DNA sequences[J]. Comput Appl Biosci Apr. 1995, 11(2): 147-153.
    [13] Anfinsen CB. Principles that govern the folding of protein chains[J]. Science, 1973, 181: 223-230.
    [14] White SH, Jacobs RE. Statistical distribution of hydrophobic residues along the length of protein chains[J]. Biophysical Journal, 1990, 57(4): 911-921.
    [15] Pande VS, Grosberg AY, Tanaka T. Nonrandomness in protein sequences: Evidence for a physically driven stage of evolution[J]. Biophysics, 1994, 91(12): 12972-12975.
    [16] Baldi P, Chauvin Y, Hunkapiller T, et al. Hidden markov models of biological primary sequence information[J]. P.N.A.S., 1994, 91: 1059-1063.
    [17] Alon U, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays[J]. Proc. Natl. Acad. Sci. USA, 1999, 96: 6745-6750.
    [18] Wyrick J, Young RA. Deciphering gene expression regulatory networks[J]. Curr Opin Genet, 2002, 12(2): 130-136.
    [19] Jeffrey HJ. Chaos game representation of gene structure[J]. Nucleic Acid Res, 1990, 18: 2163-2170
    [20] Genstyle—the genomic signature workspace[EB/OL]. http://genstyle.imed.jussieu.fr/, 2006.12.
    [21] Fertil B, Massin M, Lespinats S, et al. GENSTYLE: exploration and analysis of DNA sequences with genomic signature[J]. Nucleic Acids Research, 2005, 33: 512-515.
    [22] Samuel Karlin, Chris Burge. Dinucleotide relative abundance extremes: a genomic signature[J]. TIG July, 1995, 11 (7): 283-290.
    [23] Deschavanne PJ, Giron A, Vilain J, et al. Genomic signature: characterization and classification of species assessed by chaos game representation of sequences[J]. Molecular Biology and Evolution, 1999, 16: 1391-1399.
    [24] Almeida Jonas S, Carri?o Jo?o A, Maretzek António, et al. Analysis of genomic sequences by Chaos Game Representation[J]. Bioinformatics, 2001, 17(5 ): 429-437.
    [25] Hao Bai-Lin. Fractals from genomes—exact solutions of a biology—inspired problem[J]. Physica A, 2000, 282: 225-246.
    [26] Hao Bai-Lin, Lee HC, Zhang Shu-Yu. Fractals Related to Long DNA Sequences and Complete Genomes[J]. Chaos, Solitions and Fractals, 2000, 11: 825-836.
    [27] Shen Junjie, Zhang Shuyu, Chienlee Hoong, et al. See DNA: A visualization tool for K-string content of long DNA sequences and their randomized counter pares[J]. Geno Prot Bioinformatics,2004, 2(3): 192-196.
    [28] Tiňo Peter. Spatial representation of symbolic sequences through iterative function systems[J]. IEEE transactions on systems, man and cybernetics-part A: systems and humans, 1999, 29(4): 386-393.
    [29] Tiňo Peter. Multifractal properties of Hao’s geometric representations of DNA sequences[J].Physica A, 2002, 304: 480-494.
    [30] Xie Huimin, Hao Bailin. Visualization of k-tuple distribution in prokaryote complete genomes and their randomized counterparts[J]. CSB Bioinformatics conference proceedings, IEEE computer society, Los Alamitos Ca, 2002, 31-42.
    [31] Yu Zu-guo, Anh Vo, Lau Ka-Sing. Measure representation and multifractal analysis of complete genomes[J]. Physical Review E, 2001, 64(3): 031903.
    [32] Wu Yonghui, Liew Wee-Chung Alan, Yan hong, et al. DB-Cure: a novel 2D method of DNA sequences Visualization and representation[J]. Chemical physics letters, 2003, 367: 170-176.
    [33] Milan Randi?, Marjan Vra?ko, Nella Ler?, et al. Novel 2-D graphical representation of DNA sequences and their numerical characterization[J]. Chemical physics letters, 2003, 368:1-6.
    [34] John A. Berger , Sanjit K. Mitra , Marco Carli , Alessandro Neri. Visualization and analysis of DNA sequences using DNA walks[J]. Journal of the Franklin Institute, 2004, 341: 37–53.
    [35] Alexandre Rosas, Edvaldo Nogueira, Jr, et al. Multifractal analysis of DNA walks and trails[J]. Physical Review E, 2002, 66: 061906(1-6).
    [36] Arneodo A, D’Aubenton-Carafa Y, Auditetal B. What can we learn with wavelets about DNA sequences[J]. Physica A, 1998, 249: 439-448.
    [37] Adami C, Cerf NJ. Physical complexity of symbolic sequences[J]. Physical D, 2000, 137: 62-69.
    [38] Krishnamachari A, Mandal Vijnan Moy, Karmeshu. Study of DNA binding sites using the Rényi parametric entropy measure[J]. Journal of Theoretical Biology, 2004, 227: 429-436.
    [39] Vinga Susana, Almeida Jonas S. Rényi continuous entropy of DNA sequences[J]. Journal of Theoretical Biology, 2004, 231: 377-388.
    [40] Krishnamachari A, Mandal V M, Karmeshu. Study of DNA binding sites using the Rényi parametric entropy measure [J]. J Theor Biol, 2004, 227(3): 429-436.
    [41] Vinga S, Almeida J S. Rényi continuous entropy of DNA sequences [J]. J Theor Biol, 2004, 231(3): 377-38.
    [42] Diwa Nestor N, Glazier James A. The fractal structures of mitochondrial genomes[J]. Physical A, 2002, 311: 221-230.
    [43] Nicolay S, Brodie of Brodie EB, Touchon M, et al. From scale invariance to deterministic chaos in DNA sequences: towards a deterministic description of gene organization in the human genome[J]. Physica A, 2004, 342: 270-280.
    [44] Vinogradov AE. Evolution of genome size: multilevel selection, mutation bias or dynamical chaos[J]. Current opinion in Genetics & Development, 2004, 14: 620-626.
    [45] Sanchez R, Grau R, Morgado E A novel lie algebra of the genetic code over the Galois field of four DNA bases[J]. Math. Biol., 2006, 202 (1): 156-174.
    [46] Peng CK, Buldyrev S, Goldberg AL, Havlin S, Sciortino F, Simons M and Stanley HE. Long-range correlations in nucleotide sequences[J]. Nature, 1992, 356: 168-170.
    [47] Voss RF. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences[J]. Phys. Rev. Lett., 1992, 68: 3805.
    [48] Larhammar D, Chatzidimitriou-Dreismann CA. Biological Origins of Long-Range Correlations and Compositional Variations in DNA[J]. Nucleic Acids Res., 1993, 21: 5167-5170.
    [49] Prabhu VV, Claverie JM. Correlations in intronless DNA[J]. Nature, 1991, 359: 782.
    [50] Luo L, Lee W, Jia L, Ji F and Tsai L Statistical correlation of nucleotides in a DNA sequence[J], Phys. Rev.E, 1998, 58: 861.
    [51] Yu ZG and Chen GY. Rescaled range and transition matrix analysis of DNA sequences[J]. Commun. Theor. Phys., 2000, 33(4): 673-680.
    [52] Li W. The study of correlation structures of DNA sequences: a critical review[J]. Comput Chem., 1997, 21: 257.
    [53] Guillermo Abramson, Pablo A. Alemany and Hilda A. Cerdeira. Noisy Levy walk analog of two-dimensional DNA walks for chromosomes of S. cerevisiae[J]. Phys. Rev.E., 1998, 58(1): 914-918.
    [54] Lopes SRC and Nunes MA. Long memory analysis in DNA sequences[J]. Physica A, 2006, 361: 569-588.
    [55] Jura J, Wegrzyn P, Jura J, et al. Regulatory mechanisms of gene expression: complexity with elements of deterministic chaos[J]. Acta Biochimica Polonica, 2006, 53(1): 1-9.
    [56] Herzel H, Trifonov E.N, Weiss O, et al. Interpreting correlations in biosequences[J]. Physica A, 1998, 249: 449-459
    [57] Giuliani A, Benigni R, Sirabella P, et al. Nonlinear methods in the analysis of protein sequences: a case study in rubredoxins[J]. Biophysical Journal, 2000, 78(1): 136-149
    [58] Huang YZ, XiaoY. Nonlinear dynamics approach to the correlation analysis of protein sequences[J].Chinese Physics Letters, 2002, 19(3): 434-436
    [59] Dill KA. Theory for the folding and stability of globular proteins[J]. Biochemistry, 1985, 24: 1501-1509.
    [60] Chan HS, Dill KA. Compact polymers[J]. Macromolecular, 1989, 22: 4559-4573.
    [61] Shih CT, Su ZY, Gwan JF, Hao BL, Hsieh CH, Lee HC. The HP model, design- ability, and alpha-helices in protein structures[J]. Phys. Rev. Lett., 2000, 84(2): 386–389.
    [62] Wang B and Yu ZG. One way to characterize the compact structures of lattice protein model[J]. J. Chem. Phys., 2000, 112: 6084–6088.
    [63] Wang J, Wang W. Modeling study on the validity of a possibly simplified representation of proteins [J]. Phys. Rev. E., 2000, 61: 6981-6986.
    [64] Brown TA. Genetics, 3rd Edition [M]. London : Chapman&Hall, 1998.
    [65] Yu ZG, Anh VV, Lau KS. Fractal analysis of measure representation of large proteins based on the detailed HP model[J]. Physica A, 2004, 337: 171-184.
    [66] Fiser A, Tusnady GE and Simon I. Chaos game representation of protein structures[J]. J. Mol. Graphics, 1994, 12: 302–304.
    [67] Basu S, Pan A, Dutta C and Das J. Chaos game representation of proteins[J]. J. Mol. Graphics Model., 1997, 15: 279–289.
    [68] Yu ZG, Anh V and Lau K S. Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses[J]. Journal of Theoretical Biology, 2004, 226: 341-348.
    [69] Huang YZ, Xiao Y. Nonlinear deterministic structures and the randomness of protein sequences[J]. Chaos, Solitons and Fractals, 2003, 17: 895-900.
    [70] Dickey DA, Fuller WA. Distribution of the estimators for autoregressive time series with a unit root[J]. Journal of the American Statistical Association, 1979, 74: 427-457.
    [71] Box GEP, Jenkins GM, Reinsel GC. Time series analysis: forecasting and control[M]. San Francisco: Holden Day, 1970.
    [72] Granger CWJ, Joyeux R. An introduction to long memory time series models and fractional differencing[J]. Journal of Time Series Analysis, 1980, 1: 15-39.
    [73] Granger CWJ. Some properties of time series data and their use in econometric model specification[J]. Journal of Econometrics, 1981, 16: 121-130.
    [74] Hosking JRM. Fractional differencing[J]. Biometrika, 1981, 68: 165-176.
    [75] Engle RF. Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation[J]. Econometrica, 1982, 45: 987-1007.
    [76] Engle RF, Granger CWJ. Co-integration and error correction: representation, estimation, and testing[J]. Econometrica, 1987, 55(2): 251-276.
    [77]常学将,陈敏,王明生编著.时间序列分析[M].北京:高等教育出版社, 1993.
    [78]王燕.应用时间序列分析[M].北京:中国人民大学出版社, 2005
    [79] Ljung GM and Box GEP. On a measure of lack of fit in time series models[J]. Biometrica, 1978, 66: 297-303.
    [80] Akaike H. A new look at statistical model identification[J]. IEEE transaction on Automatics Control, 1974, 19: 716-723.
    [81] Akaike H. Posterior probabilities for choosing a regression model[J]. Ann. Inst. Math. Stat., 1978, 30A: 9-14.
    [82] Schwartz G. Estimating the dimension of a model[J]. Ann. Statistics, 1978, 6: 461-64.
    [83] Beran J. Statistics for Long-memory Processes[M]. New York: Chapman Hall, 1994.
    [84] Buldyrev SV, Dokholyan NV, Goldberg AL, Havlin S, Peng CK, Stanley HE and Visvanathan GM. Analysis of DNA sequences using methods of statistical physics[J]. Physica A, 1998, 249, 430-438.
    [85] Buldyrev SV, Goldberg AL, Havlin S, Peng CK and Stanley HE. Fractals in biology and medicine: from DNA to the heartbeat, in Fractals in Science, edited by A. Bunde and S.Havlin[M]. Berlin: Springer, 1994.
    [86] Buldyrev S V, Goldberger A L, Havlin S, Peng C K, Simons M and Stanley H E. Generalized Levy Walk Model for DNA nucleotide sequences[J]. Phys.Rev. E, 1993, 47, 4514-4523.
    [87] Buldyrev S V, Goldberger A L, Havlin S, Mantegna R N, Matsa M E, Peng C K, Simons M, and Stanley H E Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis[J]. Phys.Rev. E, 1995, 51, 5084-5091.
    [88] Tai Y Y, Li P C, Tseng H C. A two-dimensional modified Levy-walk model for the DNA sequences[J]. Physica A, 2006, 369, 688-698.
    [89] Luo L F, Lee W J, Jia L J, Ji F M, and Tsai L. Statistical correlation of nucleotides in a DNA sequence[J]. Phys. Rev. E, 1998, 58, 861.
    [90] Yu ZuGuo, Anh Vo V, Lau Ka Sing. Multifractal and correlation analyses of protein sequences from complete genomes[J]. Physical Review E, 2003, 68: 021913 (1-10).
    [91] Yu Z G, Anh V, Gong Z M and Long S C. Fractals in DNA sequences[J]. Chin. Phys., 2002, 11(12), 1313-1318.
    [92] Ramon Roman-Roldan, Pedro Bernorla-Galvan and Jose L. Oliver. Application of information theory to DNA sequence analysis: a review[J]. Pattern Recognition, 1996, 29(7): 1187-1194.
    [93] Ivo Grosse, Hanspeter Herzel, Sergey V. Buldyrev, H. Eugene Stanley. Species independence of mutual information in coding and noncoding DNA[J]. Physical Review E, 2000, 61(5): 5624-5629.
    [94] A.K. Mohanty and A.V.S.S. Narayana Rao. Factorial Moments Analyses Show a Characteristic Length Scalein DNA Sequences[J]. Physical Review Letters, 2000, 84(8): 1832-1835.
    [95] Zu-Guo Yu, V. V. Anh, Bin Wang. Correlation property of length sequences based on global structure of the complete genome[J]. Physical Review E, 2000, 63: 011903(1-6).
    [96] Barnsley MF. Fractals Everywhere[M]. Springer-Verlag, New York, 1988.
    [97] Beran J. Long-memory processes and fractional integration in econometrics[J]. J. Econometrics, 1996, 73: 5-59.
    [98] Chan NH, Palma W. State space modeling of long-memory processes[J]. Ann. Statist., 1998, 26: 719-740.
    [99] Hosking JRM. Modeling persistence in hydrological time series using fractional differencing[J]. Water Resources Research, 1984, 20: 1898-1908.
    [100] Tiao GC, Xu D. Robustness of maximum likelihood estimates for multi-step predictions: The exponential smoothing case[J]. Biometrika, 1993, 80: 623-641.
    [101] Tiao GC, Tsay RS. Some advances in non-linear and adaptive modeling in time series[J].J. Forecasting, 1994, 13: 109-131.
    [102] Lee SA. Prediction of long-memory time series, PhD term paper[D]. Graduate School of Business, University of Chicago, 1991.
    [103] Tong H. Some comments on nonlinear time series analysis[J]. In Nonlinear Dynamics and Time Series, 1997, 11: 17-27.
    [104] Brodsky J, Hurvich CM. Multi-step forecasting for long-memory processes[J]. J. Forecasting, 1999, 18: 59-75.
    [105] Anindya R, Wayne F. Estimation for autoregressive time series with a root near 1[J]. J. Business & Economic Statistics, 2001, 19: 482-493.
    [106] Jones RH. Maximum likelihood fitting of ARMA models to time series with missing observations[J]. Technometrics, 1980, 22: 389-395.
    [107] Harvey AC and Pierse RG. Estimating missing observations in economic time series[J]. J. Amer. Statist. Assoc., 1984, 79: 125-131.
    [108] Ansley CF and Kohn R. Estimations, filtering and smoothing in state space models with incompletely specified initial conditions[J]. The Annals of Statistics., 1985, 13: 1286-1316.
    [109] Kohn R and Ansley CF. Estimation, prediction and interpolation of ARIMA models with missing data[J]. J. Amer. Statist. Assoc., 1986, 81, 751-761.
    [110] Hannan EJ and Deistler M. The Statistical Theory of Linear Systems[J]. New York: John Wiley, 1988.
    [111] Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T. Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes[J]. Proc. Natl Acad. Sci. USA, 1989, 86: 9355–9359.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700