基于纠错编码理论的DNA序列编码特性分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
现代生物学的研究不再是单一学科的研究,而是多学科交叉、综合的研究。由于生物系统自身的复杂性,需要将多种分析理论和研究方法应用到该领域。随着基因工程所获得的基因数据的急速增加,引起了人们采用新方法、技术和工具对其进行分析的兴趣。由于生物系统中存在的信息传递、编码与现代通信系统中的信息传输与编码的相似性,因此将现代通信工程中的纠错编码理论应用于生物遗传序列的研究和测试系统的设计并取得了一些可喜的进展。
     本文基于通信工程的纠错编码理论对生物系统信息分析方法进行研究,对若干对象的序列进行分析,旨在为通信纠错编码理论在生物学领域研究中的应用寻求新的途径和方法。
     开展工作如下:
     1、根据三联体密码在遗传信息表达过程中的重要作用,将三联体密码(而不是单个碱基)作为遗传信息的基本信息单元,同时考虑相邻密码子之间的相互作用,借鉴通信编码理论中分组码编码模型的设计、分析方法,通过试验选定基于分组码的(6,3)分组码模型。选定GC含量不同的12种原核生物和9种真核生物作为分析对象,利用(6,3)分组码模型对它们的DNA序列进行分析,利用码距作为特征参数与分析对象的生物特征进行对比。分析结果在表明原核与真核分析对象的平均码距在起始密码子附近和终止密码子附近均呈现出显著变化,在原核生物的SD区域也有显著变化。
     2、在纠错编码中,卷积码是一种具有较好性能的信道编码方式,理论和实际上均已证明卷积码的性能至少不比分组码差,应该可以寻找更好的卷积编码模型来分析DNA序列的编码特性。参考分组码模型分析方法和结果,借鉴通信编码理论中卷积码编码模型的设计、分析方法,基于密码子简并性、密码子上下文关联性、碱基短程关联占优特性,使用三联体密码作为基本信息单元,设计了(6,3,1)卷积码分析模型。利用(6,3,1)卷积码模型对所选12种原核生物和9种真核生物DNA序列进行分析,结果表明原核与真核分析对象的平均码距在起始密码子附近和终止密码子附近均呈现出显著变化,在原核生物的SD区域有显著变化。此外,所有对象的平均码距曲线在编码区表现出明显的周期3特性。根据观察到的不同GC含量的分析对象平均码距曲线分离的特性(特别是对于原核生物),我们在实验中新定义了一个参数:特征平均码距(CACD),它与GC含量具有关联,与原核生物GC含量具有较好的比例特性。这赋予了编码参数以生物特征,表明卷积码模型在生物信息研究中具有深入研究和应用的潜力。
     由于上述分析模型的设计是基于生物遗传信息的通用特性提出,因此对分析对象没有依赖性,可以对多类对象进行分析而不需要对模型调整。
     3、侧重于基于卷积码的分析模型,根据碱基短程关联占优特性,对参数进行了对比分析。考虑通常分析方法中常采用将单个碱基作为基本信息单元,选定(2,1,1)卷积码模型进行分析。考虑过渡状态的对比,选定(3,2,1)卷积码模型进行分析。通过对编码输出长度、码距计算码长等参数的对比分析,初步确定效果较好的(6,3,1)、(3,2,1)和(2,1,1)模型作为分析模型。
     4、将基于纠错编码的分析模型应用于序列相似性分析。使用所设计的(6,3,1)、(3,2,1)和(2,1,1)卷积码模型对11个物种(包括人,山羊,负鼠,鸡,狐猴,小鼠,大鼠,兔子,牛,大猩猩和黑猩猩)的β-球蛋白第一个外显子编码序列的相似性/不相似性进行分析。利用L/L和M/M矩阵的归一化最大特征值构建8分量矢量,计算其两两端点间的欧几里得距离,分析结果反映出3种灵长类对象(人,黑猩猩,大猩猩)之间由于进化关系而存在的强相似性,而与负鼠(距现存哺乳动物最远物种)和鸡(其中唯一非哺乳动物对象)的相似性很弱。数据分析的结果表明所提出的方法可以反映所分析的DNA序列的重要信息。
Researches in modern biology are based on multi-interdisciplinary subjects, instead of single one. The complexity of biological systems requires the crossing of various theories and methods. The rapid increasing data obtained from genetic engineering have aroused the scholars’interest to study the biological systems as information transmission systems. Based on the similiarity of information transmission and coding between biological systems and modern communication engineering, the error-correction coding theory of modern communication engineering is employed for the study of genetic sequences and design of biological test systems, which has resulted in some obvious progresses.
     In our research, we studied the information analysis method of biosystem based on error-correction coding theory of communication engineering and sequences of some objects were analyzed. This will help us explore a new approach of applying communication coding theory to biological field.
     The relative work is as follows:
     1. A codon is treated as a basic genetic information unit, instead of a nucleotide, based on the importance of codons in the expressing of genetic information. Considering interaction between adjacent codons, we designed a (6,3) block code model for analysis, using the design method of block code encoding model in communication coding theory as reference. DNA sequences of the twelve procaryotic organisms and nine eukaryotic organisms with different GC content were selected for analyzing with the (6,3) block code model. Code distance was used as a characteristic parameter for detecting the corresponding biological feature. We observe that average code distances fluctuate obviously near the initiation codon and termination codon. Remarkable changes also appear in the SD field of procaryotic organisms.
     2. We know that convolutional code model is always better than block code mode in coding system, which inspires us to study and search better convolutional code model for the analyzing of DNA sequences. Considering the convolutional code encoding model and the results based on our block code model, we designed a (6,3,1) convolutional code-based model according to the degeneracy of codons, context of condons, short-range dominance of bases correlation and a codon being a information unit. And then, we analyzed the selected DNA sequences of the twelve procaryotic organisms and nine eukaryotic organisms with the (6,3,1) convolutional code model. We observe that average code distances fluctuate obviously near the initiation codon and termination codon. Remarkable changes also appear in the SD field of procaryotic organisms. We also observe obvious period-3 feature in the coding region of all objects. We defined a new parameter, characteristic average code distance (CACD), to describe the separation of average code distance curves of different objects with different GC contents (especially for procaryotic organisms). CACDs are relative to GC contents and proportional to the corresponding GC contents of procaryotic organisms approximately. So, the code parameter carries certain biological information. This shows that this model deserves further study and usage in bioinformation processing.
     We establishe these models on the basis of general features of genetic information, so it is species-independent and suitable for various kinds of objects analysis without model’s adjustment.
     3. Focusing on the convolutional code model, we compared some model parameters based on short-range dominance of bases correlation. Considering a nucleotide as a genetic information unit as usually, we selected (2,1,1) convolutional code model. And (3,2,1) model was selected as a transition. We compared code length of coding output and code length for code distance calculation, and then confirmed that (6,3,1), (3,2,1) and (2,1,1) models can provide good results.
     4. The analysis models based on error-correction coding theory were used for similarity study of DNA sequences. We studied the similarities/dissimilarities among the coding sequences of the first exon ofβ-globin gene of 11 species (human, goat, opossum, gallus, lemur, mouse, rabbit, rat, gorilla, bovine and chimpanzee) with the (6,3,1), (3,2,1) and (2,1,1) convolutional models. We constructed an 8-component vector whose components were the normalized leading eigenvalues of the L/L and M/M matrices. Based on the Euclidean distances between the end points of the 8-component vectors, the simulation illustrates that the three kinds of Primates (human, chimpanzee, and gorilla) are similar to each other strongly because of their evolutionary relationship, and opossum (the most remote species from the remaining mammals) and gallus (the only non-mammalian representative) are of weak similarity to the others. The results demonstrate that the approach can reflect the important information of the DNA sequences considered.
引文
[1]庄永龙,周敏,李衍达,沈岩.人类遗传突变数据库及其应用[J].遗传, 2004(4): 514-518.
    [2]刘仲祥,安治国,孙步彤,孙荣武.基因和人类基因组计划[J].中国实验诊断学, 2001, 5(2): 21-23.
    [3]李衍达.信息与生命[J].化学通报, 2001(10): 601-607.
    [4] Hogeweg, P. Simulating the growth of cellular forms [J]. SIMULATION, 1978, 31: 91-96.
    [5] Hogeweg, P., Hesper, B. Interactive instruction on population interactions [J]. Comput. Biol. Med., 1978, 8: 319-327.
    [6] http://www.bisti.nih.gov/ docs/CompuBioDef.pdf. Online aviable. [J].
    [7]朱玉贤,李毅,郑晓峰.现代分子生物学(第3版) [M].北京:高等教育出版社, 2007.
    [8]唐旭清,朱平.后基因组时代生物信息学的发展趋势[J].生物信息学, 2008(3): 142-144.
    [9]张阳德.生物信息学( 1) :概论[J].外科理论与实践, 2006, 11(5):附1-7.
    [10]张德礼,李衍达,季梁.用电子克隆新基因C17orf32和ZNF362对NCBI人类基因数据库模式参考序列5种错误类型的分析与纠正[J].遗传学报, 2004, 31(4): 326-334.
    [11]袁远,季星来,孙之荣,李衍达. Isomap在基因表达谱数据聚类分析中的应用[J].清华大学学报(自然科学版), 2004, 44(9): 1286-1289.
    [12]李菁,李逢博,王炜.蛋白质序列复杂性简化与非比对序列分析[J].生物化学与生物物理进展, 2006, 33(12): 1215-1222.
    [13]解涛,梁卫平,丁达夫.后基因组时代的基因组功能注释[J].生物化学与生物物理进展, 2000, 27(2): 166-170.
    [14]郝家胜.生物进化研究的回顾与展望[J].微体古生物学报, 2003, 20(3): 325-332.
    [15]杨春梅,万柏坤,高晓峰.基因表达聚类分析技术的现状与发展[J].生物化学与生物物理进展, 2003, 30(6): 974-979.
    [16] Chen, Y.H., Chen, Q., Chen, F., Zhao, Y. Protein fold recognition based on error correcting output codes and SVM [J]. PROTEIN AND PEPTIDE LETTERS, 2008, 15(5): 443-447.
    [17] Furuta, T., Shimizu, K., Terada, T. Accurate prediction of native tertiary structure of protein using molecular dynamics simulation with the aid of the knowledge of secondary structures [J]. CHEMICAL PHYSICS LETTERS, 2009, 472(1-3): 134-139.
    [18]宋雪梅,李宏滨,杜立新.比较基因组学及其应用[J].生命的化学, 2006, 26(5).
    [19]蒋太交,薛艳红,徐涛.系统生物学——生命科学的新领域[J].生物化学与生物物理进展, 2004, 31(11): 957-964.
    [20] Bennett, M.R., Hasty, J. Systems biology - Genome rewired [J]. NATURE, 2008, 452(7189):824-825.
    [21]李春华,马晓慧,陈慰祖,王存新.蛋白质-蛋白质分子对接方法研究进展[J].生物化学与生物物理进展, 2006, 33(7): 616-621.
    [22] Freedenberg, M., Kaddi, C., Quo, C.F., Wang, M.D. Review of systems biology simulation tools for translational research [C]. In 7th IEEE International Conference on Bioinformatics and Bioengineering. 2007: Boston MA: 358-365.
    [23] Dada, J.O., Mendes, P. Design and Architecture of Web Services for Simulation of Biochemical Systems [J]. DATA INTEGRATION IN THE LIFE SCIENCES, PROCEEDINGS, 2009, 5647: 182-195.
    [24]中国教育科研网格专家组.聚合网络资源推动BI发展——生物信息学网格项目稳步推进[J].中国教育网络, 2005(4):封面报道.
    [25] Gatlin, L.L. Information Theory and the Living System. [M]. New York, NY,: Columbia University Press, 1972.
    [26] Sibbald, P.R., Banerjee S., Maze, J. Calculating higher order DNA sequence information measures [J]. J. Tfzeor. Bwl., 1989, 136: 475-483.
    [27] Roman-Roldan, R., Bernaola-Galvan, P., Oliver, J.L. Application of information theory to DNA sequence analysis: a review [J]. Pattern Recognition, 1996, 29(7): 1187-1194.
    [28] Wang, X.H., Istepanian, R.S.H., Song, Y.H., May, E.E. Review of Application of Coding Theory in Genetic Sequence Analysis [C]. In 2003 Workshop on Enterprise Networking and Computing in Healthcare Industry, Healthcom 2003. 2003: Santa Monica, Canada: 5-9.
    [29] Hanus, P., Goebel, B., Dingel, J., Weindl, J., Zech, J., Dawy, Z., Hagenauer, J., Mueller, J.C. Information and communication theory in molecular biology [J]. Electrical Engineering, 2007, 90: 161-173.
    [30] Didier, G.A., Christian, J.M. Periodicities in introns [J]. Nucleic Acids Research, 1987, 15: 7581-7592.
    [31] Didier, G.A., Christian, J.M. Study of a Perturbation in the Coding Periodicity [J]. MATHEMATICAL BIOSCIENCES, 1987, 86: 1-14.
    [32] ShiYuan, W., FengChun, T., Xiao, L., Jia, W. A Novel Representation Approach to DNA Sequence and Its Application [J]. IEEE Signal Processing Letters, 2009, 16(4): 275-278.
    [33] Weijiang, L., Liaofu, L. Periodicity of base correlation in nucleotide sequence [J]. PHYSICAL REVIEW E, 1997, 56(1): 848-851.
    [34] Changchuan, Y., Stephen, S.-T., Y. Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence [J]. Journal of Theoretical Biology, 2007, 247: 687–694.
    [35] Li, W., Kaneko, K. Long-range correlation and partial 1/f spectrum in a noncoding DNA sequence [J]. Europhys. Lett., 1992, 17(7): 655-660.
    [36] Peng, C.K., Buldyrev, S.V., Goldberger, A.L., Havlin, S., Sciortino, F., Simons, M., Stanley, H.E. Long-range correlation in nucleotide sequences [J]. Nature, 1992, 356: 68-170.
    [37] Thao, T.T., Vincent, A.E.I., Zhou, G.T. Techniques for detecting approximate tandem repeats in DNA [C]. In Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference. 2004: 449-452.
    [38] Tien-Yien, L., James, A.Y. Period Three Implies Chaos [J]. The American Mathematical Monthly, 1975, 82(10): 985-992.
    [39] Randic, M. Another look at the chaos-game representation of DNA [J]. Chemical Physics Letters, 2008, 456: 84-88.
    [40] Zu-Guo, Y., Vo, A., Ka-Sing, L. Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses [J]. Journal of Theoretical Biology, 2004, 226(3): 341-348.
    [41] Jeffrey, H.J. Chaos game visualization of sequences [J]. Computers & Graphics, 1992, 16(1): 25-33.
    [42] Anfinsen, C.B., Haber, E., Sela, M., White, F.J. The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain [J]. PNAS, 1961, 47: 1309-1314.
    [43] Anfinsen, C.B., Haber, E. Studies on the reduction and re-formation of protein disulfide bonds [J]. J. Biol. Chem., 1961, 236: 1361-1363.
    [44] Qiu, J.D., Liang, R.P., Zou, X.Y., Mo, J.Y. Prediction of protein secondary structures by continuous wavelet transform [J]. ACTA CHIMICA SINICA, 2003, 61(5): 748-754.
    [45] SASAGAWA, F., TAJIMA, K. PREDICTION OF PROTEIN SECONDARY STRUCTURES BY A NEURAL NETWORK [J]. COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1993, 9(2): 147-152.
    [46] Ding, C.H.Q., Dubchak, I. Multi-class protein fold recognition using support vector machines and neural networks [J]. Bioinformatics, 2001, 17(4): 349-358.
    [47] Deschavanne, P., TufféryP. Exploring an alignment free approach for protein classification and structural class prediction [J]. Biochimie, 2008, 90(4): 615-625.
    [48] Wenlu, Y., Xiongjun, P., Liqing, Z. Similarity analysis of DNA sequences based on the relative entropy [M]. Lecture Notes in Computer Science. New York, NY: Springer Berlin, Heidelberg, 2005, 1035-1038.
    [49] Pham, T.D., Yan, H. LPC-VQ based hidden Markov models for similarity searching in DNAsequences [C]. In 2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS. 2006: 1654-1659.
    [50] Rosen, G.L. Comparison of Autoregressive Measures for DNA Sequence Similarity [C]. In Genomic Signal Processing and Statistics, 2007. GENSIPS 2007. IEEE International Workshop. 2007: 1-4.
    [51] Endy, D. Foundations for engineering biology [J]. NATURE, 2005, 438(7067): 449-453.
    [52] Wang, X.H., Istepanian, R.S.H., Geake, T. ERROR CONTROL CODING IN MICROARRAY DATA ANALYSIS [C]. In Workshop on Genomic Signal Processing and Statistics. 2004
    [53] Battail, G. Does information theory explain biological evolution? [J]. Europhys. Lett., 1997, 40(3): 343-348.
    [54] Battail, G. An engineer’s view on genetic information and biological evolution [J]. BioSystems, 2004, 76: 279–290.
    [55] Battail, G. Should Genetics Get an Information-Theoretic Education? [J]. IEEE Eng. Med. Biol. Mag., 2006, 25(1): 34-45.
    [56] Radman, M. Fidelity and infidelity [J]. Nature, 2001, 413(6852): 115.
    [57] Battail, G. Can we explain the faithful communication of genetic information? [C]. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Advances in Information Recording. 2008: Piscataway, NJ USA: 79-103.
    [58] Gonzalez, D.L., Giannerini, S., Rosa, R. Strong short-range correlations and dichotomic codon classes in coding DNA sequences [J]. Physical Review E, 2008, 78(5): 051918.
    [59] Nicolas, R., MiklósBercsényi, LászlóOrbán, Maria, E.K., Dirk, L., Michael, B., ChristianeNüsslein-Volhard, Matthew, P.H. Duplication of fgfr1 Permits Fgf Signaling to Serve as a Target for Selection during Domestication [J]. Current Biology, 2009, 19(19): 1642-1647.
    [60]陈杰,彭志红.生物信息学研究的军事应用前景[J].北京理工大学学报, 2002, 6(3): 265-269.
    [61] May, E., Lee, M.T., Dolan, P., Crozier, P., Brozik, S., Manginell, M. Computational sensing and in vitro classification of GMOs and biomolecular events [C]. In 26th Army Science Conference. 2008
    [62]刘次全,白春礼,张静.结构分子生物学[M].北京:高等教育出版社, 1997.
    [63]罗辽复.分子进化的几个普适性质全国理论生物物理研讨会(武汉, 1999)上的报告[J].内蒙古大学学报(自然科学版), 2000, 31(1): 41-46.
    [64]朱玉贤,李毅.现代分子生物学(第二版) [M].北京:高等教育出版社, 2002.
    [65]王镜岩,朱圣庚,徐长法.生物化学(第三版) [M].北京:高等教育出版社, 2002.
    [66]陈阅增.普通生物学[M].北京:高等教育出版社, 1997.
    [67]李难.进化生物学基础[M].北京:高等教育出版社, 2005.
    [68] Sadovsky, M.G., Putintseva, Y.A. On the Correlation between the Synonymy of Codon Usage and Taxonomy [J]. Doklady Biochemistry and Biophysics, 2007, 416: 243-244.
    [69] Okayasu, T., Sorimachi, K. Organisms can essentially be classified according to two codon patterns [J]. Amino Acids, 2009, 36: 261–271.
    [70] Robinson, R. Genetics (Volume 3 K-P) [M]. New York: Macmillan Reference USA, 2003.
    [71] Strachan, T., Read, A. Human Molecular Genetics (Third Edition) [M]. New York: Garland Science, 2004.
    [72]曹志刚,钱亚生.现代通信原理[M].北京:清华大学出版社, 1992.
    [73]王新梅,肖国镇.纠错码—原理与方法(修订版) [M].西安:西安电子科技大学出版社, 2001, 13-16.
    [74]樊昌信,曹丽娜.通信原理(第6版) [M].北京:国防工业出版社, 2010.
    [75]曹雪虹,张宗橙.信息论与编码[M].北京:北京邮电大学出版社, 2001.
    [76]曹雪虹,张宗橙.信息论与编码[M].北京:清华大学出版社, 2004.
    [77] Proakis, J.G. Digital Communications (Third Edition) [M]. BeiJing: Publishing House of Electronics Industry, 1998.
    [78] Wilson, S.G. Digital Modulation and Coding [M]. BeiJing: Publishing House of Electronics Industry, 1998.
    [79] Clark, G.C., Cain, J.J.B. Error-Correction Coding for Digital Communications [M]. New York: Plenum Press, 1981.
    [80] Shannon, C.E. A Mathematical Theory of Communication [J]. The Bell System Technical Journal, 1948, 27: 379-423, 623-656.
    [81] Yockey, H. Information theory and molecular biology [M]. New York: Cambridge University Press, 1992, 102.
    [82] May, E.E., Mladen, A.V., Donald, L.B., David, I.R. An error-correcting code framework for genetic sequence analysis [J]. Journal of the Franklin Institute, 2004, 341: 89-109.
    [83] Dawy, Z., Gonzalez, Faruck, M.G., Hagenauer, J., Mueller, J.C. Modeling and Analysis of Gene Expression Mechanisms a communication theory approach [C]. In 2005 IEEE International Conference. 2005: 815-819.
    [84] Dawy, Z., Morcos, F., Weindl, J., Mueller, J.C. Translation initiation modeling and mutational analysis based on the 3'-end of the Escherichia coli 16S rRNA sequence [J]. Biosystems, 2009, 96(1): 58-64.
    [85] Rosen, G.L. Examining Coding Structure and Redundancy in DNA [J]. IEEE Eng. Med. Biol.Mag., 2006, 25(1): 62-68.
    [86] Bouaynaya, N., Schonfeld, D. Protein communication system: Evolution and genomic structure [J]. Algorithmica, 2007, 48(4): 375-397.
    [87] Gamow, G. Possible Relation between Deoxyribonucleic Acid and Protein Structures [J]. Nature, 1954, 173: 318.
    [88] Golomb, S.W. Efficient coding for the desoxyribonucleic channel [J]. Proc. of Symposia in Applied Mathematics, 1962, 14: 87-100.
    [89] Liebovitch, L.S., Tao, Y., Todorov, A.T., Levine, L. Is there an error correcting code in the base sequence in DNA? [J]. Biophysical Journal, 1996, 71(3): 1539-1544.
    [90] Rosen, G.L., Moore, J.D. Investigation of coding structure in DNA [C]. In Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference. 2003: 361-364.
    [91] Mac Donill, D.A. Digital parity and the composition of the nucleotide alphabet [J]. IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2006, 25(1): 54-61.
    [92] Mac Donaill, D.A. Molecular Informatics: Hydrogen-Bonding, Error-Coding, and Genetic Replication [C]. In 43RD ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS. 2009: Baltimore MD: 853-858.
    [93] Khan, A.H., Ossadtchi, A., Leahy, R.M., Smith, D.J. Error-correcting microarray design [J]. Genomics, 2003, 81(2): 157-165.
    [94] Reif, J.H., LaBean, T.H. Computationally inspired biotechnologies: improved DNA synthesis and associative search using error-correcting codes and vector-quantization [C]. In DNA Computing. 6th International Workshop on DNA-Based Computers, DNA 2000. 2000: 145-72.
    [95] May, E.E., Mladen, A.V., Donald, L.B., David, I.R. Coding theory based models for protein translation initiation in prokaryotic organisms, Biosystems [J]. Biosystems, 2004, 76(1-3): 249-260.
    [96] Ponnala, L., Bitzer, D.L., Vouk, M.A. On finding convolutional code generators for translation initiation of Escherichia Coli K-12 [C]. In Engineering in Medicine and Biology Society 2003, Proceedings of the 25th Annual International Conference of the IEEE. 2003: 3854-3857.
    [97] Liu, Y., Chakrabartty, S. Computer aided simulation and verification of forward error-correcting biosensors [J]. ISCAS 2008. 2008 IEEE International Symposium on Circuits and Systems, 2008: 1826-1829.
    [98] Yukinawa, N., Oba, S., Kato, K., Ishii, S. Optimal Aggregation of Binary Classifiers forMulticlass Cancer Diagnosis Using Gene Expression Profiles [C]. In 4th International Symposium on Bioinformatics Research and Applications. 2009: Georgia State Univ, Atlanta, GA: 333-343.
    [99] Zheng, G.Y., Qian, Z.L., Yang, Q., Wei, C.C., Xie, L., Zhu, Y.Y., Li, Y.X. The combination approach of SVM and ECOC for powerful identification and classification of transcription factor [J]. BMC BIOINFORMATICS, 2008, 9(1): 282-289.
    [100] Dimitris, A. Genomic signal processing [J]. IEEE Signal Processing Magazine, 2001, 18(4): 8-20.
    [101] Randic, M., Vracko, M., Lers, N., Plavsic, D. Novel 2-D graphical representation of DNA sequences and their numerical characterization [J]. Chemical Physics Letters, 2003, 368: 1-6.
    [102] Bo, L., Tianming, W. Analysis of similarity/dissimilarity of DNA sequences based on 3-D graphical representation [J]. Chemical Physics Letters, 2004, 388: 195-200.
    [103] Bo, L., Mingshu, T., Kequan, D. A 4D representation of DNA sequences and its application [J]. Chemical Physics Letters, 2005, 402: 380-383.
    [104] Jeffrey, H.J. CHAOS GAME REPRESENTATION OF GENE STRUCTURE [J]. NUCLEIC ACIDS RESEARCH, 1990, 18(8): 2163-2170.
    [105] Randic, M. Another look at the chaos-game representation of DNA [J]. CHEMICAL PHYSICS LETTERS, 2008, 456(1-3): 84-88.
    [106] Gao, J., Xu, Z. Chaos game representation (CGR)-walk model for DNA sequences [J]. Chinese Physics B, 2009, 18(1): 370-376.
    [107] Bai-Lin, H., Lee, H.C., Shu-Yu, Z. Fractals related to long DNA sequences and complete genomes [J]. Chaos Solitons & Fractals, 2000, 11(6): 825-836.
    [108] Shiyuang, W., Fengchun, T., Wenjiang, F., Xiao, L. Applications of representation method for DNA sequences based on symbolic dynamics [J]. Journal of Molecular Structure: THEOCHEM, 2009, 909: 33-42.
    [109] Shi-Yuan, W., Feng-Chun, T., Xiao, L., Jia, W. A novel representation approach to DNA sequence and its application [J]. IEEE Signal Processing Letters, 2009: 275-278.
    [110] Shpaer, E.G. Constraints on codon context in Escherichia coli genes.Their possible role in modulating the effiency of translation [J]. J. Mol. Biol., 1986, 188(4): 555-564.
    [111] Yarus, M., Folley, L.S. Sense codons are found in specific contexts [J]. J. Mol. Biol., 1985, 182(4): 529-540.
    [112]张静,顾宝洪,石秀凡,刘次全.大肠杆菌基因中密码子前后碱基的使用与蛋白质结构[J].生物物理学报, 2001, 17(1): 174-180.
    [113]张静,蔡军,李衍达.人类可变翻译事件的研究[J].中国科学C辑:生命科学, 2007, 37(2): 198-203.
    [114] Richard, F.V. Evolution of Long-Range Fractal Correlations and 1/f Noise in DNA Base Sequences [J]. Physical Review Letters, 1992, 68(25): 3805-3808.
    [115]卢欣,陈惠民,李衍达.细菌DNA序列中的长程相关性[J].清华大学学报(自然科学版), 1999, 39(7): 98-102.
    [116]赵小杰,刘勇,姚力,陈惟昌.基于非线性方法的DNA序列分析[J].生物物理学报, 2004, 20(1): 37-42.
    [117] Luo, L.F., Lee, W.J., Jia, L.J., Ji, F.M., Tsai, L. Statistical correlation of nucleotides in a DNA sequence [J]. Physical Review E, 1998, 58(1): 861-871.
    [118] International Nucleotide Sequence Database Collaboration (INSD). http://www.insdc.org.
    [119] European Molecular Biology Laboratory Nucleotide Sequence Database ( EMBL ) . http://www.ebi.ac.uk/embl/.
    [120] GenBank. http://www.ncbi.nlm.nih.gov/Genbank/.
    [121] DNA Databank of Japan (DDBJ). http://www.ddbj.nig.ac.jp/.
    [122] Andreas, D.B., B. F. Francis, O.生物信息学——基因和蛋白质分析的实用指南[M].北京:清华大学出版社, 2003.
    [123] Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W. GenBank [J]. NUCLEIC ACIDS RESEARCH, 2009, 37: D26-D31.
    [124]罗盘棋,王新华,薄新文.模式生物及其在分子生物学研究中的意义[J].动物医学进展, 2008, 29(6): 105-109.
    [125]邹俊,荆清.模式生物与microRNA研究[J].中国科学C辑:生命科学, 2009, 39(1): 129-136.
    [126]惠俊爱,王绍明,张霞,惠俊美,王庆果.模式生物及其研究进展[J].生物学通报, 2002, 37(8): 4-6.
    [127] Whitfield, C.W., Behura, S.K., Berlocher, S.H., Clark, A.G., Johnston, J.S., Sheppard, W.S., Smith, D.R., Suarez, A.V., Weaver, D., Tsutsui, N.D. Thrice out of Africa: Ancient and recent expansions of the honey bee, Apis mellifera [J]. SCIENCE, 2006, 314(5799): 642-645.
    [128] Danforth, B.N., Sipes, S., Fang, J., Brady, S.G. The history of early bee diversification based on five genes plus morphology [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103(41): 15118-15123.
    [129]郑火青,胡福良.蜜蜂——新兴的模式生物[J].昆虫学报, 2009, 52(2): 210-215.
    [130]沈萍,陈向东.微生物学(第二版) [M].北京:高等教育出版社, 2006.
    [131] Garrity, G.M., Bell, J.A., Lilburn, T.G. Taxonomic Outline of the Prokaryotes. Bergey's Manual of Systematic Bacteriology, Second Edition [M]. New York: Springer, 2004.
    [132]吴家睿. "生命之书"的解读[J].科学(上海), 2003, 55(2): Null.
    [133]贾茜,吴洪涛,周兴军,高健,赵伟,阿奇思,魏敬双,候利华,吴书音,张莹,董向锋,黄艳敏,金炜元,朱红杰,赵兴卉,黄春华,邢丽苹,李丽文,马骏,刘西燕,陶苒,叶帅东,宋义高,宋羚羚,陈冠平,杜春玲,张雪婷,李博,王延涛,杨威,吉尔伯特,腾宇扬,冷国庆,李峦峰,刘文献,程立均,梁秋波,李郑武,张秀芹,左亚军,陈薇,李会成,惠觅宙.富含GC的DNA片段在哺乳动物细胞基因超高表达调控中的重要作用[J].中国科学:生命科学, 2010, 40(2): 159-165.
    [134]刘晓,田逢春,李素芳.基于分组码模型的DNA序列编码特性分析[J].生物信息学, 2009, 7(22): 28-31.
    [135]田元新,陈超,邹小勇,建丁,蔡沛祥,莫金垣.外显子周期三行为特征的研究[J].化学学报, 2005, 63(13): 1215-1219.
    [136] Yabuuchi, E., Kosako, Y., Oyaizu, H., Yano, I., Hotta, H., Hashimoto, Y., Ezaki, T., Arakawa, M. PROPOSAL OF BURKHOLDERIA GEN-NOV AND TRANSFER OF 7 SPECIES OF THE GENUS PSEUDOMONAS HOMOLOGY GROUP-II TO THE NEW GENUS, WITH THE TYPE SPECIES BURKHOLDERIA-CEPACIA (PALLERONI AND HOLMES 1981) COMB-NOV [J]. MICROBIOLOGY AND IMMUNOLOGY, 1992, 36(12): 1251-1275.
    [137] Lapeyre, B., Michot, B., Feliu, J., Bachellerie, J.P. NUCLEOTIDE-SEQUENCE OF THE SCHIZOSACCHAROMYCES-POMBE 25S RIBOSOMAL-RNA AND ITS PHYLOGENETIC IMPLICATIONS [J]. NUCLEIC ACIDS RESEARCH, 1993, 21(14): 3322-3322.
    [138]白凤兰,廖波,王天明.拓扑指数在生物序列相似性比较中的应用[J].生物数学学报, 2006, 21(4): 521-530.
    [139]宋杰.基于离散度函数的DNA序列的相似性分析[J].计算机与应用化学, 2007, 24(6): 729-733.
    [140] Randic, M., Vracko, M., Novic, M., Plavsic, D. Spectrum-Like Graphical Representation of RNA Secondary Structure [J]. INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, 2009, 109(13Sp. Iss. SI): 2982-2995.
    [141] Randic, M., Vracko, M., Lers, N., Plavsic, D. Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation [J]. Chemical Physics Letters, 2003, 371: 202-207.
    [142] Yingzhao, L., Tianming, W. Related matrices of DNA primary sequences based on triplets of nucleic acid bases [J]. Chemical Physics Letters, 2006, 417(1-3): 173-178.
    [143]陈润生.与生物信息学相关的两个前沿方向--非编码基因和复杂生物网络[J].生物物理学报, 2007, 23(4): 290-295.
    [144]罗辽复.垃圾DNA与信息生物学[J].科学(上海), 2006, 58(1): 24-28.
    [145]曹更生,柳爱莲,李宁.隐藏在基因组中的遗传信息[J].遗传, 2004, 26(5): 714-720.
    [146] Nogueira, T., Rankin, D.J., Touchon, M., Taddei, F., Brown, S.P., Rocha, E.P. Horizontal Gene Transfer of the Secretome Drives the Evolution of Bacterial Cooperation and Virulence [J]. Current Biology, 2009, 19(20): 1683-1691.
    [147] Morris, P.F., Schlosser, L.R., Onasch, K.D., Wittenschlaeger, T., Austin, R., Provart, N. Multiple horizontal gene transfer events and domain fusions have created novel regulatory and metabolic networks in the oomycete genome. [J]. PLoS One, 2009, 4(7): e6133(1-12).
    [148]徐自祥,孙啸.细胞代谢复杂网络研究进展[J].生物信息学, 2009, 7(2): 120-124;132.
    [149]王青芸,易慧,刘来福,孟大志.基因逻辑网络研究进展[J].生物化学与生物物理进展, 2008, 35(11): 1239-1246.
    [150] Matsuno, K., Paton, R.C. Is there a biology of quantum information? [J]. BioSystems, 2003, 55: 39–46.
    [151] Schuster, P. Free Will, Information, Quantum Mechanics, and Biology [J]. COMPLEXITY, 2009, 15(1): 8-10.
    [152] Shultzaberger, R.K., Roberts, L.R., Lyakhov, I.G., Sidorov, I.A., Stephen, Andrew, G., Fisher, Robert, J., Schneider, Thomas, D. Correlation between binding rate constants and individual information of E. coli Fis binding sites [J]. Nucleic Acids Research, 2007, 35(16): 5275-5283.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700