基于信号处理技术的生物序列相似性分析与基因识别
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
生物信息学是一门新兴的交叉学科。它是以计算机和网络为工具,采用数学和信息科学等理论和方法研究核酸、蛋白质等生物大分子。生物信息学的研究能够帮助我们进一步探索生物进化和生命本质等重大问题。同时,生命中蕴藏的巨大信息也将进一步促进其他学科的发展。
     本文旨在探索信号处理技术在生物信息学中的应用。主要研究内容包括生物序列的相似性分析和基因识别。
     本文的研究成果可概括为:
     ①鉴于RNA二级结构的结构特征主要体现在碱基对中,本文以碱基对为出发点,提取出RNA二级结构序列所对应的基序列,并借鉴信号处理技术中的正交投影和小波变换的思想在所得的基序列上设计碱基对变换,进而构建序列间的相似性函数。该函数结合了序列间碱基对变换后结果之间的差值及其对应的位置差值,因此能够全面地比较序列间的差异,从而实现RNA二级结构的相似性分析。基于碱基对变换的相似性分析方法的时间复杂度较小。除此之外,该方法获得的相似性分析结果之间的差异较大,有利于进一步实现所得结果的聚类分析。
     ②基于信息论中的汉明距离,本文提出了一种具有普适性的双边相似性函数,使之能够适应DNA序列、RNA二级结构序列和蛋白质序列的相似性分析。该方法不需要对生物序列进行数值映射,能够较好地提取生物序列中的信息,以较低的时间复杂度统一地实现三种生物序列的相似性分析,证明了双边相似性函数的有效性和普适性。尤其对RNA二级结构序列的相似性分析,不考虑结构信息和考虑结构信息的分析结果近似一致。这样就简化了RNA二级结构序列的相似性分析过程。
     ③基于符号动力学原理,本文提出了一种新的DNA序列表示方法。该表示方法不仅具有良好的数值特征,能够挖掘DNA序列中的混沌特征,而且还能够实现序列的可视化表示。新表示方法的可视化特征能够实现DNA序列的图形比对和密码子比对。基于密码子比对的结果,构建序列间的相似百分比有效地实现了DNA序列的相似性分析。基于几何中心构成的特征向量,新表示方法同样能够有效地实现DNA序列的相似性分析,表明符号动力学原理能够有效地应用在DNA序列的分析中。
     ④结合RNA二级结构序列与DNA序列的不同点,改进DNA序列的符号动力学表示方法使之适合RNA二级结构序列。其出发点是RNA二级结构的结构稳定性主要是由碱基对的自由能决定。重点讨论了改进后的RNA二级结构序列表示方法中的截取长度对序列相似性分析结果的影响。在时域中,结合矩阵不变量,利用改进后的表示方法定量地实现了RNA二级结构序列的相似性分析。为了进一步验证改进后的表示方法的有效性,对表示结果进行离散傅里叶变换,从频域定性地分析了RNA二级结构序列的相似性。实验结果表明符号动力学原理同样能够有效地应用在RNA二级结构序列的相似性分析中。
     ⑤结合DNA序列的符号动力学表示方法和Z曲线表示方法,本文利用基因编码区的周期-3特性设计了一种基于扩展卡尔曼滤波器的基因识别模型。该方法能够利用扩展卡尔曼滤波器的预测能力,有效地识别基因的外显子位置。同时,为了降低识别结果中的背景噪声,对识别结果采用加窗处理的方法,进一步提高了基因编码区和非编码区的识别效果。
Bioinformatics is a new interdiscipline. With the aid of computers and internet, bioinformatics deals with biological macromolecules including nucleic acid and protein etc., according to the theories and methodologies from mathematics and information science. The research on bioinformatics can help us explore some serious problems about biological evolution and life inbeing. In addition, the huge knowledge hidden in life can also accelerate the development of other disciplines.
     This dissertation is aimed at exploring the applications of signal processing techniques in bioinformatics. The main research focuses on similarity analysis of biological sequences and gene identification.
     The main results obtained can be summarized as follows.
     ①Since the structure information on RNA secondary structure is mainly composed of base pairs, we construct base sequences from sequences of RNA secondary structure based on base pairs. With the help of the principles of orthogonal projection and wavelet transform, base-pair transform on the obtained base sequences is then designed. Then, based on the designed base-pair transform, the similarity function between sequences is constructed for comparing RNA secondary structures. The similarity function combines the difference between the transformed results of two sequences with the difference between the associated locations. Therefore, the similarity function can comprehensively compare difference of sequences, and can be applied to similarity analysis of RNA secondary structure. This proposed method for similarity analysis has lower time complexity. In addition, the difference among the results obtained by this method is larger, which can help to further implement cluster analysis of the obtain results.
     ②Based on Hamming distance of information theory, a universal bilateral similarity function is proposed to implement similarity analysis of biological sequences including DNA, RNA secondary structure and protein. With no requirement of numerical mapping of biological sequence, the proposed method with lower time complexity, contains much information of biological sequences, and unify the methods for similarity analysis of three kinds of biological sequences. Simulation results fully show the validity and universality of the bilateral similarity function. Especially for RNA secondary structure, based on the proposed similarity function, the results with consideration of structure information is consistent with the ones without consideration of structure information, which can simplify the procedure of similarity analysis of RNA secondary structure.
     ③Based on the principle of symbolic dynamics, a novel representation method for DNA sequence is proposed. This proposed representation method with the feature of visualization, bears better numerical characteristic which can help to find the chaotic characteristic of DNA sequence. The visualization feature of the proposed method can implement graphical alignment, codon alignment of DNA sequence. Based on the results of codon alignment, a similarity percent between sequences is constructed for effectively implementing similarity analysis of DNA. Based on the characteristic vector composed of the geometrical centers, the proposed method can also implement similarity analysis of DNA, effectively. It is shown from the obtain results that the principle of symbolic dynamics can be applied to sequence analysis of DNA, effectively.
     ④Combined with the difference between the sequences of RNA secondary structure and DNA, the representation method for DNA based on symbolic dynamics is modified for RNA secondary structure. The starting point is that the structure stabilization of RNA secondary structure is mainly decided by the free energy of base pairs. The influence of truncated length on the results of similarity analysis is discussed emphatically. In time domain, combined with matrix invariants, the modified method can implement similarity analysis of RNA secondary structure, quantificationally. In frequency domain, the qualitative analysis is made to further validate the modified method. Simulation results show that the principle of symbolic dynamics can also be effectively applied to similarity analysis of RNA secondary structure.
     ⑤Combined with the representation methods based on symbolic dynamics and Z curve for DNA, the period-3 feature of protein coding region is utilized to design a gene identification model based on extended Kalman filter. With the help of the prediction ability of extended Kalman filter, the proposed model can effectively identify the location of gene exons. In order to reduce the background noise, a window operation is performed after the proposed model, which can further improve the identification results of coding and noncoding regions of gene.
引文
[1]张玉静.分子遗传学[M].北京:科学出版社, 2000.
    [2]王翼飞,史定华.生物信息学-智能化算法及其应用[M].北京:化学工业出版社, 2006.
    [3]孙啸,陆祖宏,谢建明.生物信息学基础[M].北京:清华大学出版社, 2005.
    [4]罗静初.生物信息学概述[M].北京:北京大学出版社, 2001.
    [5] P. P. Vaidyanathan. Genomics and proteomics: a signal processor's tour[J]. Circuits and Systems Magazine, IEEE, 2004, 4 (4): 6-29.
    [6]张春霆.生物信息学的现状与展望[J].世界科技研究与发展, 2000, 22 (6): 17-20.
    [7] S. B. Needleman, C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of Molecular Biology, 1970, 48 (3): 443-453.
    [8] T. F. Smith, M. S. Waterman. Identification of common molecular subsequences[J]. Journal of Molecular Biology, 1981, 147: 195-197.
    [9]张贤达.现代信号处理[M].北京:清华大学出版社, 2002.
    [10] H. Simon. Adaptive filter theory[M]. Upper Saddle River, New Jersy: Prentice-HALL, 2001.
    [11] G. Rosen. Examining coding structure and redundancy in DNA[J]. IEEE Engineering in Medicine and Biology Magazine, 2006, 25 (1): 62-68.
    [12] R. F. Voss. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences[J]. Physical Review Letters, 1992, 68 (25): 3805-3808.
    [13] D. Anastassiou. Genomic signal processing[J]. IEEE Signal Processing Magazine, 2001, 18 (4): 8-20.
    [14]马宝山.基于信号处理理论和方法的基因预测研究[学位论文].大连海事大学, 2008.
    [15] E. Hamori, J. Ruskin. H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences[J]. Journal of Biological Chemistry, 1983, 258 (2): 1318-1327.
    [16] E. Hamori. Graphic representation of long DNA sequences by the method of H curves--current results and future aspects[J]. Biotechniques, 1989, 7 (7): 710-720.
    [17] M. A. Gates. A simple way to look at DNA[J]. Journal of Theoretical Biology, 1986, 119 (3): 319-328.
    [18] A. Nandy. A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes[J]. Current Science, 1994, 66 (4): 309-314.
    [19] P. M. Leong, S. Morgenthaler. Random walk and gap plots of DNA sequences[J]. ComputerApplication Bioscience, 1995, 11 (5): 503-507.
    [20] A. Nandy, P. Nandy. Graphical analysis of DNA sequence structure: II. Relative abundances of nucleotides in DNAs, gene evolution and duplication[J]. Current Science, 1995, 68 (1): 75-85.
    [21] S. Ghosh, A. Roy, S. Adhya, A. Nandy. Identification of new genes in human chromosome 3 contig 7 by graphical representation technique [J]. Current Science, 2003, 84 (12): 1534-1543.
    [22] X. Guo, M. Randic, S. C. Basak. A novel 2-D graphical representation of DNA sequences of low degeneracy[J]. Chemical Physics Letters, 2001, 350 (1-2): 106-112.
    [23]张春霆.用几何学方法分析DNA序列[J].中国科学基金, 1999, 13 (3): 152-153.
    [24] R. Zhang, C. T. Zhang. Z curves, an intutive tool for visualizing and analyzing the DNA sequences[J]. Journal of Biomolecular Structure & Dynamics , 1994, 11: 767-782.
    [25] B. Liao, T. M. Wang. New 2D graphical representation of DNA sequences[J]. Journal of Computational Chemistry, 2004, 25 (11): 1364-1368.
    [26] B. Liao, K. Ding. Graphical approach to analyzing DNA sequences[J]. Journal of Computational Chemistry, 2005, 26 (14): 1519-1523.
    [27] M. Randic, M. Vracko, N. Lers, D. Plavsic. Novel 2-D graphical representation of DNA sequences and their numerical characterization[J]. Chemical Physics Letters, 2003, 368 (1-2): 1-6.
    [28] M. Randic. Spectrum-Like Graphical Representation of DNA Based on Codons[J]. Acta Chimica Slovenica, 2006, 53: 477-485.
    [29] J. Zupan, M. Randic. Algorithm for coding DNA sequences into“spectrum-like”and“zigzag”representations[J]. Journal of Chemical Information and Computer Sciences, 2005, 45 (2): 309-313.
    [30] Y. Yao, T. Wang. A class of new 2-D graphical representation of DNA sequences and their application[J]. Chemical Physics Letters, 2004, 398 (4-6): 318-323.
    [31] Y. H. Yao, Q. Dai, X. Y. Nan, P. A. He, et al. Analysis of similarity/dissimilarity of DNA sequences based on a class of 2D graphical representation[J]. Journal of Computational Chemistry, 2008, 29 (10): 1632-1639.
    [32] Z. J. Zhang. DV-Curve: a novel intuitive tool for visualizing and analyzing DNA sequences[J]. Bioinformatics, 2009, 25 (9): 1112-1117.
    [33] Y. Wu, A. Wee-Chung Liew, H. Yan, M. Yang. DB-Curve: a novel 2D method of DNA sequence visualization and representation[J]. Chemical Physics Letters, 2003, 367 (1-2): 170-176.
    [34] Z. Qi, X. Qi. Novel 2D graphical representation of DNA sequence based on dual nucleotides[J]. Chemical Physics Letters, 2007, 440 (1-3): 139-144.
    [35] J. F. Yu, X. Sun, J. H. Wang. TN curve: A novel 3D graphical representation of DNA sequence based on trinucleotides and its applications[J]. Journal of Theoretical Biology, 2009, 261 (3): 459-468.
    [36] X. Q. Liu, Q. Dai, Z. L. Xiu, T. M. Wang. PNN-curve: A new 2D graphical representation of DNA sequences and its application[J]. Journal of Theoretical Biology, 2006, 243 (4): 555-561.
    [37] X. Q. Qi, J. Wen, Z. H. Qi. New 3D graphical representation of DNA sequence based on dual nucleotides[J]. Journal of Theoretical Biology, 2007, 249 (4): 681-690.
    [38] B. Liao, T. M. Wang. Analysis of similarity/dissimilarity of DNA sequences based on 3-D graphical representation[J]. Chemical Physics Letters, 2004, 388 (1-3): 195-200.
    [39] B. Liao, M. Tan, K. Ding. A 4D representation of DNA sequences and its application[J]. Chemical Physics Letters, 2005, 402 (4-6): 380-383.
    [40] C. Yuan, B. Liao, T. M. Wang. New 3D graphical representation of DNA sequences and their numerical characterization[J]. Chemical Physics Letters, 2003, 379 (5-6): 412-417.
    [41] B. Liao, Y. Zhang, K. Ding, T. Wang. Analysis of similarity/dissimilarity of DNA sequences based on a condensed curve representation[J]. Journal of Molecular Structure: THEOCHEM, 2005, 717 (1-3): 199-203.
    [42] B. Liao, T. Wang. 3-D graphical representation of DNA sequences and their numerical characterization[J]. Journal of Molecular Structure: THEOCHEM, 2004, 681 (1-3): 209-212.
    [43] B. Liao, M. Tan, K. Ding. Application of 2-D graphical representation of DNA sequence[J]. Chemical Physics Letters, 2005, 414 (4-6): 296-300.
    [44] B. Liao. A 2D graphical representation of DNA sequence[J]. Chemical Physics Letters, 2005, 401 (1-3): 196-199.
    [45] Z. Liu, B. Liao, W. Zhu, G. Huang. A 2D graphical representation of DNA sequence based on dual nucleotides and its application[J]. International Journal of Quantum Chemistry, 2009, 109 (5): 948-958.
    [46] Z. H. Qi, T. R. Fan. PN-curve: A 3D graphical representation of DNA sequences and their numerical characterization[J]. Chemical Physics Letters, 2007, 442 (4-6): 434-440.
    [47] Z. H. Qi, J. M. Wang, X. Q. Qi. Classification analysis of dual nucleotides using dimension reduction[J]. Journal of Theoretical Biology, 2009, 260 (1): 104-109.
    [48] M. Randic, M. Vracko, J. Zupan, M. Novic. Compact 2-D graphical representation of DNA[J]. Chemical Physics Letters, 2003, 373 (5-6): 558-562.
    [49] H. J. Jeffrey. Chaos game representation of gene structure[J]. Nucleic Acids Research, 1990, 18 (8): 2163-2170.
    [50] D. Wu, J. Robergé, D. J. Cork, B. G. Nguyen, et al. Computer visualization of long genomicsequences[C]. New York: IEEE Computer Society, 1993: 308-315.
    [51] D. Ashlock, J. Golden. Chaos automata: iterated function systems with memory[J]. Physica D: Nonlinear Phenomena, 2003, 181 (3-4): 274-285.
    [52] B. Hao, H. C. Lee, S. Zhang. Fractals related to long DNA sequences and complete genomes[J]. Chaos, Solitons and Fractals, 2000, 11 (6): 825-836.
    [53] B. L. Hao. Fractals from genomes-exact solutions of a biology-inspired problem[J]. Physica A: Statistical Mechanics and its Applications, 2000, 282 (1-2): 225-246.
    [54]王树林,王戟,陈火旺,张波云.基于分形的DNA序列可视化表示研究[J].计算机科学, 2006, 33 (7): 158-163.
    [55] Y. H. Yao, B. Liao, T. M. Wang. A 2D graphical representation of RNA secondary structures and the analysis of similarity/dissimilarity based on it[J]. Journal of Molecular Structure: THEOCHEM, 2005, 755 (1-3): 131-136.
    [56]单夫一,骆嘉伟.一种新的RNA二级结构的三维图形表示[J].武汉理工大学学报:信息与管理工程版, 2007, 29 (5): 51-54.
    [57] B. Liao, W. Chen, X. Sun, W. Zhu. A binary coding method of RNA secondary structure and its application[J]. Journal of Computational Chemistry, 2009, 30 (14): 2205-2212.
    [58]刘琦,张引,叶修梓,俞荣栋.基于奇异值分解的RNA二级结构相似度计算方法[J].浙江大学学报:工学版, 2007, 41 (8): 1249-1254.
    [59] N. Liu, T. Wang. A computational method for the similarity analysis of RNA secondary structures and its application[J]. Journal of Molecular Structure: THEOCHEM, 2006, 767 (1-3): 185-188.
    [60] M. Randic, D. Plavsic. Novel spectral representation of RNA secondary structure without loss of information[J]. Chemical Physics Letters, 2009, 476 (4-6): 277-280.
    [61] Y. Zhang, J. Qiu, L. Su. Comparing RNA secondary structures based on 2D graphical representation[J]. Chemical Physics Letters, 2008, 458 (1-3): 180-185.
    [62] C. Li, L. L. Xing, X. Wang. Analysis of similarity of RNA secondary structures based on a 2D graphical representation[J]. Chemical Physics Letters, 2008, 458 (1-3): 249-252.
    [63]韩乐,莫忠息. RNA-Z曲线及其在病毒基因识别中的应用[J].生物数学学报, 2004, 19 (2): 245-250.
    [64] B. A. Shapiro, K. Zhang. Comparing multiple RNA secondary structures using tree comparisons[J]. Bioinformatics, 1990, 6 (4): 309-318.
    [65] S. Y. Le, R. Nussinov, J. V. Maizel. Tree graphs of RNA secondary structures and their comparisons[J]. Computers and Biomedical Research, 1989, 22 (5): 461-473.
    [66] B. A. Shapiro. An algorithm for comparing multiple RNA secondary structures[J]. ComputerApplications in the Biosciences: CABIOS, 1988, 4 (3): 387-393.
    [67] K. Zhang. Computing similarity between RNA secondary structures[C]. IEEE, 2002: 126-132.
    [68] J. Jing, B. K. Sarker, V. C. Bhavsar, H. Boley, et al. Towards a weighted-tree similarity algorithm for RNA secondary structure comparison[C]. High-Performance Computing in Asia-Pacific Region, 2005. Proceedings. Eighth International Conference on, 2005: 639-644.
    [69] J. Wang, W. Wang. A computational approach to simplifying the protein folding alphabet[J]. Nature Structural & Molecular Biology, 1999, 6 (11): 1033-1038.
    [70] J. Wang, W. Wang. Modeling study on the validity of a possibly simplified representation of proteins[J]. Physical Review E, 2000, 61 (6): 6981-6986.
    [71]李丹丹,王俊,李春.蛋白质序列的一种新的三维图形表示及其应用[J].生物信息学, 2009, 7 (1): 60-63.
    [72]肖前军,周金玉,邓总纲.蛋白质序列混沌游戏表示模拟效果的优化[J].汕头大学学报:自然科学版, 2010, 25 (1): 35-41.
    [73]刘赞波. DNA及蛋白序列相似性分析方法研究[学位论文].湖南大学, 2009.
    [74]王守源,李晓琴,罗辽复.氨基酸分类与蛋白质二级结构相关性[J].内蒙古大学学报(自然科学版), 2002, 33 (44): 423-427.
    [75] F. Bai, T. Wang. A 2-D graphical representation of protein sequences based on nucleotide triplet codons[J]. Chemical Physics Letters, 2005, 413 (4-6): 458-462.
    [76] M. Randic. 2-D graphical representation of proteins based on physico-chemical properties of amino acids[J]. Chemical Physics Letters, 2007, 444 (1-3): 176-180.
    [77] Y. H. Yao, Q. Dai, C. Li, P. A. He, et al. Analysis of similarity/dissimilarity of protein sequences[J]. Proteins-Structure Function and Bioinformatics, 2008, 73 (4): 864-871.
    [78] J. Wen, Y. Zhang. A 2D graphical representation of protein sequence and its numerical characterization[J]. Chemical Physics Letters, 2009, 476 (4-6): 281-286.
    [79]马影,于晓庆,周杨,杨柳等.蛋白质序列的一类新的二维图形表示及其应用[J].宜春学院学报, 2008, 30 (4): 19-20.
    [80]肖前军.递归迭代函数系统对detailed-HP模型的蛋白质序列的混沌游戏表示的模拟[J].湘潭师范学院学报:自然科学版, 2008, 30 (1): 14-17.
    [81] M. Randic, D. Butina, J. Zupan. Novel 2-D graphical representation of proteins[J]. Chemical Physics Letters, 2006, 419 (4-6): 528-532.
    [82] P. A. He, Y. P. Zhang, Y. H. Yao, Y. F. Tang, et al. The graphical representation of protein sequences based on the physicochemical properties and its applications[J]. Journal of Computational Chemistry, 2010, 31 (11): 2136-2142.
    [83] M. Randic, K. Mehulic, D. Vukicevic, T. Pisanski, et al. Graphical representation of proteinsas four-color maps and their numerical characterization[J]. Journal of Molecular Graphics & Modelling, 2009, 27 (5): 637-641.
    [84]吴霞.蛋白质序列比较中的图形表示及其相似性分析[学位论文].大连理工大学, 2004.
    [85] F. Bai, T. Wang. On graphical and numerical representation of protein sequences[J]. Journal of Biomolecular Structure & Dynamics, 2006, 23 (5): 537-546.
    [86] D. R. Westhead, J. H. Parish, R. M. Twyman. Instant notes bioinformatics[M]. Oxford: BIOS Scientific Publishers, 2002.
    [87] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, et al. Basic local alignment search tool[J]. Journal of Molecular Biology, 1990, 215 (3): 403-410.
    [88] W. R. Pearson, D. J. Lipman. Improved tools for biological sequence comparison[J]. Proceedings of the National Academy of Sciences of the United States of America, 1988, 85 (8): 2444-2448.
    [89]郭颖.生物序列的几何刻画及应用[学位论文].大连理工大学, 2008.
    [90] M. Randic. Condensed representation of DNA primary sequences[J]. Journal of Chemical Information and Computer Sciences, 2000, 40 (1): 50-56.
    [91] M. Randic. On characterization of DNA primary sequences by a condensed matrix[J]. Chemical Physics Letters, 2000, 317 (1-2): 29-34.
    [92] F. L. Bai, D. C. Li, T. M. Wang. A new mapping rule for RNA secondary structures with its applications[J]. Journal of Mathematical Chemistry, 2008, 43 (3): 932-943.
    [93] Z.-H. Qi, X.-Q. Qi. Numerical characterization of DNA sequences based on digital signal method[J]. Computers in Biology and Medicine, 2009, 39 (4): 388-391.
    [94] Y. Guo, T. Wang. A new method to analyze the similarity of the DNA sequences[J]. Journal of Molecular Structure: THEOCHEM, 2008, 853 (1-3): 62-67.
    [95] Y. Guo, T. M. Wang. A new method to analyze the similarity of protein structure using TOPS representations[J]. Journal of Biomolecular Structure & Dynamics, 2008, 26 (3): 367-374.
    [96] N. Saitou, M. Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees[J]. Molecular Biology and Evolution, 1987, 4 (4): 406-425.
    [97] M. Randic, M. Vracko, A. Nandy, S. C. Basak. On 3-D graphical representation of DNA primary sequences and their numerical characterization[J]. Journal of Chemical Information and Computer Sciences, 2000, 40 (5): 1235-1244.
    [98] F. Bai, W. Zhu, T. Wang. Analysis of similarity between RNA secondary structures[J]. Chemical Physics Letters, 2005, 408 (4-6): 258-263.
    [99] Y. H. Yao, X. Y. Nan, T. M. Wang. A class of 2D graphical representations of RNA secondary structures and the analysis of similarity based on them[J]. Journal of Computational Chemistry,2005, 26 (13): 1339-1346.
    [100] L. W. Liu, T. M. Wang. On 3D graphical representation of RNA secondary structures and their applications[J]. Journal of Mathematical Chemistry, 2007, 42: 595-602.
    [101] J. Feng, T. M. Wang. A 3D graphical representation of RNA secondary structures based on chaos game representation[J]. Chemical Physics Letters, 2008, 454 (4-6): 355-361.
    [102] B. Liao, T. M. Wang. A 3D graphical representation of RNA secondary structures[J]. Journal of Biomolecular Structure & Dynamics, 2004, 21 (6): 827-832.
    [103] B. Liao, K. Ding, T. M. Wang. On a six-dimensional representation of RNA secondary structures[J]. Journal of Biomolecular Structure & Dynamics, 2005, 22 (4): 455-463.
    [104] M. Randic, M. Vracko, N. Lers, D. Plavsic. Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation[J]. Chemical Physics Letters, 2003, 371 (1-2): 202-207.
    [105] S. Wang, F. Tian, W. Feng, X. Liu. Applications of representation method for DNA sequences based on symbolic dynamics[J]. Journal of Molecular Structure: THEOCHEM, 2009, 909 (1-3): 33-42.
    [106] M. Li, J. H. Badger, X. Chen, S. Kwong, et al. An information-based sequence distance and its application to whole mitochondrial genome phylogeny[J]. Bioinformatics, 2001, 17 (2): 149-154.
    [107] V. Makarenkov, F. J. Lapointe. A weighted least-squares approach for inferring phylogenies from incomplete distance matrices[J]. Bioinformatics, 2004, 20 (13): 2113-2121.
    [108] J. D. Thompson, D. G. Higgins, T. J. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice[J]. Nucleic Acids Research, 1994, 22 (22): 4673-4680.
    [109]韩敏.混沌时间序列预测理论与方法[M].北京:中国水利水电出版社, 2007.
    [110] H. M. Aktulga, I. Kontoyiannis, L. A. Lyznik, L. Szpankowski, et al. Identifying statistical dependence in genomic sequences via mutual information estimates[J]. EURASIP Journal on Bioinformatics and Systems Biology, 2007, 2007: 14741-14751.
    [111] B. Hao. Symbolic dynamics and characterization of complexity[J]. Physica D, 1991, 51 (1-3): 161-176.
    [112]郑伟谋,郝柏林.实用符号动力学[M].上海:上海科技教育出版社, 1994.
    [113] D. F. Drake, D. B. Williams. Linear, random representations of chaos[J]. IEEE Transactions on Signal Processing, 2007, 55 (4): 1379-1389.
    [114] H. J. Bockenhauer, D. Bongartz. Algorithmic Aspects of Bioinformatics[M]. New York: Springer-Verlag, 2007.
    [115] M. Randic, J. Zupan, D. Vikic-Topic, D. Plavsic. A novel unexpected use of a graphical representation of DNA: Graphical alignment of DNA sequences[J]. Chemical Physics Letters, 2006, 431 (4-6): 375-379.
    [116] Y. Z. Liu, T. M. Wang. Related matrices of DNA primary sequences based on triplets of nucleic acid bases[J]. Chemical Physics Letters, 2006, 417 (1-3): 173-178.
    [117] E. Ambikairajah, J. Epps, M. Akhtar. Gene and exon prediction using time-domain algorithms[C]. Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, Australia: Australia, 2005: 199–202.
    [118] J. P. Mena-Chalco, H. Carrer, Y. Zana, R. M. Cesar Jr. Identification of protein coding regions using the modified gabor-wavelet transform[J]. IEEE IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007: 198-207.
    [119] R. Guigó. DNA composition, codon usage and exon prediction[J]. Genetics Databases, 1999: 53-80.
    [120] M. Burset, R. Guigó. Evaluation of gene structure prediction programs[J]. Genomics, 1996, 34 (3): 353-367.
    [121] I. Arasaratnam, S. Haykin. Cubature Kalman filters[J]. IEEE Transactions on Automatic Control, 2009, 54 (6): 1254-1269.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700