详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
With the rapid development of Human Genome Project (HGP) and advancement of gene sequence, structure and functioning study, more and more bioinformatics data are generated. The enormous data are typically processed efficiently by automated modern analysis approaches. Similarity analysis provides the sequential and structured information to infer or estimate the structure, functioning and evolution relation, hence becomes a fundamental study subject of bioinformatics. The sequence or structure analysis consists of similarity analysis, mutation analysis, phylogenetic analysis and function analysis, which are based on similarity analysis of sequence and structure. Therefore, the dissertation proposed the methods of similarity analysis of DNA sequence and RNA secondary structure based on the new representation models of DNA sequence and RNA secondary structure, it proposed methods of mutation analysis and construction of phylogenetic tree at the same time.
     The dissertation reviewed the recent advances of sequential and structure analysis advances first, and then study graphical representation of DNA sequences based on dual nucleotides, numerical coding method of RNA secondary structure, the methods of the analysis of mutation and structure alignment based on the coding sequences of RNA secondary structure, sequence similarity analysis based on graphical representation and construction of phylogenetic tree.
     The main contents are listed as follows:
     a) The author proposed a 3D graphical representation of DNA sequences based on physical and chemical properties of dual nucleotides, and gave a similarity analysis. It is known that the dependency and interaction between bases are very important for determining the structure and function of the sequences. To give a simple and intrinsic visualization of gene sequences, the dissertation proposed a 3D curve representation of DNA sequences with a dissimilarity measure of sequences based on geometric center covariance matrixes. The experiment showed the proposed approach can measure the similarity of sequences precisely which helps further infer the relation and relationship of species, especially those between human and other species. It may help discover human mechanics based on studies on other species as well.
     b) The author proposed a sequence comparison method and similarity analysis method based on coding scheme. According to DNA coding principle, the dissertation proposed a method that solved four basic problems. It could make analysis of similarity between DNA sequences. The coding method of sequence, which demonstrates sequences efficiently and makes the analysis of mutation visible, helps find out mechanism of diseases. Besides, the coding method provides a better mathematical model to figure out the similarity or dissimilarity between DNA sequences, in the sense that it improves genetic test and the prediction of gene functions.
     c) The author proposed a coding scheme of RNA secondary structure, and gave mutation analysis and structure comparison based on coding and XOR operator. The representation of RNA secondary structure is very complex and easily degenerated. The proposed coding method and its extension can well separate the free base and base pair, and distinguish the different structures including pseudoknot. Based on three digits coding, the dissertation presented RNA secondary structure comparison method, analysis method of mutation. And the dissertation proposed a novel structure comparison method based on coding rules. The experiment showed the excellence of the method.
     d) The author proposed two novel phylogenetic tree construction methods based on fuzzing clustering and minimum spanning tree that essentially make use of the proposed similarity and dissimilarity matrix.
[1]Venter J C, Adams M D, Myers E W, et al. The sequence of the human genome. Science,2001,291(5507):1304-1351
    [3]Altschul S F, Gish W, Miller W et al. Basic local alignment search tool. Mol Biol, 1990,215(2):403-410
    [4]Pearson W R, Lipman D J. Improved tools for biological sequence comparison. Proc Nat Acad Sci,1988,85(8):2444-2448
    [5]Aravind L, Koonin E V. Gleaning non-trivial structural functional and evolutionary information about proteins by iterative database searches. Mol Biol,1999,287(2):1023-1040
    [6]Muller A, MacCallum R M, Sternberg M J. Benchmarking PSIBLAST in genome annotation. Mol Biol,1999,293(1):1257-1271
    [7]Plewniak F, Thompson J D, Poch O. Ballast:blast post-processing based on locally conserved segments. Bioinformatics,2000,16(9):750-759
    [8]Tsoka S, Ouzounis CA. Recent developments and future directions in computational genomics. FEBS Lett,2000,480(3):42-48
    [9]Nevill-Manning C G, Wu T D, Brutlag D L. Highly specific protein sequence motifs for genome analysis. Proc Natl Acad Sci,1998,95(3):5865-5871
    [10]Nevill-Manning C G, Wu T D, Brutlag D L. Highly specific protein sequence motifs for genome analysis. Proc Natl Acad Sci,1998,95(3):5865-5871
    [11]Phillips A, Janies D, Wheeler W. Multiple sequence alignment in phylogenetic analysis. Molecular Phylogenetics and Evolution,2000,16(9):317-330
    [12]Aravind L, Watanabe H, Lipman D J et al. Lineage-specific loss and divergence of functionally linked genes in eukaryotes. Proc Natl Acad Sci,2000,97(10): 11319-11324
    [13]Eisen J A. Assessing evolutionary relationships among microbes from whole-genome analysis. Curr Opin Microbiol,2000,3(5):475-480
    [14]Koonin E V, Aravind L, Kondrashov A S. The impact of comparative genomics on our understanding of evolution. Cell,2000,101(6):573-576
    [15]Cuff J A, Barton G J. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins,2000,40(2):502-511
    [16]Zimmermann K, Gibrat J F. In unison:regularization of protein secondary structure predictions that makes use of multiple sequence alignments. Protein Eng,1998,11(10):861-865
    [17]Alexandrov N N, Luethy R. Alignment algorithm for homology modelling and threading. Protein Sci,1998,7(2):254-258
    [18]Holm L, Sander C. Protein folds and families:sequence and structure alignments. Nucleic Acids Res,1999,27(1):244-247
    [20]Hamori E, Ruskin J. H curves:a novel method of representation of nucleotide series especially suited for long DNA sequences. J Biol Chem,1983,258(1): 1318-1327
    [21]Gates M A.A simple way to look at DNA. J Theor Biol,1986,119(9):319-328
    [22]Nandy A. Graphical representation of long DNA sequences. Curr Sci,1994, 66(12):821
    [23]Leong P M. Random walk and gap plots of DNA sequences. Computer Application Biosciences,1995,11(5):503
    [24]Guo X F, Randic M, Basak S C. A novel 2-D graphical representation of DNA sequences of low degeneracy. Chem Phys Lett,2001,350:106-112
    [25]Zhang R, Zhang C T. Z Curves:An Intuitive Tool for Visualizing and Analyzing DNA sequences. J Biomol Struc Dynamics,1994,11(4):767-782
    [26]Chen LL, Qu HY, Zhang CT, et al. ZCURVE_CoV:a new system to recognize protein coding genes in coronavirus genomes and its applications in analyzing SARS-CoV genomes. Biochemical and Biophysical Research Communications, 2003,307(9):382-388
    [27]Zheng WX, Chen LL, Qu HY, et al. Coronavirus phylogeny based on a geometric approach. Molecular Phylogenetics and Evolution,2005,36(7):224-232
    [28]Guo FB, Ou HY, Zhang CT. ZCURVE:a new system for recognizing protein coding genes in bacterial and archaeal genomes. Nucleic Acids Res,2003,31(8): 1780-1789
    [30]Liao B, Wang TM. New 2D Graphical Representation of DNA Sequences. Journal of Computational Chemistry,2004,25(11):1364-1368
    [31]Liao B. A 2D graphical representation of DNA sequence. Chemical Physics Letters,2005,401(12):196-199
    [32]Liao B, Wang TM.3-D graphical representation of DNA sequences and their numerical characterization, J Mol Str:THEOCHEM,2004,68(6):209-212
    [33]Zhang Y, Liao B, Ding K. On 3DD-curves of DNA sequences. Molecular Simulation,2006,32(1):29-34
    [34]Liao B, Zhu W, Liu Y.3D graphical representation of DNA sequence without degeneracy and its application in constructing phylogenetic tree. Match Commun Math Comput Chem,2006,56(8):209-216
    [35]Randic M. On characterization of DNA primary sequences a condensed matrix. Chem Phys Lett,2000,317(9):29-34
    [36]Randic M. Condensed Representation of DNA Primary Sequences. J Chem Inf Comput Sci,2000,40(5):50-56
    [37]Randic M, Basak S C. Characterization of DNA primary sequences based on the average distances between bases. J Chem Inf Comput Sci,2001,41(6):561-568
    [38]Liao B, Ding KQ, A Graphical Approach to analyzing DNA sequences, Journal of Computational Chemistry,2005,26:1519-1523
    [39]Liao B, Luo JW, Li RF, Zhu W. RNA Secondary structure 2D graphical representation without degeneracy. International Journal of Quantum Chemistry,2006,106(8):1749-1755
    [40]Liao B, Liu YS, Li RF, Zhu W. Coronavirus phylogeny based on triplets of nucleic acids bases. Chemical Physic Letters,2006,421:313-318
    [41]Liao B, Xiang XY, Zhu W, Coronavirus phylogeny based on 2D graphical representation of DNA sequence, Journal of Computational Chemistry,2006,27(11):1196-1202
    [42]Wang WP, Liao B, Wang TM. A grapchical method to construct phylogenetic tree. International Journal of Quantum Chemistry,2006,106(9):1998-2006
    [43]Liao B, Shan XZ, Zhu W, Li RF. Phylogenetic tree construction based on 2D graphical representation. Chemical Physic Letters,2006,422(1-3):282-288
    [44]Liao B, Zhu W, Luo JW, Li RF. RNA Secondary Structure Mathematical Representation without Degeneracy. MATCH Communication in Mathematical and in Computer Chemistry,2007,57(3),687-695
    [45]Cao Z, Liao B, Li RF. A group of 3D graphical representation of DNA sequences based on dual nucleotides. International Journal of Quantum Chemistry,2008,108:1485-1490
    [46]Huang GH, Liao B, Li YF, Liu ZB. H-L curve:A novel 2D graphical representation for DNA sequences. Chemical Physics Letters,2008,462(1-3): 129-132
    [47]Zheng WX, Zhang CT. Biological Implications of Isochore Boundaries in the Human Genome. Journal of Biomolecular Structure and Dynamics,2008,25, 327-336.
    [48]Gao F, Zhang CT. Ori-Finder:a web-based system for finding oriCs in unannotated bacterial genomes. BMC Bioinformatics,2008,9, 79.
    [49]Zhang R, Lin Y, Zhang CT.Greglist. a database listing potential G-quadruplex regulated genes. Nucleic Acids Research,2008,36, D372-D376.
    [50]Hao BL. Fractals from genomes-exact solutions of a biology-inspired problem. Physica A,2000,282,225-246.
    [51]Hao BL, Lee HC, Zhang SY. Fractals related to long DNA sequences and complete genomes. Chaos, Solitons and Fractals,2000,11,825-836.
    [52]Qi J, Wang B, Hao BL. Whole genome prokaryote phylogeny without sequence alignment:a k-string composition approach. Journal of Molecular Evolution, 2003,58(1),1-11.
    [54]Chen WY, Liao B, Liu YS, Zhu W, Su ZZ. A numerical representation of DNA sequence and its applications, MATCH Communications in Mathematical and in Computer Chemistry,2008,60(2):291-300
    [55]Chen WY, Liao B, Xiang XY, Zhu W. An Improved Binary Represntation of DNA Sequences and Its Applications. MATCH Communications in Mathematical and in Computer Chemistry,2009,61(3):767-780
    [56]Cao Z, Liao B, Li RF, Zhu W. A three-dimensional cube representation of RNA secondary structure and its application. Journal of Computational and Theoretical Nanoscience,2009,6,1478-1481
    [57]Gilbert D G. Dot plot sequence comparisons on Macintosh computers. Comput Appl Biosci,1990,6(2):117-128
    [58]Huang Y, Zhang L. Rapid and sensitive dot-matrix methods for genome analysis. Bioinformatics,2004,20(4):460-466
    [59]Needleman S, Wunsch C. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J.Mol.Biol.1970,48(3): 443-453
    [60]Smith T F, Waterman M S. Identification of common molecular subsequences. J. Mol. Biol.,1981,147(2):195-197.
    [62]Thompson JD, Higgins DG, Gibson TJ. CLUSTALW:improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994,22 (22):4673-4680
    [63]Feng D F, Doolittle R F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution,1987,25(4): 351-360
    [64]Feng D F, Doolittle R F. Progressive sequence alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymology,1996,266(1):368-382
    [65]Osamu GH. Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments. Journal of Molecular Biology,1996,264(4):823-838
    [66]Anders K, Michael B, et al. Hidden Markov Models in Computational Biology: Applications to Protein Modeling. Journal of Molecular Biology,1994,235(5): 1501-1531
    [67]Jin K, Sakti P, Chung MJ. Multiple sequence alignment using simulated annealing. Computer Applications in the Biosciences,1994,10(4):419-426
    [68]Alexander VL, Jacob E, Soren B. Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. Nucleic Acids Research, 1992,20(10):2511-2516
    [69]Notredame C, Higgins DG, et al. SAGA:sequence alignment by genetic algorithm. Nucleic Acids Research,1996,24(8):1515-1524
    [70]Berger MP, Munson PJ. A novel randomized iterative strategy for aligning multiple protein sequence. Computer Applications in the Biosciences,1991,7(4): 479-484
    [71]Lawrence CE, Altschul S, et al. Detecting subtle sequence signals:a Gibbs sampling strategy for multiple alignment. Science,1993,262:208-214
    [72]Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research,1988,16(22):10881-10890
    [73]Robert C. Edgar. MUSCLE:multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research,2004,32(5):1792-1797
    [76]Yuan CX, Liao B, Wang TM. New 3D graphical representation of DNA sequences and their numerical characterization, Chemical Physics Letters,2003, 379:412-417
    [77]Liao B, Zhu W, Liu Y.3D graphical representation of DNA sequence without degeneracy and its applications in constructing phylogenic tree. MATCH Communications in Mathematical and in Computer Chemistry,2006,56:209-216
    [78]Liao B, Tan MS, Ding KQ. A 4D representation of DNA sequences and its application. Chemical Physic Letters,2005,402:380-383
    [79]Liao B, Wang TM. Analysis of similarity of DNA sequences based on triplets. J. Chem. Inf. Comput. Sci,2004,44:1666-1670
    [80]Randic M, Varacko M, Plavsic D. Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chemical Physics Leter,2003, 368:1-6
    [81]Randic M, Varacko M, Nandy A, et al. On 3-D graphical representation of DNA primary sequences and their numerical characterization. J Chem Inf Comput Sci, 2000,40:1235-1244
    [82]Li W, Fang WW, Ling L J,et al. Phylogeny based on whole genome as inferred from complete information set analysis. Journal of Biological Physics,2002, 28(3):439-447
    [83]Wang J H, Fang W W, Ling L J, et al. Gene's functional arrangement as a measure of the phylogenetic relationships of microorganisms. Journal of Biological Physics,2002,28 (1):55-62
    [84]Jin L X, Fang W W, Tang H W. Prediction of protein structural classes by a new measure of information discrepancy. Computational Biology and Chemistry, 2003,27 (3):373-380
    [87]阮庆国,陆春叶.基因突变分析技术综述, 国外医学遗传学分册,21(5),1998,225-231
    [88]Huang GH, Liao B, Zhang W, Gong F. A Novel Method for Sequence Alignment and Mutation Analysis. MATCH Communications in Mathematical and in Computer Chemistry,2008,59(3):635-645
    [89]Pauplin Y. Direct calculation of a tree length using a distance matrix. J. Mole. Evol.,2004,51(1):41-47
    [90]Ranwez V, Gascuel O. Improvement of distance-based phylogenetic methods by a local maximum likelihood approach using triplets, Molecular Biology and Evolution,2002,19 (11):1952-1963
    [91]Holder M, Lewis PO. Phylogeny estimation:traditional and Bayesian approaches. Nature Reviews Genetics,2003,4(3):275-284
    [92]Hordijk W, Gascuel O. Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics,2005,21 (24): 4338-4337
    [93]Barker D. LVB:Parsimony and simulated annealing in the search for phylogenetic trees. Bioinformatics,2004,20 (2):274-275
    [94]Sebastien R. A short proof that phylogenetic tree reconstruction by maximum likelihood is Hard. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2006,3 (1):92-94
    [95]Dopazo J., Carazo J.M..Phylogenetic Reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. Mol.Biol.Evol.1997,44:226-233.
    [96]Feng L, Latifur K, Farokh B, et al. A dynamically growing self-organizing tree(DGSOT) for hierarchical clustering gene expression profiles. Bioinformatics.2004,20:2605-2617.
    [98]Takezaki N. Tie trees generated by distance methods of phylogenetic reconstruction. Mol.Biol.Evol.1998,15:727-737
    [99]Zuker M. On Finding All suboptial Foldings of an RNA Molecule. Science, 1989,244:48-52
    [100]Zuker M., Jaeger, Turner D, A comparison of optimal and suboptimal RNA secondary structures predicted by free energy minimization with structures determined by Phylogenetic comparison. NucleicAcidsRes,1991,19(10):2707-2714
    [101]Lee SY, Nussinov R, Mazel JV. Tree graphs of RNA secondary structures and their comparison. Computer Biomed. Res,1989,22:461-473
    [102]Shapiro B, Zhang K. Comparing multiple RNA secondary structures using tree comparisons. Computer.Appl.Biosci,1990,6(4):309-318
    [103]Luo JW, Liao B, Li RF, Zhu W. RNA Secondary Structure 3D Graphical Representation without Degeneracy. Journal of Mathematical Chemistry,2006,39:629-636
    [105]Sankoff D. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J ournal on Applied Mathematics,1985,45 (5):810-825
    [106]Gorodkin J, Stricklin'SL, Stormo G. Discovering common stemloop motifs in unaligned RNA sequences. Nucleic Acids Research,2001,29 (10):2135 2144
    [107]Mathews D, Turner D. An algorithm for finding the secondary structure common to two RNA sequences. Journal of Molecular Biology,2002,317 (2):191-203
    [108]Randic M, Varacko M, Plavsic D. Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chemical Physics Leter,2003, 368:1-6
    [109]Randic M, Varacko M, Nandy A, et al. On 3-D graphical representation of DNA primary sequences and their numerical characterization. J Chem Inf Comput Sci,2000,40:1235-1244
    [110]Qi ZH, Qi XQ. Novel 2D graphical representation of DNA sequence based on dual nucleotides. Chemical Physics Letters,2007,440:139-144
    [111]Liu ZB, Liao B, Zhu W, Huang GH. A 2-D graphical representation of DNA sequence based on dual nucleotides and its application. International Journal of Quantum Chemistry,2009,109(5):948-958
    [112]Huang GH, Liao B, Li RF. Similarity studies of DNA sequences based on a new 2D graphical representation. Biophysical Chemistry,2009,143:55-59
    [113]Cao Z, Liao B, Li RF. A Group of 3D Graphical Representation of DNA Sequences Based on Dual Nucleotides, Internal Journal of Quantum Chemistry, 2008,108:1485-1490
    [114]Li C, Yu XQ, Helal N. Similarity analysis of DNA sequences based on codon usage. Chemical Physics Letters,2008,459:172-174
    [115]Chen W, Zhang YS. Three distances for rapid similarity analysis of DNA sequences. MATCH Commun. Math.Comput.Chem.,2009,61:781-788
    [116]廖波.计算分子生物学中若干问题研究:[大连理工大学博士学位论文].大连: 大连理工大学,2004
    [117]Liao B, Wang TM. A 3D graphical representation of RNA secondary structures. J Biomol Struct Dyn,2004,21(6):827-832
    [118]Liao B, Ding KQ, Wang TM. On a six-dimensional representation of RNA secondary structures. J Biomol Struct Dyn,2005,22(4):455-63
    [119]Liao B, Wang TM. On a seven-dimensional representation of RNA secondary structures. Molecular Simulation,2005,31 (14):1063-1071
    [120]Bai FL, Zhu W, Wang TM. Analysis of similarity between RNA secondary structures. Chemical Physics Letters,2005,408(4):258-263
    [121]Yao YH, Nan XY, Wang TM. A class of 2D graphical representations of RNA secondary structures and the analysis of similarity based on them. Journal of Computational Chemistry,2005,26(13):1339-1346
    [122]Mark D, Stephen LC, Andy DA, et al. Phylogenetic approaches for the analysis of mitochondrial genome sequence data in the Hymenoptera-A lineage with both rapidly and slowly evolving mitochondrial genomes. Molecular Phylogenetics and Evolution,2009,52(2):512-519
    [123]Wu ZG, Shen X, Sun MA, et al. Phylogenetic analyses of complete mitochondrial genome of Urechis unicinctus (Echiura) support that echiurans are derived annelids. Molecular Phylogenetics and Evolution,2009, 52(2):558-562
    [124]Duarte A, Veiga I, Tavares L. Genetic diversity and phylogenetic analysis of Feline Coronavirus sequences from Portugal. Veterinary Microbiology, 2009,138 (1-2):163-168

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700