用户名: 密码: 验证码:
非编码RNA相关计算问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
非编码RNA(non-coding RNAs ,ncRNA)是指不编码蛋白质的那部分转录产物,在各种生命过程中发挥着重要作用,包括基因调控、染色体重塑、基因定位、基因修饰和DNA印记等。对ncRNA的研究不仅具有重要的理论和应用价值,而且将对人们探索生命本质问题提供不可或缺的工具。由于采用实验方法研究ncRNA的各类问题,通常代价高、耗时长,且盲目性强。而随着各种生物基因组测序的先后完成,以及相应各类数据库的建立和不断丰富完善,使得计算方法在ncRNA各类研究中的应用成为可能和必要。本文选择了和ncRNA相关的序列-结构比对、二级结构预测和ncRNA基因识别等经典计算问题为研究课题,采用模式分类中的各种方法展开了深入研究,论文的主要研究内容和创新点包括以下几个方面:
     1、ncRNA序列-结构比对研究。序列比对是计算分子生物学的经典课题,而ncRNA因为其结构保守性比序列保守性更强,使得用传统的序列比对程序得到的结果不能满足各种ncRNA相关分析的需要。为此,要在序列比对的同时更多的考虑ncRNA的结构信息,这成倍的增加了算法复杂度。本文将量子遗传算法引入ncRNA序列-结构比对中,充分利用量子编码的叠加性、种群的多样性和量子旋转门进化的并行性,结合传统遗传算法的突变和交叉操作,提出了一种兼顾了结构和序列信息的全干扰配对保守交叉算子,并定义了充分考虑结构和序列信息的优化目标函数,使得进化速度和对局部最优沦陷的控制达到了较理想的结果,和传统遗传算法相比,缩短了优化过程的时间,提高了比对质量。
     2、ncRNA二级结构预测研究。遵循“结构决定功能”的ncRNA基因,二级结构在其各种相关研究中起着重要作用。传统的ncRNA二级结构预测方法都是基于优化算法,计算复杂度较高,时效性差。本文将ncRNA二级结构预测问题视为分类问题,重点研究了给定序列比对的情况下,根据序列比对提供的各种信息,判断比对的任一列对是否能够形成碱基对。在总结了现有结构预测算法中采用的各种数值计算量后,运用特征选择技术的不同方法对各种数值计算量进行了定量分析(以往预测方法都只进行了定性分析),并选出了适于分类算法进行结构预测的特征子集,此最优特征子集结合了热力学信息(平均碱基对配对矩阵)、共变信息(包含碱基对堆积的共变分值)和进化信息(Akmaev采用的整合序列间进化关系的R统计量)。采用SVM分类器及所选特征,结合茎组合规则,给出了基于分类算法的ncRNA二级结构预测方法,为结构预测提供了新的思路。
     3、microRNA基因前体识别。各类ncRNA基因由于不具备传统基因的识别特征,且在基因组中分布广泛、类别多样、长短不一,使得通用ncRNA基因识别方法效果不佳。microRNA作为一类重要的调控RNA,在许多生命过程中发挥着重要的作用,而microRNA前体(pre-miRNA)识别则是进行相关分析的前提步骤。虽然发卡二级结构是pre-miRNA的一个显著特征,但基因组中存在大量能折叠成发卡结构的非pre-miRNA序列。本文围绕pre-miRNA发卡二级结构特征,研究了如何从具有相似结构特征的序列中识别出pre-miRNA的问题。首先,通过将RNA二级结构“拉伸”,我们提出了一种新的局部序列结构特征,新特征不仅包含了发卡结构中茎结构序列信息,还考虑了凸环和内环的信息。测试显示,这些新的局部序列结构特征的分类表现要优于同类的3SVM特征。然后,为详细刻画pre-miRNA发卡二级结构信息,我们将图论和计算化学中的拓扑指数相结合,构造了新的自由能权重图拓扑指数特征。通过对这些拓扑指数的统计分析,以及和其他现有pre-miRNA识别特征的综合比较,显示了新的拓扑指数特征,不仅能够很好的刻画出pre-miRNA发卡结构中各元件的拓扑关系,而且通过自由能权重体现了结构中碱基组成及其相对位置。最后,通过特征选择技术,我们从包括了4种拓扑指数特征的52个候选特征中,选出了适合于pre-miRNA识别的23个全局特征子集,并通过对类不平衡数据的有效处理,得到了一个性能较好的pre-miRNA识别模型。
Non-coding RNAs (ncRNA) are defined as all functional RNA transcripts other than protein encoding messenger RNAs (mRNA). The ncRNAs play many key roles in the various process of life, including gene regulation, chromatin remodeling, gene localization, gene modification and DNA imprinting. Researching in ncRNAs not only has importance of theory and applications but also will offer necessary tools for exploring the hypostasis of life. It is usually expensive, time consuming and aimless to research in ncRNAs with experimental methods, which are essential for understanding ncRNAs. However, computational tools for researching ncRNAs are sorely possible and needed, with successive accomplishment of sequencing of various genomes and the establishment and enrichment of corresponding databases. This dissertation focuses on the theme of classic computational problems related with ncRNAs, including sequence-structure alignment, secondary structure prediction and identification of ncRNAs genes. The main contents and contributions of the dissertation are summarized as follows:
     1. The research on ncRNAs sequence-structure alignment. Sequences alignment is one of the classic problems in computational molecule biology. NcRNAs molecules are highly conserved in secondary structure but share little sequence similarity, therefore the traditional methods of multiple alignments fail to meet the needs of analysis involved with ncRNAs. This in turn means that the computation of reliable ncRNAs alignments must take structural information into account, which results in visibly increase in computational complexity. To deal with this problem, we employ the quantum genetic algorithm (QGA) which is based on the concept and principles of quantum computing such as a quantum bit and superposition of states. Moreover, we design a new full interference pair crossover operator and construct a fitness function, which consider information of sequences and structures simultaneously. Experiments on BRAlibase show that QGA performs well without premature convergence, and have shorter optimization time and higher solution quality compared to the conventional genetic algorithm.
     2. The research on ncRNAs secondary structure prediction. The secondary structures of ncRNAs, which determine their function, are crucial to related researches. Most of the traditional methods for ncRNAs secondary structure prediction use optimization algorithm, which suffers from high space and time complexity. Given aligned ncRNA sequences, we consider secondary structure prediction as a classification problem: to judge whether any two columns in the alignment correspond to a base pair using provided information by alignment. After analyzing various computational measures used in the existing prediction methods, the classification capability of those measures was compared quantitatively using filter and wrapper approach with combination of support vector machine (SVM) classifier. As a result, an optimum subset of computational measures, including thermodynamic, covariation and phylogenetic information, was selected for predicting RNA secondary structure by classification. Our method used SVM classifier with selected measures and the rules of stem combination to predict ncRNA secondary structure, which represent a new methodology for future ncRNA secondary structure prediction approaches.
     3. The research on the precursors of microRNA genes. The universal computational methods to identify ncRNA genes are far from satisfactory because ncRNA genes have less signals in comparison with protein coding genes, and moreover, they are widely distributed in genome and have various varieties in kind and length. As one of important regulatory ncRNAs, microRNA plays crucial roles in lots of life processes. Identifying microRNA precursors (pre-miRNAs) is a primary step for analysis problems involved with microRNA genes. While the hairpin secondary structure is a distinguishing feature of pre-miRNAs, there are a large number of sequences folding into them, which are not pre-miRNAs. Focused on hairpin secondary structure, we research prediction methods to distinguish pre-miRNA hairpins from pre-miRNA-like pseudo hairpins. Firstly, 25 novel local features for identifying hairpin structures of pre-miRNAs were proposed by pulling hairpin of RNA, which captures characteristics on not only the stem but also bulge and interior loop in structure. The tests show that the classifier with new features outperformed the 3SVM. Secondly, to characterize detailed information of pre-miRNA hairpin, four topological indices weighted by free energy are defined. Exploration on these indices shows that they could not only characterize topological connection of elements, but also depict composition and relative position of bases in structure. Finally, we select 23 features from 52 candidates, which include 4 new topological indices, as feature set to identify pre-miRNA. And moreover, through handling of class imbalance problem in the datasets, an effective classifier model for pre-miRNA is developed.
引文
[1] Crick F H C. Central dogma of molecular biology [J]. Nature, 1958, 227: 561-563.
    [2] Guerrier-Takada C, Gardiner K, Marsh T, et al. The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme [J]. Cell, 1983, 35(3 Pt 2): 849-857.
    [3] Gilbert W. Origin of Life, The RNA World [J]. Nature, 1986, 319: 618.
    [4] Berget S M, Moore C, Sharp P A. Spliced segments at the 5' terminus of adenovirus 2 late mRNA [J]. Proceedings of the National Academy of Sciences of the United States of America, 1977, 74(8): 3171-3175.
    [5] Chow L T, Gelinas R E, Broker T R, et al. An amazing sequence arrangement at the 5' ends of adenovirus 2 messenger RNA [J]. Cell, 1977, 12(1): 1-8.
    [6]金由辛.核糖核酸与核糖核酸组学[M].北京:科学出版社, 2005.
    [7] Barciszewski J, Erdmann V A. Noncoding RNAs: Molecular Biology and Molecular Medicine[M]. New York: Kluwer Academic/Plenum Publishers, 2003.
    [8] Kavanaugh L A. Use of comparative genomics for non-coding RNA prediction and investigation of DNA introgression in yeast[D]. Durham, North Carolina, U.S.: Duke University, 2008.
    [9] Eddy S R. Non-coding RNA genes and the modern RNA world [J]. Nature Rev Genet, 2001, 2: 919-929.
    [10] Huttenhofer A, Schattner P, Polacek N. Non-coding RNAs: hope or hype [J]. Trends Genet, 2005, 21(5): 289-297.
    [11] Mattick J S. Challenging the dogma: the hidden layer of nonprotein-coding RNAs in complex organisms [J]. Bioessays, 2003, 25: 930.
    [12] Mattick J S. RNA regulation: a new genetics? [J]. Nature Reviews, 2004, 5(4): 316-323.
    [13]齐力旺, Li X,张守攻等.非编码蛋白RNA的遗传调控[J].中国科学C辑生命科学2006, 36(3): 193-208.
    [14] Ogita S, Uefuji H, Yamaguchi Y, et al. Producing decaffeinated coffee plants [J]. Nature, 2003, 423: 823.
    [15] Liu Q, Singh S, Green A. High-oleic and high-stearic cottonseed oils: Nutritionally improved cooking oils developed using gene silencing [J]. Journal of the American College of Nutrition, 2002, 21(Suppl 3): 205s-211s.
    [16] Segal G, Song R, Messing J. A new opaque variant of maize by a single dominant RNA-interference-inducing transgene [J]. Genetics, 2003, 165(1): 387-397.
    [17] Calin G A, Croce C M. MicroRNA-Cancer connection: the beginning of a new tale [J]. Cancer Research, 2006, 66: 7390-7394.
    [18]郭艳合,张义玲,刘立等.非编码RNA与人类重大疾病的发生及其在生物医学领域内的应用[J].中国病理生理杂志, 2009, 25(6): 1232-1239.
    [19] Filipowicz W. imprinted expression of small nucleolar RNAs in brain: time for RNomics[C]. 2000:14035-14037.
    [20]金由辛.核糖核酸与核糖核酸组学[M].北京:科学出版社, 2005.
    [21] Bindewald E, Shapiro B A. RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers [J]. RNA, 2006, 12(3): 342-352.
    [22] Knudsen B, Hein J. Pfold: RNA secondary structure prediction using stochastic context-free grammars [J]. Nucleic Acids Res, 2003, 31(13): 3423-3428.
    [23] Pedersen J S, Meyer I M, Forsberg R, et al. A comparative method for finding and folding RNA secondary structures within protein-coding regions [J]. Nucleic Acids Res, 2004, 32(16): 4925-4936.
    [24] Griffiths-Jones S, Baterman, A., Marshall,M.,Khanna,A.,Eddy,S.R. Rfam: an RNA family database [J]. Nucleic Acids Res, 2003, 31, 439–441.
    [25] Rivas E, Eddy,S. Noncoding RNA gene detection using comparative sequence analysis [J]. BMC Bioinformatics, 2001, 2, 8.
    [26] Washietl S, Hofacker,I. and Stadler,P. Fast and reliable prediction of noncoding RNAs [J]. Proc Natl Acad Sci USA, 2005, 102, 2454–2459.
    [27] Hudelot C, Gowri-Shankar,V., Jow,H., Rattray,M. and Higgs,P. RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequences [J]. Mol Phylogenet Evol, 2003, 28: 241–252.
    [28] Gardner P P, Wilm A, Washietl S. A benchmark of multiple sequence alignment programs upon structural RNAs [J]. Nucleic Acids Res, 2005, 33(8): 2433-2439.
    [29] Washietl S, Hofacker I L. Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics [J]. Journal of Molecular Biology, 2004, 342(1): 19-30.
    [30] Bauer M, Klau G W, Reinert K. Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization [J]. BMC Bioinformatics, 2007, 8: 271.
    [31] Thompson J D, Higgins D G, Gibson T J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice [J]. Nucleic Acids Res, 1994, 22(22): 4673-4680.
    [32] Hofacker I L, Fekete M, Stadler P F. Secondary structure prediction for aligned RNA sequences [J]. Journal of Molecular Biology, 2002, 319(5): 1059-1066.
    [33] Sankoff D. Simultaneous solution of the RNA folding,alignment, and proto-sequence problems [J]. SIAM Journal on Applied Mathematics, 1985, 45: 810-825.
    [34] Smith T, Waterman M. Identification of common molecular subsequences [J]. J Mol Biol, 1981, 147: 195-197.
    [35] Nussinov R, Pieczenik G, Griggs J, et al. Algorithms for loop matchings [J]. SIAM JAppl Math, 1978, 35(1): 68-82.
    [36] Zuker M, Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information [J]. Nucleic Acids Res, 1981, 9(1): 133-148.
    [37] Corpet F, Michot B. RNAlign program: alignment of RNA sequences using both primary and secondary structures [J]. Comput Appl Biosci, 1994, 10(4): 389-399.
    [38] Bafna V, Muthukrishnan S, Ravi R. Computing similarity between RNA strings[C]. Springer, 1995:1-16.
    [39] Gorodkin J, Heyer L J, Stormo G D. Finding the most significant common sequence and structure motifs in a set of RNA sequences [J]. Nucleic Acids Res, 1997, 25(18): 3724-3732.
    [40] Mathews D H, Turner D H. Dynalign: an algorithm for finding thesecondary structure common to two RNA sequences [J]. Journal of Molecular Biology, 2002, 317(2): 191-203.
    [41] Kiryu H, Tabei Y, Kin T, et al. Murlet: a practical multiple alignment tool for structural RNA sequences [J]. Bioinformatics, 2007, 23(13): 1588-1598.
    [42] Do C B, Foo C-S, Batzoglou S. A max-margin model for efficient simultaneous alignment and folding of RNA sequences [J]. Bioinformatics, 2008, 24(13): i68-76.
    [43] Sakakibara Y, Brown M, Hughey R, et al. Stochastic context-free grammars for tRNA modeling [J]. Nucleic Acids Res, 1994, 22(23): 5112-5120.
    [44] Hofacker I L, Bernhart S H, Stadler P F. Alignment of RNA base pairing probability matrices [J]. Bioinformatics, 2004, 20(14): 2222-2227.
    [45] Torarinsson E, Havgaard J H, Gorodkin J. Multiple structural alignment and clustering of RNA sequences [J]. Bioinformatics, 2007.
    [46] Will S, Reiche K, Hofacker I L, et al. Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering [J]. PLoS Computational Biology, 2007, 3(4): e65.
    [47] Holm L. A probabilistic model for the evolution of RNA structure [J]. BMC Bioinformatics, 2004, 5: 166.
    [48] Holm L, Rubin G M. Pairwise RNA structure comparison with stochastic context-free grammars [J]. Pacific Symposium on Biocomputing, 2002:163-174.
    [49] Siebert S, Backofen R. MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons [J]. Bioinformatics, 2005, 21(16): 3352-3359.
    [50] Notredame C, Higgins D G, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment [J]. Journal of Molecular Biology, 2000, 302(1): 205-217.
    [51] Dalli D, Wilm A, Mainz I, et al. STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time [J]. Bioinformatics, 2006, 22(13): 1593-1599.
    [52] Waterman M S, Smith T F. RNA secondary structure: a complete mathematical analysis [J]. Math Biosci, 1978, 42: 257-266.
    [53] Waterman M. Secondary structure of single-stranded nucleic acids [J]. Studies in Foundations &Combinatorics, Advances in Mathematics Supplementary Studies, 1978, 1: 167-212.
    [54] Zuker C, Lodish H F. Repetitive DNA sequences cotranscribed with developmentally regulated Dictyostelium discoideum mRNAs [J]. Proceedings of the National Academy of Sciences of the United States of America, 1981, 78(9): 5386-5390.
    [55] Turner D H, Sugimoto N. RNA structure prediction [J]. Annual Review of Biophysics and Biophysical Chemistry, 1988, 17: 167-192.
    [56] Zuker M. On finding all suboptimal foldings of an RNA molecule [J]. Science, 1989, 244(4900): 48-52.
    [57] Rivas E, Eddy S R. A dynamic programming algorithm for RNA structure prediction including pseudoknots [J]. Journal of Molecular Biology, 1999, 285(5): 2053-2068.
    [58] Zuker M. Calculating nucleic acid secondary structure [J]. Curr Opin Struct Biol, 2000, 10(3): 303-310.
    [59] James B D, Olsen G J, Pace N R. Phylogenetic comparative analysis ofRNA secondary structure [J]. Methods in Enzymology, 1989, 180: 227-239.
    [60] Winker S, Overbeek R, Woese C R, et al. Structure detection through automated covariance search [J]. Comput Appl Biosci, 1990, 6(4): 365-371.
    [61] Eddy S R, Durbin R. RNA sequence analysis using covariance models [J]. Nucleic Acids Res, 1994, 22(11): 2079-2088.
    [62] Le S Y, Zhang K, Maizel J V, Jr. A method for predicting common structures of homologous RNAs [J]. Computers and Biomedical Research, 1995, 28(1): 53-66.
    [63] Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history [J]. Bioinformatics, 1999, 15(6): 446-454.
    [64] Searls D B. Linguistic approaches to biological sequences [J]. Comput Appl Biosci, 1997, 13(4): 333-344.
    [65] Witwer C, Hofacker I L, Stadler P F. Prediction of consensus RNA secondary structures including pseudoknots [J]. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 2004, 1(2): 66-77.
    [66] Lück R, Riesner D, Steger G. Thermodynamic prediction of conserved secondary structure: application to the RRE element of HIV, the tRNA-like element of CMV and the mRNA of prion protein [J]. Journal of Molecular Biology, 1996, 258(5): 813-826.
    [67] Lück R, Gr?f S, Steger G. ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure [J]. Nucleic Acids Research, 1999, 27(21): 4208-4217.
    [68] Do C B, Woods D A, Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models [J]. Bioinformatics, 2006, 22(14): e90-98.
    [69] Tabaska J, Cary R B, Gabow H, et al. An RNA folding method capable of identifying pseudoknots and base triples [J]. Bioinformatics, 1998, 14: 691-699.
    [70] Ruan J, Stormo G D, Zhang W. An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots [J]. Bioinformatics, 2004, 20:58-66.
    [71] Altschul S F. Basic local alignment search tool [J]. J Mol Biol, 1990, 215(3): 403-410.
    [72] Pearson W R. Flexible sequence similarity searching with the FASTA3 program package [J]. Methods in Molecular Biology, 2000, 132: 185-219.
    [73] Tanzer A, Stadler P F. Molecular evolution of a microRNA cluster [J]. Journal of Molecular Biology, 2004, 339(2): 327-335.
    [74] Weber M J. New human and mouse microRNA genes found by homology search [J]. The FEBS Journal, 2005, 272(1): 59-73.
    [75] Macke T J, Ecker D J, Gutell R R, et al. RNAMotif, an RNA secondary structure definition and search algorithm [J]. Nucleic Acids Res, 2001, 29(22): 4724-4735.
    [76] Pesole G, Liuni S, D'Souza M. PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance [J]. Bioinformatics, 2000, 16(5): 439-450.
    [77] Gautheret D, Lambert A. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles [J]. Journal of Molecular Biology, 2001, 313(5): 1003-1011.
    [78] Lai E C, et al. Computational identification of Drosophila microRNA genes [J]. Genome Biology, 2003, 4(7): R42.
    [79] Laslett D, Canback B, Andersson S. BRUCE: a program for the detection of transfermessenger RNA genes in nucleotide sequences [J]. Nucleic Acids Research, 2002, 30: 3449-3453.
    [80] Lowe T M, Eddy S R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence [J]. Nucleic Acids Research, 1997, 25(5): 955-964.
    [81] Lowe T M, Eddy S R. A computational screen for methylation guide snoRNAs in yeast [J]. Science, 1999, 283(5405): 1168-1171.
    [82] Edvardsson S, al. e. A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction [J]. Bioinformatics, 2003, 19(7): 865-873.
    [83] Regalia M, Rosenblad M A, Samuelsson T. Prediction of signal recognition particle RNA genes [J]. Nucleic Acids Research, 2002, 30(15): 3368-3377.
    [84] Rivas E, Eddy S R. Noncoding RNA gene detection using comparative sequence analysis [J]. BMC Bioinformatics, 2001, 2: 8.
    [85] di Bernardo D, T. Down, Hubbard T. ddbRNA: detection of conserved secondary structures in multiple alignments [J]. Bioinformatics, 2003, 19(13): 1606-1611.
    [86] Washietl S, Hofacker I L, Stadler P F. Fast and reliable prediction of noncoding RNAs [J]. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(7): 2454-2459.
    [87] Coventry A, Kleitman D J, Berger B. MSARI: multiple sequence alignments for statistical detection of RNA secondary structure[J]. Proceedings of the National Academy of Science of the USA, 2004, 101(33):12102-12107.
    [88] Pedersen J S, Bejerano G, Siepel A, et al. Identification and Classification of Conserved RNA Secondary Structures in the Human Genome [J]. PLoS Computational Biology, 2006, 2(4): e33.
    [89] Holbrook S R, Kim S H. RNA crystallography [J]. Biopolymers, 1997, 44: 3-21.
    [90] Kjems J, Egebjerg J. Modern methods for probing RNA structure [J]. Cur Opin Biotech, 1996, 9: 59-65.
    [91] Brion P, Westhof E. Hirarchy and dynamics of RNA folding [J]. Rev Biophys Bioimol Struct, 1997, 26: 113-137.
    [92] Onoa B, Tinoco I, Jr. RNA folding and unfolding [J]. Curr Opin Struct Biol, 2004, 14(3): 374-379.
    [93] Pyle A M, Green J B. RNA folding [J]. Curr Opin Struct Biol, 1995, 5(3): 303-310.
    [94] Tinoco I, Jr., Bustamante C. How RNA folds [J]. Journal of Molecular Biology, 1999, 293(2): 271-281.
    [95] Doty P, Boedtker H, Fresco J R, et al. Secondary structure in ribonucleic acids [C]. 1959:482-499.
    [96] Fresco J R, Alberts B M, Doty P. Some molecular details of the secondary structure of ribonucleic acid [J]. Nature, 1960, 188: 98-101.
    [97] Hogeweg P, Hesper B. Energy directed folding of RNA sequences [J]. Nucleic Acids Res, 1984, 12(1 Pt 1): 67-74.
    [98] Hofacker I L, Fontana W, Stadler P F, et al. Fast folding and comparison of RNA secondary structures [J]. Monatsh Chemie, 1994, 125: 167-188.
    [99] Gan H H, Pasquali S, Schlick T. Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design [J]. Nucleic Acids Research, 2003, 31(11): 2926-2943.
    [100] Fontana W, Konings D A, Stadler P F, et al. Statistics of RNA secondary structures [J]. Biopolymers, 1993, 33(9): 1389-1404.
    [101] Shu W, Bo X, Zheng Z, et al. A novel representation of RNA secondary structure based on element-contact graphs [J]. BMC Bioinformatics, 2008, 9: 188.
    [102] Batuwita R, Palade V. microPred: Effective classification of pre-miRNAs for human miRNA gene prediction [J]. Bioinformatics, 2009, 25(8): 989-995.
    [103] Zuker M, Mathews D H, Turner D H. Algorithms and thermodynamics for RNA secondary structure prediction: A practical guide [Z]. The Netherlands: Kluwer Academic Publishers Dordrecht, 1999:11-43.
    [104] Wiener H. Structural determination of paraffin boiling points [J]. J Amer Chen Soc, 1947, 69: 17-20.
    [105] Plesnik J. On the sum of all distances in a graph or digraph [J]. J Graph Theory, 1984, 8: 1-12.
    [106] Entringer R C, Jackson D E, Snyder D A. Distance in graphs [J]. Czechoslovak Math J, 1976.
    [107] Bollobás B, Erd?s P. Graphs of extremal weights [J]. Ars Combinatoria, 1998, 50: 225-233.
    [108] Zmazek B, Zerovnik J. Computing the Weighted Wiener and Szeged Number on Weighted Cactus Graphs in Linear Time [J]. Croatica Chemica Acta, 2003, 76(2): 137-143.
    [109] Levenshtein V I. Binary codes capable of correcting deletions, insertions, and reversals [J]. Cybern Control Theory, 1966, 10: 707-710.
    [110] Wilm A, Higgins D G, Notredame C. R-Coffee: a method for multiple alignment of non-coding RNA [J]. Nucleic Acids Research, 2008, 36(9): e52.
    [111] Wilm A, Mainz I, Steger G. An enhanced RNA alignment benchmark for sequence alignment programs [J]. Algorithms Mol Biol, 2006, 1: 19.
    [112] Thompson J D, Plewniak F, Poch O. A comprehensive comparison of multiple sequence alignment programs [J]. Nucleic Acids Research, 1999, 27(13): 2682-2690.
    [113] Levenshtein V I. Binary codes capable of correcting deletions, insertions and reversals [J]. Soviet Physics Doklady, 1966, 6: 707-710.
    [114] Reinert K. A Polyhedral Approach to Sequence Alignment Problems[D]: Universtit?t des Saarlandes, 1999.
    [115] Mount D W. Bioinformatics: Sequence and Genome Analysis[M]: Cold Spring Harbor Laboratory Press, 2001.
    [116] Gibbs A J, McIntyre G A. The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences [J]. Eur J Biochem, 1970, 16: 1-11.
    [117] Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins [J]. Journal of Molecular Biology, 1970, 48(3): 443-453.
    [118] Taylor W R. A flexible method to align large numbers of biological sequences [J]. J Mol Evol, 1988, 28: 161-169.
    [119] Higgins D G, Sharp P M. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer [J]. Gene, 1988, 73: 237-244.
    [120] Feng D F, Doolittle R F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees [J]. Journal of Molecular Evolution, 1987, 25(4): 351-360.
    [121] Higgins D G, Thompson J D, Gibson T J. Using CLUSTAL for multiple sequence alignments [J]. Methods in Enzymology, 1996, 266: 383-402.
    [122] Eddy S R. Profile hidden Markov models [J]. Bioinformatics, 1998, 14: 755-763.
    [123] Karplus K, Hu B. Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE ultiple alignment test set [J]. Bioinformatics, 2001, 17: 713-720.
    [124] Kim J, Pramanik S, Chung M J. Multiple sequence alignment using simulated annealing [J]. Comput Applic Biosci, 1994, 10: 419-426.
    [125] Riaz T, Wang Y, Li K B. Multiple sequence alignment using tabu search[C]. Australian Computer Society, New Zealand, 2004:223-232.
    [126] Notredame C, Higgins D G. SAGA: sequence alignment by genetic algorithm [J]. Nucleic Acids Res, 1996, 24: 1515-1524.
    [127] Zhang C, Wang A K. A genetic algorithm for multiple molecular sequence alignment [J]. Comput Appl Biosci, 1997, 13: 565-581.
    [128] Anbarasu L A, Narayanasamy P, Sundararajan V. Multiple molecular sequence alignment by island parallel genetic algorithm [J]. Curr Sci, 2000, 78: 858-863.
    [129] Nguyen H D, Yoshihara I, Yamamori K, et al. Aligning multiple protein sequences by parallel hybrid genetic algorithm [J]. Genome Informatics, 2002, 13: 123-132.
    [130] Shyu C, Sheneman L, Foster J A. Multiple sequence alignment with evolutionary computation [J]. Genet Prog Evol Mach, 2004, 5: 121-144.
    [131] Chellapilla K, Fogel G B. Multiple sequence alignment using evolutionary programming[Z]. Washington DC, 1999.
    [132] Holland J H. Genetic algorithms and classifier systems: foundations and their applications[C]. Lawrence Erlbaum Associates, 1987.
    [133] Holland J H. Adaptation in natural and artificial systems[M]. Ann Arbor The University of Michigan Press, 1975.
    [134] Benioff P. Quantum Mechanical Hamiltonian model of Turning Machines [J]. J Stat Phys, 1982, 29: 515-546.
    [135] Feynman R. Simulating Physics with Compters [J]. Int J Theory phys, 1982, 21(6-7): 467-488.
    [136] Deutsch D. Quantum theory, the Chruch-Turing principle and the universal quantum computer [J]. Proc of Roy Soc London A, 1985, 400: 97-117.
    [137] Shor P W. Algorithm for quantum computation:Discrete logarithms and factoring[C]. IEEE Press, 1994:124-134.
    [138] Grover L K. A fast quantum mechanical algorithm for database search[Z]. New York: ACM, 1996:212-219.
    [139] Spector L, Barnum H, Bernstein H J, et al. Finding a better-than-classical quantum AND/OR algorithm using genetic programming[C]. Washington, D.C., 1999:2239-2246.
    [140] Rubinstein B I P. Evolving quantum circuits using genetic programming[C]. Seoul,Korea, 2001:144-151.
    [141] Lukac M, Perkowski M. Evolving quantum circuits using genetic algorithm[C]. 2002:177-185.
    [142] Han K-H, Kim J-H. Quantum-inspired evolutionary algorithm for a classof combinatorial optimization[C]. 2002:580-593.
    [143] Narayanan A, Moore M. Quantum-inspired genetic algorithms[C]. Nagoya, Japan, 1996:61-66.
    [144] Han K. A graphical tool for parametric simulation of the RNA structure formation [J]. Molecules and Cells, 2000, 10(3): 348-355.
    [145] Han K-H, Kim J-H. On setting the parameters of quantum-inspired evolutionary algorithm for practical applications[C]. Canberra, Australia, 2003:178-184.
    [146] Han K H, Kim J H. Quantum-inspired evolutionary algorithm for a class of combinatorial optimization [J]. IEEE Trans Evol Comput, 2002, 6(6): 580-593.
    [147] Carrillo H, Lipman D J. The multiple sequence alignment problem in biology [J]. SIAM J Appl Math, 1988, 48: 1073-1082.
    [148] Durbin R, Eddy S, Krogh A, et al. Biological Sequence Analysis: Probabilistic Models of Protein and Nucleic Acids[M]. Cambridge: Cambridge University Press, 1998.
    [149] Altschul S F, Erickson B W. Optimal sequence alignment using affine gap costs [J]. Bull Math Biol, 1986, 48: 603-616.
    [150] Notredame C, Holm L, Higgins D G. COFFEE: an objective function for multiple sequence alignments [J]. Bioinformatics, 1998, 14(5): 407-422.
    [151] Lindgreen S, Gardner P P, Krogh A. MASTR: Multiple alignment and structure prediction of non-coding RNAs using simulated annealing [J]. Bioinformatics, 2007, 23(24): 3304-3311.
    [152] McCaskill J S. The equilibrium partition function and base pair binding probabilities for RNA secondary structure [J]. Biopolymers, 1990, 29(6-7): 1105-1119.
    [153] Junan Y, Bin L, Zhengquan Z. Research of quantum genetic algorithm and its application in blind source separation [J]. Journal of Electronics (China), 2003, 20(1): 62-68.
    [154] SQUID. SQUID - C function library for sequence analysis[Z].
    [155] Dash M, Liu H. Feature Selection for Classification [J]. Intelligent Data Analysis 1, 1997, 3: 131-156.
    [156] Y.Saeys. Feature selection for classification of nucleic acid sequences[D]: Ghent University, 2004.
    [157] Guyon I, Elisseeff A. An introduction to variable and feature selection [J]. J Mach Learn Res, 2003, 3: 1157-1182.
    [158] Webb A R. Statistical pattern recognition[M]. New York: John Wiley & Sons, 2002.
    [159] Vapnik V. The Nature of Statistical Learning Theory[M]. New York: Springer, 1995.
    [160] Vapnik V. Statistical Learning Theory[M]. New York: Wiley, 1998.
    [161] Ding C H Q, Dubcbak I. Multi-class protein fold recognition using support vector machines and neural networks [J]. Bioinformatics, 2001, 17(349-358).
    [162] Furey T S, Cristianini N, Duffy N, et al. Support vectr machine classification and validation of cancer tissue samples using microarray expression data [J]. Bioinformatics, 2000, 16: 906-914.
    [163] Zien A, Ratsch G, Mika S, et al. Engineering support vector machine kernels that recognize translation initiation sites [J]. Bioinformatics, 2000, 16: 799-807.
    [164] Chiu D K, Kolodziejczak T. Inferring consensus structure from nucleic acid sequences [J]. Comput Appl Biosci, 1991, 7(3): 347-352.
    [165] Gutell R R, Power A, Hertz G Z, et al. Identifying constraints on thehigher-order structure of RNA: continued development and application of comparative sequence analysis methods [J]. Nucleic Acids Research, 1992, 20(21): 5785-5795.
    [166] Freyhult E, Moulton V, Gardner P. Predicting RNA structure using mutual information [J]. Applied Bioinformatics, 2005, 4(1): 53-59.
    [167] Martin L C, Gloor G B, Dunn S D, et al. Using information theory to search for co-evolving residues in proteins [J]. Bioinformatics, 2005, 21: 4116-4124.
    [168] Lindgreen S, Gardner P P, Krogh A. Measuring covariation in RNA alignments: physical realism improves information measures [J]. Bioinformatics, 2006, 22(24): 2988-2995.
    [169] Klingler T, Brutlag D. Detection of correlations in tRNA sequences with structural implications[C]. 1993:225-233.
    [170] Muse S V. Evolutionary analyses of DNA sequences subject to constraints of secondary structure [J]. Genetics, 1995, 139(3): 1429-1439.
    [171] Gulko B, Haussler D. Using multiple alignments and phylogenetic trees to detect RNA secondary structure [J]. Pacific Symposium on Biocomputing, 1996: 350-367.
    [172] Akmaev V R, Kelley S T, Stormo G D. A phylogenetic approach to RNA structure prediction [J]. Proceedings / International Conference on Intelligent Systems for Molecular Biology ; ISMB, 1999: 10-17.
    [173] Akmaev V R, Kelley S T, Stormo G D. Phylogenetically enhanced statistical tools for RNA structure prediction [J]. Bioinformatics, 2000, 16(6): 501-512.
    [174] Juan V, Wilson C. RNA secondary structure prediction based on free energy and phylogenetic analysis [J]. Journal of Molecular Biology, 1999, 289(4): 935-947.
    [175] Hofacker I L. Vienna RNA secondary structure server [J]. Nucleic Acids Research, 2003, 31(13): 3429-3431.
    [176] Hofacker I L, Fontana, W., Stadler, P. F., Bonhoeffer, S., Tacker,M. & Schuster, P. Fast folding and comparison of RNA secondary structures [J]. Monatsh Chemie, 1994, 125: 167-188.
    [177] Griffiths-Jones S. Rfam: annotating non-coding RNAs in complete genomes [J]. Nucleic Acids Research, 2005, 33: 121-124.
    [178] Gardner P P, Giegerich R. A comprehensive comparison of comparative RNA structure prediction approaches [J]. BMC Bioinformatics, 2004, 5: 140.
    [179] S.Theodoridis, K.Koutroumbas. Pattern Recognition 2nd ed.[M]: Elsevier Science, 2004.
    [180] Bindewald E, Schneider T D, Shapiro B A. CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments [J]. Nucleic Acids Research, 2006, 34: W405-W411.
    [181] Mattick J S. The Functional Genomics of Noncoding RNA [J]. Science, 2005, 309: 1527-1528.
    [182] Group F C a R G E R, Group) G S G G N P C. The transcriptional landscape of the mammalian genome [J]. Science, 2005, 309: 1559-1563.
    [183] Group F C a R G E R, Group) G S G G N P C. Antisense transcription in the mammalian transcriptome [J]. Science, 2005, 309: 1564-1566.
    [184] Shabalina S A, Spiridonov N A. The mammalian transcrip tome and the function of non-coding DNA sequences [J]. Genome Biol, 2004, 5: 105.
    [185] Bartel D P. MicroRNAs: genomics, biogenesis, mechanism, and function [J]. Cell, 2004, 116: 281-297.
    [186] Kim V N, Nam J. Genomics of microRNA [J]. Trends in Genetics, 2006, 22(3): 165-173.
    [187] Metzler M, Wilda M, Busch K, et al. High expression of precursor microRNA-155/BIC RNA in children with Burkitt lymphoma [J]. Genes, Chromosomes & Cancer, 2004, 39(2): 167-169.
    [188] Lu J, Getz G, Miska E A, et al. MicroRNA expression profiles classify human cancers [J]. Nature, 2005, 435(7043): 834-838.
    [189] Johnson S M, Grosshans H, Shingara J, et al. RAS is regulated by the let-7 microRNA family [J]. Cell, 2005, 120: 635-647.
    [190] O'Donnell K A, Wentzel E A, Zeller K I, et al. C-Mycregulated microRNAs modulate E2F1 expression [J]. Nature, 2005, 435: 839-843.
    [191] He L, Thomson J M, Hemann M T, et al. A microRNA polycistron as a potential human oncogene [J]. Nature, 2005, 435: 828-833.
    [192] Iorio M V, Ferracin M, Liu C G, et al. MicroRNA gene expression deregulation in human breast cancer [J]. Cancer Res, 2005, 65: 7065-7070.
    [193] Kumar M S, Lu J, Mercer K L, et al. Impaired microRNA processing enhances cellular transformation and tumorigenesis [J]. Nat Genet, 2007, 39: 673-677.
    [194] James R B, Sanseau P. A computational view of microRNAs and their targets [J]. DRUG DISCOVERY TODAY: BIOSILICO, 2005, 10(8): 595-601.
    [195] Lagos-Quintana M, Rauhut R, Lendeckel W, et al. Identification of novel genes coding for small expressed RNAs [J]. Science, 2001, 294(5543): 853-858.
    [196] Lau N C, Lim L P, Weinstein E G, et al. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans [J]. Science, 2001, 294(5543): 858-862.
    [197] Lee R C, Ambros V. An extensive class of small rnas in Caenorhabditis elegans [J]. Science, 2001, 294: 862-864.
    [198] Lagos-Quintana M, Rauhut R, Meyer J, et al. New microRNAs from mouse and human [J]. RNA, 2003, 9:175-179.
    [199] Legendre M, Lambert A, Gautheret D. Profilebased detection of microRNA precursors in animal genomes [J]. Bioinformatics, 2005, 21: 841-845.
    [200] Hertel J, Lindemeyer M, Missal K, et al. The expansion of the metazoan microRNA repertoire [J]. BMC Genomics, 2006, 7: 25.
    [201] Dezulian T, Remmert M, Palatnik J F, et al. Identification of plant microRNA homologs [J]. Bioinformatics, 2006, 22(3): 359-360.
    [202] Lim L P, Lau N C, Weinstein E G, et al. The microRNAs of Caenorhabditis elegans. [J]. Genes & Development, 2003, 17: 991-1008.
    [203] Lai E C, Tomancak P, Williams R W, et al. Computational identification of Drosophila microRNA genes [J]. Genome Biol, 2003, 4(7): R42.
    [204] Wang X, Zhang J, Li F, et al. MicroRNA identification based on sequence and structure alignment [J]. Bioinformatics, 2005, 21(18): 3610-3614.
    [205] Szafranski K, Megraw M, Reczko M, et al. Support vector machines for predicting microRNA hairpins[Z]. Las Vegas, Nevada, USA: CSREA Press, 2006.
    [206] Hertel J, Stadler P F. Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data [J]. Bioinformatics, 2006, 22(14): e197e202.
    [207] Berezikov E, Guryev V, .J v d B, et al. Phylogenetic shadowing and computational identification of human microRNA genes [J]. Cell, 2005, 120(1): 21-24.
    [208] Loong K, Mishra S. De nove SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures [J].Bioinformatics, 2007, 23(11): 1321-1330.
    [209] Bentwich I, Avniel A, Karov Y, et al. Identification of hundreds of conserved and nonconserved human microRNAs [J]. Nature Genetics, 2005, 37(7): 766-770.
    [210] Xue C, Li F, He T, et al. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine [J]. BMC Bioinformatics, 2005, 6: 310.
    [211] Jiang P, Wu H, Wang W, et al. MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features [J]. Nucleic Acids Research, 2007, 35: 339-344.
    [212] Sewer A, Paul N, Landgraf P, et al. Identification of clustered microRNAs using an ab initio prediction method [J]. BMC Bioinformatics, 2005, 6: 267[epub].
    [213] Yousef M, Nebozhyn M, Shatkay H, et al. Combining multi-species genomic data for microRNA identification using a Na?ve Bayes classifier [J]. Bioinformatics, 2006, 22: 1325-1334.
    [214] Nam J W, Shin K R, Han J, et al. Human microRNA prediction through a probabilistic co-learning model of sequence and structure [J]. Nucleic Acids Res, 2005, 33: 3570-3581.
    [215] Kadri S, Hinman V, Benos P V. HHMMiR: efficient de novo prediction of microRNAs using hierarchical hidden Markov models [J]. BMC Bioinformatics, 2009, 10(Suppl 1): S35.
    [216] Xu Y, Zhou X, Zhang W. MicroRNA prediction with a novel ranking algorithm based on random walks [J]. Bioinformatics, 2008, 24: 50-58.
    [217] Brameier M, Wiuf C. Ab initio identification of human microRNAs based on structure motifs [J]. BMC Bioinformatics, 2007, 8: 478.
    [218] Croce C M, Calin G A. miRNAs, cancer, and stem cell division [J]. Cell, 2005, 122: 6-7.
    [219] Zhang B H, Pan X P, Cox S B, et al. Evidence that miRNAs are different from other RNAs [J]. Cell Mol Life Sci, 2006, 63(2): 246-254.
    [220] Zheng Y, Hsu W, Lee M L, et al. Exploring Essential Attributes For Detecting MicroRNA Precursors From Background Sequences[C]. Springer, 2006:131-145.
    [221] Sewer A, Paul N, Landgraf P, et al. Identification of clustered microRNAs using an ab initio prediction method [J]. BMC Bioinformatics, 2005, 6: 267.
    [222] Freyhult E, Gardner P P, Moulton V. A comparison of RNA folding measures [J]. BMC Bioinformatics, 2005, 6: 241.
    [223] Moulton V, Zuker M, Steel M, et al. Metrics on RNA secondary structures [J]. J Comput Biol, 2000, 7(1-2): 277-292.
    [224] Seffens W, Digby D. mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences [J]. Nucleic Acids Res, 1999, 27: 1578-1584.
    [225] Zhang B, al. E. Evidence that miRNAs are different from other RNAs [J]. Cell Mol Life Sci, 2006a, 63: 246-254.
    [226] Fera D, Kim N, Shiffeldrim N, et al. RAG: RNA-As-Graphs web resource [J]. BMC Bioinformatics, 2004.
    [227] Gan H H, Fera D, Zorn J, et al. RAG: RNA-As-Graphs database--concepts, analysis, and features [J]. Bioinformatics, 2004, 20(8): 1285-1291.
    [228] Ng K L S, Mishra S K. De novo SVM classification of precursormicroRNAs from genomic pseudo hairpins using global and intrinsic folding measures [J]. Bioinformatics, 2007, 23: 1321-1330.
    [229] Ng K L S, Mishra S K. Unique folding of precursor microRNAs: Quantitative evidence and implications for de novo identification [J]. RNA, 2007, 13: 170-187.
    [230] Bonnet E, Wuyts J, Rouze P, et al. Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences [J]. Bioinformatics, 2004, 20(17): 2911-2917.
    [231] Griffiths-Jones S. The microRNA registry [J]. Nucleic Acids Res, 2004, 32(Database): 109-111.
    [232] Griffiths-Jones S. Annotating noncoding RNA genes [J]. Annu Rev Genomics Hum Genet, 2007, 8: 279-298.
    [233] Guyon I, Elisseeff A. An introduction to variable and feature selection [J]. Journal of Machine Learning Research, 2003, 3: 1157-1182.
    [234] Kailath T. The divergence and Bhattacharyya distance measures in signal selection [J]. IEEE Transactions on Communications Technology, 1967, COM-15: 52-60.
    [235] Devijver P A, Kittler J. Pattern Recognition, A Statistical Approach[M]. London: Prentice Hall, 1982.
    [236] Weiss G B. Mining with rarity: a unifying framework [J]. SIGKDD Expl, 2004, 6: 7-19.
    [237] Chawla N V. SMOTE: synthetic minority over-sampling technique [J]. Artificial Intelligence Research, 2002, 16: 321-357.
    [238] Molinara M. Facing imbalance classes through aggregation of classifiers[Z]. Italy, 2007:43-48.
    [239] Veropoulos K. Controlling the sensitivity of support vector machines[C]. Sweden, 1999:55-60.
    [240] Imam T. z-SVM: A SVM for improved classification of imbalanced data[C]. Australia, 2006:264-273.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700