RNA二级结构的若干计算生物学问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
对于RNA分子结构与功能的研究是当今生物信息学领域的一个非常重要的课题。随着研究的不断深入,RNA正在从人们眼中简单的、线性的、功能单一的分子形象演变成今天种类多样,结构复杂,功能特异的新个体,特别是大量非编码RNA的发现以及对于其功能特性的分析,使得人们逐步认识到RNA世界的多样性和重要性。RNA已经不仅仅被认为是DNA到蛋白质之间的一种信息传递中介,它已经逐渐在中心法则中取得了与DNA和蛋白质同等重要的地位。RNA组学(Rnomics)或者核糖组学(Ribonomics)也成为继基因组学(Genomics),蛋白质组学(Proteomics)等一系列系统生物学概念之后的一个崭新的系统交叉学科。
     RNA分子结构与功能的研究不光依赖于实验手段,同时也需要借助生物信息学的方法进行分析,特别是目前对于RNA分子的研究已经进入了一个大规模的,高通量的,系统分析的时代,为了更加深入探索RNA结构与功能的关系,了解RNA在生命活动中的各种工作机制,大力发展RNA研究方面的生物信息学方法和技术显得尤为重要。在这样一个热点研究背景之下,本论文从算法设计以及计算机软件平台构建的角度,对于RNA分子结构相关的计算生物学问题进行研究。论文内容涉及RNA分子结构的表示,RNA结构预测,RNA结构比对,RNA结构的压缩和信息度的衡量,RNA综合分析平台的构建,以及非编码RNA基因预测等若干问题,属于计算机,生物,医学交叉的前沿学科领域。
     论文的主要内容与成果概括如下:
     (1).对于RNA分子的结构表示理论进行了系统的分析,对于各种结构表示方法进行比较,同时提出了一种基于6-D编码的RNA分子二级结构表示方法,将RNA构象的二级结构转化为结构矩阵,提取矩阵奇异值向量作为其主要结构特征,从而从代数矩阵论角度给出了其分子结构的精确描述。
     (2).对于RNA分子二级结构的预测算法进行了系统的阐述,基于图论极大独立集思想提出了一种基于Hopfield网络进行并行预测的有效算法,进一步提高了RNA二级结构预测的效率。
     (3).对于RNA分子二级结构的各种相似度衡量算法进行了探讨和比较,采用6-D编码设计了一种利用矩阵奇异值分解进行结构比对的算法。
     (4).基于上述相似度提出了一种新的模糊核聚类(Kernel Fuzzy C-means,KFCM)算法,应用于RNA二级结构构象的聚类分析中。结果表明该聚类算法对于RNA构象分析十分有效。
     (5).构建了RNA结构比对以及结构构象聚类的整合软件平台(RNACluster),将基于最小生成树(Minimum Spanning Tree,MST)表示的聚类算法应用于RNA分子的构象聚类、RNA构象转换(RNA conformational switches)以及非编码RNA预测当中。
     (6).首次提出了RNA二级结构压缩的概念,设计了一种利用上下文无关文法压缩RNA二级结构的算法,构建了相关软件(RNACompress),该软件可以有效的对于RNA一级序列及二级结构同时进行建模,并且进行无损压缩。
     (7).首次引入基于压缩的Kolmogorov复杂度来衡量RNA结构的信息度。并将其应用于11种GTP-binding RNA核酸适体(aptamer)的结构信息度衡量及其绑定活性与结构信息复杂度关系的定量研究中。
     (8).对于非编码RNA的相关概念及其基因预测算法进行了论述。系统总结了非编码RNA相关的网上平台与数据库资源。对于非编码RNA计算生物学领域的未来研究方向和研究热点提出了自己的看法,并作了相应展望。
The computational studies of RNA structure and function are one of the hotest research topics in bioinformatics. As more and more non-coding RNAs are identified recently, RNA has been viewed as a more complex molecule in its diversity of functions rathor than barly serving as the transmitter between DNA and protein. A view that is emerging now is that RNA world are much more complicated than our previously thought, and RNAs may play the same important roles in the central dogma as DNA and protein does. In addition, "Rnomics" and "Ribonomics" has currently become another two new interdisciplines after the notion of "Genomics" and "Proteomics" were presented.
     Computational techniques can be served as an extremely helpful complementarity to the experimental study of RNA structure and function, expecially when this study is in its large scale, high throughput and systematic level. In order to faciliate this study, more efficient bioinformatic analysis approaches are needed. In our thesis, we performed some basic computational studies on RNA secondary structure as well as their functions from the perspective of algorithm designning and platform implementation. The content in this thesis includes: representation of RNA secondary structure; RNA secondary structure prediction; comparison of RNA secondary structure; compression of RNA secondary structure and informational complexity measurement; construction of integrated platform for RNA structure analysis; non-coding RNA prediction, etc.
     The main contributes of this thesis are listed as following:
     (1). Different representations of RNA secondary structure are compared. A 6-D encoding based RNA secondary structure representation method is presented. The RNA secondary structure can be represented as a structure matrix, and the corresponding singlar value vector of this matrix is calculated to extract the main information of the RNA secondary structure. This kind of representation has provided an accurate description of RNA secondary structure mathmatically.
     (2). A thoroughly discussion of RNA secondary structure prediction approches is given. Based on the definitions of Discrete Hopfield Neural Network (DHNN) and Maximal Independent Set (MIS) in Graph theory, a heuristic algorithm is presented to select stems in RNA structure as well as its application in RNA secondary structure prediction. This method has substaintially improved the efficency of RNA secondary structure prediction.
     (3). Different RNA secondary structure comparison methods is discussed. A novel method is presented to compare the similarity of RNA secondary structures using a matrix representation of the RNA structures. Relevant features of the RNA secondary structures can be easily extracted through singular value decomposition (SVD) of the representing matrices.
     (4). A fuzzy kernel clustering method is applied, using the similarity metric defined above, to cluster the RNA secondary structure ensembles. Our application results suggest that our fuzzy kernel clustering method is highly promising for classifications of RNA structure ensembles, because of its low computational complexity and high clustering accuracy.
     (5). An integrated platform (RNACluster) is constructured to calculate and compare different distances between RNA secondary structures, and to perform cluster identification to derive useful information of RNA structure ensembles, using a minimum spanning tree (MST) based clustering algorithm. RNACluster can be used in the analysis of RNA structure ensemble clustering、RNA conformational switches and non-coding RNA prediction.
     (6). An universal algorithm for the compression of RNA secondary structure is discussed (RNACompress). RNACompress employs an efficient content-free-grammar based model to compress RNA sequences and their secondary structures simultaneously.
     (7). A novel informational complexity measurment of RNA seondary structure is presented based on the notion of Kolmogorov comlexity. A test of the activities of 11 distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity.
     (8). A survey of the concept of non-coding RNA and its computational prediction is given. A complete list of web sources for non-coding RNA analysis is compiled. Several future research directions and topics in this field are discussed.
引文
1.郭颖,RNA的二级结构.硕士学位论文,大连理工大学,2005.
    2.刘海军,RNA二级结构预测的建模及其应用研究.博士学位论文,上海大学,2005.
    3.Reisig,W.,Petri nets:an introduction.1985:Springer-Verlag New York,Inc.New York,NY,USA.
    4.Stutzle,T.and M.Dorigo,ACO Algorithms for the Traveling Salesman Problem.Evolutionary Algorithms in Engineering and Computer Science,1999.183.
    5.Wang,Q.C.,Q.H.Nie,and Z.H.Feng,RNA interference:Antiviral weapon and beyond.世界胃肠病学杂志(英文版),2003.9(8):p.1657-1661.
    6.Gesteland,R.F.and J.F.Atkins,The RNA world:the nature of modern RNA suggests a prebiotic RNA world.1993:Cold Spring Harbor Laboratory Press.
    7.Maher,D.L.and P.P.Dennis,In vivo transcription of E.coli genes coding for rRNA,ribosomal proteins and subunits of RNA polymerase:Influence of the stringent control system.Molecular Genetics and Genomics,1977.155(2):p.203-211.
    8.Liang,X.H.,Q.Liu,and M.J.Fournier,rRNA modifications in an intersubunit bridge of the ribosome strongly affect both ribosome biogenesis and activity.Mol Cell,2007.28(6):p.965-77.
    9.Jurado,J.,et al.,Alternative splicing of c-fos pre-mRNA:contribution of the rates of synthesis and degradation to the copy number of each transcript isoform and detection of a truncated c-Fos immunoreactive species. BMC Mol Biol, 2007. 8: p. 83.
    
    10. Li, Q., J. A. Lee, and D. L. Black, Neuronal regulation of alternative pre-mRNA splicing. Nat Rev Neurosci, 2007. 8(11): p. 819-31.
    11. Deutscher, M.P., Degradation of RNA in bacteria: comparison of mRNA and stable RNA. Nucleic Acids Res, 2006. 34(2): p. 659-66.
    12. Fury, M.G. and J. Andersen, In vitro interaction of U2 snRNA with cytoplasmic 6S protein complexes. FEBS Lett, 1997. 404(1): p. 70-4.
    13. Crotti, L. B. , D. Bacikova, and D. S. Horowitz, The Prpl8 protein stabilizes the interaction of both exons with the U5 snRNA during the second step of pre-mRNA splicing. Genes Dev, 2007. 21(10): p. 1204-16.
    14. Li, Z. and M. P. Deutscher, The tRNA processing enzyme RNase T is essential for maturation of 5S RNA. Proc Natl Acad Sci U S A, 1995. 92(15): p. 6883-6.
    15. Roovers, M. , et al., A primordial RNA modification enzyme: the case of tRNA (m1A) methyltransferase. Nucleic Acids Res, 2004. 32(2): p. 465-76.
    16. Zhu, L. J. and S. W. Altmann, mRNA and 18S-RNA coapplication-reverse transcription for quantitative gene expression analysis. Anal Biochem, 2005. 345(1): p. 102-9.
    17. Oikonomou, P., et al., Quantitative determination of human telomerase reverse transcriptase messenger RNA expression in premalignant cervical lesions and correlation with human papillomavirus load.Hum Pathol,2006.37(2):p.135-42.
    18.Takefuji,Y.,et el.,Parallel algorithms for finding a near-maximum independent set of a circle graph.IEEE Trans Neural Netw,1990.1(3):p.263-7.
    19.Hofacker,I.L.,Vienna RNA secondary structure server.Nucleic Acids Res,2003.31(13):p.3429-31.
    20.Shapiro,B.A.,An algorithm for comparing multiple RNA secondary structures.Bioinformatics,2003.4(3):p.387-393.
    21.韩乐,莫忠良,RNA-Z曲线及其在病毒基因识别中的应用.生物数学学报,2004.19:p.245-250.
    22.Fresco,J.R.,B.M.Alberts,and P.Doty,Some molecular details of the secondary structure of ribonucleic acid.Nature,1960.188(4745):p.98-101.
    23.Nussinov,R.and A.B.Jacobson,Fast Algorithm for Predicting the Secondary Structure of Single-Stranded RNA.Proceedings of the National Academy of Sciences of the United States of America,1980.77(11):p.6309-6313.
    24.Zuker,M.and P.Stiegler,Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.Nucleic Acids Res,1981.9(1):p.133-148.
    25.Knudsen,B.,RNA secondary structure prediction using stochastic context-free grammars and evolutionary history.Bioinformatics,1999.15(6):p.446-454.
    26.Hopfield,J.J.,Neural networks and physical systems with emergent collective computational abilities.Biological Physics,1993.
    27.Grumbach,S.,F.Tahi,and L.C.Inria,Compression of DNA sequences.Data Compression Conference,1993.DCC′93.,1993:p.340-350.
    28.Chen,X.,S.Kwong,and M.Li,A compression algorithm for DNA sequences and its applications in genome comparison.Proceedings of RECOMB,2000.107.
    29.Chen,X.,et al.,DNACompress:fast and effective DNA sequence compression.2002,Oxford Univ Press.p.1696-1698.
    30.Rivals,E.,et al.,A guaranteed compression scheme for repetitive DNA sequences.Data Compression Conference,1996.DCC′96.Proceedings,1996.
    31.Yu,S.L.,et al.,MicroRNA signature predicts survival and relapse in lung cancer.Cancer Cell,2008.13(1):p.48-57.
    32.Feber,A.,et al.,MicroRNA expression profiles of esophageal cancer.J Thorac Cardiovasc Surg,2008.135(2):p.255-60;discussion 260.
    33.Huttenhofer,A.,J.Cavaille,and J.P.Bachellerie,Experimental RNomics:a global approach to identifying small nuclear RNAs and their targets in different model organisms.Methods Mol Biol,2004.265:p.409-28.
    34.Huttenhofer,A.,J.Brosius,and J.P.Bachellerie,RNomics:identification and function of small,non-messenger RNAs.Curr Opin Chem Biol,2002.6(6):p.835-43.
    35.Tenenbaum,S.A.,et al.,Ribonomics:identifying mRNA subsets in mRNP complexes using antibodies to RNA-binding proteins and genomic arrays.Methods,2002.26(2):p.191-8.
    36.白凤兰,生物序列的图形表示及其应用.博士学位论文,大连理工大学,2005.
    37.李春,生物大分子的数学描述及其应用.博士学位论文,大连理工大学,2006.
    38.Zhang,R.and C.T.Zhang,Identification of replication origins in archaeal genomes based on the Z-curve method.Archaea,2005.1(5):p.335-46.
    39.Zhang,C.T.and R.Zhang,An isochore map of the human genome based on the Z curve method.Gene,2003.317(1-2):p.127-35.
    40.Marrero-Ponce,Y.,et al.,Protein linear indices of the ′macromolecular pseudograph alpha-carbon atom adjacency matrix′ in bioinformatics.Part 1:prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor.Bioorg Med Chem,2005.13(8):p.3003-15.
    41.Yan,M.,Z.S.Lin,and C.T.Zhang,A new fourier transform approach for protein coding measure based on the format of the Z curve,Bioinformatics,1998.14(8):p.685-90.
    42.Marrero Ponce,Y.,J.A.Castillo Garit,and D.Nodarse,Linear indices of the ′macromolecular graph′s nucleotides adjacency matrix′ as a promising approach for bioinformatics studies.Part 1:prediction of paromomycin′s affinity constant with HIV-1 psi-RNA packaging region.Bioorg Med Chem,2005.13(10):p.3397-404.
    43.Haslinger,C.and P.F.Stadler,RNA structures with pseudo-knots:graph-theoretical,combinatorial,and statistical properties.Bull Math Biol,1999.61(3):p.437-67.
    44.付微,黄竞伟,徐丽,RNA二级结构表示方法及其转换算法.计算机工程与应用,2004.14:p.43-45.
    45.Shapiro,B.A.and K.Zhang,Comparing multiple RNA secondary structures using tree comparisons.2004,Oxford Univ Press.p.309-318.
    46.Yao,Y.,X.Nan,and T.Wang,Analysis of similarity/dissimilarity of DNA sequences based on a 3-D graphical representation.Chemical Physics Letters,2005.411(1-3):p.248-255.
    47.Li,C.and J.Wang,On a 3-D Representation of DNA Primary Sequences.Combinatorial Chemistry & High Throughput Screening,2004.7:p.23-27.
    48.Liao,B.and T.M.Wang,A 3D graphical representation of RNA secondary structures.J Biomol Struct Dyn,2004.21(6):p.827-32.
    49.单夫一,骆嘉伟,一种新的RNA二级结构的三维图形表示.武汉理工大学学报(信息与管理工程版),2007.5:p.5-8.
    50.Liao,B.,K.Ding,and T.M.Wang,On a six-dimensional representation of RNA secondary structures.J Biomol Struct Dyn,2005.22(4):p.455-63.
    51.Zhang,C.T.,et al.,A Novel Approach to Distinguish Between Intron-containing and Intronless Genes Based on the Format of Z Curves.Journal of Theoretical Biology,1998.192(4):p.467-473.
    52.庞彦伟,刘政海,俞能海,融合奇异值分解和主成分分析的人脸识别算法.信号处理,2005.21(2):p.202-205.
    53.Deerwester,S.,et al.,Indexing by latent semantic analysis.Journal of the American Society for Information Science,1990.41(6):p.391-407.
    54.遇辉,马秀莉,谭少华,唐世渭,杨冬青,基于奇异值分解的异常切 片挖掘.软件学报,2005.26(7):p.1282-1288.
    55.Vaccaro,R.J.,SVD and Signal Processing Ⅱ:Algorithms,Analysis and Applications.1991:Elsevier Science Inc.New York,NY,USA.
    56.Mathews,D.H.and D.H.Turner,Prediction of RNA secondary structure by free energy minimization.Curr Opin Struct Biol,2006.16(3):p.270-8.
    57.Fang,X.,et al.,Improving the prediction of RNA secondary structure by detecting and assessing conserved stems.Bioinformation,2007.2(5):p.222-9.
    58.Harmanci,A.O.,G.Sharma,and D.H.Mathews,PARTS:Probabilistic Alignment for RNA joinT Secondary structure prediction.Nucleic Acids Res,2008.
    59.Wiese,K.C.,A.A.Deschenes,and A.G.Hendriks,RnaPredict-An Evolutionary Algorithm for RNA Secondary Structure Prediction.IEEE/ACM Trans Comput Biol Bioinform,2008.5(1):p.25-41.
    60.Andronescu,M.,et al.,Efficient parameter estimation for RNA secondary structure prediction.Bioinformatics,2007.23(13):p.i19-28.
    61.Searls,D.B.,The linguistics of DNA.American Scientist,1992.80(6):p.579-591.
    62.Searls,D.B.,The computational linguistics of biological sequences.Artificial Intelligence and Molecular Biology,1993:p.47 120.
    63.高琼,莫忠息,一种基于能量的RNA二级结构预测的动态划分算法.数学杂志,2003.23(1):p.43-48.
    64.李伍举,吴加金,基于螺旋区随机堆积的RNA二级结构预测.生物 物理学报, 1996. 12(2): p. 213-218.
    65. WuJu, L., Prediction of RNA secondary structure based on helical regions distribution. Bioinformatics, 1998. 14(8): p. 700-706.
    66. Turner, D. H. , N. Sugimoto, and S. M. Freier, RNA Structure Prediction. Annual Review of Biophysics and Biophysical Chemistry, 1988. 17(1): p. 167-192.
    67. Mathews, D.H., et al., Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. Journal of Molecular Biology, 1999. 288(5): p. 911-940.
    68. Ndifon, W. and A. Nkwanta, An RNA foldability metric; implications for the design of rapidly foldable RNA sequences. Biophys Chem, 2006.120(3): p. 237-9.
    69. Zuker, M. and D. Sankoff, RNA secondary structures and their prediction. Bulletin of Mathematical Biology, 1984. 46(4): p. 591-621.
    70. Zuker, M., Algorithms and thermodynamics for RNA secondary structure prediction:a practical guide. Rna Biochemistry and Biotechnology, 1999.
    71. Zuker, M. , Computer prediction of RNA structure. Methods Enzymol, 1989. 180: p. 262-88.
    72. Zuker, M., Calculating nucleic acid secondary structure. Current Opinion in Structural Biology, 2000. 10(3): p. 303-310.
    73. McCaskill, J.S., The Equilibrium Partition Function and Base Pair Binding Probabilities for RNA Secondary Structure. Biopolymers, 1990. 29: p. 1105-1119.
    74. Gutell, R.R., Comparative studies of RNA: inferring higher-order structure from patterns of sequence variation. Curr. Opin. Struct. Biol, 1993. 3: p. 313-322.
    75. Gutell, R.R., et al., Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Res, 1992. 20(21): p. 5785 5795.
    76. Woese, C. R. and N. R. Pace, Probing RNA structure, function, and history by comparative analysis. The RNA World, 1993: p. 91-117.
    77. Eddy, S. R. and R. Durbin, RNA sequence analysis using covariance models. Nucleic Acids Res, 1994. 22(11): p. 2079-88.
    78. Tahi, F. , M. Gouy, and M. Régnier, Automatic RNA secondary structure prediction with a comparative approach. Computers and Chemistry, 2002. 26(5): p. 521-530.
    79. Zwieb, C., et al., Comparative sequence analysis of tmRNA. Nucleic Acids Res. 27(10): p. 2063-2071.
    80. Rabiner, L. and B. Juang, An introduction to hidden Markov models. ASSP Magazine, IEEE [see also IEEE Signal Processing Magazine], 1986. 3(1 Part 1): p. 4-16.
    81. Durbin, R., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1998: Cambridge University Press.
    82. Chomsky, N., The Minimalist Program. 1995: MIT Press.
    83. Lari, L and S. J. Young, The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 1990. 4(1): p. 35-56.
    84. Pipas, J.M. and J. E. McMahon, Method for Predicting RNA Secondary Structure. Proceedings of the National Academy of Sciences of the United States of America, 1975. 72(6): p. 2017-2021.
    85. Waterman, M.S. and T. F. Smith, RNA secondary structure: a complete mathematical analysis. Math. Biosci, 1978. 42: p. 257-266.
    86. Gardner, P.P. and R. Giegerich, A comprehensive comparison of comparative RNA structure prediction approaches, feedback, 2005.
    87. Nakaya, A., K. Yamamoto, and A. Yonezawa, RNA secondary structure prediction using highly parallel computers. Bioinformatics, 2006. 11(6): p. 685-692.
    88. Fekete, M., I. L. Hofacker, and P. F. Stadler, Predict ion of RNA Base Pairing Probabilities on Massively Parallel Computers. Journal of Computational Biology, 2000. 7(1-2): p. 171-182.
    89. Zuker, M., Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res, 2003. 31(13): p. 3406-3415.
    90. Ding, Y. , C. Y. Chan, and C. E. Lawrence, Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res.
    91. Hofacker, I. L., RNA consensus structure prediction with RNAalifold. Methods Mol Biol, 2007. 395: p. 527-44.
    92. Knudsen, B. and J. Hein, Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res, 2003. 31(13): p. 3423-3428.
    93.Ruan,J.,G.D.Stormo,and W.Zhang,An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots.2004,Oxford Univ Press.p.58-66.
    94.Havgaard,J.H.,R.B.Lyngs,and J.Gorodkin,The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search.Nucleic Acids Res.
    95.Mathews,D.H.and D.H.Turner,Dynalign:an algorithm for finding the secondary structure common to two RNA sequences.Journal of Molecular Biology,2002.317(2):p.191-203.
    96.Touzet,H.and O.Perriquet,CARNAC:folding families of related RNAs.Nucleic Acids Res.
    97.Hofacker,I.L.,S.H.F.Bernhart,and P.F.Stadler,Alignment of RNA base pairing probability matrices.2004,Oxford Univ Press.p.2222-2227.
    98.Siebert,S.and R.Backofen,MARNA:multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons.Bioinformatics,2005.21(16):p.3352-3359.
    99.李兢,刘长林,关于图的极大独立集的理论及生成方法.电子学报,1995.23(8):p.78-79.
    100.李有梅,徐宗本,用神经网络启发式算法求解最大独立集问题.模式识别与人工智能,2003.16(1):p.76-80.
    101.李有梅,徐宗本,孙建永,一类求解最大独立集问题的混合神经演化算法.计算机学报,2003.26(11):p.1538-1545.
    102.Jiang,T.,Ying,X.,and M.Q.Zhang,Current Topics in Computational Molecular Biology.2002:MIT Press.
    103.Stoesser,G.,et al.,The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 27(1): p. 18-24.
    104. Ji, Y., X. Xu, and G. D. Stormo, A graph theoretical approach to predict common RNA secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics, 2004. 20(10): p. 1591-1602.
    105. Gorodkin, J., S.L. Stricklin, and G. D. Stormo, Discovering common stem loop motifs in unaligned RNA sequences. Nucleic Acids Res, 2001. 29(10): p. 2135-2144.
    106. Fontana, W., et al., Statistics of RNA secondary structures. Biopolymers, 1993. 33(9): p. 1389-1404.
    107. Bonhoeffer, S., et al., RNA multi-structure landscapes. European Biophysics Journal, 1993. 22(1): p. 13-24.
    108. Schmitt, W. R. and M. S. Waterman, Linear trees and RNA secondary structure. Discrete Applied Mathematics, 1994. 51(3): p. 317-323.
    109. Jiang, T., et al. , A General Edit Distance between RNA Structures. Journal of Computational Biology, 2002. 9(2): p. 371-388.
    110. Zhang, K., Computing Similarity Between RNA Secondary Structures. RNA. 2: p. 2.
    111. Barash, D., Second eigenvalue of the Laplacian matrix for predicting RNA conformational switch by mutation. 2004, Oxford Univ Press, p. 1861-1869.
    112. Liao, B. , T. Wang, and K. Ding, On a seven-dimensional representation of RNA secondary structures. Molecular Simulation, 2005. 31(14): p. 1063-1071.
    113. Yao, Y. H., X. Y. Nan, and T. M. Wang, A class of 2 D graphical representations of RNA secondary structures and the analysis of similarity based on them. J Comput Chem, 2005. 26(13): p. 1339-46.
    114. Liu, N. and T. Wang, A method for rapid similarity analysis of RNA secondary structures. BMC Bioinformatics, 2006. 7: p. 493.
    115. Ding, Y. , C. Y. Chan, and C. E. Lawrence, RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA, 2005. 11(8): p. 1157-66.
    116. Moulton, V., et al., Metrics on RNA Secondary Structures. Journal of Computational Biology, 2000. 7(1-2): p. 277-292.
    117. Voss, B., C. Meyer, and R. Giegerich, Evaluating the predictability of conformational switching in RNA. 2004, Oxford Univ Press. p. 1573-1582.
    118. Boser, B. E. , I.M. Guyon, and V. N. Vapnik, A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on Computational learning theory, 1992: p. 144-152.
    119. Scholkopf, B., Nonlinear Component Analysis as a Kernel Eigenvalue Problem. 1998, MIT Press. p. 1299-1319.
    120. Mercer, J., Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 1909. 209: p. 415-446.
    121. Cover, T. M., Geometrical and statistical properties of systems of linear inequal ities with appl ications in pattern recognition. IEEE Transactions on Electronic Computers, 1965. 14(3): p. 326-334.
    122. Bezdek, J. C. , Pattern Recognition with Fuzzy Objective Function Algorithms. 1981: Kluwer Academic Publishers Norwell, MA, USA.
    123. Krishnapuram, R., H. Frigui, and 0. Nasraoui, Fuzzy and possibilistic shell clustering algorithms and theirapplication to boundary detection and surface approximation. I. Fuzzy Systems, IEEE Transactions on, 1995. 3(1): p. 29-43.
    124. Torgerson, W. S., Scaling and Psychometrika: Spatial and alternative representations of similarity data. Psychometrika, 1986. 51(1): p. 57-63.
    125. Kruskal, J. B. and M. Wish, Mult idimensional Scal ing. 1978: Sage Publications.
    126. Fisher, R.A., The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936. 7(2): p. 179-188.
    127. Pal, N. R. and J. C. Bezdek, On cluster validity for the fuzzy c-means model. Fuzzy Systems, IEEE Transactions on, 1995. 3(3): p. 370-379.
    128. Comay, E. , R. Nussinov, and 0. Comay, An accelerated algorithm for calculating the secondary structure of single stranded RNAs. Nucleic Acids Res, 1984. 12(1Part 1): p. 53.
    129. Steffen, P., et al., RNAshapes: an integrated RNA analysis package based on abstract shapes. 2006, Oxford Univ Press. p. 500-503.
    130. Ward Jr, J. H. , Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association, 1963. 58(301): p. 236-244.
    131. Felsenstein, J., PHYLIP-Phylogeny Inference Package (Version 3.2). Cladistics, 1989. 5(1): p. 164-166.
    132. Xu, Y., V. Olman, and D. Xu, Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. 2002, Oxford Univ Press, p. 536-545.
    133. Olman, V., D. Xu, and Y. Xu, CUBIC: identification of regulatory binding sites through data clustering. J. Bioinform. Comput. Biol, 2003. 1: p. 21 40.
    134. Johnson, S.C., Hierarchical clustering schemes. Psychometrika, 1967. 32(3): p. 241-254.
    135. Hartigan, J. A. and M. A. Wong, A K-means clustering algorithm. JR Stat. Soc. Ser. C-Appl. Stat, 1979. 28: p. 100 108.
    136. Kaski, S., J. Nikkila, and T. Kohonen, Methods for interpreting a self-organized map in data analysis. Proc. of ESANN, 1998.
    137. Ding, Y. and C. E. Lawrence, A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res, 2003. 31(24): p. 7280-7301.
    138. Giegerich, R., B. Vo, and M. Rehmsmeier, Abstract shapes of RNA. Nucleic Acids Res, 2004. 32(16): p. 4843.
    139. Bonnet, E., et al. , Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. 2004, Oxford Univ Press, p. 2911-2917.
    140. Bénard, L., et al., Pseudoknot and translational control in the expression of the S15 ribosomal protein. Biochimie, 1996. 78(7): p. 568-576.
    141. Taylor, J.M. , Structure and Replication of Hepatitis Delta Virus RNA. genome. 1679: p. 1.
    142. Unger, S.H., A global parser for context-free phrase structure grammars. Communications of the ACM, 1968. 11(4): p. 240-247.
    143. Grune, D. and C.J.H. Jacobs, A programmer-friendly LL (1) parser generator. Software—Practice & Experience, 1988. 18(1): p. 29-38.
    144. Knuth, D. E., Dynamic Huffman coding. Journal of Algorithms, 1985. 6(2): p. 163-180.
    145. Murthy, V. L. and G. D. Rose, RNABase: an annotated database of RNA structures. Nucleic Acids Res, 2003. 31(1): p. 502-504.
    146. Campbell, J., Grammatical Man: Information, Entropy, Language, and Life. 1982: Simon and Schuster.
    147. Szymanski, M. , et al., 5S Ribosomal RNA Database. Nucleic Acids Research, 2002. 30(1): p. 176-178.
    148. Griffiths-Jones, S., et al. , miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Research, 2006. 34 (Database Issue): p. D140-D144.
    149. Torarinsson, E., J. H. Havgaard, and J. Gorodkin, Multiple structural alignment and clustering of RNA sequences. Bioinformatics, 2007. 23(8): p. 926.
    150. Engstrom, P. G. , et al. , Complex loci in human and mouse genomes. PLoS Genet, 2006. 2(4): p. e47.
    151. Lestrade, L., M. J. Weber, and 0. Journals, snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Research, 2006. 34 (Database issue): p. D158-D162.
    152. Griffiths-Jones, S., et al. , Rfam: an RNA family database. Nucleic Acids Research, 2003. 31(1): p. 439-441.
    153. Do, C. B., D. A. Woods, and S. Batzoglou, CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 2006. 22(14): p. e90.
    154. Kolgomorov, A.N., Three approaches to the definition of the concept "Quantity of Information. " . Problems Inform. Transmission, 1965. 1: p. 3 11.
    155. Chaitin, G. J., On the length of programs for computing finite binary sequences. Information, Randomness & Incompleteness: Papers on Algorithmic Information Theory, 1990. 13: p. 547-569.
    156. Solomonoff, R.J., A formal theory of inductive inference. Information and Control, 1964. 7(1): p. 1-22.
    157. Li, M., et al. , An information-based sequence distance and its application to whole mitochondrial genome phylogeny. 2001, Oxford Univ Press, p. 149-154.
    158. Li, M. and P. Vit a nyi, An Introduction to Kolmogorov Complexity and Its Applications. 1997: Springer.
    159. Carothers, J.M., et al., Informational Complexity and Functional Activity of RNA Structures, networks, 2001. 63(57): p. 94.
    160. Zagryadskaya, E. I., et al., Importance of the reverse Hoogsteen base pair 54-58 for tRNA function. Nucleic Acids Res, 2003. 31(14): p. 3946-3953.
    161. Batey, R. T. , R. P. Rambo, and J. A. Doudna, Tertiary motifs in RNA structure and folding. Angew. Chem. Int. Ed, 1999. 38: p. 2326 2343.
    162. Bergig, 0., D. Barash, and K. Kedem, RNA Motif Search Using the Structure to String (STR 2) Method.
    163. Yao, Z., Z. Weinberg, and W.L. Ruzzo, CMfinder-a covariance model based RNA motif finding algorithm. Bioinformatics, 2006. 22(4):p.445-452.
    164.Eddy,S.R.,Non-coding RNA genes and the modern RNA world.Nature Reviews Genetics,2001.2(12):p.919-929.
    165.秦云霞,田娥,刘志昕,曾华金,非编码RNA及其研究进展.生物技术通报,2004(5):p.9-12.
    166.Calin,G.A.and C.M.Croce,MicroRNA-Cancer Connection:The Beginning of a New Tale.Cancer Research,2006.66(15):p.7390.
    167.Mathijs Voorhoeve,P.and R.Agami,Classifying microRNAs in cancer:The good,the bad and the ugly.Biochimica et biophysica acta,CR.Reviews on cancer,2007.1775(2):p.274-282.
    168.McManus,M.T.,MicroRNAs and cancer.Seminars in Cancer Biology,2003.13(4):p.253-258.
    169.管乃洋,非编码RNA基因识别模型的设计与实现.硕士学位论文,国防科学技术大学,2006.
    170.Meyer,I.M.,Apractical guide to the art of RNA gene prediction.Briefings in Bioinformatics,2007.8(9):p.396-414.
    171.Nakajima,N.,H.Ozeki,and Y.Shimura,Organization and structure of an E.coli tRNA operon containing seven tRNA genes.Cell,1981.23(1):p.239-49.
    172.Chow,J.C.,et al.,Silencing of the mammalian X chromosome.Annu Rev Genomics Hum Genet,2005.6:p.69-92.
    173.Gartler,S.M.and A.D.Riggs,Mammalian X-Chromosome Inactivation.Annual Review of Genetics,1983.17(1):p.155-190.
    174.Tycowski,K.T.,M.D.Shu,and J.A.Steitz,A small nucleolar RNA is processed from an intron of the human gene encoding ribosomal protein S3.Genes and Development,1993.7(7a):p. 1176.
    
    175. Yano, Y. , et al. , A new role for expressed pseudogenes as ncRNA: regulation of mRNA stability of its homologous coding gene. Journal of Molecular Medicine, 2004. 82(7): p. 414-422.
    176. Lowe, T. M. and S. R. Eddy, tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25(5): p. 955-964.
    177. Laslett, D., et al., BRUCE: a program for the detection of transfer-messenger RNA genes in nucleotide sequences. Nucleic Acids Res, 2002. 30(15): p. 3449-3453.
    178. Washietl, S., I.L. Hofacker, and P. F. Stadler, Fast and reliable prediction of noncoding RNAs. Proceedings of the National Academy of Sciences, USA, 2005. 102(7): p. 2454-2459.
    179. Griffiths-Jones, S., et al. , Rfam: an RNA family database. Nucleic Acids Res, 2003. 31(1): p. 439-441.
    180. Klein, R.J., Z. Misulovin, and S. R. Eddy, Noncoding RNA genes identified in AT-rich hyperthermophiles. Proceedings of the National Academy of Sciences of the United States of America, 2002. 99(11): p. 7542.
    181. Hofacker, I. L., B. Priwitzer, and P. F. Stadler, Prediction of locally stable RNA secondary structures for genome-wide surveys. 2004, Oxford Univ Press. p. 186-190.
    182. Rivas, E. and S. R. Eddy, Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics, 2001. 2: p. 8.
    183. Pillai, R. S., MicroRNA function: Multiple mechanisms for a tiny RNA? RNA, 2005. 11(12): p. 1753-1761.
    184.冯起平,李云峰,孟雁,方福德,miRNA的研究进展.生命科学,2003.15(4):P.193-199.
    
    185. Waterhouse, P.M. and C. A. Helliwell, Exploring plant genomes by RNA-induced gene silencing. Nature Reviews Genetics, 2003. 4(1): p. 29-38.
    186. Lim, L. P., et al., Vertebrate MicroRNA Genes. Science, 2003. 299(5612): p. 1540.
    187. Lai, E. C., et al., Computational identification of Drosophila microRNA genes. Genome Biol, 2003. 4(7): p. R42.
    188. Brown, J. W. S., et al. , Multiple snoRNA gene clusters from Arabidopsis. RNA, 2002. 7(12): p. 1817-1832.
    189. Brown, J. W. S. and P. J. Shaw, Small nucleolar RNAs and pre-rRNA processing in plants. Plant Cell, 1998. 10(5): p. 649-57.
    190. Lestrade, L., M. J. Weber, and O. Journals, snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res.
    191. Griffiths-Jones, S., et al., miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res.
    192. Shahi, P., et al., Argonaute-—a database for gene regulation by mammalian microRNAs. Nucleic Acids Res.
    193. Betel, D., et al., The microRNA. org resource: targets and expression. Nucleic Acids Res, 2008. 36(Database issue): p. D149.
    194.Stark, A., et al., Identification of Drosophila microRNA targets. PLoS Biol, 2003. 1(3): p. E60.
    195. Gustafson, A.M., et al. , ASRP: the Arabidopsis Small RNA Project Database. Nucleic Acids Res, 2005. 33(Database Issue): p. D637.
    196. Brown, J. W. and 0. Journals, The ribonuclease P database. Nucleic Acids Res. 26(1): p. 351-352.
    197. Zwieb, C., et al., tmRDB (tmRNA database). Nucleic Acids Res, 2003. 31(1): p. 446-447.
    198. Liu, C., et al. , NONCODE: an integrated knowledge database of non-coding RNAs. Nucleic Acids Res.
    199. Pang, K. C., et al., RNAdb—-a comprehensive mammalian noncoding RNA database. Nucleic Acids Res, 2005. 33(Database Issue): p. D125.
    200. Wu, T. , et al., NPInter: the noncoding RNAs and protein related biomacromolecules interaction database. Nucleic Acids Res.
    201. Tanino, M., et al., The Human Anatomic Gene Expression Library (H-ANGEL), the H-Inv integrative display of human gene expression across disparate technologies and platforms. Nucleic Acids Res, 2005. 33: p. D567-D572.
    202. Kin, T. , et al., fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res, 2007. 35(Database issue): p. D145.
    203. Zhang, J., et al., Learning rules from highly unbalanced data sets. Data Mining, 2004. ICDM 2004. Proceedings. Fourth IEEE International Conference on, 2004: p. 571-574.
    204. Uetz, P., et al., A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 2000. 403(6770): p. 623-627.
    205. Yeung, M.L., et al. , siRNA, miRNA and HIV: promises and challenges. 细胞研究(英文版), 2005. 15(11): p. 935-946.
    206.Bejerano,G.,et al.,Ultraconserved Elements in the Human Genome.2004,American Association for the Advancement of Science.p.1321-1325.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700