含伪结的RNA二级结构预测算法的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
RNA二级结构预测是近年来RNA研究的热点问题,研究者提出了多种预测算法,并取得了丰富的成果。但对于含伪结的二级结构,现有算法无法得到很好的预测结果:要么无法预测,要么算法复杂度过高、预测精度比较低。本文针对这一问题,分别从建立更可靠的协变信息计算模型和提出更有效的启发式算法两个方面对同源RNA序列的公共结构预测算法进行研究,在保持较低复杂度的条件下,本文提出的算法提高了含伪结RNA二级结构的预测精度,并且可预测出多个次优结构。具体来讲,本文主要包括以下两部分的内容和结果:
     一、提出了一种基于堆积协变信息和最小自由能的同源RNA含伪结二级结构迭代预测算法。该算法在Ifold算法基础上,考虑相邻碱基对之间的相互作用对协变信息的影响,引入堆积协变信息计算模型,并结合最小自由能通过逐步迭代求得含伪结的RNA二级结构。数值实验表明,该算法能正确预测伪结,其平均敏感性和特异性优于现有算法,并且在堆积协变信息的权重因子比值为5:1时,预测性能达到最优。
     二、提出了一种预测同源RNA多个次优公共结构的种子集合扩展算法。该算法将HotKnots算法的种子集合扩展方法应用于同源RNA公共结构预测,组合多个种子形成种子集合并扩展得到多个次优公共结构。数值实验表明,该算法预测得到的多个次优结构比较接近真实结构,并可从中分析出稳定的子结构。其平均敏感性和特异性要优于其它现有算法,其时间复杂度高于迭代算法,而远低于动态规划算法。
The prediction of RNA secondary structure is a hotspot in RNA research. Many secondary structure prediction methods have been presented and rich results have been achieved. But most of these methods cann't properly deal well with pseudoknots prediction, either of relatively high complexity or of low accuracy. On this issue, this thesis studies a more reliable covariance model and a more effective heuristic algorithm for the consensus structure prediction of homology RNA sequences. With the relatively low complexity, the algorithms presented in this thesis improve the accuracy of pseudoknots prediction and provide multiple suboptimal structures. The thesis mainly includes two sections of following contents and conclusions.
     (1) Based on stacking covariance and minimum free energy, the thesis presents an iteration algorithm to predict the consensus structure with pseudoknots of homology RNA sequences. With emphasis on the impact of neighbour base pairs on covariance, the algorithm introduces a model of stacking covariance into Ifold and combines with minimum free energy to assess RNA secondary structure with pseudoknots though iterations. The numerical test shows that this algorithm can correctly predict pseudoknots, with the mean sensitivity and specificity better than that of other algorithms. The performance of the algorithm achieves the best result when the factor of stacking covariance is 5:1.
     (2) A seed set expansion algorithm for predicting multiple suboptimal consensus structures of homology RNA sequences is presented. The algorithm applies seed set expansion algorithm derived from HotKnots to predict consensus structure of homology RNA sequences. The seed sets are made up of seeds and expanded to multiple suboptimal consensus structures. The numerical test shows that multiple suboptimal structures are all close to the reference structures and some stable substructures can be achieved. The mean sensitivity and specificity of the algorithm are better than that of the reference algorithms and the time complexity is more than that of iteration algorithms and far less than that of dynamic programming algorithms.
引文
[1]Sean R.E.,Computational genomics of noncoding RNA genes[J],Cell,2002,109(2):137-140
    [2]Gisela S.,An expanding universe of noncoding RNAs[J],Science,2002,296(5571):260-1263
    [3]孙强,黄红艳,韩骅,ncRNA候选基因sp1的克隆与初步分析[J],遗传学报,2004,31(5):485-488
    [4]刘次全,白春礼,张静,结构分子生物学[M],北京:高等教育出版社,1997
    [5]Dan E.Krane and Michael L.Raymer著,孙啸,陆祖宏,谢建明等译,生物信息学概论[M],北京:清华大学出版社,2004
    [6]Bruce A Shapiro,Yaroslava G Yingling,Wojciech Kasprzak and Eckart Bindewald,Bridging the gap in RNA structure prediction[J],Current Opinion in Structural Biology,2007,17(2):157-165
    [7]付微,关于RNA二级结构的研究[D],武汉大学硕士学位论文,2004
    [8]刘海军,RNA二级结构预测的建模及其应用研究[D],上海大学博士学位论文,2005
    [9]Caroline Thurner,Conserved and Consensus RNA Structures[D],Doctor's Thesis for University of Vienna,2004
    [10]Zuker M.and Stiegler P.,Optimal computer folding of larger RNA sequences using thermodynamics and auxiliary information[J],Nucleic Acids Research.,1981,9:133-148
    [11]Zuker M.,Computer prediction of RNA structure[J],Methods in Enzymology,1989,180:262-28
    [12]Zuker M.,Calculating nucleic acid secondary structure[J],Curr.Opin.Struct.Biol.,2000,10:303-310
    [13]I.L.Hofacker,W.Fontana,P.F.Stadler,S.Bonhoeffer,M.Tacker and P.Schuster,Fast Folding and Comparison of RNA Secondary Structures[J],Monatsheftef.Chemie,1994,125:167-188
    [14]Susan M.Freier,Ryszard Kierzek,John A.Jaeger,Naoki Sugimoto,Marvin H.Caruthers,Thomas Neilson,and Douglas H.Tumer,Improved free-energy parameters for predictions of RNA duplex stability[J],Proc.Natl.Acad.Sci.USA,1986,83(24):9373-9377
    [15]Mathews D.H.,Sabina J.,Zuker M.and Turner D.H.,Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secodary Structure[J],J.Mol.Biol.,1999,288:911
    [16]Eddy SR.,A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure[J],BMC Bioinformatics,2002,3:18
    [17]Durbin R,Eddy S,Krogh A and Mitchison G.,Biological Sequence Analysis:Probabilistic Models of Proteins and Nucleic Acids[M],Cambridge:Cambridge University Press,1998
    [18]Hofacker,I.L.,Fekete,M.and Stadler,P.F.,Secondary structure prediction for aligned RNA sequences[J],J.Mol.Biol.,2002,318:1059-1066
    [19]Knudsen B.and Hein J.,RNAsecondary structure prediction using stochastic context-free grammars and evolutionary history[J],Bioinformatics,1999,15:446-454
    [20]Knudsen B.and Hein J.,Pfold:RNA secondary structure prediction using stochastic context-free grammars[J],Nucleic Acids Res,2003,31:3423-3428
    [21]McCaskill J.S.,The equilibrium partition function and base pair binding probabilities for RNA secondary structure[J],Biopolymers,1990,29:1105-1119
    [22]Rune B.Lyngsφ and Christian N.S.,Pedersen.RNA Pseudoknot Prediction in Energy BasedModels[J],J.Comput.Biol,2000,7..409
    [23]Rivas E.,Eddy S.R.,A Dynamic Programming Algorithm for RNA Structure Prediction Including Pseudoknots[J],J.Mol.Biol.,1999,285:2053-2068
    [24]Rivas E and Eddy SR.,The language of RNA:a formal grammar that included pseudoknots[J].Bioinformatics,2000,16:334
    [25]Lyngsφ RB,Christian NS,Pseudoknots in RNA Secondary Structure[C],Proceedings of Recomb,Tokyo Japan USA,2000
    [26]李恒武,朱大铭,RNA二级结构预测算法[J],计算机科学,2002,29(z1):275-278
    [27]Ruan J.,Stormo G.D.and Zhang W.,An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots[J],Bioinformatics,2004,20:58-66
    [28]王金华,骆志刚,管乃洋等,基于最小自由能和协变信息预测带伪结RNA二级结构的迭代化方法[J],遗传,2007,29(7):889-897
    [29]Ren J.,B.Rastegari,et al,HotKnots:heuristic prediction of RNA secondary structures including pseudoknots[J], RNA, 2005, 11(10): 1494-504
    
    [30] Van Bantenburg F.H.D., Gultyaev AT, et al. An APL-programmed Genetic Algorithm for the Prediction of RNA Secondary Structure[J], J. Theor Biol, 1995,174: 269-28
    
    [31] Gultyaev A.P., Van Batenburg F.H.D., et al. The Computer Simulation of RNA Folding Pathways Usinga Genetic Algorithm [J], J. Mol. Biol., 1995, 250:37-51
    
    [32] Steeg E.W., Neural networks: adaptive optimization and RNA secondary structure prediction[C], In Artiftcal intelligence and molecular biology, Hunter L.Ed. Menlo Park: AAAI Press, 1995: 121-160
    
    [33] Bindewald E. and Shapiro B.A., RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers[J], RNA,2006, 12(3): 342-352
    
    [34] Tao Jiang, Ying Xu and Michael Q. Zhang, Current Topics in Computational Molecular Biology[M], London: The MIT Press, 2002
    
    [35] Pleij C.W.A., Pseudoknots: a new motif in the RNA game[J], Trends Biochem. Sci., 1990, 15: 143-157
    
    [36] Shapiro B.A. and Wu J., Prediction RNA H-type pseudoknots with the massively parallel genetic algorithm[J], CABIOS, 1997, 13: 459-47
    
    [37] ten Dam E., Pleij C.W.A., Draper D., Structural and functional aspects of RNA pseudoknots[J], Biochemistry, 1992, 31: 11665-11676
    
    [38] Zuker M., Sankof D., RNA secondary structures and their prediction[J],Bulletin of Mathematical Biology, 1984, 46: 591-621
    
    [39] Waterman M. and Smith T., Rapid dynamic programming algorithms for RNA secondary structure[J], Advances in Applied Mathematics, 1986, 7: 455-464
    
    [40] Lyngφ R., Zuker M. and Pedersen C, Fast evaluation of internal loops in RNA secondary structure prediction[J], Bioinformatics, 1999, 15(6): 440-445
    
    [41] Chiu D.K. and Kolodziejczak T., Inferring consensus structure from nucleic acid sequences[J], CABIOS, 1991, 7: 347-352
    
    [42] Gutell R.R., Power A., Hertz G.Z., Putz E.J. and Stormo G.D., Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods[J], Nucl. Acids Res., 1992,20: 5785-5795
    
    [43] Gulko Brad, Using Phylogenetic Markov Trees to Detect Conserved Structure in RNA Multiple Alignments[D], Master's Thesis for the University of California at Santa Cruzboard of Computer Engineering, 1995
    
    [44] Akmaev V.R., Kelley S.T., Stormo G.D., Phylogenetically enhanced statistical tools for RNA structure prediction[J], Bioinformatics, 2000, 16:501-512
    
    [45] Chiu D.K., Kolodziejczak T., Inferring consensus structure from nucleic acid sequences[J], CABIOS, 1991, 7: 347-352
    
    [46] Gutell R.R., Power A., Hertz G.Z., Putz E.J., Stormo G.D., Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods[J], Nucl. Acids Res., 1992,20: 5785-5795
    
    [47] Gutell R.R. and Woese C.R., Higher order structural elements in ribosomal RNAs: Pseudoknots and the use of noncanonical pairs[J], Proc. Natl. Acad. Sci.,USA, 1990, 87: 663-667
    
    [48] Akmaev V.R., Kelley S.T. and Stormo G.D., Phylogenetically enhanced statistical tools for RNA structure prediction[J], Bioinformatics, 2000, 16:501-512
    
    [49] R Luck, S Graf and G Steger., Construct: a tool for thermodynamic controlled prediction of conserved secondary structure[J], Nucleic Acids Research,1999, 27(21): 4208-4217
    
    [50] Smith T.F. and Waterman M.S., Identification of Common Molecular Subsequces[J].J.Mol.Biol., 1981, 147: 195-197
    
    [51] Andronescu M., Zhang Z.C. and Condon A., Secondary structure prediction of interacting RNA molecules[J], J.Mol.Biol., 2005, 345: 987-1001
    
    [52] Borer PN, Dengler B., Tinoco I. Jr and Uhlenbeck OC, Stability of ribonucleic acid double-stranded helices[J], J. Mol.Evol., 1974, 86: 843-853
    
    [53] Onoa B. and Tinoco I. Jr., RNA folding and unfolding[J], Curr. Opin. Struct.Biol., 2004, 14: 374-379
    
    [54] S. Lindgreen, P. P. Gardner and A. Krogh, Measuring covariation in RNA alignments: physical realism improves information measures[J], Bioinformatics,2006, 22(24): 2988-2995
    
    [55] Christina Witwer, Ivo L. Hofacker and Peter F. Stadler., Prediction of consensus RNA secondary structures including pseudoknots[J] , IEEE/ACM Transaction on Computational Biology and Bioinformatics, 2004, 1(2): 66—77
    
    [56] S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khann and S.R. Eddy,Rfam: an RNA family database [J], Nucl. Acids Res., 2003, 31: 439-441, http://www.sanger,ac.uk/Software/Rfam/
    [57]J.Gorodkin,B.Knudsen,C.Zwieb and T.Samuelsson,SRPDB:signal recognition particle database[J],Nucl.Acids Res.,2001,29(1):169-170,http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html
    [58]B.Knudsen,J.Wower,C.Zwieb and J.Gorodkin,tmRDB(tmRNA database)[J],Nucl.Acids Res.,2001,29(1):171-172,http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html
    [59]J.W.Brown,The ribonuclease P database[J],Nucl.Acids Res.,1999,27(1):314,http://www.mbio.ncsu.edu/RNaseP/home.html
    [60]P.Baldi,S.Brunak,Y.Chauvin,C.Andersen and H.Nielsen,Assessing the accuracy of prediction algorithms for classification:an overview[J],Bioinformatics,2000,16:412-424
    [61]Peter De Rijk,Jan Wuyts and Rupert De Wachter,RnaViz2:an improved representation of RNA secondary structure[J],Bioinformatics,2003,19(2):299-300
    [62]Stefan Wuchty,Walter Fontana,Ivo L.Hofacker and Peter Schuster,Complete suboptimal folding of RNA and the stability of secondary structure[J],Biopolymers,1999,49(2):145-165
    [63]严蔚敏,吴伟民,数据结构(C语言版)[M],北京:清华大学出版社,2002
    [64]Serra M.J.,Turner D.H.and Freier S.M.,Predicting thermodynamic properties of RNA[J],Meth.Enzymol.,1995,259:243-261
    [65]Dirks R.M.and Pierce N.A.,A partition function algorithm for nucleic acid secondary structure including pseudoknots[J],J.Comput.Chem.,2003,24:1664-1677
    [66]张法,乔香珍,刘志勇,基于Smith-Waterman算法的并行分而治之生物序列比对算法[J],中国科学E辑(技术科学),2004,34(2):190-199
    [67]曹新谱,算法设计与分析[M],湖南长沙:国防科学技术大学,1983
    [68]王金华,RNA二级结构预测的研究[D],国防科学技术大学硕士学位论文,2007

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700