基于混沌模拟退火的RNA二级结构预测的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
RNA是生物遗传信息的中间载体,参与蛋白质合成,在细胞分化凋亡、生物发育、疾病发生等方面起着重要作用。RNA二级结构是由碱基配对与核苷酸链折叠而成的茎环空间结构,其茎环结构不仅可用于RNA功能分析,还可用于RNA三级结构预测。因此,RNA二级结构预测具有重要意义。RNA二级结构可通过物理实验测定,但耗时长且成本高。借助生物信息学方法预测RNA二级结构,可加速认识RNA分子空间结构及其生物学功能。
     本论文探讨基于混沌模拟退火的RNA二级结构预测问题,主要工作包括:
     1、阐述RNA二级结构的表达方式和形式化表示;
     2、描述RNA二级结构预测的现有主要方法并总结各方法特点;
     3、基于混沌映射的随机性,遍历性特点和模拟退火的优化能力,给出一个基于混沌模拟退火的RNA二级结构预测算法,通过控制混沌系统的轨道密度调节RNA序列上发生折叠的位置,以冷却进度表控制模拟退火过程,采用最小自由能作为目标函数以预测RNA二级结构。
     对RNA序列Asellus aquaticus, Haloarcula marismortui, Saccharomyces cerevisiae进行的仿真实验分别取得了69.12%、55.26%和89.19%的碱基对正确率,表明了基于幂函数载波的混沌退火算法应用于RNA二级结构预测的可行性。选择更长的RNA序列试验、进一步研究幂函数载波方法对预测结果的影响是下一步的研究方向。
RNA is the intermediate carrier of genetic information, participates in the synthesis of proteins, plays an important role in apects of cell differentiation and apoptosis, biological development, disease triggering etc. RNA secondary structure is a kind of stem-loop space structure which forms by base pairing and nucleotide chain self-folding. The stem-loop structures in RNA secondary structure not only on the analysis of RNA function but also can be used to predict RNA tertiary structure, therefore, RNA secondary structure prediction is a significant work. RNA secondary structure can be determined by the method of physical experiments, but it is time-consuming and costly. So, researchers predict RNA secondary structure with bioinformatics methods to accelerate understanding of the spatial structure of RNA molecules and their biological functions.
     This thesis discusses simulated annealing-based RNA secondary structure prediction problem, including:
     1. Elaborates the expression method of RNA secondary structure and formal representation;
     2. Describes the existing RNA secondary structure prediction methods and summarizes their characteristics;
     3. Based on the stochastic, ergodicity characteristics of chaos mapping and the optimal capacity of simulated annealing, this thesis proposed an algorithm based on chaotic simulated annealing algorithm through the technical of controling chaotic systems track density to adjust the position where folding occurs, using cooling schedule to regulate the annealing process, taking RNA secondary structure free energy as the objective function to predict RNA secondary structure.
     The simulation obtains 69.12%, 55.26% 89.19% base pair correct rate for predicting RNA sequence Aquaticus Asellus , Haloarcula Saccharomyces and Cerevisiae Marismortui respectively. It shows the feasibility of appling the CSA algorithm based on power function carrier to predict RNA secondary structure. Further work will be done in apects of testing for longer RNA sequence and studing of the affects of power function carrier to the base pair correct rate.
引文
[1] Pal, S., S. Bandyopadhyay and S. Ray, Evolutionary computation in bioinformatics: A Review. IEEE transactions on systems man and cybernetics—part c: Applications and reviews, 2006, 36(5): 601-615
    [2] Smith, Temple F., Waterman, Michael S. Identification of common molecular subsequences, Journal of molecular biology, 1981, 147(1): 195–197
    [3] A. P. Gultyaev,V. Batenburg, C.W. A. Pleij, The computer simulation of RNA folding pathways using an genetic algorithm, J. Mol. Biol., 1995, 250(1):37–51
    [4] Yu, U., et al. Bioinformatics in the Post-genome Era, Journal of biochemistry and molecular biology, 2004, 37(1): 75-82
    [5] Larsen, N., C. Zwieb, SRP-RNA sequence alignment and secondary structure, Nucleic acids research, 1991, 19(2):209-215
    [6] Jr, T.I., U. OC and L. MD, Estimation of secondary structure in ribonucleic acids, Nature, 1971. 230(5293):362-367
    [7] Michael, Z. and S. Patrick, Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information, Nucleic acids research, 1981, 9(1): 133-148
    [8]刘海军.RNA二级结构预测的的建模及其应用研究:[博士学位论文].上海:上海大学,2005
    [9] Waterman, M. and T. Smith, Rapid dynamic programming algorithms for RNA secondary structure. Advances in applied mathematics, 1986, 7(4): 455-464
    [10] M, S. and S. G, Description of RNA folding by simulated annealing. Journal of molecular biology, 1996, 255(1): 254-266
    [11]任清华,莫中息,陶玉敏.预测RNA二级结构的一种遗传模拟退火算法.武汉大学学报(理学版), 2004,50(1):23-28
    [12]邹权,郭茂祖,张涛涛, RNA二级结构预测方法综述.电子学报, 2008, 36(2): 331-337
    [13] Knudsen, B. and J. Hein, Pfold: RNA secondary structure prediction using stochastic context-free grammars, Nucleic acids research, 2003, 31(13): 3423-3428
    [14] Hofacker, I.L., Vienna RNA secondary structure server, Nucleic acids research, 2003, 31(13): 3429-3431
    [15] Mathews, D.H. and D.H. Turner, Dynalign: an algorithm for finding the secondary structure common to two RNA sequences, Journal of molecular biology, 2002, 317(2): 191-203
    [16] Siebert, S. and R. Backofen, MARNA multiple alignment and consensus structure predictionMARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons, Bioinformatics, 2005, 21: 3352-3359
    [17] Baldi, P., et al., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 2000, 16(5):412-424
    [18] Hamada, M., et al., Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics, 2009, 25(4): 465-473
    [19] Smith, S.F., Covariance-Model-Based RNA gene finding: using dynamic programming versus evolutionary computing, Computational intelligence in bioinformatics, 2008, 94(7):183-208
    [20] Knudsen, B. and J. Hein, RNA secondary structure prediction using stochastic context-free grammars and evolutionary history, Bioinformatics, 1999, 15(6): 446-454
    [21] Y, Y.S., et al., Stochastic context-free grammars for tRNA modeling. Nucleic acids research, 1994, 22(23): 5112-5120
    [22] Richard, D. and E. Sean, Biological sequence analysis: Probabilistic models of proteins and nucleic acids, Cambridge: Cambridge University Press, 1998
    [23] Eddy, S.R., What is a hidden Markov model?, Nature biotechnology, 2004, 22(10):1315 - 1316
    [24] SR, E. and D. R, RNA sequence analysis using covariance models, Nucleic acids research, 1994. 22(11): 2079-2088
    [25] Nussinov, R., et al., Algorithms for Loop Matchings, SIAM journal on applied mathematics, 1978. 35(1): 68-82
    [26] NUSSINOV, R. and A.B. JACOBSON, Fast algorithm for predicting the secondary structure of fast algorithm for predicting the secondary structure of single-stranded RNA, Proceedings of the national academy of sciences, 1980. 77(11):6309-6313
    [27] Hunt, J.W. and T.G. Szymanski, A fast algorithm for computing longest common subsequences. Communications of the ACM, 1977, 20(5):350 - 353
    [28] Hirschberg, D.S., Algorithms for the Longest Common Subsequence Problem. Journal of the ACM, 1977, 24(4):664 - 675
    [29] Wagner, R.A. and M.J. Fischer, The string-to-string correction problem. Journal of the ACM, 1974, 21(1):168 - 173
    [30] Rivasa, E. and S.R. Eddya, A dynamic programming algorithm for RNA structure predictionincluding pseudoknots. Journal of molecular biology, 1999, 285(5): 2053-2068
    [31] Kirkpatrick, S., Optimization by simulated annealing: Quantitative studies. Journal of statistical physics, 1983, 34(5-6):975-986
    [32]康立山,谢云,尤矢勇等.非数值并行算法—模拟退火算法.北京:科学出版社,1994
    [33] Mitchell, M., An introduction to genetic algorithms. 1996: MIT Press Cambridge, MA, USA
    [34] B., N., L. ERIC and WESTHOF, Geometric nomenclature and classification of RNA base pairs, RNA, 2001. 7(4):499-512
    [35] Eddy, S.R., Noncoding RNA genes, Current opinion in genetics & development, 1999. 9(6):695-699
    [36] Mattick, J.S. and I.V. Makunin, Non-coding RNA. Human molecular genetics, 2006, 15(1): R17-R29
    [37] Lari, K. and S. Young, The estimation of stochastic context-free grammars using the Inside-Outside algorithm. Computer speech and language, 1990, 4:35-56
    [38] J V Maizel, J. and R.P. Lenk, Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natt Acad. Sci., 1981. 78(12):7665-7669
    [39] Jessica S Reuter, David H Mathews, RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics, 2010, 11:129
    [40] DH, M., et al., Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure, Journal of molecular biology, 1999, 288(5): 911-940
    [41] Pan, H., L. Wang and B. Liu, Chaotic annealing with hypothesis test for function optimization in noisy environments, Chaos, Solitons & Fractals, 2008. 35(5):888-894
    [42] Tang, W., Chaotic optimization method based on power function carrier and its application. Control and Decision, 2005, 20(9):1043-1046
    [43]胡桂武,彭宏.基于免疫粒子群集成的RNA二级结构预测算法.计算机工程与应用,2007.43(3):26-29
    [44] Yanqiu Che, Zheng Tang, and Shaozhi Liu, An Efficient Algorithm Based on Hopfield Neural Network for RNA Secondary Structure Prediction, International journal of computer science and network security, 2007, 7(5):49-54
    [45] Quan Zou, Tuo Zhao, Yang Liu etc., Predicting RNA secondary structure based on the class information and hopfield network. Computers in biology and medicine, 2009, 39(3):206-214

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700