用户名: 密码: 验证码:
基于遗传禁忌算法的蛋白质三维折叠结构预测
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蛋白质折叠成特有的空间结构后,行使各种各样的生物活性,对其折叠结构进行研究,是生物信息学的关键问题之一。基于蛋白质天然构象处于其能量最低态的热力学假说,研究者们正采用从头预测法从氨基酸残基序列直接预测其空间结构,遇到了两大难点:其一,需要一个能区分蛋白质天然态和非天然态的势能函数;其二,需要一个有效的全局优化方法,搜索其最低自由能状态。
     本文将能反映蛋白质真实特性的三维AB非格模型作为简化模型,结合遗传算法和禁忌算法,形成了一种新的混合优化方法——遗传禁忌算法,并提出了五个改进策略:染色体编码、可变群体规模、排序选择、变异算子TSM和交叉算子TSR,用以增强算法的全局优化能力和局部搜索力度。
     本文将遗传禁忌算法应用于两类数据的三维折叠结构预测:被广泛使用的4条斐波纳契序列和从PDB获取的9条真实蛋白质序列,均能在保持较高精度的情况下,快速搜索到最低自由能和最低能量构象。实验结果表明,本文搜索到的最低能量值都优于同类其它算法得到的值,且随着序列长度的增加,遗传禁忌算法对解的改善程度越高,此外,本文的最低能量构象均形成了紧凑的疏水核,亲水残基包围在外面,与蛋白质的真实特征保持一致。这些都说明遗传禁忌算法为蛋白质三维折叠结构预测提供了一个有效的高性能的全局搜索方法。
While protein folds into its specific spatial structure, it performs a wide variety of bioactivities, thus the study of protein folding structure is a key issue in bioinformatics. Based on the thermodynamic hypothesis that the native conformation of protein is the one at lowest free energy state, researchers are using the ab inito prediction method to predict stereo structure from its amino acid sequence directly, and encounter two major difficulties. One is how to obtain a potential energy function that distinguishes the native conformation from non-native structures, and the other is how to develop an effective global optimization method to search the minimum free energy.
     The three-dimensional AB off-lattice model which remains faithful to protein characteristics is adopted as simplified model in this thesis, and a novel hybrid approach genetic tabu search algorithm (GATS) that combines genetic algorithm and tabu search algorithm is presented for dealing with the 3D protein folding structure prediction. In order to enhance the global optimization ability and local search intensity, five improved strategies are developed for the proposed method, which are chromosome encoding, variable population size, ranking selection, tabu search mutation (TSM) and tabu search recombination (TSR).
     The proposed genetic tabu search algorithm is applied to two types of data for folding structure prediction in three dimensions, which are the widely-used four Fibonacci sequences and the nine real protein sequences from Protein Data Bank (PDB), and can search the minimum free energies and the lowest-energy landscapes quickly with the higher accuracy. Experimental results show that the lowest energies obtained by our algorithm are better than those searched by other methods, and the optimal solutions are improved even more with the sequence length increases. In addition, the lowest-energy conformations of this thesis all form compact hydrophobic cores, surrounded by hydrophilic residues as observed in real protein. All of these demonstrate that the proposed GATS provides an effective and high-performance global search method for protein folding structure prediction.
引文
[1] [英] D.惠特福德(著),魏群(主译).蛋白质结构与功能[M].北京:科学出版社, 2008.
    [2] G. T. Montelione, D. Zheng, Y. J. Huang, et al. Protein NMR spectroscopy in structural genomics [J]. Nature Structural Biology, 2000, 7(Supp.): 982–985.
    [3] V. Heun. Approximate protein folding in the HP side chain model on extended cubic lattices [J]. Discrete Applied Mathematics, 2003, 127: 163–177.
    [4] M. M. Gromiha, S. Selvaraj. Inter-residue interactions in protein folding and stability [J]. Progress in biohysics and molecular biology, 2004, 86: 25–277.
    [5] M. P. Morrissey, Z. Ahmed, E. I. Shakhovich. The role of cotranslation in protein folding: a lattice model study [J]. Polymer, 2004, 45: 557–571.
    [6] C. B. Anfinsen. Principles that govern the folding of protein chains [J]. Science, 1973, 181 (4096): 223–227.
    [7] C. Branden, J. Tooze. Introduction to protein structure (2nd edn.) [M]. New York: Garland Science Publishing, 1999.
    [8] [德] T.伦盖威尔(主编).郑珩,王非(译).生物信息学-从基因组到药物[M].北京:化学工业出版社, 2006.
    [9] R. Unger, J. Moult. Finding the lowest free energy conformation of a protein is an NP-hard problem: proof and implication [J]. Bulletin of Mathematical Biology, 1993, 55(6): 1183–1198.
    [10] A. S. Fraenkel. Complexity of protein folding [J]. Bulletin of Mathematical Biology, 1993, 55(6): 1192–1210.
    [11] A. N. Lupas. The long coming of computational structural biology [J]. Journal of Structural Biology, 2008, 163(3): 254–257.
    [12]殷志祥,蛋白质结构预测方法的研究进展[J].计算机工程与应用, 2004, 20: 54–57.
    [13] J. Lee, A. Liwo, H. A. Scheraga. Energy-based de novo protein folding by conformational space annealing and an off-lattice united-residue force field: Application to the 10-55 fragment of staphylococcal protein A and to apo calbindin D9K [J]. Proc. Natl. Acad. Sci. Biophysics, 1999, 96: 2025–2030.
    [14] H. S. Lopes. Evolutionary algorithms for the protein folding problem: A Review and Current Trends [J]. Computational Intelligence in Biomedicine and Bioinformatics, Springer Berlin / Heidelberg, 2008, 151: 297–315.
    [15] K. A. Dill. Theory for the folding and stability of globular proteins [J]. Biochemistry, 1985, 24: 1501-1509.
    [16] A. Irb?ck, E. Sandelin. Local Interactions and Protein Folding: Model Study on the Squareand Triangular Lattices [J]. Journal of Chemical Physics, 1998, 108(5): 2245–2250.
    [17] F. H. Stillinger, T. Head-Gordon and C. L. Hirshfel. Toy model for protein folding [J]. Physical Review. E, 1993, 48: 1469–1477.
    [18] F. H. Stillinger, T. Head-Gordon. Collective aspects of protein folding illustrated by a toy model [J]. Physical Review. E, 1995, 52: 2872–2877.
    [19] C. Clementi, H. Nymeyer, J. N. Onuchic. Topological and energetic factors: what determines the structural details of the transition state ensemble and en-route intermediates for protein folding? An investigation for small globular proteins [J]. Journal of Molecular Biology, 2000, 298: 937–953.
    [20] S. Matysiak, C. Clementi: Optimal combination of theory and experiment for the characterization of the protein folding landscape of S6: how far can a minimalist model go? [J]. Journal of molecular biology, 2004, 343: 235–248.
    [21] J. Karanicolas, C. L. Brooks III: Improved Gō-like models demonstrate the robustness of protein folding mechanisms towards non-native interactions [J]. Journal of molecular biology, 2003, 334: 309–325.
    [22] J. Liu, L. Wang, L. He, F. Shi. Analysis of toy model for protein folding based on particle swarm optimization algorithm [J]. ICNC, 2005, (3): 636–645.
    [23]许如初,秦明,黄文奇.蛋白质结构预测的拟物拟人算法研究[J].计算机应用研究, 2007, 24(8): 45–47.
    [24] W. Huang, J. Liu.Structure optimization in a three-dimensional off-lattice protein model [J]. Biopolymers, 2006, 82: 93–98.
    [25] A. Irb?ck, C.Peterson, F. Potthast. Identification of amino acid sequences with good folding properties in an off-lattice model [J]. Physical Review E, 1997, 55: 860–867.
    [26] W. E. Hart, S. Istrail. Robust proofs of NP-hardness for protein folding general lattices and energy potentials [J]. Journal of Computational Biology, 1997, 4(1): 1–22.
    [27] J. T. Ngo, J. Marks, M. Karplus. Computational complexity, protein structure prediction, and the Levinthal paradox [M]. The Protein folding problem and tertiary structure prediction. Boston: Birkhauser, 1994.
    [28] M. Bachmann, H. Arkin, W. Janke. Multicanonical study of coarse-grained off-lattice models for folding heteropolymers [J]. Physical Review E, 2005, 71, 031906.
    [29] S. Y. Kim, S. B. Lee, J. Lee. Structure optimization by conformational space annealing in an off-lattice protein model [J]. Physical Review E, 2005, 72, 011916.
    [30]陈矛,黄文奇,吕志鹏.求解蛋白质折叠问题的模拟退火算法[J].小型微型计算机系统, 2007, 28(1):75–78.
    [31] X. Zhang, X. Lin. Structure optimization by Genetic-Annealing algorithm in a 3D off-lattice protein model [C]. LNAI, 2007, 4819: 186-193.
    [32] X. Zhang, W. Cheng. Protein 3D structure prediction by improved Tabu Search in Off-Lattice AB model [C]. iCBBE, 2008, 184-187.
    [33] T. Wüst, D. P. Landau. The HP model of protein folding: a challenging testing ground for Wang-Landau sampling [J]. Computer Physics Communications, 2008, 179:124–127.
    [34] X. Zhao. Advances on protein folding simulations based on the lattice HP models with natural computing [J]. Applied Soft Computing, 2008, 8: 1029–1040.
    [35]许如初,秦明,黄文奇.蛋白质三维连续模型结构预测的高效算法[J].计算机工程与科学, 2007, 29(4): 68–71.
    [36] C. Cotta. Hybrid Evolutionary Algorithms for Protein Structure Prediction under the HPNX Model [J]. Advances in Soft Computing, 2005, 2: 525–534.
    [37]许忠能.生物信息学[M] .北京:清华大学出版社, 2008.
    [38]刘培楠,吴国利.基础分子生物学[M].北京:高等教育出版社, 1993.
    [39]王翼飞,史定华.生物信息学——智能优化算法及其应用[M].北京:化学工业出版社, 2006.
    [40]陶士珩.生物信息学[M].北京:科学出版社, 2007.
    [41]张阳德.生物信息学(第二版) [M].北京:科学出版社, 2009.
    [42]王克夷.蛋白质导论[M].北京:科学出版社, 2007.
    [43] [美] T. K. Attwood, D. J. Parry-Smith (著),罗静初(译).生物信息学概论[M].北京:北京大学出版社, 2002.
    [44] G. D. Fassman. Prediction of protein structure and the principles of protein conformation [M]. New York: Plenum Press, 1989.
    [45] C. B. Anfinsen, E. Haber, M. Sela, F. H. White. The kinetics of formation of native ribonuculease during oxidation of the reduced polypeptide chain [J]. PNAS, 1961, 47: 1309–1314.
    [46] [美] D. W. Mount (著),钟扬,王莉,张亮(主译).生物信息学[M].北京:高等教育出版社, 2003.
    [47] B. Rost, C. Sander. Combining evolutionary information and neural networks to predict protein secondary structure [J]. Proteins: Structure, Function, and Genetics, 1994, 19: 55-72.
    [48] B. Rost, C. Sander. Prediction of protein secondary structure at better than 70% accuracy [J]. Journal of Molecular Biology, 1993, 23(2): 584-99.
    [49] K. Ginalski,N. V. Grishin.Practical lessons from protein structure prediction [J]. Nucleic Acids Research, 2005, 33(6): 1874-1891.
    [50]徐晓风.蛋白质折叠[J].分子生物物理学研究生评述, 2003.
    [51]杨铭.结构生物学概论[M].北京:北京医科大学出版社, 2002.
    [52] C. Clementi. Coarse-grained models of protein folding: toy models or predictive tools? [J]. Current Opinion in Structural Biology, 2008, 18: 10–15.
    [53] C. Branden, J. Tooze. Introduction to protein structure (2nd edn.) [M]. New York: Garland Publishing Inc, 1991.
    [54] T. M. Mitchell. Machine Learning [M]. New York: McGraw-Hill Companies, 1997.
    [55]刘勇,康立山等.非数值并行算法——遗传算法[M].北京:科学出版社, 1995.
    [56]王凌.智能优化算法及其应用[M].北京:清华大学出版社,柏林:施普林格出版社, 2001.
    [57] K. A. Dill, S. Bromberg, K. Z. Yue, et al. Principle of protein folding - a perspective from simple exact models [J]. Protein Science, 1995, 4(4): 561-602.
    [58] R. K?nig, T. Dandekar. Refined Genetic Algorithm Simulation to Model Proteins [J]. Journal of Molecular Modeling. Springer Berlin/Heidelberg, 1999, 5: 317-324.
    [59] M. T. Hoque, M. Chetty, L. S. Dooley. A New Guided Genetic Algorithm for 2D Hydrophobic-Hydrophilic Model to Predict Protein Folding [C]. IEEE Congress on Evolutionary Computation, Edinburgh, 2005, 1: 259–266.
    [60]云庆夏.进化算法[M].北京:冶金工业出版社, 2000.
    [61] D. E. Goldberg. Genetic algorithms in search, optimization, and machine learning [M]. Boston, MA: Addison-Wesley Longman Publishing, 1989.
    [62] H. Mühlenbein. Evolution in time and space - the parallel genetic algorithm [C]. Foundations of Genetic Algorithms. San Mateo: Morgan Kaufmann Publishers, 1991.
    [63] J. C. Poths, T. D. Giddens, S. B. Yadaw. The development and evaluation for an improved genetic algorithm based on migration and artificial selection [J]. IEEE Transactions on Systems, 1994, 24(1): 37–86.
    [64]朱剑英.智能系统非经典数学方法[M].武汉:华中科技大学出版社, 2004.
    [65] T. Starkweather, S. Mcdaniel, D. Whitley, et al. A comparison of genetic sequencing operators [C]. Proceedings of the fourth International Conference on Genetic Algorithms. Los Altos: Morgan Kaufmann Publishers, 1991, 69–76.
    [66] F. Glover. Future paths for integer programming and links to artificial intelligence [J]. Computers and Operations Research, 1986, 13: 533–549.
    [67] F. Glover. Tabu search: Part I [J]. ORSA. Journal on Computing, 1989, 1: 190–206.
    [68] F. Glover. Tabu search: part II [J]. ORSA. Journal on Computing, 1990, 2: 4–32.
    [69]岳晓辉.基于禁忌搜索算法的蛋白质结构预测的研究[D].大连理工大学, 2005.
    [70] F. Glover, Kelly J, Laguna M. Genetic algorithms and tabu search: Hybrids for optimization [J]. Computers and Operations Research, 1995, 22(1): 111–134.
    [71] B. B. Mabrouk, H. Hasni, Z. Mahjoub. On a parallel genetic-tabu search based algorithm for solving the graph colouring problem [J]. European Journal of Operational Research, 2009, 197(3): 1192–1201.
    [72] G. Vilcot, J. -C. Billaut. A tabu search and a genetic algorithm for solving a bicriteriageneral job shop scheduling problem [J]. European Journal of Operational Research, 2008, 190: 398–411.
    [73]吴圣宁,李思昆.遗传算法和关键事件禁忌搜索相融合的ARM/Thumb处理器指令选择[J].计算机学报, 2007, 30(4): 680–685.
    [74]陈友,沈华伟,李洋,等.一种高效的面向轻量级入侵检测系统的特征选择算法[J].计算机学报, 2007, 30(8): 1398–1408.
    [75] Z. Michalewicz. Genetic algorithms + data structures = evolution programs (3rd edn.) [M]. New York: Springer-Verlag, 1996.
    [76]程文.基于改进的禁忌算法的蛋白质三维结构预测[D].武汉科技大学,2008.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700