用户名: 密码: 验证码:
蛋白质结构预测与结构比对方法的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
后基因组时代最主要的研究任务之一就是阐明蛋白质的功能。蛋白质功能可以帮助人们理解复杂的生命现象。但是在许多情况下,不仅要了解蛋白质所发挥的作用,更需要理解为什么蛋白质会具有这种功能,这要求人们深入地研究蛋白质结构。然而,受到蛋白质结构和功能获取实验技术的限制,大量已知蛋白质的结构和功能仍是未知的。生物信息学的快速发展为解决这一问题提供了非常有效的途径。基于此,本文通过对特征进行发现和挖掘,研究了蛋白质结构和功能的预测及分析领域中几个相关问题。主要工作如下:
     (1)研制了一个二级结构预测工具。通过分析蛋白质二级结构端点位置附近氨基酸分布特征,发现这些位置上的氨基酸分布具明显的特异性。在此基础上,结合其它特征,构建了对二级结构整个片段进行整体预测的工具E-SSpred。利用标准测试数据集进行的测试结果表明,E-SSpred二级结构预测精度要优于同类软件,特别是对二级结构端点的预测准确度有大幅度的提高。
     (2)提出了一种考虑模板疏水环境的能量计算函数,并在此基础上开发了折叠识别预测系统。通过分析蛋白质结构中疏水环境对残基间成对相互作用能的影响,发现在不同疏水环境中残基相互作用能存在的差异。基于此,改进了折叠识别方法中的能量计算函数,并进一步将其能量函数应用于折叠识别方法之中。测试的结果表明,考虑疏水环境的影响可有效提高折叠识别的精度。
     (3)研制了基于二级结构元件的结构比对方法。通过分析二级结构元件特性及残基比对算法,针对于目前基于二级结构的结构比对方法在发现相关残基方面的不足,本研究在计算二级结构元件相似度时考虑元件的长度,并改进了残基对齐算法,基于此开发出了蛋白质结构比对工具3D-Sali。与同类软件的对比测试结果表明,3D-Sali具有较好的同源蛋白辨识能力,同时也可以很好的发现比对蛋白质间对应的残基。
     (4)分析了决定氨基酸替换对蛋白质功能影响的特征,并应用发现的特征进行预测。通过分析发现,功能位点及其相关位点上发生替换影响功能的可能性要远高于其它位点,而当前人们广泛应用的如进化信息等则不能反映这个现象。针对于此,功能注释数据库及位点相关性分析被用于得出功能位点及其相关位点信息,在此基础上进一步开发出氨基酸突变影响功能的预测方法。比较对比测试结果表明,这种方法可有效提高预测的精度。
One of most important tasks in the post-genome era is inferring function of proteins. Protein function can help us to understand the elusive life system. However, in many cases, illustrating why the protein has such function is more important than the function itself. To deal with this problem, the protein structure should be known. Restricted by the experimental techniques, though many proteins have been sequenced, most of them have not been assigned functional annotations or structures. Bioinformatics, which has been developing rapidly, provides an efficient way to resolve this problem. In the view of this, it's necessary to study the issues related to protein structure, function prediction and analysis using computational methods by characters finding.
     In this research, a protein secondary structure prediction tool was developed. The position-specific residue preferences around the protein secondary structures' ends were analyzed, and the results showed that there are residues distribution specificity around these sites. Based on this new feature and other features, E-SSpred, a protein secondary structure prediction tool which predicts the secondary structure fragments as a whole, was proposed. E-SSpred was evaluated on standard test datasets and compared with other tools, and the results indicated that E-SSpred can have better performance.
     By using a novel energy function, a fold recognition method was proposed. The diversity of residue-residue pair-wise interaction in different hydrophobic environment was found. Based on this, a new energy function was proposed and used in fold recognition. The new energy function was tested and compared with common energy function, and the results imply that considering the hydrophobic environment can improve the accuracy of fold recognition.
     A structure comparison method based on secondary structure elements was proposed. Aimed to resolve the deficiency of recent methods in matching the related residues in query protein and target protein, we improved the secondary structure similarity scoring function and the residue-residue alignment algorithm, and further developed a structure alignment tool, 3D-Sali. The test results indicated that 3D-Sali has good performance on both detecting homology proteins and finding corresponding residues.
     In the last part, the key factors to decide the amino acid substitution in effecting protein function were analyzed, and were used to improve the prediction. The function sites and their related sites in proteins are found to be more sensitive to impact on the protein function when the substitutions happen on these sites. However, recent widely used features, such as evolution information, cannot show this characteristic. In order to solve this problem, the function annotation database and correlation mutation analysis were used to find the function sites and their related sites, and a method using a novel feature considering these sites was proposed to predict the effect of amino acid substitution to protein function. The test results indicated that this method could improve prediction accuracy effectively.
引文
[1]Lee,D,Redfern,O,and Orengo,C.Predicting protein function from sequence and structure.Nat Rev Mol Cell Biol,2007,8(12):995-1005.
    [2]Baker,D and Sali,A.Protein structure prediction and structural genomics.Science,2001,294(5540):93-6.
    [3]Dodson,E J.Computational biology: protein predictions.Nature,2007,450(7167): 176-7.
    [4]Qian,B,Raman,S,Das,R,et al.High-resolution structure prediction and the crystallographic phase problem.Nature,2007,450(7167):259-64.
    [5]Ginalski,K,Grishin,N V,Godzik,A,et al.Practical lessons from protein structure prediction.Nucleic Acids Res,2005,33(6): 1874-91.
    [6]Redfern,O C,Dessailly,B,and Orengo,C A.Exploring the structure and function paradigm.Curt Opin Struct Biol,2008,18(3):394-402.
    [7]Cuff,A L,Sillitoe,I,Lewis,T,et al.The CATH classification revisited-architectures reviewed and new ways to characterize structural divergence in superfamilies.Nucleic Acids Res,2009,37(Database issue):D310-4.
    [8]Altschul,S F,Gish,W,Miller,W,et al.Basic local alignment search tool.J Mol Biol,1990,215(3):403-10.
    [9]Wang,Z and Moult,J.SNPs,protein structure,and disease.Hum Mutat,2001,17(4):263-70.
    [10]Cargill,M,Altshuler,D,Ireland,J,et al.Characterization of single-nucleotide polymorphisms in coding regions of human genes.Nat Genet,1999,22(3):231-8.
    [11]来鲁华,蛋白质的结构预测与分子设计.1 ed.1993,北京:北京大学出版社.
    [12]Chou,P Y and Fasman,G D.Conformational parameters for amino acids in helical,beta-sheet,and random coil regions calculated from proteins.Biochemistry,1974,13(2):211-22.
    [13]Chou,P Y and Fasman,G D.Prediction of protein conformation.Biochemistry,1974,13(2):222-45.
    [14] Lim, V I. Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. J Mol Biol, 1974, 88(4):873-94.
    [15] Qian, N and Sejnowski, T J. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol, 1988, 202(4):865-84.
    [16] Altschul, S F, Madden, T L, Schaffer, A A, et al. Gapped BLAST and PSI-BLAST:a new generation of protein database search programs. Nucleic Acids Res, 1997,25(17):3389-402.
    [17] Rost, B and Sander, C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol, 1993, 232(2):584-99.
    [18] Rost, B and Sander, C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc Natl Acad Sci USA, 1993, 90(16):7558-62.
    [19] Rost, B, Sander, C, and Schneider, R. Redefining the goals of protein secondary structure prediction. J Mol Biol, 1994, 235(1): 13-26.
    [20] Rost, B and Sander, C. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 1994, 19(1):55-72.
    [21] Rost, B, Sander, C, and Schneider, R. PHD-an automatic mail server for protein secondary structure prediction. Comput Appl Biosci, 1994, 10(1):53-60.
    [22] Goldman, N, Thorne, J L, and Jones, D T. Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J Mol Biol, 1996, 263(2): 196-208.
    [23] Jones, D T. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 1999, 292(2): 195-202.
    [24] Hua, S and Sun, Z. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol, 2001, 308(2):397-407.
    [25] Ward, J J, McGuffin, L J, Buxton, B F, et al. Secondary structure prediction with support vector machines. Bioinformatics, 2003, 19(13): 1650-5.
    [26] Kim, H and Park, H. Protein secondary structure prediction based on an improved support vector machines approach. Protein Eng, 2003, 16(8):553-60.
    [27] Guo, J, Chen, H, Sun, Z, et al. A novel method for protein secondary structure prediction using dual-layer SVM and profiles. Proteins, 2004, 54(4):738-43.
    [28] Qin, S, He, Y, and Pan, X M. Predicting protein secondary structure and solvent accessibility with an improved multiple linear regression method. Proteins, 2005,61(3):473-80.
    [29] Cole, C, Barber, J D, and Barton, G J. The Jpred 3 secondary structure prediction server. Nucleic Acids Res, 2008, 36(Web Server issue):Wl 97-201.
    [30] Moult, J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol, 2005, 15(3):285-9.
    [31] Chivian, D, Robertson, T, Bonneau, R, et al. Ab initio methods. Methods Biochem Anal, 2003, 44:547-57.
    [32] Wilson, C, Gregoret, L M, and Agard, D A. Modeling side-chain conformation for homologous proteins using an energy-based rotamer search. J Mol Biol, 1993, 229(4):996-1006.
    [33] Vasquez, M. Modeling side-chain conformation. Curr Opin Struct Biol, 1996,6(2):217-21.
    [34] Koehl, P and Levitt, M. A brighter future for protein structure prediction. Nat Struct Biol, 1999, 6(2):108-11.
    [35] John, B and Sali, A. Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res, 2003, 31(14):3982-92.
    [36] Ginalski, K. Comparative modeling for protein structure prediction. Curr Opin Struct Biol, 2006, 16(2): 172-7.
    [37] Bowie, J U, Luthy, R, and Eisenberg, D. A method to identify protein sequences that fold into a known three-dimensional structure. Science, 1991,253(5016):164-70.
    [38] Jones, D T, Taylor, W R, and Thornton, J M. A new approach to protein fold recognition. Nature, 1992, 358(6381):86-9.
    [39] Rost, B, Schneider, R, and Sander, C. Protein fold recognition by prediction-based threading. J Mol Biol, 1997, 270(3):471-80.
    [40] Rice, D W and Eisenberg, D. A 3D-lD substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J Mol Biol,1997, 267(4): 1026-38.
    [41] Ding, C H and Dubchak, I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 2001, 17(4):349-58.
    [42] Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 2008, 9:40.
    [43] Dutta, S, Burkhardt, K, Young, J, et al. Data deposition and annotation at the worldwide protein data bank. Mol Biotechnol, 2009, 42(1):1-13.
    [44] Schwede, T, Kopp, J, Guex, N, et al. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res, 2003, 31(13):3381-5.
    [45] Bordoli, L, Kiefer, F, Arnold, K, et al. Protein structure homology modeling using SWISS-MODEL workspace. Nat Protoc, 2009, 4(1):1-13.
    [46] Marti-Renom, M A, Stuart, A C, Fiser, A, et al. Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct, 2000,29:291-325.
    [47] Morris, R J. Statistical pattern recognition for macromolecular crystallographers. Acta Crystallogr D Biol Crystallogr, 2004, 60(Pt 12 Pt 1):2133-43.
    [48] Kelley, L A, MacCallum, R M, and Sternberg, M J. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol, 2000, 299(2):499-520.
    [49] Jones, D T. GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol, 1999, 287(4):797-815.
    [50] Grindley, H M, Artymiuk, P J, Rice, D W, et al. Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. J Mol Biol, 1993, 229(3):707-21.
    [51] Alexandrov, N N and Fischer, D. Analysis of topological and nontopological structural similarities in the PDB: new examples with old structures. Proteins, 1996, 25(3):354-65.
    [52] Holm, L and Sander, C. Protein structure comparison by alignment of distance matrices. J Mol Biol, 1993, 233(1):123-38.
    [53] Yang, J. Comprehensive description of protein structures using protein folding shape code. Proteins, 2008, 71(3): 1497-518.
    [54] Falicov, A and Cohen, F E. A surface of minimum area metric for the structural comparison of proteins. J Mol Biol, 1996, 258(5):871-92.
    [55] Gerstein, M and Levitt, M. Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins. Protein Sci, 1998, 7(2):445-56.
    [56] Taylor, W R. Protein structure comparison using iterated double dynamic programming. Protein Sci, 1999, 8(3):654-65.
    [57] Jewett, A I, Huang, C C, and Ferrin, T E. MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance. Bioinformatics, 2003, 19(5):625-34.
    [58] Orengo, C A and Taylor, W R. SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol, 1996, 266:617-35.
    [59] Singh, A P and Brutlag, D L. Hierarchical protein structure superposition using both secondary structure and atomic representations. Proc Int Conf Intell Syst Mol Biol, 1997,5:284-93.
    [60] Gibrat, J F, Madej, T, and Bryant, S H. Surprising similarities in structure comparison. Curr Opin Struct Biol, 1996, 6(3):377-85.
    [61] Dietmann, S and Holm, L. Identification of homology in protein structure classification. Nat Struct Biol, 2001, 8(11):953-7.
    [62] Ng, P C and Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res, 2001, 11(5):863-74.
    [63] del Sol, A, Pazos, F, and Valencia, A. Automatic methods for predicting functionally important residues. J Mol Biol, 2003, 326(4):1289-302.
    [64] Ferrer-Costa, C, Orozco, M, and de la Cruz, X. Sequence-based prediction of pathological mutations. Proteins, 2004, 57(4):811-9.
    [65] Fleming, M A, Potter, J D, Ramirez, C J, et al. Understanding missense mutations in the BRCA1 gene: an evolutionary approach. Proc Natl Acad Sci USA, 2003,100(3): 1151-6.
    [66] Mooney, S. Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief Bioinform, 2005, 6(1):44-56.
    [67] Jiang, R, Yang, H, Zhou, L, et al. Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations. Am J Hum Genet, 2007, 81 (2):346-60.
    [68] Stone, E A and Sidow, A. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res, 2005, 15(7):978-86.
    [69] Chasman, D and Adams, R M. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J Mol Biol, 2001, 307(2):683-706.
    [70] Herrgard, S, Cammer, S A, Hoffman, B T, et al. Prediction of deleterious functional effects of amino acid mutations using a library of structure-based function descriptors. Proteins, 2003, 53(4):806-16.
    [71] Stitziel, N O, Tseng, Y Y, Pervouchine, D, et al. Structural location of disease-associated single-nucleotide polymorphisms. J Mol Biol, 2003, 327(5):1021-30.
    [72] Ye, Z Q, Zhao, S Q, Gao, G, et al. Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics, 2007, 23(12):1444-50.
    [73] Ferrer-Costa, C, Orozco, M, and de la Cruz, X. Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J Mol Biol, 2002, 315(4):771-86.
    [74] Stitziel, N O, Binkowski, T A, Tseng, Y Y, et al. topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association. Nucleic Acids Res, 2004, 32(Database issue):D520-2.
    [75] Yue, P, Melamud, E, and Moult, J. SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics, 2006, 7:166.
    [76] Sunyaev, S, Ramensky, V, and Bork, P. Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet, 2000,16(5):198-200.
    [77] Bromberg, Y and Rost, B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res, 2007, 35(11):3823-35.
    [78] Yue, P and Moult, J. Identification and analysis of deleterious human SNPs. J Mol Biol, 2006, 356(5): 1263-74.
    [79] Chou, P Y and Fasman, G D. Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol, 1978, 47:45-148.
    [80] Gibrat, J F, Garnier, J, and Robson, B. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol, 1987, 198(3):425-43.
    [81] King, R D and Sternberg, M J. Machine learning approach for the prediction of protein secondary structure. J Mol Biol, 1990, 216(2):441-57.
    [82] Kneller, D G, Cohen, F E, and Langridge, R. Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol, 1990,214(1):171-82.
    [83] Vieth, M, Kolinski, A, Skolnick, J, et al. Prediction of protein secondary structure by neural networks: encoding short and long range patterns of amino acid packing. Acta Biochim Pol, 1992, 39(4):369-92.
    [84] Benner, S A and Gerloff, D. Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. Adv Enzyme Regul, 1991,31:121-81.
    [85] Zvelebil, M J, Barton, G J, Taylor, W R, et al. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol, 1987, 195(4):957-61.
    [86] McGuffin, L J, Bryson, K, and Jones, D T. The PSIPRED protein structure prediction server. Bioinformatics, 2000, 16(4):404-5.
    [87] Di Francesco, V, Garnier, J, and Munson, P J. Improving protein secondary structure prediction with aligned homologous sequences. Protein Sci, 1996,5(1):106-13.
    [88] Selbig, J, Mevissen, T, and Lengauer, T. Decision tree-based formation of consensus protein secondary structure prediction. Bioinformatics, 1999,15(12): 1039-46.
    [89] Huang, X, Huang, D S, Zhang, G Z, et al. Prediction of protein secondary structure using improved two-level neural network architecture. Protein Pept Lett, 2005, 12(8):805-11.
    [90] Robles, V, Larranaga, P, Pena, J M, et al. Bayesian network multi-classifiers for protein secondary structure prediction. Artif Intell Med, 2004, 31(2): 117-36.
    [91] Aydin, Z, Altunbasak, Y, and Borodovsky, M. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models. BMC Bioinformatics, 2006, 7:178.
    [92] Kakumani, R, Devabhaktuni, V, and Ahmad, M. A two-stage neural network based technique for protein secondary structure prediction. Conf Proc IEEE Eng Med Biol Soc, 2008, 2008:1355-8.
    [93] Malekpour, S A, Naghizadeh, S, Pezeshk, H, et al. Protein secondary structure prediction using three neural networks and a segmental semi Markov model. Math Biosci, 2009, 217(2): 145-50.
    [94] Richardson, J S and Richardson, D C. Amino acid preferences for specific locations at the ends of alpha helices. Science, 1988, 240(4859): 1648-52.
    [95] Presta, L G and Rose, G D. Helix signals in proteins. Science, 1988, 240(4859):1632-41.
    [96] Padmanabhan, S, Marqusee, S, Ridgeway, T, et al. Relative helix-forming tendencies of nonpolar amino acids. Nature, 1990, 344(6263):268-70.
    [97] Blaber, M, Zhang, X J, and Matthews, B W. Structural basis of amino acid alpha helix propensity. Science, 1993, 260(5114): 1637-40.
    [98] Aurora, R, Srinivasan, R, and Rose, G D. Rules for alpha-helix termination by glycine. Science, 1994, 264(5162): 1126-30.
    [99] Baldwin, R L and Rose, G D. Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem Sci, 1999, 24(1):26-33.
    [100] Russell, R B and Barton, G J. The limits of protein secondary structure prediction accuracy from multiple sequence alignment. J Mo1 Biol, 1993, 234(4):951-7.
    [101] Wang, G and Dunbrack, R L, Jr. PISCES: recent improvements to a PDB sequence culling server. Nucleic Acids Res, 2005, 33(Web Server issue):W94-8.
    [102] Rost, B, Casadio, R, and Fariselli, P. Refining neural network predictions for helical transmembrane proteins by dynamic programming. Proc Int Conf Intell Syst Mol Biol, 1996, 4:192-200.
    [103] Cuff, J A and Barton, G J. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins, 1999, 34(4):508-19.
    [104] Kabsch, W and Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983,22(12):2577-637.
    [105] Frishman, D and Argos, P. Knowledge-based protein secondary structure assignment. Proteins, 1995, 23(4):566-79.
    [106] Richards, F M and Kundrot, C E. Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure. Proteins, 1988, 3(2):71-84.
    [107] Brown, M P, Grundy, W N, Lin, D, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A,2000, 97(1):262-7.
    [108] Koike, A and Takagi, T. Prediction of protein-protein interaction sites using support vector machines. Protein Eng Des Sel, 2004, 17(2): 165-73.
    [109] Sarda, D, Chua, G H, Li, K B, et al. pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties. BMC Bioinformatics, 2005, 6:152.
    [110] Krause, L, McHardy, A C, Nattkemper, T W, et al. GISMO-gene identification using a support vector machine for ORF classification. Nucleic Acids Res, 2007, 35(2):540-9.
    [111] Matthews, B W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta, 1975, 405(2):442-51.
    [112] Zemla, A, Venclovas, C, Fidelis, K, et al. A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins, 1999, 34(2):220-3.
    [113] Hermann, J C, Marti-Arbona, R, Fedorov, A A, et al. Structure-based activity prediction for an enzyme of unknown function. Nature, 2007, 448(7155):775-9.
    [114] Aloy, P, Bottcher, B, Ceulemans, H, et al. Structure-based assembly of protein complexes in yeast. Science, 2004, 303(5666):2026-9.
    [115] Neidigh, J W, Fesinmeyer, R M, and Andersen, N H. Designing a 20-residue protein. Nat Struct Biol, 2002, 9(6):425-30.
    [116] Arcus, V L, Lott, J S, Johnston, J M, et al. The potential impact of structural genomics on tuberculosis drug discovery. Drug Discov Today, 2006,11(1-2):28-34.
    [117] Dobson, C M. Protein folding and misfolding. Nature, 2003, 426(6968):884-90.
    [118] Blundell, T L, Sibanda, B L, Sternberg, M J, et al. Knowledge-based prediction of protein structures and the design of novel molecules. Nature, 1987, 326(6111):347-52.
    [119] Bonneau, R and Baker, D. Ab initio protein structure prediction: progress and prospects. Annu Rev Biophys Biomol Struct, 2001, 30:173-89.
    [120] Park, J, Karplus, K, Barrett, C, et al. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol, 1998, 284(4):1201-10.
    [121] Sippl, M J. Knowledge-based potentials for proteins. Curr Opin Struct Biol, 1995, 5(2):229-35.
    [122] Jones, D T and Thornton, J M. Potential energy functions for threading. Curr Opin Struct Biol, 1996, 6(2):210-6.
    [123] Kim, D, Xu, D, Guo, J T, et al. PROSPECT Ⅱ: protein structure prediction program for genome-scale applications. Protein Eng, 2003, 16(9):641-50.
    [124] McGuffin, L J and Jones, D T. Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics, 2003, 19(7):874-81.
    [125] Tanaka, S and Scheraga, H A. Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. Macromolecules, 1976, 9(6):945-50.
    [126] Miyazawa, S and Jernigan, R L. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol, 1996, 256(3):623-44.
    [127] Huang, E S, Subbiah, S, and Levitt, M. Recognizing native folds by the arrangement of hydrophobic and polar residues. J Mol Biol, 1995, 252(5):709-20.
    [128] Sippl, M J. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol, 1990, 213(4):859-83.
    [129] Buchete, N V, Straub, J E, and Thirumalai, D. Development of novel statistical potentials for protein fold recognition. Curr Opin Struct Biol, 2004, 14(2):225-32.
    [130] Fang, Q and Shortle, D. A consistent set of statistical potentials for quantifying local side-chain and backbone interactions. Proteins, 2005, 60(1):90-6.
    [131] Mayewski, S. A multibody, whole-residue potential for protein structures, with testing by Monte Carlo simulated annealing. Proteins, 2005, 59(2): 152-69.
    [132] Park, B H, Huang, E S, and Levitt, M. Factors affecting the ability of energy functions to discriminate correct from incorrect folds. J Mol Biol, 1997, 266(4):831-46.
    [133] Vendruscolo, M, Najmanovich, R, and Domany, E. Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading?Proteins, 2000, 38(2): 134-48.
    [134] Khatun, J, Khare, S D, and Dokholyan, N V. Can contact potentials reliably predict stability of proteins? J Mol Biol, 2004, 336(5):1223-38.
    [135] Anfinsen, C B, Redfield, R R, Choate, W L, et al. Studies on the gross structure, cross-linkages, and terminal sequences in ribonuclease. J Biol Chem, 1954,207(1):201-10.
    [136] Li, W, Jaroszewski, L, and Godzik, A. Sequence clustering strategies improve remote homology recognitions while reducing search times. Protein Eng, 2002, 15(8):643-9.
    [137] Ohlson, T, Wallner, B, and Elofsson, A. Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods. Proteins,2004, 57(1):188-97.
    [138] Teichmann, S A, Chothia, C, Church, G M, et al. Fast assignment of protein structures to sequences using the intermediate sequence library PDB-ISL.Bioinformatics, 2000, 16(2): 117-24.
    [139] Chothia, C. Proteins. One thousand families for the molecular biologist. Nature, 1992, 357(6379):543-4.
    [140] Lathrop, R H and Smith, T F. Global optimum protein threading with gapped alignment and empirical pair score functions. J Mol Biol, 1996, 255(4):641-65.
    [141] Xu, Y and Xu, D. Protein threading using PROSPECT: design and evaluation.Proteins, 2000, 40(3):343-54.
    [142] Bryant, S H and Altschul, S F. Statistics of sequence-structure threading. Curr Opin Struct Biol, 1995, 5(2):236-44.
    [143] Andreeva, A, Howorth, D, Chandonia, J M, et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res, 2008, 36(Database issue):D419-25.
    [144] Greene, L H, Lewis, T E, Addou, S, et al. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res, 2007, 35(Database issue):D291-7.
    [145] Shrake, A and Rupley, J A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J Mol Biol, 1973, 79(2):351-71.
    [146] Cavallo, L, Kleinjung, J, and Fraternali, F. POPS: A fast algorithm for solvent accessible surface areas at atomic and residue level. Nucleic Acids Res, 2003, 31(13):3364-6.
    [147] Hou, J, Jun, S R, Zhang, C, et al. Global mapping of the protein structure space and application in structure-based inference of protein function. Proc Natl Acad Sci U S A, 2005, 102(10):3651-6.
    [148] Shapiro, J and Brutlag, D. FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web. Nucleic Acids Res, 2004, 32(Web Server issue):W536-41.
    [149] Mizuguchi, K, Deane, C M, Blundell, T L, et al. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci, 1998, 7(11):2469-71.
    [150] Shindyalov, I N and Bourne, P E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng, 1998, 11(9):739-47.
    [151] Holm, L, Kaariainen, S, Rosenstrom, P, et al. Searching protein structure databases with DaliLite v.3. Bioinformatics, 2008, 24(23):2780-1.
    [152] Zhu, J and Weng, Z. FAST: a novel protein structure alignment algorithm. Proteins, 2005, 58(3):618-27.
    [153] Holm, L and Sander, C. The FSSP database of structurally aligned protein fold families. Nucleic Acids Res, 1994, 22(17):3600-9.
    [154] Nishikawa, K, Ishino, S, Takenaka, H, et al. Constructing a protein mutant database. Protein Eng, 1994, 7(5):733.
    [155] McMillan, L E and Martin, A C. Automatically extracting functionally equivalent proteins from SwissProt. BMC Bioinformatics, 2008, 9:418.
    [156] Rennell, D, Bouvier, S E, Hardy, L W, et al. Systematic mutation of bacteriophage T4 lysozyme. J Mol Biol, 1991, 222(1):67-88.
    [157] Ng, P C and Henikoff, S. Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet, 2006, 7:61-80.
    [158] Schneider, T D, Stormo, G D, Gold, L, et al. Information content of binding sites on nucleotide sequences. J Mol Biol, 1986, 188(3):415-31.
    [159] Laskowski, R A. PDBsum: summaries and analyses of PDB structures. Nucleic Acids Res, 2001, 29(1):221-2.
    [160] Neher, E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci U S A, 1994, 91(1):98-102.
    [161] Pazos, F, Helmer-Citterich, M, Ausiello, G, et al. Correlated mutations contain information about protein-protein interaction. J Mol Biol, 1997, 271 (4):511-23.
    [162] Vicatos, S, Reddy, B V, and Kaznessis, Y. Prediction of distant residue contacts with the use of evolutionary information. Proteins, 2005, 58(4):935-49.
    [163] Halperin, I, Wolfson, H, and Nussinov, R. Correlated mutations: advances and limitations. A study on fusion proteins and on the Cohesin-Dockerin families. Proteins, 2006, 63(4):832-45.
    [164] Yip, K Y, Patel, P, Kim, P M, et al. An integrated system for studying residue coevolution in proteins. Bioinformatics, 2008, 24(2):290-2.
    [165] Finn, R D, Tate, J, Mistry, J, et al. The Pfam protein families database. Nucleic Acids Res, 2008, 36(Database issue):D281-8.
    [166] Thompson, J D, Gibson, T J, and Higgins, D G. Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics, 2002, Chapter 2:Unit 2 3.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700