蛋白质残基间的相互作用分析与预测
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,由于生物信息学的迅猛发展,大量的生物学数据亟待矫正、管理、解释以及充分的利用,而机器学习正适合这类数据量大、含有噪声的模式。许多机器学习算法也已经被成功地用来进行生物数据的处理并挖掘未知的生物学知识。本文主要针对支持向量机这个机器学习工具来进行蛋白质结构的分析,着重运用支持向量机和遗传算法(GA)来进行蛋白质残基温度因子(B-factor)的预测,并由此进行残基间的远程相互作用的分析和预测。全文的主要工作包括如下:
     1、提出了利用多类支持向量机进行蛋白质残基温度因子的分析和预测的方法。一般来说,残基的B-factor代表它的一种热不稳定状态或自由活动的程度,较高的残基B-factor对应着较大的外露表面面积。因此预测残基的B-factor会有助于理解和预测蛋白质的结构。本文主要介绍了所采取的氨基酸的物理化学特征,例如:蛋白质的序列谱、残基的进化速率、残基的疏水值,作为多类支持向量机的输入来进行蛋白质残基B-fattor的分析和预测。
     2、提出了一种基于预测的残基B-factor以及疏水谱特征的支持向量机方法,来进行残基间的接触聚类中心分析和预测。在蛋白质的接触图谱中,残基间的远程相互作用点往往聚集在一起形成一个个的聚类。分析发现,这些聚类大部分都对应着较低的残基B-factor区域或较强的疏水区域,基于此特点而进行的有选择性的样本抽取就可以降低正负样本数据的不平衡性,从而得到较高的预测性能。最后,我们利用支持向量机来预测残基间的接触聚类中心,并由此得到残基间的相互作用位点。
     3、构建了一种基于蛋白质序列谱中心的遗传算法,来分析残基间的序列谱中心,并以此预测残基间的远程相互作用位点。首先我们运用一种基于遗传算法的多分类器(Genetic algorithm based Multi-Classification,GaMC)系统分析发现,大部分的残基间远程作用位点位于序列谱中心的周围,采用此分类器就可以把接触和不接触残基对给分离开来,从而能够预测到某残基对是否具有远程相互作用。
Recent years, more and more biological data are needing to be corrected, managed, explained, and sufficiently utilized because of the speedy development of the bioinformatics. However, machine learning methods are just suitable for handling these data with huge size and noise. So far, many machine learning algorithms have been successfully used to deal with those huge biological data, and to mine and discover unknown biological knowledge. This thesis mainly uses machine learning tool such as support vector machine (SVM) to analyze protein structure, and adopts SVM and genetic algorithm (GA) to predict residue' s temperature factor (B-factor) as well as predict long-range contact between residues. The main works for this thesis are introduced as follows:
     1. A multi-class support vector machine (SVM) based prediction method was proposed in this thesis to analyze and predict B-factors of residues of protein. In general, the temperature factor or B-factor of residue, which is linearly related to the mean square displacement of its C_αatom,indicates the atomic flexibility in the crystalline state. Previous works have shown that hydrophobic residues, which are usually buried, tend to be more rigid whereas charged residues tend to be more flexible. Consequently, the prediction of the B-factor may help to understand and predict the three-dimensional structure of protein. In conclusion, this thesis mainly makes use of some selected properties of amino acid residue, such as sequence profile of protein chain, evolutionary rate of residue, and hydrophobic value of residue, as the input for multi-class support vector machine to analyze and predict the B-factor of residue.
     2. A prediction approach was proposed to predict the inter-residues contact cluster centers based on predicted residue B-factor, hydrophobic value of residure and support vector machine. It is general knowledge that inter-residues contacts are always gathered together to form the clusters in contact maps of proteins. Observation can be seen that almost all inter-residues contact clusters correspond to pairs of residues with local lowest-B-factor or within higher hydrophobic areas. Moreover, selectively extracting input vector for predictor based on these characteristics can reduce the imbalance of positive-negative sample data. Thus, higher prediction performance can be obtained. After that, SVM was used to predict inter-residues contact cluster centers. As a result, inter-residues interacting sites can be obtained.
     3. A genetic algorithm based on sequence profile (SP) centers of residue pairs was constructed to predict the sequence profile centers of the inter-residues as well as long-range interacting sites of the inter-residues. Firstly, we constructed a genetic algorithm-based multiple classifier (GaMC), and discovered that most long-range contacts are clustered around their SP centers. Secondly, using the GaMC predictor may separate residue pairs in contacts from those in non-contacts. Finally, we can make a decision whether or not two residues are in long-range contact based on the GaMC predictor and SP centers.
引文
1.黄德双,张学工,田捷,刘湘军.生物信息学若干前沿问题的探讨.合肥:中国科学技术大学出版社:2004.
    2.黄德双,刘海燕,施蕴渝,陈国良.生物信息学中的智能计算理论与方法研究.合肥:中国科学技术大学出版社:2006.
    3. Anfinsen CB. Principles that govern the folding of protein chains. Science 1973; 181(4096): 223-230.
    4. Qian N, Sejnowski TJ. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 1988; 202: 865-884.
    5. Stormo GD, Schneider TD, Gold L, Ehrenfeucht A. Use of the "perceptron" algorithm to distinguish translational initiation sites in e. coli. Nucl Acids Res 1982; 10: 2997-3011.
    6. Stormo GD, Schneider TD, Gold LM. Characterization of translational initiation sites in e. coli. Nucl Acids Res 1982; 10: 2971-2996.
    7. Rost B, Sander C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc Nat Acad Sci 1993; 90: 7558-7562.
    8. Rost B, Sander C. Prediction of protein Secondary. structure at better than 70% accuracy. J Mol Biol 1993; 232: 584-599.
    9. Rost B, Sander C. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 1994; 19: 55-72.
    10. Grand SML, Merz KM. The application of the genetic algorithm to the minimization of potential energy functions. Journal of Global Optimization 1993; 3(1): 49-66.
    11.Tuffery P, Etchebest C, Hazout S, Lavery R. A critical comparison of search algorithms applied to the optimization of protein side-chain conformations. Journal of Computational Chemistry 1993; 14(7): 790-798.
    12. Unger R, Moult J. Genetic algorithms for protein folding simulations. J Mol Biol 1993; 269: 240-259.
    13. Schulze-Kremer S. Genetic algorithms for protein tertiary structure prediction. Applications of Genetic Algorithms, IEE Colloquium on. Berlin, Germany; 1994.
    14. Shapiro BA, Navetta J. A massively parallel genetic :algorithm for RNA secondary structure prediction. The Journal of Supercomputing 1994; 8: 195-207
    15. Pedersen JT, Moult J. Ab initio structure prediction for small polypeptides and protein fragments using genetic algorithms. Proteins: Structure, Function, and Genetics 1995; 23 (3): 454-460.
    16. Sun S. A genetic algorithm that seeks native states of peptides and proteins. Biophys J 1995; 69(2): 340-355.
    17. Ogata H, Akiyama Y, Kanehisa M. A genetic algorithm based molecular modeling technique for RNA stem-loop structures. Nucleic Acids Res 1995; 23(3): 419-426.
    18. Vivarelli F, Giusti G, Villani M, Campanini R, Fariselli P, Compiani M, Casadio. R. LGANN: a parallel system combining a local genetic algorithm and neural networks for the prediction of secondary structure of proteins. Bioinformatics 1995; 11(3): 253-260.
    19. Knudsen B, Hein J. Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Research 2003; 31(13): 3423-3428
    20. RABOW AA, SCHERAGA HA. Improved genetic algorithm for the protein folding problem by use of a Cartesian combination operator. Protein Science 1996; 5(9): 1800-1815.
    21. Pedersen JT, Moult J. Protein folding simulations with genetic algorithms and a detailed molecular description. J Mol Biol 1997; 269: 240-259
    22. Cui Y, Chen RS, Wong WH. Protein folding simulation with genetic algorithm and supersecondary structure constraints. Proteins: Structure, Function, and Genetics 1998; 3(3): 247-257.
    23. Gardiner EJ, Willett P, Artymiuk PJ. Protein docking using a genetic algorithm. Proteins 2001;44(1):44-56.
    24. Notredame C, O'Brien EA, Higgins DG, RAGA: RNA sequence alignment by genetic algorithm. Nucleic Acids Research 1997;25:4570-4580.
    25. Szustakowski JD, Weng Z. Protein structure alignment using a genetic algorithm. Proteins Structure Function and Genetics 2000;38(4):428 - 440.
    26. Lancia G, Carr R, Walenz B, Istrail S. 101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem. 2001; Montreal, Quebec, Canada, p 193 - 202.
    27. Nguyen HD, Yoshihara I, Yamamori K, Yasunaga M. Aligning multiple protein sequences by parallel hybrid genetic algorithm. Genome Informatics 2002.
    28. MacCallum RM. Striped sheets and protein contact prediction. Bioinformatics 2004; 20(Sl):224-231.
    29. Zhang GZ, Huang DS. Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme. Journal of Computer-Aided Molecular Design 2004; 18:797-810.
    30. Zhang GZ, Han K. Software note: Hepatitis C virus contact map prediction based on binary encoding strategy. Computational Biology and Chemistry 2007;31 (3):233-238.
    31. Vapnik V, Lerner A. Pattern recognition using generalized portrait method. Automation and Remote Control 1963;24:774-780.
    32. Vapnik VN, Chervonenkis AJ. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Apl 1971; 16:264-280.
    33. Kimeldorf GS, Wahba G Some results on Tchebycheffian spline functions. J Math Anal Applic 1971;33:82-95.
    34. Vapnik V. The Nature of Statistical Learning Theory. New York: Springer; 1995.
    35. Vapnik V. Statistical Learning Theory. New York: John Wiley and Sons; 1998.
    36. Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet C, Furey TS, Ares M, Haussler. D. Support vector machine classification of microarray gene expression data: University of California, Santa Cruz, Santa Cruz, CA; 1999. Report nr UCSC-CRL-99-09.
    37. Mukherjee S, Tamayo P, Slonim D, Verri A, Golub T, Messirov JP, Poggio T. Support vector machine classification of microarray data. AI memo 182: MIT; 1999.
    38. Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Jr. MA, Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines. PNAS 2000;97(1):262-267.
    39. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000;16(10):906-914
    40. Brazma A, Vilo J. Gene expression data analysis. FEBS Lett 2000;480:17-24.
    41. Jaakkola T, Diekhans M, Haussler D. Using the Fisher kernel method to detect remote protein homologies. 1999; Menlo Park, CA. AAAI Press, p 149.
    42. Jaakkola T, Diekhans M, Haussler D. A Discriminative Framework for Detecting Remote Protein Homologies. Journal of Computational Biology 2000;7:95-114.
    43. Ding CHQ, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 2001;17(4):349-358.
    44. Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001; 17:721 -728.
    45. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences 2001;18(26):15149-15154.
    46. Hua S, Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: support Vector Machine Approach. J Mol Biol 2001;308(2):397—407.
    47. Bock JR, Gough DA. Predicting protein-protein interactions from primary structure. Bioinformatics 2001;17(5):455-460.
    48. Skolnick J, Kolinski A, Ortiz AR. MONSSTER: a method for folding globular proteins with a small number of distance restraints. J Mol Biol 1997;265:217-241.
    49. Olmea O, Rost B, Valencia A. Effective use of sequence correlation and conservation in fold recognition. J Mol Biol 1999;293:1221-1239.
    50. Ortiz AR, Kolinski A, Rotkiewicz P, Ilkowski B, Skolnick J. Ab initio folding of proteins using restraints derived from evolutionary information. Proteins 1999;37(S3):177-185.
    51. Thomas DJ, Casari G, Sander C. The prediction of protein contacts from multiple sequence alignments. Protein Eng 1996;9:941-948.
    52. Vendruscolo M, Kussell E, Domany E. Recovery of protein structure from contact maps. Fold Des 1997;2(5):295-306.
    53. Fariselli P, Casadio R. A neural network based predictor of residue contacts in proteins. Protein Engineering 1999;12(1):15-21.
    54. Olmea O, Valencia A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Des 1997;2:S25-S32.
    55. Lund O, Frimand K, Gorodkin J, Bohr H, Bohr J, Hansen J, Brunak S. Protein distance constraints predicted by neural networks and probability density functions. Protein Eng 1997;10:1241-1248.
    56. Ponnuswamy PK, Gromiha MM. On the conformational stability of folded proteins. J Theor Biol 1994; 166:63-74.
    57. Tanaka S, Scheraga HA. Model of protein folding: inclusion of short-, medium-, and long-range interactions. Proc Natl Acad Sci USA 1975;72:3802-3806.
    58. Gromiha MM, Selvaraj S. Inter-residue interactions in.proteins folding and stabiligy. Prog Biophys Mol Biol 2004;86(2):235-277.
    59. Gromiha MM, Uedaira H, An J, Selvaraj S, Prabakaran P, Sarai A. ProTherm, thermodynamic database for proteins and mutants: developments in version 3.0. Nucl Acids Res 2002;30:301-302.
    60. Selbig J, Argos P. Relationships between protein sequence and structure patterns based on residue contacts. Proteins 1998;31:172-185.
    61. Gromiha MM, Selvaraj S. Importance of long-range interactions in protein folding. Biophys Chem 1999;77:49-68.
    62. Drablos F. Clustering of non-polar contacts in proteins. Bioinformatics 1999;15:501-509.
    63. Gromiha MM, Selvaraj S. Influence of medium and long range interactions in different structural classes of globular proteins. J Biol Phys 1997;23:151-162.
    64. Miyazawa S, Jernigan RL. Evaluation of short-range interactions as secondary structure energies for protein fold and sequence recognition. Proteins 1999(36):347-356.
    65. Reva B, Finkelstein AV, Sanner M, Olson AJ. Residue-residue mean-force potentials for protein structure recognition. Protein Eng 1997; 10:865-876.
    66. Zhang L, Skolnick J. Howdo potentials derived from structural databases relate to "true" potentials? Protein Sci 1998;7:112-122.
    67. Seno F, Maritan A, Banavar JR. Interaction potentials for protein folding. Proteins 1998;30:244-248.
    68. Pace CN, Shirley BA, McNutt M, Gajiwala K. Forces contributing to the conformational stability of proteins. FASEB J 1996;10:75-83.
    69. Heringa J, Argos P. Side-chain clusters in protein structures and their role in protein folding. J Mol Biol 1991220:151-171.
    70. Karlin S, Zhu ZY. Characterizations of diverse residue clusters in protein three-dimensional structures. Proc Natl Acad Sci USA 1996;93:8344-8349.
    71. Dosztanyi Z, Fiser A, Simon I. Stabilization centers in proteins: identification, characterization and predictions. J Mol Biol 1997;272:597-612.
    72. Selvaraj S, Gromiha MM. An analysis of the amino acid clustering pattern in (α/β)_8 barrel roteins. J Protein Chem 1998;17:407-415.
    73. Kannan N, Vishveshwara S. Identification of side-chain clusters in protein structures by a graph spectral method. J Mol Biol 1999,292:441-464.
    74. Plaxco KW, Simons KT, Baker D. Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 1998;277:985-994.
    75. Baker D. A surprising simplicity to protein folding. Nature 2000;405:39-42.
    76. Gromiha MM, Selvaraj S. Comparison between long-range interactions and contact order in determining the folding rates of two-state proteins: application of long-range order to folding rate prediction. J Mol Biol 2001 ;310:27-32.
    77. Altschuh D, Lesk AM, Bloomer AC, Klug A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J Mol Biol 1987(193):693-707.
    78. Altschuh D, Vernet T, Berti P, Moras D, Nagai K. Coordinated amino acid changes in homologous protein families. Protein Eng 1988;2:193-199.
    79. Shindyalov IN, Kolchanov NA, Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 1994;7:349-358.
    80. Gobel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins 1994;18:309-317.
    81. Taylor WR, Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Eng 1994;7:341-348.
    82. Lapedes AS, Giraud BG, Liu LC, Stormo GD. Correlated mutations in protein sequences: phylogenetic and structural effects. PASCSMB 1997:1-22.
    83. Chelvanayagam G, Eggenschwiler A, Knecht L, Gonnet GH, Benner SA. An analysis of simultaneous variation in protein structures. Protein Eng 1997; 10(4):307-316.
    84. Dekker JP, Fodor A, Aldrich RW, Yellen G A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments. Bioinformatics 2004;20(10):1565-1572.
    85. Zhang C, Hou J, Kim SH. Fold prediction of helical proteins using torsion angle dynamics and predicted restraints. Proc Natl Acad Sci 2002;99:3581-3585.
    86. Zhu J, Zhu Q, Shi Y, Liu H. How well can we predict native contacts in proteins based on decoy structures and their energies? Proteins 2003;52:598-608.
    87. Fariselli P, Olmea O, Valencia A, Casadio R. Prediction of contact maps with neural networks and correlated mutations. Protein Engineering 2001; 14(11 ):835-843.
    88. Fariselli P, Olmea O, Valencia A, Casadio R. Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations. Proteins 2001;45(S5):157-162.
    89. Pollastri G, Baldi P. Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 2002;18(Suppl. l):S62-S70.
    90. Shao Y, Bystroff C. Predicting interresidue contacts using templates and pathways. Proteins Suppl 2003;53(S6):497-502.
    91. Vullo A, Frasconi P. Prediction of protein coarse contact maps. J Bioinform Comput Biol 2003; 1(2):411-431.
    92. Gupta N, Mangal N, Biswas S. Evolution and similarity evaluation of protein structures in contact map space. PROTEINS: Structure, Function, and Bioinformatics 2005;59:196-204.
    93. Punta M, Rost B. Toward good 2D predictions in proteins:submitted Jan 2005. Bioinformatics 2005.
    94. Lesk AM. CASP2: Report on ab initio predictions. Proteins 1997;29(S1):151-166.
    95. Orengo CA, Bray JE, Hubbard T, LoConte L, Sillitoe I. Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction. Proteins 1999;37(S3):149-170.
    96. Lesk AM, Lo Conte L, Hubbard T. Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts. Proteins 2001;45(S5):98-118.
    97. Aloy P, Stark A, Hadley C, Russell RB. Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins 2003;53(S6):436-456.
    98. Fischer D, Elofsson A, Rychlewski L, Pazos F, Valencia A, Rost B, Ortiz AR, Dunbrack RLJ. CAFASP2: the second critical assessment of fully automated structure prediction methods. Proteins 2001;45(S5):171-183.
    99. Eyrich VA, Przybylski D, Koh IY, Grana O, Pazos F, Valencia A, Rost B. CAFASP3 in the spotlight of EVA. Proteins 2003;53(S6):548-560.
    100. Grafta O, Eyrich VA, Pazos F, Rost B, Valencia A. EVAcon: a protein contact prediction evaluation service. Nucleic Acid Res 2005;33 (Web Server issue):W347-W351.
    101. Manavalan P, Ponnuswamy PK. A study of the preferred environment of amino acid residues in globular proteins. Arch Biochem Biophys Chem 1977; 184:476-487.
    102. Manavalan P, Ponnuswamy PK. Hydrophobic character of amino acid residues in globular proteins. Nature 1978;275:673-674.
    103. Ponnuswamy PK, Gromiha MM. Prediction of transmembrane helices from hydrophobic characteristics of proteins. Int J Pept Protein Res 1993;42:326-341.
    104. Gromiha MM, Selvaraj S. Inter-residue interactions in the structure, folding and stability of proteins. Recent Res Dev Biophys Chem 2000;l:l-14.
    105. Jiang Z, Zhang L, Chen J, Xia A, Zhao D. Effect of amino acid on forming residue-residue contacts in proteins. Polymer 2002;43:6037-6047.
    106. Debe DA, Goddard WA. First principles prediction of protein folding rates. J Mol Biol 1999;294:619-625.
    107. Gromiha MM. Important inter-residue contacts for enhancing the thermal stability of thermophilic proteins. Biophys Chem 2001;91:71-77.
    108. Gromiha MM, Thangakani AM. Role of medium- and long-range interactions to the stability of the mutants of T4 lysozyme. Prep Biochem Biotech 2001;31:217-227.
    109. Gromiha MM, Selvaraj S. Important amino acid properties for determining the transition state structures of two-state protein mutants. FEBS Lett 2002;526:129-134.
    110. Selvaraj S, Gromiha MM. Role of hydrophobic clusters and long-range contact networks in the folding of (α/β)_8 barrel proteins. Biophys J 2003;84:1919-1925.
    111. Tudos E, Fiser A, Simon I. Different sequence environments of amino acid residues involved and not involved in long-range interactions in proteins. Int J Pept Protein Res 1994;43:205-208.
    112. Kocher JP, Rooman MJ, Wodak SJ. Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. J Mol Biol 1994;235:1598-1613.
    113. Miyazawa S, Jernigan RL. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 1996(256):623-644.
    114. Bahar I, Jernigan RL. Inter-residue potentials in globular proteins and the dominance of highly specific hydrophilic interactions at close separation. J Mol Biol 1997;266(1):195-214.
    115. Selbig J. Contact pattern-induced pair potentials for protein fold recognition. Protein Eng 1995;8:339-351.
    116. Ponnuswamy PK, Warme PK, Scheraga HA. Role of medium-range interactions in proteins. Proc Natl Acad Sci USA 1973;70(3):830-833.
    117. Ponnuswamy PK, Prabakaran M, Manavalan P. Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. Biochim Biophys Acta 1980;623:301-316.
    118. Miller EJ, Fischer KF, Marqusee S. Experimental evaluation of topological parameters determining proteinfolding rates. Proc Natl Acad Sci USA 2002;99:10359-10363.
    119. Gilis D, Rooman MJ. Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. J Mol Biol 1997;272:276-290.
    120. Gugolya Z, Dosztanyi Z, Simon I. Interresidue interactions in protein classes. Proteins 1997;27:360-366.
    121. Park K, Vendruscolo M, Domany E. Toward an energy function for the contact map representation of proteins. PROTEINS: Structure, Function, and Genetics 2000;40(2):237-248.
    122. Vendruscolo M, Najmanovich R, Domany E. Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? Proteins 2000;38:134-148.
    123. Maiorov VN, Crippen GM. Contact potential that recognizes the correct folding of globular proteins. J Mol Biol 1992;227:876-888.
    124. Crippen GM. Prediction of protein folding from amino-acid sequence over discrete conformation space. Biochemistry 1991; 30: 4232-4237.
    125.边肇祺,张学工.模式识别.北京:清华大学出版社:2000.
    126.黄德双.神经网络模式识别系统理论.北京:电子工业出版社;1996.
    127.王红强.应用于基因选择与癌症分类德微阵列数据分析:中国科学技术大学博士论文:2005.
    128.阎平凡,张天水.人工神经网络与模拟进化计算.北京:清华大学出版社;2005.
    129. Haykin S. Neural networks: a comprehensive foundation: Prentice Hall; 1994.
    130. Cristianini N, Shawe-Taylor J. An introduction to Support Vector Machines: and other kernel-based learning methods. Cambridge, U. K.; New York: Cambridge University Press; 2000. ⅹⅲ, 189 p.
    131. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Research 2000; 28(1): 235-242.
    132. Zhang B, Rychlewski L, Pawlowski K, Fetrow JS, Skolnick J, GODZIK A. Protein Sci, 2000; 8: 1104-1115.
    133. Thornton JM. From Genome to Function. Science 2001; 292(5524): 2095-2097.
    134. Vihinen M, Torkkila E, Riikonen P. Accuracy of protein flexibility predictions. Proteins 1994; 19: 141-149.
    135. Radivojac P, Obradovic Z, Smith DK, Zhu G, Vucetic S, Brown CJ, Lawson JD, Dunker AK. Protein flexibility and intrinsic disorder. Protein Sci 2004; 13: 71-80.
    136. Schlessinger A, Rost B. Protein flexibility and rigidity predicted from sequence. PROTEINS: Structure, Function, and Bioinformatics 2005; 61: 115-126.
    137. Halle B. Flexibility and packing in proteins. Proc Natl Acad Sci USA 2002; 99(3): 1274-1279.
    138. Bahar I, Atilgan AR, Erman B. Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding Des 1997; 2(3): 173-181.
    139. Carson M, Buckner TW, Yang Z, Narayana SVL, Bugg CE. Error detection in crystallographic models. Acta Cryst 1994; D50: 900-909.
    140. Haliloglu T, Bahar I, Erman B. Gaussian Dynamics of Folded Proteins. Phys Rev Lett 1997; 79: 3090-3093
    141. Kundu S, Melton JS, Sorensen DC, Phillips GN. Dynamics of Proteins in Crystals: Comparison of Experiment with Simple Models. Biophys J 2002; 83(2): 723-732.
    142. Karplus PA, Schulz GE. Prediction of chain flexibility in proteins. Naturwissenschaften 1985; 72: 212-213.
    143. Yuan Z, Bailey TL, Teasdale RD. Prediction of protein B-factor profiles. PROTEINS 2005; 58: 905-912.
    144. Nguyen MN, Rajapakse JC. Multi-class support vector machines for protein, secondary structure prediction Genome Informatics 2003; 14: 218-227.
    145. Kittler J, Alkoot FM. Sum versus vote fusion in multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 2003; 25(1): 110-115.
    146. Ludmila IK. Combining pattern classifiers: methods and algorithms. U. S.: Wiley; 2004.
    147. Franc V, Hlavac V. Greedy algorithm for a training set reduction in the kernel methods. In: Petkov N, Westenberg MA, editors. Computer Analysis of Images and Patterns. Berlin: Springer; 2003. p 426-433.
    148. Noguchi T, Matsuda H, Akiyama Y, Journals O. PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) Nucleic Acids Res 2003; 31 (1): 492-493.
    149. Glaser F, Rosenberg Y, Kessel A, Tal P, Ben-Tal N. The ConSurf-HSSP database: The mapping of evolutionary conservation among homologs onto PDB structures. PROTEINS: Structure, Function, and Bioinformatics 2005; 58(3): 610-617.
    150. Hsu CW, Lin CJ. A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 2002; 13(2): 415-425.
    151. Franc V, Hlavac V. Multi-class Support Vector Machine. In: Kasturi R, Laurendeau D, Suen C, editors; 2002. p 236-239.
    152. Franc V. Optimization Algorithms for Kernel Methods: Czech Technical University, Prague; 2005.
    153. Sanchez R, Sali A. Evaluation of comparative protein structure modeling by MODELLER-3. Proteins 1997;29(S1):50-58.
    154. Guex N, Peitsch MC. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 1997; 18(15):2714-2723.
    155. Skolnick J, Kihara D. Defrosting the frozen approximation: PROSPECTOR~a new approach to threading. Proteins 2001;42(3):319-331.
    156. Panchenko AR, Marchler-Bauer A, Bryant SH. Combination of threading potentials and sequence profiles improves fold recognition. J Mol Biol 2000;296:1319-1331.
    157. Simons KT, Strauss C, Baker D. Prospects for ab initio protein structural genomics. J Mol Biol 2001;306:1191-1199.
    158. Huang ES, Samudrala R, Ponder JW. Ab Initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions. J Mol Biol 1999;290:267-281.
    159. Kihara D, Lu H, Kolinski A, Skolnick J. TOUCHSTONE: An ab initio protein structure prediction method that uses threading-based tertiary restraints. Proc Natl Acad Sci USA 1998;98(18):10125-10130.
    160. Niggemann M, Steipe B. Exploring local and non-local interactions for protein stability by structural motif engineering. J Mol Biol 2000;296:181-195.
    161. Laskowski RA, Macarthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 1993;26:283-291.
    162. Vicatos S, Boojala VBR, Kaznessis Y. Prediction of distant residue contacts with the use of evolutionary information. PROTEINS: Structure, Function, and Bioinformatics 2005;58:935-949.
    163. Punta M, Rost B. PROFcon: novel prediction of long-range contacts. BIOINFORMATICS 2005;21(13):2960-2968.
    164. Kuncheva LI. Combining pattern classifiers: methods and algorithms. New York, U.S.: Wiley; 2004.
    165. Wong HS, Cheung KKT, Horace HSI. 3D head model classification by evolutionary optimization of the extended Gaussian image representation. Pattern Recognition 2004;37:2307-2322.
    166. Chenna R, Sugawara H, Koike T, Lopez R, Gibson T, Higgins D, Thompson J. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 2003;31(13):3497-3500.
    167. Dodge C, Schneider R, Sander C. The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res 1998;26(1):313-315.
    168. Shannon C. The mathematical theory of communication. MD Comput1997 1963;14(4):306-317.
    169. Mirny LA, Shakhnovich EI. Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 1999;291(1):177-196.
    170. Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996;257(2):342-358.
    171. Landgraf R, Fischer D, Eisenberg D. Analysis of heregulin symmetry by weighted evolutionary tracing. Protein Engineering 1999;12(11):943-951.
    172. Landgraf R, Xenarios I, Eisenberg D. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J Mol Biol 2001 ;307(5): 1487-1502.
    173. Armon A, Graur D, Ben-Tal N. ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 2001;307(1):447-463.
    174. Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 1981;17(6):368-376.
    175. Baldi P, Brunak S, Chauvin Y, Andersen CAR Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000; 16(5):412-424.
    176. Seber GAF. Multivariate Observations. New York, USA: Wiley; 1984.
    177. Weikl TR. Loop-closure events during protein folding: Rationalizing the shape of φ-value distributions. Proteins: Structure, Function, and Bioinformatics 2005;60(4):701 - 711.
    178. Kostrowicki J, Piela L, Cherayil J, Scheraga HA. Performance of the diffusion equation method in searches for optimum structures of clusters of Lennard-Jones atoms. J Phys Chem 1991;95:4113-4119.
    179. Kostrowicki J, Scheraga HA. Application of the diffusion equation method for global optimization to oligopeptides. J Phys Chem 1992;96:7442-7449.
    180. Kawashima S, Ogata H, Kanehisa M. AAindex:amino acid index database. Nucleic Acids Res 1999;27:368-369.
    181. Christopher CP, David ADP. Proteins 2001(42 ):243-255.
    182. Jolliffe IT. Principal Component Analysis: Springer; 2002.
    183. Pearl F, Todd A. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 2005 ;33:D247-D251.
    184. Vullo A, Walsh L, Pollastri G. A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics 2006;7:180.
    185. Leary RH, Rosen JB, Jambeckz P. An Optimal Structure-Discriminative Amino Acid Index for Protein Fold Recognition. Biophysical Journal 2004;86:411-419.
    186. Shindyalov IN, Bourne PE. An alternative view of protein fold space. Proteins 2000;38:247-260.
    187. Chung S, Subbiah S. A structural explanation for the twilight zone of protein sequence homology. Structure 1996;4(15):1123-1127.
    188. Rost B. Twilight zone of protein sequence alignment. Protein Eng 1999;12:85-94.
    189. Baldi P, Brunak S. Bioinformatics: The machine learning approach. London, England: The MIT Press; 2001.29-42 p.
    190. Boser BE, Guyon IM, Vapnik VN. A Training Algorithm for Optimal Margin Classifiers. 1992; Pittsburg PA. ACM Press, p 144-152.
    191. Cortes C, Vapnik V. Support-Vector Networks. Machine Learning 1995;20(3):273-297.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700