Prediction of protein-protein interaction sites using an ensemble method
详细信息    查看全文
  • 作者:Lei Deng (1)
    Jihong Guan (1)
    Qiwen Dong (2) (3)
    Shuigeng Zhou (2) (3)
  • 刊名:BMC Bioinformatics
  • 出版年:2009
  • 出版时间:December 2009
  • 年:2009
  • 卷:10
  • 期:1
  • 全文大小:2838KB
  • 参考文献:1. Alberts BD, Lewis J, Raff M, Roberts K, Watson JD: / Molecular Biology of the Cell New York: Garland 1989.
    2. Chothia C, Janin J: Principles of protein-protein recognition. / Nature 1975, 256:705鈥?08. CrossRef
    3. Argos P: An investigation of protein subunit and domain interfaces. / Protein Eng 1988, 2:101鈥?13. CrossRef
    4. Janin J, Miller S, Chothia C: Surface, subunit interfaces and interior of oligomeric proteins. / J Mol Biol 1988, 204:155鈥?64. CrossRef
    5. Janin J, Chothia C: The structure of protein-protein recognition sites. / J Biol Chem 1990, 265:16027鈥?6030.
    6. Jones S, Thornton JM: Protein-protein interactions: a review of protein dimer structures. / Prog Biophys Mol Biol 1995, 63:31鈥?5. CrossRef
    7. Ofran Y, Rost B: Analysing six types of protein-protein interfaces. / J Mol Biol 2003, 325:377鈥?87. CrossRef
    8. Jones S, Thornton JM: Principles of protein-protein interactions. / Proc Natl Acad Sci 1996, 93:13鈥?0. CrossRef
    9. Lo Conte L, Chothia C, Janin J: The atomic structure of Protein-protein recognition sites. / J Mol Biol 1999, 285:2177鈥?198. CrossRef
    10. Nooren IM, Thornton JM: Structural characterisation and functional significance of transient protein-protein interactions. / J Mol Biol 2003, 325:991鈥?018. CrossRef
    11. Yan C, Wu F, Jernigan RL, Dobbs D, Honavar V: Characterization of Protein-Protein Interfaces. / Protein J 2008, 27:59鈥?0. CrossRef
    12. Ansari S, Helms V: Statistical analysis of predominantly transient protein-protein interfaces. / Proteins 2005, 61:344鈥?55. CrossRef
    13. Wang B, Chen P, Huang DS, Li JJ, Lok TM, Lyu MR: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. / FEBS Letters 2006, 580:380鈥?84. CrossRef
    14. Yan C, Dobbs D, Honavar V: A two-stage classifier for identification of protein-protein interface residues. / Bioinformatics 2004,20(Suppl 1):i371鈥?78. CrossRef
    15. Zhou HX, Shan Y: Prediction of Protein Interaction Sites From Sequence Profile and Residue Neighbor List. / PROTEINS: Structure, Function, and Genetics 2001, 44:336鈥?43. CrossRef
    16. Guharoy M, Chakrabarti P: Conservation and relative importance of residues across protein-protein interfaces. / Proc Natl Acad Sci 2005, 102:15447鈥?5452. CrossRef
    17. Fariselli P, Pazos F, Valencia A, Casadio R: Prediction of protein-protein interaction sites in heterocomplexes with neural networks. / Eur J Biochem 2002, 269:1356鈥?361. CrossRef
    18. Ofran Y, Rost B: Predict protein-protein interaction sites from local sequence information. / FEBS Letters 2003, 544:236鈥?39. CrossRef
    19. Farisellil P, Zauli A, Rossi I, Finell M, Martelli P, Casadio R: A neural network method to improve prediction of protein-protein interaction sites in heterocomplexes. / XI11 Workshop on Neural Networks for Signal Processing 2003, IEEE 2003:33鈥?1.
    20. Koike A, Takagi T: Prediction of protein-protein interaction sites using support vector machines. / Protein Eng Des Sel 2004, 17:165鈥?73. CrossRef
    21. Chung J, Wang W, Bourne PE: Exploiting sequence and structure homologs to identify protein-protein binding sites. / Proteins 2006, 62:630鈥?40. CrossRef
    22. Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. / Bioinformatics 2005, 21:1487鈥?494. CrossRef
    23. Nguyen MN, Rajapakse JC: Protein-Protein Interface Residue Prediction with SVM Using Evolutionary Profiles and Accessible Surface Areas. / CIBCB 2006, 1鈥?.
    24. Li N, Sun Z, Jiang F: Prediction of protein-protein binding site by using core interface residue and support vector machine. / BMC Bioinformatics 2008, 9:553. CrossRef
    25. Bradford JR, Needham CJ, Bulpitt AJ, Westhead DR: Insights into protein-protein interfaces using a Bayesian network prediction method. / J Mol Biol 2006, 362:365鈥?86. CrossRef
    26. Bernardes JS, Fernandez JH, Vasconcelos ATR: Structural descriptor database: a new tool for sequence-based functional site prediction. / BMC Bioinformatics 2008, 9:492. CrossRef
    27. Li MH, Lin L, Wang XL, Liu T: Protein-protein interaction site prediction based on conditional random fields. / Bioinformatics 2007, 23:597鈥?04. CrossRef
    28. Chen X, Jeong JC: Sequence-based Prediction of Protein Interaction Sites with an Integrative Method. / Bioinformatics 2009,25(5):585鈥?91. CrossRef
    29. Jones S, Thornton JM: Analysis of protein-protein interaction sites using surface patches. / Journal of molecular biology 1997, 272:121鈥?32. CrossRef
    30. Zhao XM, Li X, Chen L, Aihara K: Protein classification with imbalanced data. / Proteins 2008, 70:1125鈥?132. CrossRef
    31. Yan C, Dobbs D, Honavar V: Identification of Surface Residues Involved in Protein-Protein Interaction - A Support Vector Machine Approach. / Intelligent Systems Design and Applications 2003, 53鈥?2.
    32. Chakrabarti P, Janin J: Dissecting protein-protein recognition sites. / Proteins 2002, 47:334鈥?43. CrossRef
    33. Schneider R, Sander C: The HSSP database of protein structure-sequence alignments. / Nucleic Acids Res 1996, 24:201鈥?05. CrossRef
    34. Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. / Biopolymers 1983, 22:2577鈥?637. CrossRef
    35. Kawashima S, Ogata H, Kanehisa M: AAindex: amino acid index database. / Nucleic Acids Res 1999, 27:368鈥?69. CrossRef
    36. Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, Ben-Tal N: ConSurf: the projection of evolutionary conservation scores of residues on protein structures. / Nucleic Acids Res 2005, 33:W299-W302. CrossRef
    37. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. / Nucleic Acids Res 2000, 28:235鈥?42. CrossRef
    38. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. / Nucleic Acids Res 1997, 25:3389鈥?402. CrossRef
    39. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. / Nucleic Acids Res 2004,32(5):1792鈥?797. CrossRef
    40. Mayrose I, Graur D, Ben-Tal N, Pupko T: Comparison of site-specific rate-inference methods: Bayesian methods are superior. / Mol Biol Evol 2004, 21:1781鈥?791. CrossRef
    41. Keskin O, Ma B, Nussinov R: Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. / J Mol Biol 2005, 345:1281鈥?294. CrossRef
    42. Cho K, Kim D, Lee D: A feature-based approach to modeling protein-protein interaction hot spots. / Nucleic Acids Res 2009, 37:2672鈥?687. CrossRef
    43. del Sol A, Fujihashi H, Amoros D, Nussinov R: Residue centrality, functionally important residues and active site shape: analysis of enzyme and non-enzyme families. / Protein Sci 2006, 15:2120鈥?128. CrossRef
    44. Wen ZN, Li ML, Li YZ, Guo YZ, Wang KL: Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. / Amino Acids 2007, 32:277鈥?83. CrossRef
    45. Guo Y, Yu L, Wen Z, Li M: Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. / Nucleic Acids Research 2008,36(9):3025鈥?030. CrossRef
    46. Dong Q, Zhou S, Guan J: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. / Bioinformatics 2009,25(20):2655鈥?662. CrossRef
    47. Efron B: Bootstrap Methods: Another Look at the Jackknife. / The Annals of Statistics 1979,7(1):1鈥?6. CrossRef
    48. Tao D, Tang X, Li X, Wu X: Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval. / IEEE Transactions on Pattern Analysis and Machine Intelligence 2006,28(7):1088鈥?099. CrossRef
    49. Sayle RA, Milner-White EJ: RASMOL: Biomolecular graphics for all. / Trends in Biochemical Sciences 1995, 20:374鈥?76. CrossRef
    50. Webster RG, Bean WJ, Gorman OT, Chambers TM, Kawaoka Y: Evolution and ecology of influenza A viruses. / Microbiol Rev 1992, 56:152鈥?79.
    51. Horimoto T, Kawaoka Y: Influenza: Lessons from past pandemics, warnings from current incidents. / Nature Rev Microbiol 2005, 3:591鈥?00. CrossRef
    52. Lin YP, Shaw M, Gregory V, Cameron K, Lim W, Klimov A, Subbarao K, Guan Y, Krauss S, Shortridge K, Webster R, Cox N, Hay A: Avian-to-human transmission of H9N2 subtype influenza A viruses: Relationship between H9N2 and H5N1 human isolates. / Proc Natl Acad Sci 2000, 97:9654鈥?658. CrossRef
    53. Hale BG, Randall RE, Ortin J, Jackson D: The multifunctional NS1 protein of influenza A viruses. / Journal of General Virology 2008, 89:2359鈥?376. CrossRef
    54. Neumann G, Hughes MT, Kawaoka Y: Influenza A virus NS2 protein mediates vRNP nuclear export through NES-independent interaction with hCRM1. / EMBO J 2000, 19:6751鈥?758. CrossRef
    55. Schmitt AP, Lamb RA: Influenza Virus Assembly and Budding at the Viral Budozone. / Adv Virus Res 2005, 64:383鈥?16. CrossRef
    56. Wang XY, Basler CF, Williams BRG, Silverman RH, Palese P: Functional replacement of the carboxy-terminal two-thirds of the influenza A virus NS1 protein with short heterologous dimerization domains. / J Virol 2002, 76:12951鈥?2962. CrossRef
    57. Liu J, Lynch PA, Chien CY, Montelione GT, Krug RM, Berman HM: Crystal structure of the unique RNA-binding domain of the influenza virus NS1 protein. / Nat Struct Biol 1997, 4:896鈥?99. CrossRef
    58. Twu KY, Noah DL, Rao P, Kuo P, Krug RM: The CPSF30 Binding Site on the NS1A Protein of Influenza A Virus Is a Potential Antiviral Target. / JOURNAL OF VIROLOGY 2006,80(8):3957鈥?965. CrossRef
    59. Hale BG, Jackson D, Chen YH, Lamb RA, Randall RE: Influenza A virus NS1 protein binds p85 and activates phosphatidylinositol-3-kinase signaling. / Proc Natl Acad Sci 2006, 103:14194鈥?4199. CrossRef
    60. Min JY, Li S, Sen GC, Krug RM: A site on the influenza A virus NS1 protein mediates both inhibition of PKR activation and temporal regulation of viral RNA synthesis. / Virology 2007, 363:236鈥?43. CrossRef
    61. Akarsu H, Burmeister WP, Petosa C, Petit I, Muller CW, Ruigrok RW, Baudin F: Crystal structure of the M1 protein-binding domain of the influenza A virus nuclear export protein (NEP/NS2). / Embo J 2003, 22:4646鈥?655. CrossRef
    62. Darapaneni V, Prabhaker VK, Kukol A: Large-scale analysis of Influenza A virus sequences reveals potential drug-target sites of NS proteins. / Journal of General Virology 2009, 90:2124鈥?133. CrossRef
  • 作者单位:Lei Deng (1)
    Jihong Guan (1)
    Qiwen Dong (2) (3)
    Shuigeng Zhou (2) (3)

    1. Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
    2. Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, 200433, China
    3. School of Computer Science, Fudan University, Shanghai, 200433, China
  • ISSN:1471-2105
文摘
Background Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved. Results In this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites. Conclusion Our experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700