Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns
详细信息    查看全文
  • 作者:Pieter Meysman (1) (2)
    Cheng Zhou (1)
    Boris Cule (1)
    Bart Goethals (1)
    Kris Laukens (1) (2)

    1. Advanced Database Research and Modelling (ADReM)
    ; Department of Mathematics and Computer Science ; University of Antwerp ; Antwerp ; Belgium
    2. Biomedical Informatics Research Center Antwerp (biomina)
    ; University of Antwerp/Antwerp University Hospital ; Edegem ; Belgium
  • 关键词:Protein structure ; Frequent pattern mining ; Thermostability ; Protein ; DNA complexes
  • 刊名:BioData Mining
  • 出版年:2015
  • 出版时间:June 2015
  • 年:2015
  • 卷:8
  • 期:1
  • 全文大小:1,717 KB
  • 参考文献:1. Jaenicke, R, B枚hm, G (1998) The stability of proteins in extreme environments. Curr Opin Struct Biol 8: pp. 738-48 CrossRef
    2. England, J, Shakhnovich, E (2003) Structural determinant of protein designability. Phys Rev Lett 90: pp. 218101 CrossRef
    3. Godzik, A, Kolinski, A, Skolnick, J (1992) Topology fingerprint approach to the inverse protein folding problem. J Mol Biol 227: pp. 227-38 CrossRef
    4. Selbig, J, Argos, P (1998) Relationships between protein sequence and structure patterns based on residue contacts. Proteins 31: pp. 172-85 CrossRef
    5. Hu J, Shen X, Shao Y, Bystroff C, Zaki MJ. Mining protein contact maps. In 2nd BIOKDD workshop on data mining in bioinformatics;2002.
    6. Wu, S, Zhang, Y (2008) A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. Bioinformatics (Oxford, England) 24: pp. 924-31 CrossRef
    7. Pollastri, G, Baldi, P (2002) Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics (Oxford, England) 18: pp. S62-70 CrossRef
    8. Zaki MJ, Bystroff C. Mining residue contacts in proteins using local structure predictions. In Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering. IEEE Comput. Soc;2000:168鈥?75.
    9. Hamilton, N, Burrage, K, Ragan, MA, Huber, T (2004) Protein contact prediction using patterns of correlation. Proteins 56: pp. 679-84 CrossRef
    10. Cheng, J, Randall, AZ, Sweredoski, MJ, Baldi, P (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic acids research 33: pp. W72-6 CrossRef
    11. Bacardit, J, Widera, P, M谩rquez-Chamorro, A, Divina, F, Aguilar-Ruiz, JS, Krasnogor, N (2012) Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features. Bioinformatics (Oxford, England) 28: pp. 2441-8 CrossRef
    12. Liu, Z-P, Wu, L-Y, Wang, Y, Zhang, X-S, Chen, L (2008) Bridging protein local structures and protein functions. Amino Acids 35: pp. 627-50 CrossRef
    13. Dhifli, W, Saidi, R, Nguifo, EM (2014) Smoothing 3D protein structure motifs through graph mining and amino acid similarities. J Comput Biol 21: pp. 162-72 CrossRef
    14. Huan, J, Wang, W, Bandyopadhyay, D, Snoeyink, J, Prins, J, Tropsha, A (2004) Mining Spatial Motifs from Protein Structure Graphs. Proceedings of the eighth annual international conference on Research in computational molecular biology. pp. 308-15
    15. Rahat, O, Alon, U, Levy, Y, Schreiber, G (2009) Understanding hydrogen-bond patterns in proteins using network motifs. Bioinformatics (Oxford, England) 25: pp. 2921-8 CrossRef
    16. Vacic, V, Iakoucheva, LM, Lonardi, S, Radivojac, P (2010) Graphlet kernels for prediction of functional residues in protein structures. J Comput Biol 17: pp. 55-72 CrossRef
    17. Kouranov, A, Xie, L, Cruz, J, Chen, L, Westbrook, J, Bourne, PE, Berman, HM (2006) The RCSB PDB information portal for structural genomics. Nucleic Acids Res 34: pp. D302-5 CrossRef
    18. Naulaerts S, Meysman P, Bittremieux W, Vu TN, Vanden Berghe W, Goethals B, Laukens K. A primer to frequent itemset mining for bioinformatics. Brief Bioinform. 2013:bbt074.
    19. Stelle, D, Barioni, MC, Scott, LP (2011) Using data mining to identify structural rules in proteins. Appl Math Comput 218: pp. 1997-2004 CrossRef
    20. Cule, B, Goethals, B, Robardet, C (2009) A new constraint for mining sets in sequences. Proceedings of the 2009 SIAM International Conference on Data Mining. SIAM, Philadelphia, US
    21. Zhou, C, Meysman, P, Cule, B, Laukens, K, Goethals, B (2013) Mining spatially cohesive itemsets in protein molecular structures.
    22. Madej, T, Lanczycki, CJ, Zhang, D, Thiessen, PA, Geer, RC, Marchler-Bauer, A, Bryant, SH (2014) MMDB and VAST+: tracking structural similarities between macromolecular complexes. Nucleic Acids Res 42: pp. D297-303 CrossRef
    23. Zhou, C, Meysman, P, Cule, B, Laukens, K, Goethals, B (2014) Discovery of Spatially Cohesive Itemsets in Three-dimensional Protein Structures. IEEE/ACM Transactions on Computational Biology and Bioinformatics PP: pp. 1 CrossRef
    24. G盲rtner, B Fast and robust smallest enclosing balls. In: Ne拧et艡il, J eds. (1999) Algorithms-ESA鈥?9. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 325-38
    25. Agrawal, R, Imieli艅ski, T, Swami, A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22: pp. 207-16 CrossRef
    26. Punta, M, Coggill, PC, Eberhardt, RY, Mistry, J, Tate, J, Boursnell, C, Pang, N, Forslund, K, Ceric, G, Clements, J, Heger, A, Holm, L, Sonnhammer, ELL, Eddy, SR, Bateman, A, Finn, RD (2012) The Pfam protein families database. Nucleic Acids Res 40: pp. D290-301 CrossRef
    27. S枚hngen, C, Bunk, B, Podstawka, A, Gleim, D, Overmann, J (2014) BacDive鈥搕he Bacterial Diversity Metadatabase. Nucleic Acids Res 42: pp. D592-9 CrossRef
    28. Zeldovich, KB, Berezovsky, IN, Shakhnovich, EI (2007) Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput Biol 3: pp. e5 CrossRef
    29. Goncearenco, A, Ma, B-G, Berezovsky, IN (2014) Molecular mechanisms of adaptation emerging from the physics and evolution of nucleic acids and proteins. Nucleic acids research 42: pp. 2879-92 CrossRef
    30. Mitchell, JBO, Thornton, JM, Singh, J, Price, SL (1992) Towards an understanding of the arginine-aspartate interaction. J Mol Biol 226: pp. 251-62 CrossRef
    31. Faure, G, Bornot, A, Brevern, A (2008) Protein contacts, inter-residue interactions and side-chain modelling. Biochimie 90: pp. 626-39 CrossRef
    32. Selvaraj, S, Gromiha, MM (2004) Importance of hydrophobic cluster formation through long-range contacts in the folding transition state of two-state proteins. Proteins 55: pp. 1023-35 CrossRef
    33. Baldwin, RL (2002) Making a network of hydrophobic clusters. Science (New York, NY) 295: pp. 1657-8 CrossRef
    34. Luscombe, NM, Laskowski, R, Thornton, JM (2001) Amino acid鈥揵ase interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res 29: pp. 2860-74 CrossRef
    35. Kono, H, Sarai, A (1999) Structure-based prediction of DNA target sites by regulatory proteins. Proteins 35: pp. 114-31 CrossRef
    36. Mahalingam, B, Louis, JM, Reed, CC, Adomat, JM, Krouse, J, Wang, YF (1999) Structural and kinetic analysis of drug resistant mutants of HIV-1 protease. Eur J Biochem 263: pp. 238-45 CrossRef
    37. Kumar, S, Tsai, C-J, Nussinov, R (2000) Factors enhancing protein thermostability. Protein Eng Des Sel 13: pp. 179-91 CrossRef
    38. Berezovsky, IN, Shakhnovich, EI (2005) Physics and evolution of thermophilic adaptation. Proc Natl Acad Sci U S A 102: pp. 12742-7 CrossRef
    39. Tokuriki, N, Oldfield, CJ, Uversky, VN, Berezovsky, IN, Tawfik, DS (2009) Do viral proteins possess unique biophysical features?. Trends Biochem Sci 34: pp. 53-9 CrossRef
    40. Klipcan, L, Safro, I, Temkin, B, Safro, M (2006) Optimal growth temperature of prokaryotes correlates with class II amino acid composition. FEBS Lett 580: pp. 1672-6 CrossRef
    41. Ma, B-G, Goncearenco, A, Berezovsky, IN (2010) Thermophilic adaptation of protein complexes inferred from proteomic homology modeling. Structure (London, England鈥? 1993) 18: pp. 819-28 CrossRef
    42. Berezovsky, IN, Chen, WW, Choi, PJ, Shakhnovich, EI (2005) Entropic stabilization of proteins and its proteomic consequences. PLoS Comput Biol 1: pp. e47 CrossRef
    43. Greaves, RB, Warwicker, J (2007) Mechanisms for stabilisation and the maintenance of solubility in proteins from thermophiles. BMC Struct Biol 7: pp. 18 CrossRef
    44. Berezovsky, IN, Zeldovich, KB, Shakhnovich, EI (2007) Positive and negative design in stability and thermal adaptation of natural proteins. PLoS Comput Biol 3: pp. e52 CrossRef
    45. Kumwenda, B, Litthauer, D, Bishop, OT, Reva, O (2013) Analysis of protein thermostability enhancing factors in industrially important thermus bacteria species. Evol Bioinformatics Online 9: pp. 327-42 CrossRef
    46. Glyakina, AV, Garbuzynskiy, SO, Lobanov, MY, Galzitskaya, OV (2007) Different packing of external residues can explain differences in the thermostability of proteins from thermophilic and mesophilic organisms. Bioinformatics (Oxford, England) 23: pp. 2231-8 CrossRef
    47. Schweiker, KL, Makhatadze, GI (2009) A computational approach for the rational design of stable proteins and enzymes: optimization of surface charge-charge interactions. Methods Enzymol 454: pp. 175-211 CrossRef
    48. Cambillau, C, Claverie, JM (2000) Structural and genomic correlates of hyperthermostability. J Biol Chem 275: pp. 32383-6 CrossRef
    49. Matthews, BW (1988) Protein-DNA interaction. No code for recognition. Nature 335: pp. 294-5 CrossRef
    50. Garvie, CW, Wolberger, C (2001) Recognition of Specific DNA Sequences. Mol Cell 8: pp. 937-46 CrossRef
    51. Benos, PV, Lapedes, AS, Stormo, GD (2002) Is there a code for protein-DNA recognition? Probab(ilistical)ly. BioEssays鈥? news and reviews in molecular, cellular and developmental biology 24: pp. 466-75 CrossRef
    52. Luscombe, N, Thornton, JM (2002) Protein鈥揇NA interactions: amino acid conservation and the effects of mutations on binding specificity. J Mol Biol 320: pp. 991-1009 CrossRef
    53. Nadassy, K, Wodak, SJ, Janin, J (1999) Structural features of protein-nucleic acid recognition sites. Biochemistry 38: pp. 1999-2017 CrossRef
    54. Max, KEA, Zeeb, M, Bienert, R, Balbach, J, Heinemann, U (2007) Common mode of DNA binding to cold shock domains. Crystal structure of hexathymidine bound to the domain-swapped form of a major cold shock protein from Bacillus caldolyticus. FEBS J 274: pp. 1265-79 CrossRef
  • 刊物主题:Computer Appl. in Life Sciences; Computational Biology/Bioinformatics; Data Mining and Knowledge Discovery; Bioinformatics; Algorithms;
  • 出版者:BioMed Central
  • ISSN:1756-0381
文摘
Background The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. Results Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. Conclusions The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700