Exact Protein Structure Classification Using the Maximum Contact Map Overlap Metric
详细信息    查看全文
  • 作者:Inken Wohlers (20)
    Mathilde Le Boudic-Jamin (21)
    Hristo Djidjev (22)
    Gunnar W. Klau (23)
    Rumen Andonov (21)
  • 关键词:k ; nearest neighbours ; metric spaces ; maximum contact map overlap ; automatic classification of proteins
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2014
  • 出版时间:2014
  • 年:2014
  • 卷:8542
  • 期:1
  • 页码:262-273
  • 参考文献:1. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician (1992)
    2. Andonov, R., Malod-Dognin, N., Yanev, N.: Maximum contact map overlap revisited. J. Comput. Biol.聽18(1), 27鈥?1 (2011) CrossRef
    3. Bernstein, F., Koetzle, T., Williams, G., Meyer Jr., E., Brice, M., Rodgers, J., Kennard, O., Shimanouchi, T., Tasumi, M.: The protein data bank: A computer-based archival file for macromolecular structures. J. of Mol. Biol.聽112, 535 (1977) CrossRef
    4. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters聽19, 255鈥?59 (1998) CrossRef
    5. Caprara, A., Carr, R., Istrail, S., Lancia, G., Walenz, B.: 1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap. J. Comput. Biol.聽11(1), 27鈥?2 (2004) CrossRef
    6. Csaba, G., Birzele, F., Zimmer, R.: Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct. Biol.聽9, 23鈥?3 (2009) CrossRef
    7. Godzik, A., Skolnick, J., Kolinski, A.: Regularities in interaction patterns of globular proteins. Protein Eng.聽6(8), 801鈥?10 (1993) CrossRef
    8. Harder, T., Borg, M., Boomsma, W., R酶gen, P., Hamelryck, T.: Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics聽28(4), 510鈥?15 (2012) CrossRef
    9. Hidovic, D., Pelillo, M.: Metrics for attributed graphs based on the maximal similarity common subgraph. IJPRAI聽18(3), 299鈥?13 (2004)
    10. Lathrop, R.H.: The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng.聽7(9), 1059鈥?068 (1994) CrossRef
    11. Malod-Dognin, N., Przulj, N.: Gr-align: fast and flexible alignment of protein 3d structures using graphlet degree similarity. Bioinformatics (2014)
    12. Malod-Dognin, N., Le Boudic-Jamin, M., Kamath, P., Andonov, R.: Using dominances for solving the protein family identification problem. In: Przytycka, T.M., Sagot, M.-F. (eds.) WABI 2011. LNCS, vol.聽6833, pp. 201鈥?12. Springer, Heidelberg (2011) CrossRef
    13. Moreno-Seco, F., Mico, L., Oncina, J.: A modification of the laesa algorithm for approximated k-nn classification. Pattern Recognition Letters聽24, 47鈥?3 (2003) CrossRef
    14. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol.聽247(4), 536鈥?40 (1995)
    15. Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH鈥揳 hierarchic classification of protein domain structures. Structure聽5(8), 1093鈥?108 (1997) CrossRef
    16. Pelta, D.A., Gonz谩lez, J.R., Moreno Vega, M.: A simple and fast heuristic for protein structure comparison. BMC Bioinformatics聽9, 161鈥?61 (2008) CrossRef
    17. Rogen, P., Fain, B.: Automatic classification of protein structure by using gauss integrals. Proceedings of the National Academy of Sciences of the United States of America聽100(1), 119鈥?24 (2003) CrossRef
    18. Wohlers, I., Boudic-Jamin, M.L., Djidjev, H., Klau, G.W., Andonov, R.: Exact protein structure classification using the maximum contact map overlap metric. Tech. Rep. LA-UR-14-20815, Los Alamos National Laboratory (2014)
    19. Wohlers, I., Malod-Dognin, N., Andonov, R., Klau, G.W.: CSA: comprehensive comparison of pairwise protein structure alignments. Nucleic Acids Research聽40(W1), W303鈥揥309 (2012)
    20. Xie, W., Sahinidis, N.V.: A reduction-based exact algorithm for the contact map overlap problem. J. Comput. Biol.聽14(5), 637鈥?54 (2007) CrossRef
  • 作者单位:Inken Wohlers (20)
    Mathilde Le Boudic-Jamin (21)
    Hristo Djidjev (22)
    Gunnar W. Klau (23)
    Rumen Andonov (21)

    20. Genome Informatics, University of Duisburg, Essen, Germany
    21. INRIA Rennes, Bretagne Atlantique and University of Rennes 1, France
    22. Los Alamos National Laboratory, Los Alamos, NM, USA
    23. Life Sciences, CWI, Science Park 123, 1098聽XG, Amsterdam, The Netherlands
  • ISSN:1611-3349
文摘
In this work we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows to avoid pairwise comparisons on the entire database and thus to significantly accelerate exploring the protein space compared to non-metric spaces. We show on a gold-standard classification benchmark set of 6,759 and 67,609 proteins, resp., that our exact k-nearest neighbor scheme classifies up to 95% and 99% of queries correctly. Our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on contact map overlap.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700