蛋白质结构域划分方法及在线服务综述
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A review of protein domain partitioning methods and online services
  • 作者:王燕 ; 石强 ; 薛志东
  • 英文作者:WANG Yan;SHI Qiang;XUE Zhi-dong;School of Life Science and Technology,Huazhong University of Science and Technology;School of Software Engineering,Huazhong University of Science and Technology;
  • 关键词:蛋白质 ; 结构域 ; 不连续结构域 ; 预测 ; 在线服务
  • 英文关键词:protein;;domain;;discontinuous domain;;prediction;;online service
  • 中文刊名:GUDZ
  • 英文刊名:Journal of Guangzhou University(Natural Science Edition)
  • 机构:华中科技大学生命学院;华中科技大学软件学院;
  • 出版日期:2019-02-15
  • 出版单位:广州大学学报(自然科学版)
  • 年:2019
  • 期:v.18;No.103
  • 基金:国家自然科学基金资助项目(61772217)
  • 语种:中文;
  • 页:GUDZ201901003
  • 页数:10
  • CN:01
  • ISSN:44-1546/N
  • 分类号:24-33
摘要
蛋白质结构域是研究蛋白质结构、功能与进化的基本单位,不同的结构域可组合出更为复杂的蛋白质分子.划分蛋白质结构域后,可以从结构域的角度研究蛋白质的结构、功能与进化,降低了研究复杂度.根据已知结构的蛋白质统计,有约40%的为多结构域蛋白质,其中还存在一级结构上不临近的氨基酸序列出现在同一个结构域的情况,即不连续结构域.文章给出了当前国内外有关蛋白质结构域边界预测、不连续结构域检测及结构域数据库与在线服务的研究进展,供相关研究者参考.
        Protein domains are the basic units for the structure, function and evolution of proteins. Different domains can be used to construct more complex protein molecules. After delineating the protein domain, the structure, function and evolution of the protein can be studied from the perspective of the domain, which reduces the research complexity. According to the protein statistics of known structures, about 40% are multi-domain proteins, in which there are also cases that amino acid sequences that are not structurally adjacent to each other appear in the same domain, i.e., discontinuous domains. This paper presents the current research progress on protein domain boundary prediction, discontinuous domain detection, domain database and online service at home and abroad, for relevant researchers reference.
引文
[1] Baker M.Proteomics:The interaction map[J].Nature,2012,484(7393):271-275.
    [2] Markus G.Structural genomics:Open collaboration is key to new drugs[J].Nature,2012,491(7422):40.
    [3] Rose P W,Prlic A,Altunkaya A,et al.The RCSB protein data bank:Integrative view of protein,gene and 3D structural information[J].Nucleic Acids Research,2017,45(D1):271-281.
    [4] Ww P D B C.Protein Data Bank:The single global archive for 3D macromolecular structure data[J].Nucleic Acids Research,2019,47(D1):520-528.
    [5] Xue Z,Jang R,Govindarajoo B,et al.Extending protein domain boundary predictors to detect discontinuous domains[J].PLoS One,2015,10(10):e0141541.
    [6] Wetlaufer D B.Nucleation,rapid folding,and globular intrachain regions in proteins[J].Proceedings of the National Academy of Sciences,1973,70(3):697-701.
    [7] Guo J T,Xu D,Kim D,et al.Improving the performance of domain parser for structural domain partition using neural network[J].Nucleic Acids Research,2003,31(3):944-952.
    [8] Xu Y,Xu D,Gabow H N.Protein domain decomposition using a graph-theoretic approach (vol 16,pg 1091,2000)[J].Bioinformatics,2001,17(3):290-290.
    [9] Xu Y,Xu D,Gambow H N.Protein domain decomposition using a graph-theoretic approach[J].Bioinformatics,2000,16(12):1091-1104.
    [10] Alexandrov N,Shindyalov I.PDP:Protein domain parser[J].Bioinformatics,2003,19(3):429-430.
    [11] Ebina T,Suzuki R,Tsuji R,et al.H-DROP:An SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection[J].Journal of Computer-Aided Molecular Design,2014,28(8):831-839.
    [12] Ebina T,Toh H,Kuroda Y.DROP:An SVM domain linker predictor trained with optimal features selected by random forest[J].Bioinformatics,2011,27(4):487-494.
    [13] Cheng J L,Sweredoski M J,Baldi P.DOMpro:Protein domain prediction using profiles,secondary structure,relative solvent accessibility,and recursive neural networks[J].Data Mining and Knowledge Discovery,2006,13(1):1-10.
    [14] Eickholt J,Deng X,Cheng J.DoBo:Protein domain boundary prediction by integrating evolutionary signals and machine learning[J].BMC Bioinformatics,2011,12:43.
    [15] Xue Z,Xu D,Wang Y,et al.ThreaDom:Extracting protein domain boundary information from multiple threading alignments[J].Bioinformatics,2013,29(13):247-256.
    [16] Sonnhammer E L,Eddy S R,Durbin R.Pfam:A comprehensive database of protein domain families based on seed alignments[J].Proteins,1997,28(3):405-420.
    [17] El-Gebali S,Mistry J,Bateman A,et al.The Pfam protein families database in 2019[J].Nucleic Acids Research,2019,47(D1):427-432.
    [18] Letunic I,Doerks T,Bork P.SMART:Recent updates,new developments and status in 2015[J].Nucleic Acids Research,2015,43:257-260.
    [19] Letunic I,Doerks T,Bork P.SMART 7:Recent updates to the protein domain annotation resource[J].Nucleic Acids Research,2012,40:302-305.
    [20] Andreeva A,Howorth D,Brenner S E,et al.SCOP database in 2004:Refinements integrate structure and sequence family data[J].Nucleic Acids Research,2004,32:226-229.
    [21] Murzin A G,Brenner S E,Hubbard T,et al.SCOP:A structural classification of proteins database for the investigation of sequences and structures[J].Journal of Molecular Biology,1995,247(4):536-540.
    [22] Cuff A L,Sillitoe I,Lewis T,et al.Extending CATH:Increasing coverage of the protein structure universe and linking structure with function[J].Nucleic Acids Research,2011,39:420-426.
    [23] Greene L H,Lewis T E,Addou S,et al.The CATH domain structure database:New protocols and classification levels give a more comprehensive resource for exploring evolution[J].Nucleic Acids Research,2007,35:291-297.
    [24] Mitchell A,Chang H Y,Daugherty L,et al.The InterPro protein families database:The classification resource after 15 years[J].Nucleic Acids Research,2015,43:213-221.
    [25] Wang Y,Wang J,Li R,et al.ThreaDomEx:A unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly[J].Nucleic Acids Research,2017,45(W1):400-407.
    [26] Veretnik S,Shindyalov I.Computational methods for domain partitioning of protein structures[J].Biological & Medical Physics Biomedical Engineering,2007:125-145.
    [27] Hubbard T J,Ailey B,Brenner S E,et al.SCOP,Structural classification of proteins database:Applications to evaluation of the effectiveness of sequence alignment methods and statistics of protein structural data[J].Acta Crystallogr D Biol Crystallogr,1998,54(Pt 6 Pt 1):1147-1154.
    [28] Lo Conte L,Ailey B,Hubbard T J,et al.SCOP:A structural classification of proteins database[J].Nucleic Acids Research,2000,28(1):257-259.
    [29] Orengo C A,Michie A D,Jones S,et al.CATH:A hierarchic classification of protein domain structures[J].Structure,1997,5(8):1093-1108.
    [30] Rossman M G,Liljas A.Letter:Recognition of structural domains in globular proteins[J].Journal of Molecular Biology,1974,85(1):177-181.
    [31] Crippen G.The tree structural organization of proteins[J].Journal of Molecular Biology,1979,126:315-332.
    [32] Rose G D.Hierarchic organization of domains in globular proteins[J].Journal of Molecular Biology,1979,134(3):447-470.
    [33] Wodak S J,Janin J.Location of structural domains in proteins[J].Biochemistry,1981,20(23):6544-6552.
    [34] Holm L,Sander C.Mapping the protein universe[J].Science,1996,273(5275):595-603.
    [35] Swindells M B.A procedure for detecting structural domains in proteins[J].Protein Science,1995,4(1):103-112.
    [36] Islam S A,Luo J,Sternberg M J.Identification and analysis of domains in proteins[J].Protein Engineering,1995,8(6):513-525.
    [37] Siddiqui A S,Barton G J.Continuous and discontinuous domains:An algorithm for the automatic generation of reliable protein domain definitions[J].Protein Science,1995,4(5):872-884.
    [38] Sowdhamini R,Blundell T L.An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins[J].Protein Science,1995,4(3):506-520.
    [39] Taylor W R.Protein structural domain identification[J].Protein Engineering,1999,12(3):203-216.
    [40] Wernisch L,Hunting M,Wodak S J.Identification of structural domains in proteins by a graph heuristic[J].Proteins-Structure Function and Genetics,1999,35(3):338-352.
    [41] Xuan Z Y,Ling L J,Chen R S.A new method for protein domain recognition[J].European Biophysics Journal,2000,29(1):7-16.
    [42] Berezovsky I N.Discrete structure of van der Waals domains in globular proteins[J].Protein Engineering,Design and Selection,2003,16(3):161-167.
    [43] Kundu S,Sorensen D C,Phillips G N.Automatic domain decomposition of proteins by a Gaussian Network Model[J].Proteins,2004,57(4):725-733.
    [44] Bondugula R,Lee M S,Wallqvist A.FIEFDom:A transparent domain boundary recognition system using a fuzzy mean operator[J].Nucleic Acids Research,2009,37(2):452-462.
    [45] Fiser A,Sali A.Modeller:Generation and refinement of homology-based protein structure models[J].Methods Enzymol,2003,374:461-491.
    [46] Sanchez R,Sali A.Evaluation of comparative protein structure modeling by MODELLER-3[J].Proteins,1997(S1):50-58.
    [47] Zhang Y.I-TASSER server for protein 3D structure prediction[J].BMC Bioinformatics,2008,9(1):40.
    [48] George R A,Heringa J.SnapDRAGON:A method to delineate protein structural domains from sequence data1[J].Journal of Molecular Biology,2002,316(3):839-851.
    [49] Kim D E,Chivian D,Malmstrom L,et al.Automated prediction of domain boundaries in CASP6 targets using Ginzu and RosettaDOM[J].Proteins-Structure Function and Bioinformatics,2005,61(S7):193-200.
    [50] Wu Y,Dousis A D,Chen M,et al.OPUS-Dom:Applying the folding-based method VECFOLD to determine protein domain boundaries[J].Journal of Molecular Biology,2009,385(4):1314-1329.
    [51] Joshi R R,Samant V V.Bayesian data mining of protein domains gives an efficient predictive algorithm and new insight[J].Journal of Molecular Biology,2007,13(1):275-282.
    [52] Suyama M,Ohara O.DomCut:Prediction of inter-domain linker regions in amino acid sequences[J].Bioinformatics,2003,19(5):673-674.
    [53] Dumontier M,Yao R,Feldman H J,et al.Armadillo:Domain boundary prediction by amino acid composition[J].Journal of Molecular Biology,2005,350(5):1061-1073.
    [54] Sim J,Kim S Y,Lee J.PPRODO:Prediction of protein domain boundaries using neural networks[J].Proteins,2005,59(3):627-632.
    [55] Yoo P D,Sikder A R,Taheri J,et al.DomNet:Protein domain boundary prediction using enhanced general regression network and new profiles[J].IEEE Trans Nanobioscience,2008,7(2):172-181.
    [56] Zou S,Huang Y,Wang Y,et al.A novel method for prediction of protein domain using distance-based maximal entropy[C]//Advances in Neural Networks-ISNN 2007,Berlin Heidelberg:Springer,2007:1264-1272.
    [57] Zou S,Huang Y,Wang Y,et al.Prediction of protein domains from sequence information using support vector machines[C]//Advances in Neural Networks-ISNN 2006,Berlin Heidelberg:Springer,2006:674-681.
    [58] Li B Q,Hu L L,Chen L,et al.Prediction of protein domain with mRMR feature selection and analysis[J].PLoS One,2012,7(6):e39308.
    [59] Zhang X Y,Lu L J,Song Q,et al.DomHR:Accurately identifying domain boundaries in proteins using a hinge region strategy[J].PLoS One,2013,8(4):e60559.
    [60] Cheng J.DOMAC:An accurate,hybrid protein domain prediction server[J].Nucleic Acids Research,2007,35:354-356.
    [61] Saini H K,Fischer D.Meta-DP:Domain prediction meta-server[J].Bioinformatics,2005,21(12):2917-2920.
    [62] Sikder A R,Zomaya A Y.Inferring boundary information of discontinuous-domain proteins[J].IEEE Trans Nanobioscience,2008,7(3):200-205.
    [63] Sigrist C J,Cerutti L,De Castro E,et al.PROSITE:A protein domain database for functional characterization and annotation[J].Nucleic Acids Research,2010,38:161-166.