基于序列信息的蛋白质功能位点预测的算法开发
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蛋白质功能位点的识别对深入理解蛋白质的生物学功能具有重要的意义,应用各种计算方法对蛋白质功能位点进行预测是生物信息学中的一个重要课题。本文中,作者开展了两种蛋白功能位点(泛素化位点和锌离子结合位点)的预测研究。首先,作者根据酵母和人类中泛素化位点的序列特征,先后开发了分别针对酵母和人类泛素化位点的预测工具CKSAAP_UbSite和hCKSAAP_UbSite。然后,基于四个物种的数据集,作者对现有的泛素化位点预测工具的性能进行了系统的评价。最后,通过对蛋白质锌离子结合位点的序列特征进行分析,并整合多种预测方法和特征开发了一个基于序列信息的锌离子结合位点预测的新工具。
     作为一种重要的可逆的蛋白质翻译后修饰位点,蛋白质泛素化涉及众多的生物学过程并且与多种疾病紧密相连。对泛素化位点的识别是进一步了解泛素化相关生物学过程和分子机制的第一步,也是比较重要的一步。因此,作者根据酵母中泛素化位点周围的序列特征开发了一个名为CKSAAP_UbSite的酵母特异的泛素化位点预测工具。在CKSAAP_UbSite中,首次将CKSAAP编码应用到泛素化位点预测当中,并使用支持向量机建立预测模型。为了方便学术界使用,建立了一个在线服务器(http://protein.cau.edu.cn/cksaap_ubsite/)和开发相关软件来执行CKSAAP_UbSite算法。另外,CKSAAP_UbSite也可以被用来预测整个蛋白质组的泛素化位点。
     随着基于质谱手段的蛋白质组学技术的发展,数以万计的人类泛素化位点被实验测定。针对人类泛素化位点周围复杂的序列特征,作者通过整合多种互补的预测方法开发了一个人类特异的泛素化位点预测工具。首先,采用CKSAAP编码并用支持向量机建立一个预测模型。接着,为了进一步对人类泛素化位点周围的序列特征进行挖掘,作者使用支持向量机分别整合正交编码、理化性质编码和蛋白质聚集倾向性编码建立了三个预测模型。最后通过逻辑回归的方法对四个预测模型的结果进行整合建立hCKSAAP_UbSite。hCKSAAP_UbSite在5-折交叉检验(5-fold cross validation)中,其AUC (Area under the ROC curve)能够达到0.770。为了方便用户使用,hCKSAAP_UbSite算法被进一步整合到CKSAAP_UbSite的在线服务器中。
     近年来,许多泛素化位点预测工具被相继开发出来。但是这些工具之间有很大区别,具体表现在所采用的分类算法不一、所使用的特征不同和数据集来自不同的物种等方面,从而导致用户在选择这些工具时比较困惑。为了解决这一问题,作者搜集了四个不同物种的数据集,对五种工具的预测性能进行了全面比较分析。接着,作者从用户的角度对不同的工具的使用方便性做出了评价,用于指导用户快速高效地选择预测工具。最后,测试了一些常用编码特征对泛素化位点的预测能力,并对这些特征进行排序,从而找出在特定的物种中哪些特征具有较好的预测能力。
     作为一种重要的微量元素,锌离子与多种生物学过程和疾病紧密相关,锌离子对于蛋白质行使其功能具有重要的作用。由于锌离子重要的生物学功能,作者提出了一个新的基于序列信息的预测方法ZincExplorer来对锌离子结合位点进行预测。ZincExplorer是一个集成的算法,它整合了SVM-based predictor、Cluster-based predictor和Template-based predictor三种预测方法的结果,能够对四种残基(即CYS, HIS, ASP和GLU)进行预测。经过5-fold cross-validation测试,ZincExplorer的AURPC (Area under recall-precision Curve)值能够达到0.851,在Recall等于70%的情况下,其Precision可达到85.6%(Specificity=98.4%, MCC=0.747)。另外,ZincExplorer同时也能够对结合于同一个锌离子的多个残基的相互依赖关系(Interdependent relationships, IRs)进行识别。最后,作者建立了一个在线服务器(http://protein.cau.edu.cn/ZincExplorer/)来执行ZincExplorer算法,方便学术界免费使用。
Identification of protein functional sites is of great importance to further understand the biological function of protein molecules. In silico prediction of protein functional sites has become an important topic in the field of bioinformatics. In this thesis, the author focused on the prediction of two different protein functional sites (ubiquitination sites and zinc-binding sites). Firstly, according to the ubiquitina-tion site characteristics of yeast and human, the author developed two species-specific ubiquitination site prediction tools (CKSAAP_UbSite and hCKSAAP_UbSite). Then, the author conducted a compre-hensive evaluation on the existing ubiquitination site prediction tools based on four datasets from dif-ferent species. Finally, after the intensive feature analysis between zinc-binding sites and non zinc-binding sites, multiple prediction methods and features were integrated into a prediction tool named ZincExplorer.
     As one of the most important reversible protein post-translation modifications (PTMs), ubiquitina-tion has been reported to be involved in lots of biological processes and closely implicated with various diseases. To fully decipher the molecular mechanisms of ubiquitination-related biological processes, an initial but crucial step is the recognition of ubiquitylated substrates and the corresponding ubiquitination sites. At first, a new bioinformatics tool named CKSAAP_UbSite was developed to predict ubiquitina-tion sites from protein sequences in yeast. With the assistance of Support Vector Machine (SVM), the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs (CKSAAP) surrounding a query site (i.e. any lysine in a query sequence) as input. To facilitate the community's research, a web server of CKSAAP_UbSite was constructed and is freely available at http://protein.cau.edu.cn/cksaap_ubsite/, which can be further used for proteome-wide ubiquitination site identification.
     Recent developments in the mass spectrometry (MS)-based proteomics have greatly expedited proteome-wide analysis of PTMs, more than ten thousands of ubiquitination sites in human were deter-mined. According to the complicated sequence context of human ubiquitination sites, the author devel-oped a novel human-specific ubiquitination site predictor through the integration of multiple comple-mentary classifiers. Firstly, a SVM classier was constructed based on the CKSAAP encoding, which has been utilized in our previous yeast ubiquitination site predictor. To further exploit the pattern and prop-erties of the ubiquitination sites and their flanking residues, three additional SVM classifiers were con-structed using the binary amino acid encoding, the AAindex physicochemical property encoding and the protein aggregation propensity encoding, respectively. Through an integration that relied on logistic re-gression, the resulting predictor termed hCKSAAP_UbSite achieved an area under ROC curve (AUC) of0.770in5-fold cross-validation test on a class-balanced training dataset. To facilitate the users, hCKSAPP_UbSite has been integrated into the existing CKSAAP_UbSite server.
     In the past several years, a few tools have been developed for the prediction of ubiquitination sites, but users are frequently confused by the differences in the prediction algorithms adopted and the select- ed features as well as the performance in different species. To address this problem, the author first compared and analyzed five popular standalone/web-server tools on four large sets from different spe-cies. Then, the author summarized the usage convenience of the tools under investigation in order to guide the users to choose the tools more efficiently. Finally, the author tested most of the features used in previous prediction tools and ranked them according to their performance to find out which features make a significant contribution in predicting ubiquitination sites for a specific species.
     As one of the most important trace elements within an organism, zinc has been shown to be in-volved in numerous biological processes and closely implicated in various diseases. The zinc ion is im-portant for proteins to perform their functional roles. Motivated by the biological importance of zinc, the author proposed a new method called ZincExplorer to predict zinc-binding sites from protein se-quences. ZincExplorer is a hybrid method that can accurately predict zinc-binding sites from protein sequences. It integrates the outputs of three different types of predictors, namely, SVM-, cluster-and template-based predictors. Four types of zinc-binding amino acids CHEDs (i.e. CYS, HIS, ASP and GLU) could be predicted using ZincExplorer. It achieved a high AURPC (Area Under Recall-Precision Curve) of0.851, and a precision of85.6%(specificity=98.4%, MCC=0.747) at the70.0%recall for the CHEDs on the5-fold cross-validation test. Moreover, ZincExplorer could also identify the interde-pendent relationships (IRs) of the predicted zinc-binding sites bound to the same zinc ion, which makes it a useful tool for providing in-depth zinc-binding site annotation. To facilitate the research community, the online web server of ZincExplorer was constrcuted, which is freely accessible at http://protein.cau.edu.cn/ZincExplorer/.
引文
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang,J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST:a new generation of protein database search programs. Nucleic Acids Res.25,3389-3402.
    Amitai, G., Shemesh, A., Sitbon, E., Shklar, M., Netanely, D., Venger, I., and Pietrokovski, S. (2004). Network analysis of protein structures identifies functional residues. J. Mol. Biol.344,1135-1146.
    Andreini, C., Banci, L., Bertini, I., Elmi, S., and Rosato, A. (2007). Non-heme iron through the three domains of life. Proteins 67,317-324.
    Andreini, C., Banci, L., Bertini, I., and Rosato, A. (2006). Counting the zinc-proteins encoded in the human genome. J. Proteome Res.5,196-201.
    Andreini, C., Banci, L., Bertini, I., and Rosato, A. (2008). Occurrence of copper proteins through the three domains of life:a bioinformatic approach. J. Proteome Res.7,209-216.
    Anfinsen, C.B. (1973). Principles that govern the folding of protein chains. Science 181,223-230.
    Argos, P., Rao, J.K., and Hargrave, P.A. (1982). Structural prediction of membrane-bound proteins. Eur. J. Biochem.128,565-575.
    Babor, M., Gerzon, S., Raveh, B., Sobolev, V., and Edelman, M. (2008). Prediction of transition metal-binding sites from apo protein structures. Proteins 70,208-217.
    Bahar, I., Lezon, T.R., Yang, L.W., and Eyal, E. (2010). Global dynamics of proteins:bridging between structure and function. Annu. Rev. Biophys.39,23-42.
    Berman, H.M., Battistuz, T., Bhat, T.N., Bluhm, W.F., Bourne, P.E., Burkhardt, K., Feng, Z., Gilliland, G.L., Iype, L., Jain, S., et al. (2002). The Protein Data Bank. Acta Crystallogr., Sect. D:Biol. Crystallogr.58,899-907.
    Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Jr., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. (1977). The Protein Data Bank. A computer-based archival file for macromolecular structures. Eur. J. Biochem.80,319-324.
    Berry, E.A., Dalby, A.R., and Yang, Z.R. (2004). Reduced bio basis function neural network for identification of protein phosphorylation sites:comparison with pattern recognition algorithms. Comput. Biol. Chem.28,75-85.
    Bettmer, J. (2005). Metalloproteomics:a challenge for analytical chemists. Anal. Bioanal. Chem.383, 370-371.
    Bienkowska, J.R., Dalgin, G.S., Batliwalla, F., Allaire, N., Roubenoff, R., Gregersen, P.K., and Carulli, J.P. (2009). Convergent Random Forest predictor:methodology for predicting drug response from genome-scale data applied to anti-TNF response. Genomics 94,423-432.
    Blom, N., Gammeltoft, S., and Brunak, S. (1999). Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol.294,1351-1362.
    Blom, N., Sicheritz-Ponten, T., Gupta, R., Gammeltoft, S., and Brunak, S. (2004). Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4,1633-1649.
    Bordner, A.J. (2008). Predicting small ligand binding sites in proteins using backbone structure. Bioinformatics 24,2865-2871.
    Cai, Y, Huang, T., Hu, L., Shi, X., Xie, L., and Li, Y. (2012). Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 42,1387-1395.
    Capra, J.A., and Singh, M. (2007). Predicting functionally important residues from sequence conservation. Bioinformatics 23,1875-1882.
    Centor, R.M. (1991). Signal detectability:the use of ROC curves and their analyses. Med. Decis. Making 11,102-106.
    Chang, C.C., and Lin, C.J. (2011). LIBSVM:A library for support vector machines. ACM Trans. Intell. Syst. Technol.2,1-27.
    Chasapis, C.T., Loutsidou, A.C., Spiliopoulou, C.A., and Stefanidou, M.E. (2012). Zinc and human health:an update. Arch. Toxicol.86,521-534.
    Chen, K., Jiang, Y., Du, L., and Kurgan, L. (2009). Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs. J. Comput. Chem.30,163-172.
    Chen, K., Kurgan, L., and Rahbari, M. (2007a). Prediction of protein crystallization using collocation of amino acid pairs. Biochem. Biophys. Res. Commun.355,764-769.
    Chen, K., Kurgan, L.A., and Ruan, J. (2007b). Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Struct. Biol.7,25.
    Chen, K., Kurgan, L.A., and Ruan, J. (2008a). Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J. Comput. Chem.29,1596-1604.
    Chen, P.C., Na, C.H., and Peng, J. (2012). Quantitative proteomics to decipher ubiquitin signaling. Amino Acids 43,1049-1060.
    Chen, T., Zhou, T., He, B., Yu, H., Guo, X., Song, X., and Sha, J. (2014). mUbiSiDa:A Comprehensive Database for Protein Ubiquitination Sites in Mammals. PLoS One 9, e85744.
    Chen, X., Qiu, J.D., Shi, S.P., Suo, S.B., Huang, S.Y., and Liang, R.P. (2013a). Incorporating Key Position and Amino Acid Residue Features to Identify General and Species-specific Ubiquitin Conjugation Sites. Bioinformatics 29,1614-1622.
    Chen, Y.Z., Tang, Y.R., Sheng, Z.Y., and Zhang, Z. (2008b). Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs. BMC Bioinformatics 9,101.
    Chen, Z., Chen, Y.Z., Wang, X.F., Wang, C., Yan, R.X., and Zhang, Z. (2011). Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One 6, e22930.
    Chen, Z., Wang, Y, Zhai, Y.F., Song, J., and Zhang, Z. (2013b). ZincExplorer:an accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences. Mol. Biosyst.9, 2213-2222.
    Chen, Z., Zhou, Y., Song, J., and Zhang, Z. (2013c). hCKSAAP_UbSite:Improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. Biochim. Biophys. Acta 1834, 1461-1467.
    Chen, Z.J., and Sun, L.J. (2009). Nonproteolytic functions of ubiquitin in cell signaling. Mol, Cell 33, 275-286.
    Chernorudskiy, A.L., Garcia, A., Eremin, E.V., Shorina, A.S., Kondratieva, E.V., and Gainullin, M.R. (2007). UbiProt:a database of ubiquitylated proteins. BMC Bioinformatics 8,126.
    Chien, T.Y., Chang, D.T., Chen, C.Y., Weng, Y.Z., and Hsu, C.M. (2008). E1DS:catalytic site prediction based on 1D signatures of concurrent conservation. Nucleic Acids Res.36, W291-296.
    Coleman, J.E. (1992). Zinc proteins:enzymes, storage proteins, transcription factors, and replication proteins. Annu. Rev. Biochem.61,897-946.
    Crooks, G.E., Hon, G., Chandonia, J.M., and Brenner, S.E. (2004). WebLogo:a sequence logo generator. Genome Res.14,1188-1190.
    Danielsen, J.M., Sylvestersen, K.B., Bekker-Jensen, S., Szklarczyk, D., Poulsen, J.W., Horn, H., Jensen, L.J., Mailand, N., and Nielsen, M.L. (2011). Mass spectrometric analysis of lysine ubiquitylation reveals promiscuity at site level. Mol. Cell. Proteomics 10, M110 003590.
    Degtyarenko, K. (2000). Bioinorganic motifs:towards functional classification of metalloproteins. Bioinformatics 16,851-864.
    Dekker, J.P., Fodor, A., Aldrich, R.W., and Yellen, G. (2004). A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments. Bioinformatics 20,1565-1572.
    Dikic, I., Wakatsuki, S., and Walters, K.J. (2009). Ubiquitin-binding domains-from structures to functions. Nat. Rev. Mol. Cell Biol.10,659-671.
    Ding, C., and Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol.3,185-205.
    Dinkel, H., Chica, C., Via, A., Gould, C.M., Jensen, L.J., Gibson, T.J., and Diella, F. (2011). Phospho.ELM:a database of phosphorylation sites--pdate 2011. Nucleic Acids Res.39, D261-267.
    Dinkel, H., Van Roey, K., Michael, S., Davey, N.E., Weatheritt, R.J., Born, D., Speck, T., Kruger, D., Grebnev, G., Kuban, M., et al. (2014). The eukaryotic linear motif resource ELM:10 years and counting. Nucleic Acids Res.42, D259-266.
    Dosztanyi, Z., Csizmok, V, Tompa, P., and Simon, I. (2005). The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J. Mol. Biol.347,827-839.
    Du, Y., Xu, N., Lu, M., and Li, T. (2011). hUbiquitome:a database of experimentally verified ubiquitination cascades in humans. Database (Oxford) 2011, bar055.
    Dukka, B.K., and Livesay, D.R. (2008). Improving position-specific predictions of protein functional sites using phylogenetic motifs. Bioinformatics 24,2308-2316.
    Dye, B.T., and Schulman, B.A. (2007). Structural mechanisms underlying posttranslational modification by ubiquitin-like proteins. Annu. Rev. Biophys. Biomol. Struct.36,131-150.
    Ebert, J.C., and Altman, R.B. (2008). Robust recognition of zinc binding sites in proteins. Protein Sci. 17,54-65.
    Eisenberg, D. (1984). Three-dimensional structure of membrane and surface proteins. Annu. Rev. Biochem.53,595-623.
    Fauchere, J.L., Charton, M., Kier, L.B., Verloop, A., and Pliska, V. (1988). Amino acid side chain parameters for correlation studies in biology and pharmacology. Int. J. Pept. Protein Res.32, 269-278.
    Finley, D. (2009). Recognition and processing of ubiquitin-protein conjugates by the proteasome. Annu. Rev. Biochem.78,477-513.
    Fischer, J.D., Mayer, C.E., and Soding, J. (2008). Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 24,613-620.
    Fukushima, K. (1975). Cognitron:a self-organizing multilayered neural network. Biol. Cybern.20, 121-136.
    Garavelli, J.S. (2003). The RESID Database of Protein Modifications:2003 developments. Nucleic Acids Res.31,499-501.
    Garbuzynskiy, S.O., Lobanov, M.Y., and Galzitskaya, O.V. (2010). FoldAmyloid:a method of prediction of amyloidogenic regions from protein sequence. Bioinformatics 26,326-332.
    Gentry, M.S., Worby, C.A., and Dixon, J.E. (2005). Insights into Lafora disease:malin is an E3 ubiquitin ligase that ubiquitinates and promotes the degradation of laforin. Proc. Natl. Acad. Sci. U. S.A.102,8501-8506.
    George, D.G., Dodson, R.J., Garavelli, J.S., Haft, D.H., Hunt, L.T., Marzec, C.R., Orcutt, B.C., Sidman, K.E., Srinivasarao, G.Y., Yeh, L.S., et al. (1997). The Protein Information Resource (PIR) and the PIR-International Protein Sequence Database. Nucleic Acids Res.25,24-28.
    Glickman, M.H., and Ciechanover, A. (2002). The ubiquitin-proteasome proteolytic pathway: destruction for the sake of construction. Physiol. Rev.82,373-428.
    Gobel, U., Sander, C., Schneider, R., and Valencia, A. (1994). Correlated mutations and residue contacts in proteins. Proteins 18,309-317.
    Gong, H., Liu, X., Wu, J., and He, Z. (2013). Data construction for phosphorylation site prediction. Brief Bioinform, (DOI:10.1093/bib/bbt012).
    Gong, S., Park, C., Choi, H., Ko, J., Jang, I., Lee, J., Bolser, D.M., Oh, D., Kim, D.S., and Bhak, J. (2005). A protein domain interaction interface database:InterPare. BMC Bioinformatics 6,207.
    Goyal, K., and Mande, S.C. (2008). Exploiting 3D structural templates for detection of metal-binding sites in protein structures. Proteins 70,1206-1218.
    Gribskov, M., and Robinson, N.L. (1996). Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput. Chem.20,25-33.
    Gutteridge, A., Bartlett, G.J., and Thornton, J.M. (2003). Using a neural network and spatial clustering to predict the location of active sites in enzymes. J. Mol. Biol.330,719-734.
    Hagai, T., Azia, A., Toth-Petroczy, A., and Levy, Y. (2011). Intrinsic disorder in ubiquitination substrates. J. Mol. Biol.412,319-324.
    Hagai, T., and Levy, Y. (2010). Ubiquitin not only serves as a tag but also assists degradation by inducing protein unfolding. Proc. Natl. Acad. Sci. U. S. A.107,2001-2006.
    Hagai, T., Toth-Petroczy, A., Azia, A., and Levy, Y. (2012). The origins and evolution of ubiquitination sites. Mol. Biosyst.8,1865-1877.
    Haglund, K., and Dikic, I. (2005). Ubiquitylation and cell signaling. EMBO J.24,3353-3359.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I.H. (2009). The WEKA data mining software:an update. SIGKDD Explor. Newsl.11,10-18.
    Halperin, I., Glazer, D.S., Wu, S., and Altman, R.B. (2008). The FEATURE framework for protein function annotation:modeling new functions, improving performance, and extending to novel applications. BMC Genomics 9 Suppl 2, S2.
    Han, L., Zhang, Y.J., Song, J., Liu, M.S., and Zhang, Z. (2012a). Identification of Catalytic Residues Using a Novel Feature that Integrates the Microenvironment and Geometrical Location Properties of Residues. PLoS One 7, e41370.
    Han, Y, Lee, H., Park, J.C., and Yi, G.S. (2012b). E3Net:a system for exploring E3-mediated regulatory networks of cellular functions. Mol. Cell. Proteomics 11, O111 014076.
    Harris, J.L., Alper, P.B., Li, J., Rechsteiner, M., and Backes, B.J. (2001). Substrate specificity of the human proteasome. Chem. Biol.8,1131-1141.
    Hershko, A., and Ciechanover, A. (1998). The ubiquitin system. Annu. Rev. Biochem.67,425-479.
    Hicke, L. (2001). Protein regulation by monoubiquitin. Nat. Rev. Mol. Cell Biol.2,195-201.
    Jennissen, H.P. (1995). Ubiquitin and the enigma of intracellular protein degradation. Eur. J. Biochem. 231,1-30.
    Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (Edmonton, Alberta, Canada:ACM), pp.133-142.
    Johnson, E.S., Ma, P.C., Ota, I.M., and Varshavsky, A. (1995). A proteolytic pathway that recognizes ubiquitin as a degradation signal. J. Biol. Chem.270,17442-17456.
    Jones, D.T. (1999a). GenTHREADER:an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol.287,797-815.
    Jones, D.T. (1999b). Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol.292,195-202.
    Kaganovich, D., Kopito, R., and Frydman, J. (2008). Misfolded proteins partition between two distinct quality control compartments. Nature 454,1088-1095.
    Kawashima, S., and Kanehisa, M. (2000). AAindex:amino acid index database. Nucleic Acids Res.28, 374.
    Kawashima, S., Ogata, H., and Kanehisa, M. (1999). AAindex:Amino Acid Index Database. Nucleic Acids Res.27,368-369.
    Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., and Kanehisa, M. (2008). AAindex:amino acid index database, progress report 2008. Nucleic Acids Res.36, D202-205.
    Keshava Prasad, T.S., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., Telikicherla, D., Raju, R., Shafreen, B., Venugopal, A., et al. (2009). Human Protein Reference Database--2009 update. Nucleic Acids Res.37, D161-112.
    Kim, D.Y., Scalf, M., Smith, L.M., and Vierstra, R.D. (2013). Advanced proteomic analyses yield a deep catalog of ubiquitylation targets in Arabidopsis. Plant Cell 25,1523-1540.
    Kim, P.K., Hailey, D.W., Mullen, R.T., and Lippincott-Schwartz, J. (2008). Ubiquitin signals autophagic degradation of cytosolic proteins and peroxisomes. Proc. Natl. Acad. Sci. U. S. A.105, 20567-20574.
    Kim, W., Bennett, E.J., Huttlin, E.L., Guo, A., Li, J., Possemato, A., Sowa, M.E., Rad, R., Rush, J., Comb, M.J., et al. (2011). Systematic and quantitative assessment of the ubiquitin-modified proteome. Mol. Cell 44,325-340.
    Kumar, K.K., Pugalenthi, G., and Suganthan, P.N. (2009). DNA-Prot:identification of DNA binding proteins from protein sequence information using random forest. J. Biomol. Struct. Dyn.26, 679-686.
    La, D., Sutch, B., and Livesay, D.R. (2005). Predicting protein functional sites with phylogenetic motifs. Proteins 58,309-320.
    Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K.,' Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409,860-921.
    Lee, H., Yi, G.S., and Park, J.C. (2008a). E3Miner:a text mining tool for ubiquitin-protein ligases. Nucleic Acids Res.36, W416-422.
    Lee, M.H., Zhao, R., Phan, L., and Yeung, S.C. (2011a). Roles of COP9 signalosome in cancer. Cell Cycle 10,3057-3066.
    Lee, T.Y., Chen, S.A., Hung, H.Y., and Ou, Y.Y. (2011b). Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites. PLoS One 6, el7331.
    Lee, W.C., Lee, M., Jung, J.W., Kim, K.P., and Kim, D. (2008b). SCUD:Saccharomyces cerevisiae ubiquitination database. BMC Genomics 9,440.
    Li, H., Xing, X., Ding, G., Li, Q., Wang, C., Xie, L., Zeng, R., and Li, Y. (2009). SysPTM:a systematic resource for proteomic research on post-translational modifications. Mol. Cell. Proteomics 8, 1839-1849.
    Li, W., Bengtson, M.H., Ulbrich, A., Matsuda, A., Reddy, V.A., Orth, A., Chanda, S.K., Batalov, S., and Joazeiro, C.A. (2008). Genome-wide and functional annotation of human E3 ubiquitin ligases identifies MULAN, a mitochondrial E3 that regulates the organelle's dynamics and signaling. PLoS One 3, e1487.
    Li, Y, Kong, Y, Zhou, Z., Chen, H., Wang, Z., Hsieh, Y.C., Zhao, D., Zhi, X., Huang, J., Zhang, J., et al. (2013). The HECTD3 E3 ubiquitin ligase facilitates cancer cell survival by promoting K63-linked polyubiquitination of caspase-8. Cell Death Dis.4, (DOI:10.1038/cddis.2013.464).
    Liu, Z.P., Wu, L.Y., Wang, Y., Zhang, X.S., and Chen, L. (2010). Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics 26,1616-1622.
    Lobinski, R., Moulin, C., and Ortega, R. (2006). Imaging and speciation of trace elements in biological environment. Biochimie 88,1591-1604.
    Lockless, S.W., and Ranganathan, R. (1999). Evolutionaly conserved pathways of energetic connectivity in protein families. Science 286,295-299.
    Lu, C.T., Huang, K.Y., Su, M.G., Lee, T.Y., Bretana, N.A., Chang, W.C., Chen, Y.J., Chen, Y.J., and Huang, H.D. (2013). DbPTM 3.0:an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res.41, D295-305.
    Mann, M., and Jensen, O.N. (2003). Proteomic analysis of post-translational modifications. Nat. Biotechnol.21,255-261.
    McGuffin, L.J., Bryson, K., and Jones, D.T. (2000). The PSIPRED protein structure prediction server. Bioinformatics 16,404-405.
    Mertins, P., Qiao, J.W., Patel, J., Udeshi, N.D., Clauser, K.R., Mani, D.R., Burgess, M.W., Gillette, M.A., Jaffe, J.D., and Carr, S.A. (2013). Integrated proteomic analysis of post-translational modifications by serial enrichment. Nat. Methods 10,634-637.
    Mittelman, D., Sadreyev, R., and Grishin, N. (2003). Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments. Bioinformatics 19,1531-1539.
    Mocchegiani, E., Muzzioli, M., and Giacconi, R. (2000). Zinc and immunoresistance to infection in aging:new biological tools. Trends Pharmacol. Sci.21,205-208.
    Mooney, S.D., Liang, M.H., DeConde, R., and Altman, R.B. (2005). Structural characterization of proteins using residue environments. Proteins 61,741-747.
    Neduva, V., Linding, R., Su-Angrand, I., Stark, A., de Masi, F., Gibson, T.J., Lewis, J., Serrano, L., and Russell, R.B. (2005). Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS Biol.3, e405.
    Neutzner, M., and Neutzner, A. (2012). Enzymes of ubiquitination and deubiquitination. Essays. Biochem.52,37-50.
    Obradovic, Z., Peng, K., Vucetic, S., Radivojac, P., and Dunker, A.K. (2005). Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 61 Suppl 7,176-182.
    Pagani, I., Liolios, K., Jansson, J., Chen, I.M., Smirnova, T., Nosrat, B., Markowitz, V.M., and Kyrpides, N.C. (2012). The Genomes OnLine Database (GOLD) v.4:status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res.40, D571-579.
    Passerini, A., Andreini, C., Menchetti, S., Rosato, A., and Frasconi, P. (2007). Predicting zinc binding at the proteome level. BMC Bioinformatics 8,39.
    Passerini, A., Lippi, M., and Frasconi, P. (2011). MetalDetector v2.0:predicting the geometry of metal binding sites from protein sequence. Nucleic Acids Res.39, W288-292.
    Passerini, A., Punta, M., Ceroni, A., Rost, B., and Frasconi, P. (2006). Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks. Proteins 65,305-316.
    Pavlidis, P., Wapinski, I., and Noble, W.S. (2004). Support vector machine classification on the web. Bioinformatics 20,586-587.
    Pechmann, S., Levy, E.D., Tartaglia, G.G., and Vendruscolo, M. (2009). Physicochemical principles that regulate the competition between functional and dysfunctional association of proteins. Proc. Natl. Acad. Sci. U. S. A.106,10159-10164.
    Peng, J., Schwartz, D., Elias, J.E., Thoreen, C.C., Cheng, D., Marsischky, G., Roelofs, J., Finley, D., and Gygi, S.P. (2003). A proteomics approach to understanding protein ubiquitination. Nat. Biotechnol. 21,921-926.
    Peng, K., Radivojac, P., Vucetic, S., Dunker, A.K., and Obradovic, Z. (2006). Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics 7,208.
    Petrova, N.V., and Wu, C.H. (2006). Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties. BMC Bioinformatics 7,312.
    Pickart, C.M. (2001a). Mechanisms underlying ubiquitination. Annu. Rev. Biochem 70,503-533.
    Pickart, C.M. (2001b). Ubiquitin enters the new millennium. Mol. Cell 8,499-504.
    Pickart, C.M., and Eddins, M.J. (2004). Ubiquitin:structures, functions, mechanisms. Biochim. Biophys. Acta 1695,55-72.
    Pickart, C.M., and Fushman, D. (2004). Polyubiquitin chains:polymeric protein signals. Curr. Opin. Chem.Biol.8,610-616.
    Porter, C.T., Bartlett, G.J., and Thornton, J.M. (2004). The Catalytic Site Atlas:a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 32, D129-133.
    Qi, Y., Klein-Seetharaman, J., and Bar-Joseph, Z. (2005). Random forest similarity for protein-protein interaction prediction from multiple sources. Pac. Symp. Biocomput,531-542.
    Radivojac, P., Vacic, V., Haynes, C., Cocklin, R.R., Mohan, A., Heyen, J.W., Goebl, M.G., and Iakoucheva, L.M. (2010). Identification, analysis, and prediction of protein ubiquitination sites. Proteins 78,365-380.
    Rink, L., and Gabriel, P. (2000). Zinc and the immune system. Proc. Nutr. Soc.59,541-552.
    Rost, B. (1999). Twilight zone of protein sequence alignments. Protein Eng.12,85-94.
    Sadowski, M., Suryadinata, R., Lai, X., Heierhorst, J., and Sarcevic, B. (2010). Molecular basis for lysine specificity in the yeast ubiquitin-conjugating enzyme Cdc34. Mol. Cell Biol.30,2316-2329.
    Sanger, F., Nicklen, S., and Coulson, A.R. (1992). DNA sequencing with chain-terminating inhibitors. 1977. Biotechnology 24,104-108.
    Sanz-Medel, A. (2005). From metalloproteomics to heteroatom-tagged proteomics. Anal. Bioanal. Chem.381,1-2.
    Schneider, T.D., and Stephens, R.M. (1990). Sequence logos:a new way to display consensus sequences. Nucleic Acids Res.18,6097-6100.
    Shao, J., Xu, D., Tsai, S.N., Wang, Y., and Ngai, S.M. (2009). Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS One 4, e4920.
    Sharifi, H.J., Furuya, A.M., and de Noronha, C.M. (2012). The role of HIV-1 Vpr in promoting the infection of nondividing cells and in cell cycle arrest. Curr. Opin. HIV AIDS 7,187-194.
    Shi, W., and Chance, M.R. (2008). Metallomics and metalloproteomics. Cell Mol. Life Sci.65, 3040-3048.
    Shi, W., Zhan, C., Ignatov, A., Manjasetty, B.A., Marinkovic, N., Sullivan, M., Huang, R., and Chance, M.R. (2005). Metalloproteomics:high-throughput structural and functional annotation of proteins in structural genomics. Structure 13,1473-1486.
    Shu, N., Zhou, T., and Hovmoller, S. (2008). Prediction of zinc-binding sites in proteins from sequence. Bioinformatics 24,775-782.
    Sigrist, C.J., de Castro, E., Cerutti, L., Cuche, B.A., Hulo, N., Bridge, A., Bougueleret, L., and Xenarios, I. (2013). New and continuing developments at PROSITE. Nucleic Acids Res.41, D344-347.
    Spiro, R.G. (2002). Protein glycosylation:nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds. Glycobiology 12,43R-56R.
    Starita, L.M., Lo, R.S., Eng, J.K., von Haller, P.D., and Fields, S. (2012). Sites of ubiquitin attachment in Saccharomyces cerevisiae. Proteomics 12,236-240.
    Stefanidou, M., Maravelias, C., Dona, A., and Spiliopoulou, C. (2006). Zinc:a multipurpose trace element. Arch. Toxicol.80,1-9.
    Tainer, J.A., Roberts, V.A., and Getzoff, E.D. (1991). Metal-binding sites in proteins. Curr. Opin. Biotechnol.2,582-591.
    Tang, Y.R., Chen, Y.Z., Canchaya, C.A., and Zhang, Z. (2007). GANNPhos:a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Eng. Des. Sel.20, 405-412.
    Tang, Y.R., Sheng, Z.Y., Chen, Y.Z., and Zhang, Z. (2008). An improved prediction of catalytic residues in enzyme structures. Protein Eng. Des. Sel.21,295-302.
    Temporini, C., Calleri, E., Massolini, G., and Caccialanza, G. (2008). Integrated analytical strategies for the study of phosphorylation and glycosylation in proteins. Mass. Spectrom. Rev.27,207-236.
    Tian, W., Li, B., Warrington, R., Tomchick, D.R., Yu, H., and Luo, X. (2012). Structural analysis of human Cdc20 supports multisite degron recognition by APC/C. Proc. Natl. Acad. Sci. U. S. A.109, 18419-18424.
    Tokunaga, F., and Iwai, K. (2012). LUBAC, a novel ubiquitin ligase for linear ubiquitination, is crucial for inflammation and immune responses. Microbes Infect.14,563-572.
    Tomlinson, E., Palaniyappan, N., Tooth, D., and Layfield, R. (2007). Methods for the purification of ubiquitinated proteins. Proteomics 7,1016-1022.
    Torrance, J.W., Macarthur, M.W., and Thornton, J.M. (2008). Evolution of binding sites for zinc and calcium ions playing structural roles. Proteins 71,813-830.
    Tung, C.W., and Ho, S.Y. (2008). Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics 9,310.
    Tupler, R., Perini, G., and Green, M.R. (2001). Expressing the human genome. Nature 409,832-833.
    Udeshi, N.D., Svinkina, T., Mertins, P., Kuhn, E., Mani, D.R., Qiao, J.W., and Carr, S.A. (2013). Refined preparation and use of anti-diglycine remnant (K-epsilon-GG) antibody enables routine quantification of 10,000s of ubiquitination sites in single proteomics experiments. Mol. Cell. Proteomics 12,825-831.
    UniProt, C. (2011). Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res.39, D214-219.
    van Erkel, A.R., and Pattynama, P.M. (1998). Receiver operating characteristic (ROC) analysis:basic principles and applications in radiology. Eur. J. Radiol.27,88-94.
    van Wijk, S.J., de Vries, S.J., Kemmeren, P., Huang, A., Boelens, R., Bonvin, A.M., and Timmers, H.T. (2009). A comprehensive framework of E2-RING E3 interactions of the human ubiquitin-proteasome system. Mol. Syst. Biol.5,295.
    Varshavsky, A. (1992). The N-end rule. Cell 69,725-735.
    Varshavsky, A. (2008). Discovery of cellular regulation by protein degradation. J. Biol. Chem.283, 34469-34489.
    Vasak, M., and Hasler, D.W. (2000). Metallothioneins:new functional and structural insights. Curr. Opin. Chem. Biol.4,177-183.
    Verger, A., Perdomo, J., and Crossley, M. (2003). Modification with SUMO. A role in transcriptional regulation. Embo. Rep.4,137-142.
    Verma, R., McDonald, H., Yates, J.R.,3rd, and Deshaies, R.J. (2001). Selective degradation of ubiquitinated Sicl by purified 26S proteasome yields active S phase cyclin-Cdk. Mol. Cell 8, 439-448
    Wagner, S.A., Beli, P., Weinert, B.T., Nielsen, M.L., Cox, J., Mann, M., and Choudhary, C. (2011). A proteome-wide, quantitative survey of in vivo ubiquitylation sites reveals widespread regulatory roles. Mol. Cell. Proteomics 10, M111 013284.
    Wagner, S.A., Beli, P., Weinert, B.T., Scholz, C., Kelstrup, C.D., Young, C., Nielsen, M.L., Olsen, J.V., Brakebusch, C., and Choudhary, C. (2012). Proteomic analyses reveal divergent ubiquitylation site patterns in murine tissues. Mol. Cell. Proteomics 11,1578-1585.
    Wang, X.F., Chen, Z., Wang, C., Yan, R.X., Zhang, Z., and Song, J. (2011). Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach. PLoS One 6,e26767.
    Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F., and Jones, D.T. (2004). Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol.337,635-645.
    Welchman, R.L., Gordon, C., and Mayer, R.J. (2005). Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat. Rev. Mol. Cell Biol.6,599-609.
    Wu, S., and Zhang, Y. (2008). A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. Bioinformatics 24,924-931.
    Xu, J., He, Y., Qiang, B., Yuan, J., Peng, X., and Pan, X.M. (2008). A novel method for high accuracy sumoylation site prediction from protein sequences. BMC Bioinformatics 9,8.
    Xue, B., Dunbrack, R.L., Williams, R.W., Dunker, A.K., and Uversky, V.N. (2010). PONDR-FIT:a meta-predictor of intrinsically disordered amino acids. Biochim. Biophys. Acta 1804,996-1010.
    Xue, Y., Li, A., Wang, L., Feng, H., and Yao, X. (2006). PPSP:prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinformatics 7,163.
    Xue, Y, Ren, J., Gao, X., Jin, C., Wen, L., and Yao, X. (2008). GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol. Cell. Proteomics 7,1598-1608.
    Yan, R.X., Si, J.N., Wang, C., and Zhang, Z. (2009). DescFold:a web server for protein fold recognition. BMC Bioinformatics 10,416.
    Yang, J., Roy, A., and Zhang, Y. (2013). BioLiP:a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res.41, D1096-1103.
    Yang, X.G., and Feng, Z.P. (2002). Predicting membrane protein types using residue-pair models based on reduced similarity dataset. J. Biomol. Struct. Dyn.20,163-172.
    Yang, X.G., Luo, R.Y., and Feng, Z.P. (2007). Using amino acid and peptide composition to predict membrane protein types. Biochem. Biophys. Res. Commun.353,164-169.
    Ye, Y, and Rape, M. (2009). Building ubiquitin chains:E2 enzymes at work. Nat. Rev. Mol. Cell Biol. 10,755-764.
    Youn, E., Peters, B., Radivojac, P., and Mooney, S.D. (2007). Evaluation of features for catalytic residue prediction in novel folds. Protein Sci.16,216-226.
    Zhang, J.P., Bloedorn, E., Rosen, L., and Venese, D. (2004). Learning rules from highly unbalanced data sets. Paper presented at:Data Mining,2004. ICDM'04. Fourth IEEE International Conference on.
    Zhang, T., Zhang, H., Chen, K., Shen, S., Ruan, J., and Kurgan, L. (2008). Accurate sequence-based prediction of catalytic residues. Bioinformatics 24,2329-2338.
    Zhao, W., Xu, M., Liang, Z., Ding, B., Niu, L., Liu, H., and Teng, M. (2011a). Structure-based de novo prediction of zinc-binding sites in proteins of unknown function. Bioinformatics 27,1262-1268.
    Zhao, X., Li, X., Ma, Z., and Yin, M. (2011b). Prediction of lysine ubiquitylation with ensemble classifier and feature selection. Int. J. Mol. Sci.12,8347-8361.
    Zheng, C., Wang, M., Takemoto, K., Akutsu, T., Zhang, Z., and Song, J. (2012). An integrative computational framework based on a two-step random forest algorithm improves prediction of zinc-binding sites in proteins. PLoS One 7, e49716.
    Zhou, F.F., Xue, Y, Chen, G.L., and Yao, X. (2004). GPS:a novel group-based phosphorylation predicting and scoring method. Biochem. Biophys. Res. Commun.325,1443-1448.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700