蛋白质翻译后修饰及其相互作用预测方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蛋白质翻译后修饰和蛋白质间的相互作用是蛋白质发挥正常生物学功能的基础,在生命体中具有十分重要的作用。由于实验研究手段欠佳和相关数据的零散不齐,尽管有350多种蛋白质翻译后修饰已经被实验所证实,仅有很少的几种蛋白质翻译后修饰被较好的研究。通过传统的实验方法鉴定蛋白质翻译后修饰位点既费时又费力,并且酶反应的优化又是一个极为耗时的过程,这些因素严重制约了相关研究的进展速度。因此,一些基于计算的方法逐渐被提出来,这些方法既可以高效地、准确地预测蛋白质的翻译后修饰位点,又可以对进一步的体内或体外的实验验证提供一些线索。而对蛋白质间相互作用的研究,将有助于从系统角度深入理解各种生物学过程,为进一步探索生物体疾病的发生机制提供可靠的数据来源,同时还可以为寻找新的药物靶标,新药研发开辟道路。本文针对蛋白质翻译后修饰位点及蛋白质间相互作用的预测方法进行了研究,主要成果如下:
     (1)提出了一种基于集成学习的蛋白质泛素化位点预测方法,首先采用四种类型的特征,来编码每一个赖氨酸位点及其相邻位点的氨基酸;接下来,为了减少计算复杂度并提高预测方法的准确度,采用了一种有效的特征选择方法筛选最优的特征子集;最后,利用筛选出来的最优特征子集建立了一个集成分类器,并对最优特征子集中进行了特征分析。与其它方法预测方法在公共数据集上的对比实验表明该集成分类器良好的预测性能。
     (2)通过提取有效的pupylation底物信息,建立了一个新的pupylation位点分类器。首先,对训练集中每个样本序列,提取五种类型的信息并对pupylation位点本身和它邻近的残基进行编码;接下来,对于这五种特征构成的集合,应用最大相关最小冗余(mRMR)和增量的特征选择(IFS)方法找出最优的特征子集;最后,基于最优特征子集,用最近邻算法(NNA)建模并预测pupylation位点,其留一法测试的预测准确率可以达到70.93%。通过对最优特征子集的生物学分析,研究发现进化信息和物理化学/生物化学属性在pupylation位点识别中发挥了极其重要的作用,位点7,10和11对pupylation位点识别的贡献最大。本文的工作结果表明:mRMR与IFS两种特征选择方法的结合能够有效地对生物数据集进行特征筛选,在此基础上的建模,既可以得到满意的预测性能,也容易发现所选特征的生物学意义。
     (3)首次将一种新的编码方式,k-spaced氨基酸对构成编码(CKSAAP),应用于预测磷酸化位点预测问题,并提高了磷酸化位点的预测准确度,通过与PPRED、DISPHOS和NetPhos这三种预测工具的比较,本章构建的CKSAAP_PhSite预测工具能够更加准确地预测磷酸化位点。CKSAAP_PhSite对丝氨酸磷酸化位点预测的敏感度是84.81%,特异度是86.07%,准确度是85.43%;对苏氨酸磷酸化位点预测的敏感度是78.59%,特异度是82.26%,准确度是80.31%;对酪氨酸磷酸化位点预测的敏感度是74.44%,特异度是78.03%,准确度是76.21%。实验结果验证了该方法的有效性和实用性,相应的特征分析表明CKSAAP编码方式能够有效地提取出磷酸化位点附近序列模式。基于该研究内容,建立了相应的在线预测工具。
     (4)提出了一种新的基于扩增的Chou’s伪氨基酸构成编码的蛋白质间的相互作用预测方法,首先采用了三组描述符来编码每一个蛋白质交互对;然后利用PCA技术对编码后的930个序列特征进行降维,经PCA降维后得到的特征子集不但包含很少的特征,而且还尽可能多地保留了原始特征集合的信息;最后,通过将降维后的特征子集作为输入向量,建立了一个基于支持向量机的蛋白质相互作用预测模型,并在黑腹果蝇数据集和幽门螺杆菌数据集上与其它预测方法进行比较,实验结果表明,本文提出的预测模型能够更加准确地预测蛋白质间的相互作用。
As basics of protein’s normal biological function, post-translational modifications andprotein-protein interactions play a very important role in the life body. Due to the poorexperimental methods and the lack of sufficient data for analyses, although more than350kinds of protein post-translational modifications have been discovered, only a few of themhave been well-characterized. Conventional experimental identification of proteinpost-translational modifications sites is laborious and expensive, and the optimization ofenzymatic reaction is also a very time consuming process, these factors severely limit thedevelopment speed of the related researches. Therefore, some computational methods havebeen proposed and applied with varying success. These methods not only can efficiently,accurately predict protein post-translational modification sites, but also can provide someclues for further in vivo or in vitro confirmation. The research of protein-protein interactionswill help related researchers in-depth understand of various biological processes from thesystem point, meanwhile, it could provide a reliable data source for further exploring themechanism of zoonotic diseases, and point out the direction of new drug research anddevelopment. In this paper, we do some researches on protein post-translation modificationsites and protein-protein interactions. The main results can be summarized as follows:
     (1) We propose an ensemble computational method to predict lysine ubiquitylation sites.Firstly, four kinds of useful features are used to describe each amino acid of lysine site and itssurrounding sites. Secondly, in order to reduce the computational complexity and enhance theoverall accuracy of the predictor, an effective feature selection method is used to select someoptimal feature subsets. Finally, the ensemble classifier is established using the optimalfeature subsets as input, and compared with the other predictors. Experimental results haveshown that our method is very promising to predict lysine ubiquitylation sites.
     (2) Based on the effective pupylation substrate information, we construct a novelpredictor to predict the pupylation sites. Firstly, we extract five kinds of features for eachprotein sequence in the training dataset and use these features to encode each amino acid ofpupylation site and its surrounding sites. Then, the maximum relevance minimum redundancy(mRMR) and incremental feature selection (IFS) methods are made on the feature set to selectthe optimal feature subset. Finally, the predictor model is built based on the optimal featuresubset with the assistant of nearest neighbor algorithm (NNA), and the accuracy is70.93%bythe jackknife cross-validation. Through the biological analysis of the optimal feature subset,we find that evolutionary information and physicochemical/biochemical properties play important role in the recognition of pupylation sites, and sites7,10and11contribute the mostto the determination of pupylation sites. The experimental results indicate that thecombination of mRMR and IFS could effectively select the optimal feature subset of thebiological datasets. We can obtain satisfactory prediction performance and find the biologysignification of the selected features using the model constructed on the optimal featuresubset.
     (3) The composition of k-spaced amino acid pairs (CKSAAP) is first used to predictprotein phosphorylation sites, and enhanced the prediction accuracy of phosphorylation sites.When benchmarked against PPRED, DISPHOS and NetPhos, the performance ofCKSAAP_PhSite is measured with a sensitivity of84.815%, a specificity of86.07%,and anaccuracy of85.43%for serine, a sensitivity of78.59%, a specificity of82.26%and anaccuracy of80.31%for threonine as well as a sensitivity of74.44%, a specificity of78.03%and an accuracy of76.21%for tyrosine. Experimental results indicate that the proposedapproach is effective and practical. Based on the model of predicting protein phosphorylationsites, a corresponding online web server is established.
     (4) We propose a new augmented Chou’s pseudo amino acid composition to predictprotein-protein interactions. Firstly, three groups of descriptors are used to encode eachinteractive pair. As a result, each interactive pair is represented by930features. Then theprincipal component analysis (PCA) is utilized for dimensionality reduction. The resultingfeature subset contains few features, meanwhile, retains as much information of the whole setas possible. Finally, a protein-protein interaction prediction model is established based on theresulting feature subset, and compared with the other predictors on the Drosophilamelanogaster and the Helicobater pylori datasets. Experimental results have shown that ourmethod is very promising to predict protein-protein interactions.
引文
[1] Kumar K. Introduction to bioinformatics[J]. Journal of the Royal Statistical Society Series a-Statistics inSociety,2008,171:761-762.
    [2]孙啸,陆祖宏,谢建明.生物信息学基础[M].北京:清华大学出版社,2005.
    [3]张阳德.生物信息学[M].北京:科学出版社,2005.
    [4] Mitchell TM. Machine Learning[M]. New York: McGraw-Hill,1997.
    [5]陈凯,朱钰.机器学习机器相关算法综述[J].统计与信息论坛,2007,22:105-112.
    [6]周海廷.机器学习与生物信息学[J].信息与控制,2003,32:352-357.
    [7]张晓龙,杨艳霞.机器学习在生物信息学中的应用[J].武汉科技大学学报(自然科学版),2005,28:201-204.
    [8] Bansal AK. Bioinformatics in microbial biotechnology-a mini review[J]. Microbial Cell Factories,2005,4:19.
    [9] Smith TF, Waterman MS. Identification of common molecular subsequences[J]. Journal of MolecularBiology,1981,147(1):195-197.
    [10] Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acidsequence of two proteins[J]. Journal of Molecular Biology,1970,48(3):443-453.
    [11] Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool[J]. Journal of Molecular Biology,1990,215(3):403-410.
    [12] Lipman DJ, Pearson WR. Rapid and sensitive protein similarity searches[J]. Science,1985,227(4693):1435-1441.
    [13] Bulyk ML. Computational prediction of transcription-factor binding site locations[J]. Genome Biology,2003,5(1):201.
    [14] Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and classprediction by gene expression monitoring[J]. Science,1999,286(5439):531-537.
    [15] Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vectormachine[J]. Machine Learning,2002,46(1):389-422.
    [16] Eisen MB, Spellman PT, Brown PO, et al. Cluster analysis and display of genome-wide expressionpatterns[J]. Proceedings of the National Academy of Sciences,1998,95(25):14863-14868.
    [17]李虹,谢鹭.预测和鉴定蛋白质翻译后修饰的生物信息方法[J].现代生物医学进展,2008,8(9):1729-1735.
    [18]薛宇.蛋白质翻译后修饰和细胞信号通路的生物信息学[D]:[博士学位论文].安徽:中国科学技术大学,2006.
    [19]史明光.蛋白质相互作用预测方法的研究[D]:[博士学位论文].安徽:中国科学技术大学,2009.
    [20]夏俊峰.蛋白质相互作用及其结合面热点残基的预测方法研究[D]:[博士学位论文].安徽:中国科学技术大学,2010.
    [21]王杰.数学建模方法在药物化学及大鼠大脑新陈代谢中的应用研究[D]:[博士学位论文].甘肃:兰州大学,2009.
    [22]陈凯,朱钰.机器学习及其相关算法综述[J].统计与信息论坛,2007,22:105-112.
    [23]史钟植.知识发现[M].北京:清华大学出版社,2002.
    [24]李航.统计学习方法[M].北京:清华大学出版社,2012.
    [25]钮冰.基于集成学习算法的若干生物信息学问题研究[D]:[博士学位论文].上海:上海大学,2008.
    [26] Hong JR. AE1: Extension matrix approximate method for the general covering problem[J]. InternationalJournal of Computer and Information Science,1985,14:421-437.
    [27]张华伟,王明文,甘丽新.基于随机森林的文本分类模型研究[J].山东大学学报(理学版),2006,41:139-143.
    [28] Breiman L. Random Forests[J]. Machine Learning,2001,45(1):5-32.
    [29]方匡南,吴见彬,朱建平,谢邦昌.随机森林方法研究综述[J].统计与信息论坛,2011,26(3):32-38.
    [30] Sikic M, Tomic S, Vlahovicek K. Prediction of protein-protein interaction sites in sequences and3Dstructures by random forests[J]. PLoS Computational Biology,2009,5: e1000278.
    [31] Wu J, Liu H, Duan X, Ding Y, Wu H, Bai Y, Sun X. Prediction of DNA-binding residues in proteins fromamino acid sequences using a random forest model with a hybrid feature[J]. Bioinformatics,2009,25:30-35.
    [32] Ma X, Guo J, Wu J, Liu H, Yu J, Xie J, Sun X. Prediction of RNA-binding residues in proteins fromprimary sequence using an enriched random forest model with a novel hybrid feature[J]. Proteins,2011,79:1230-1239.
    [33]孙岩,吕世聘,王秀坤等.基于结构学习的KNN分类算法[J].计算机科学,2007,34(12):184-187.
    [34]桑应宾.基于k近邻的分类算法研究[D]:[硕士学位论文].重庆:重庆大学,2009.
    [35]范明,柴玉梅等.统计学习基础——数据挖掘、推理与预测[M].北京:电子工业出版社,2004.
    [36] Vapnik VN. Statistical Learning Theory.许建华,张学工,译.统计学习理论[M].北京:电子工业出版社,2004.
    [37]孙永奎.基于支持向量机的模拟电路故障诊断方法研究[D]:[博士学位论文].四川:电子科技大学,2009.
    [38] Cortes C, Vapnik V. Support Vector Networks[J]. Machine Learning,1995,20(3):273-297.
    [39]邝亨年.支持向量机及其应用研究综述[J].计算机工程,2004,30(10):6-9.
    [40]徐海洋.基于支持向量机方法的图像分割与目标分类[D]:[博士学位论文].湖北:华中科技大学,2005.
    [41] Burges CJC. A tutorial on Support Vector Machines for pattern recoginition[J]. Data Mining andKnowledge discovery,1998,2:121-167.
    [42]刘涛.支持向量机方法在T细胞表位预测中的应用[D]:[博士学位论文].辽宁:大连理工大学,2009.
    [43] Hansen LK, Slamon P. Neural network ensembles[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence,1990,12:933-1001.
    [44] Schapire RE. The strength of weak learnability[J]. Machine Learning,1990,5:197-227.
    [45] Pickart CM. Ubiquitin enters the new millennium[J]. Molecular Cell,2001,8:499-504.
    [46] Aguilar RC, Wendland B. Ubiquitin: Not just for proteasomes anymore[J]. Current Opinion in Cell Biology,2003,15:184-190.
    [47] Saghatelian A, Cravatt BF. Assignment of protein function in the postgenomic era[J]. Nature ChemicalBiology,2005,1:130-142.
    [48] Welchman RL, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals[J].Naturen Reviews Molecular Cell Biology,2005,6:599-609.
    [49] Taghavi P, Lohuizen M. Developmental biology: two paths to silence merge[J]. Nature,2006,439:794-795.
    [50] Schwartz AL, Ciechanover A. The ubiquitin-proteasome pathway and pathogenesis of human diseases[J].Annual Review of Medicine,1999,50:57-74.
    [51] Haglund K, Dikic I. Ubiquitylation and cell signaling[J]. The EMBO Journal,2005,24:3353-3359.
    [52] Peng J, Schwartz D, Elias JE, Thoreen CC, Cheng D, et al. A proteomics approach to understanding proteinubuiquitination[J]. Nature Biotechnology,2003,21:921-926.
    [53] Hitchcock AL, Auld K, Gygi SP, Silver PA. A subset of membrance-associated proteins is ubiquitinated inresponse to mutations in the endoplasmic reticulum degradation machinery[J]. Proceedings of the NationalAcademy of Sciences of the United States of America,2003,100:12735-12740.
    [54] Chernorudskiy AL, Garcia A, Eremin EV, Shorina AS, Kondratieva EV, et al. UbiProt: a databse ofubiquitylated proteins[J]. BMC Bioinformatica,2007,8:126.
    [55] Lee WC, Lee M, Jung JW, Kim KP, Kim D. SCUD: Saccharomyces cerevisiae ubiquitination database[J].BMC Genomics,2008,9:440.
    [56] Li H, Xing X, Ding G, Li Q, Wang C, et al. SysPTM: a systematic resource for proteomic research onpost-translational modifications[J]. Molecular&Cellular Proteomics,2009,8:1839-1849.
    [57] Tung CW, Ho SY. Computational identigicaiton of ubiquitylation sites from protein sequences[J]. BMCBioinformatics.2008,9:310-324.
    [58] Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW, Goebl MG, Iakoucheva LM.Identification, analysis, and prediction of protein ubiquitination sites[J]. Proteins,2010,78:365-380.
    [59] Cai Y, Huang T, Hu L, Shi X, Xie L, Li Y. Prediction of lysine ubiquitination with mRMR feature selectionand analysis[J]. Amino Acids,2011,17:273-281.
    [60] Roy S, Martinez AD, Platero H, Lane T, Werner-Washburne M. Exploiting amino acid composition forpredicting protein-protein interactions[J]. PLoS One,2009,4(11): e7813.
    [61] Jones DT. Improving the accuracy of transmembrane protein topology prediction using evolutionaryinformation[J]. Bioinformatics,2007,23:538-544.
    [62] Kaur H, Raghava GP. A neural network method for prediction of beta-turn types in proteins usingevolutionary information[J]. Bioinformatics,2004,20:2751-2758.
    [63] Atchey WR, Zhao J, Fernandes AD, Druke T. Solving the protein sequence metric problem[J]. Proceedingsof the National Academy of Sciences of the United States of America,2005,102:6395-6400.
    [64] Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z. Length-dependent prediction of protein intrinsicdisorder[J]. BMC Bioinformatics,2006,7:208-216.
    [65] Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotidesequences[J]. Bioinformatics,2006,22:1658-1659.
    [66] Chou KC. Prediction of signal peptides using scaled window[J]. Peptides,2001,22:1973-1979.
    [67] Chou KC, Shen HB. Signal-CF: a subsite-coupled and window-fusing approach for predicting signalpeptides[J]. Biochemical and Biophysical Research Communications,2007,357:633-640.
    [68] Pugalenthi G, Tang K, Suganthan PN, Archunan G, Sowdhamini R. A machine learning approach for theidentification of odorant binding proteins from sequence-derived properties[J]. BMC Bioinformatics,2007,19:351-362.
    [69] Altschul SF, Madden TL, Schaffer AA, Zhang J, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST:A new generation of protein database search programs[J]. Nucleic Acids Research,1997,25:3389-3402.
    [70]刘太岗.机器学习方法在生物信息学中的应用[D]:[博士学位论文].辽宁:大连理工大学,2010.
    [71] Wright PE, Dyson HJ. Intrinsically unstructured proteins: Reassessing the protein structure-functionparadigm[J]. Journal of Molecular Biology,1999,293:321-331.
    [72] Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. Intrinsic disorder and proteinfunction[J]. Biochemistry,2002,41:6573-6582.
    [73] Liu J, Tan H, Rost B. Loopy proteins appear conserved in evolution[J]. Journal of Molecular Biology,2002,322:53-64.
    [74] Tompa P. Intrinsically unstructured proteins[J]. Trends in Biochemical Sciences,2002,27:527-533.
    [75] Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovuc Z. Length-dependent prediction of proteinintrinsic disorder[J]. BMC Bioinformatics,2006,7:208-217.
    [76] Bordoli L, Kiefer F, Schwede T. Assessment of disorder prediction in CASP7[J]. Proteins,2007,69:129-136.
    [77] He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK. Predicting intrinsic disorder in proteins: anoverview[J]. Cell Research,2009,19:929-949.
    [78] Matsumoto M, Hatakeyama S, Oyamada K, Oda Y, Nishimura T, Nakayama KI. Large-scale analysis of thehuman ubiquitin-related proteome[J]. Proteomics,2005,5:4145-4151.
    [79] Peng J, Schwartz D, Elias JE, Thoreen CC, Cheng D, Marsischky G, Roelofs J, Finley D, Gygi SP. Aproteomics approach to understanding protein ubiquitination[J]. Nature Biotechnology,2003,21:921-926.
    [80] Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acidindex database, process report[J]. Nucleic Acids Research,2008,36:202-205.
    [81] Levi D, Ullman S. Learning to classify by ongoing feature selection[J]. Image and Vision Computing,2010,28:715-723.
    [82] Liu HW, Liu L, Zhang HJ. Ensemble gene selection for cancer classification[J]. Pattern Recognition,2010,43:2763-2772.
    [83] Cover TM, Thomas JA. Elements of Information Theory[M]. New York: Wiley,2011.
    [84]刘华文.基于信息熵的特征选择算法研究[D]:[博士学位论文].吉林:吉林大学,2010.
    [85] Fleuret F. Fast Binary Feature Selection with Conditional Mutual Information[J]. Journal of MachineLearning Research,2004,5:1531-1555.
    [86] Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy[J]. Journal of MachineLearning Research,2004,5:1205-1224.
    [87]陈冰.多分类器集成算法研究[D]:[硕士学位论文].山东:山东师范大学,2009.
    [88] Liu TL, Zheng XQ, Wang J. Prediction of protein structural class for low-similarity sequences usingsupport vector machine and PSI-BLAST profile[J]. Biochimie,2010,92:1330-1334.
    [89] Chou KC, Shen HB. Recent progress in protein subcellular location prediction[J]. Analytical Biochemistry.2007,370:1-16.
    [90] Zheng X, Liu T, Wang J. A complexity-based method for predicting protein subcellular location[J]. AminoAcids,2009,37:427-433.
    [91] Shen HB, Chou KC. Predicting protein subnuclear location with optimized evidence-theoretic K-nearestclassifier and pseudo amino acid composition[J]. Biochemical and Biophysical Research Communications,2005,337:752-756.
    [92] Peng H, Long F, Ding C. Feature selection based on mutual information: Criteria of max-dependency,max-relevance, and min-redundancy[J]. IEEE transaction on Pattern Analysis and Machine Intelligence,2005,27:1226-1238.
    [93] Wagner SA, Beli P, Weinert BT, Nielsen ML, Cox J, Mann M, Choudhary C. A proteome-wide quantitativesurvey of in vivo ubiquitylation sites reveals widespread regulatory roles[J].Molecular and Cellular Proteomics,2011,10: M111013284.
    [94] Kim W, Bennett EJ, et al. Systematic and quantitative assessment of the Ubiquitin-modified proteome[J].Molecular Cell,2011,44:325-340.
    [95] Welchman RL, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals[J].Nature Reviews Molecular Cell Biology,2005,6:599-609.
    [96] Hershko A, Ciechanover A. The ubiquitin system [J]. Annual Review of Biochemistry,1998,67:425-479.
    [97]谭永聪,王启军,赵国屏,姚玉峰.原核生物的蛋白质翻译后修饰[J].生物化学和生物物理进展,2011,38(3):197-203.
    [98]马倩.蛋白质泛素化的生物学分析[D]:[硕士学位论文].安徽:中国科学技术大学,2011.
    [99] Tung CW. PupDB: a database of pupylated proteins[J]. BMC Bioinformatics,2012,13:40.
    [100] Striebel F, Imkamp F, Sutter M, Steiner M, Mamedov A, Weber-Ban E. Bacterial ubiquitin-like modifierPup is deamidated and conjugated to substrates by distinct but homologous enzymes[J]. Nature Structural andMolecular Biology,2009,16(6):647-651.
    [101] Guth E, Thommen M, Weber-Ban E. Mycobacterial ubiquitin-like protein ligase PafA follows a two-stepreaction pathway with a phosphorylated pup intermediate[J]. Journal of Biological Chemistry,2011,286(6):4412-4419.
    [102] Salgame P. PUPylation provides the punch as Mycobacterium tuberculosis battles the host macrophage[J].Cell Host and Microbe,2008,4(5):415-416.
    [103] Pearce MJ, Mintseris J. Ubiquitin-like protein involved in the proteasome pathway of Mycobacteriumtuberculosis[J]. Science,2008,322:1104-1107.
    [104] Festa RA, McAllister F. Prokaryotic ubiquitin-like protein (Pup) proteome of Mycobacteriumtuberculosis[J]. PLoS one,2010,5: e8589.
    [105] Burns KE, Darwin KH. Pupylation versus ubiquitylation: tagging for proteasome-dependentdegradation[J]. Cellular Microbiology,2010,12:424-431.
    [106] Poulsen C, Akhter Y, et al. Proteome-wide identification of mycobacterial pupylation targets[J]. MolecularSystem Biology,2010,6:386.
    [107] Liu ZX, Ma Q, Cao J, Gao XJ, Ren J, Xue Y. GPS_PUP: computational prediction of pupylation sites inprokaryotic proteins[J]. Molecular BioSystems,2011,7(10):2737-2740.
    [108] Tung CW. PupDB: a database of pupylated proteins[J]. BMC Bioinformatics,2012,13:40.
    [109] Chou KC, Shen HB. Large-scale plant protein subcellular location prediction[J]. Journal of CellularBiochemistry,2007,100:665-678.
    [110] Mcguffin LJ, Bryson K, Jones DT. The PSIPRED protein structure prediction server[J]. Bioinformatics,2000,16:404-405.
    [111] Pang CN, Hayen A, Wilkins MR. Surface accessibility of protein-translational modifications[J]. Journal ofProteome Research,2007,6:1833-1845.
    [112] Chou KC, Shen HB. Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers[J]. Journal of Proteome Research,2006,5(8):1888-1897.
    [113] Li BQ, Hu LL, Niu S, Cai YD, Chou KC. Predict and analyze S-nitrosylation modification sites with themRMR and IFS approaches[J]. Journal of Proteomics,2012,75:1654-1665.
    [114] Xia XY, Ge M, Wang ZX, Pan XM. Accurate Prediction of Protein Structural Class[J]. PLoS ONE,2012,7: e37653.
    [115] Uddin S, Lekmine F, Sassano A, Rui H, Fish EN, Platanias LC. Role of Stat5in type I interferon-signalingand transcriptional regulation[J]. Biochemical and Biophysical Research Communications,2003,308:325-330.
    [116] Wood CD, Tina MT, Guadalupe S, Roger AD, Mercedes R. Nuclear localization of p38MAPK in responseto DNA damage[J]. International Journal of Biological Sciences,2009,5:428-437.
    [117] Bu YH, He YL, Zhou HD, Liu W, Peng D, Tang AG, Tang LL, Xie H, Huang QX, Luo XH, Liao EY.Insulin receptor substrate I regulates the cellular differentiation and the matrix metallopeptidase expression ofpreosteoblastic cells[J]. Journal of Endocrinology,2010,206:271-277.
    [118] Kim SH, Lee CE. Counter-regulation mechanism of IL-4and IFN-α signal transduction through cytosolicretention of the pY-STAT6: pY-STAT2:p48complex[J]. European Journal of Immunology,2011,41:461-472.
    [119] Wang YY, Chen SM, Li H. Hydrogen peroxide stress stimulates phosphorylation of FoxO1in rat aorticendothelial cells[J]. Acta Pharmacologica Sinica,2010,31:160-164.
    [120] Ressurreico M, Rollinson D, Emery AM, Walker AJ. A role for p38MAPK in the regulation of ciliarymotion in a eukaryote[J]. BMC Cell Biology,2011,12:6.
    [121] Pinna L, Ruzzene M. How do protein kinases recognize their sunstrate?[J]. BBA-Molecular Cell Research,1996,1314:191-225.
    [122] Meier R, Alessi DR, Cron P, Andjelkovic M, Hemmings BA. Mitogenic activation, phosphorylation, andnuclear translation of protein kinase Bbeta[J]. The Journal of Biological Chemistry,1997,272:30491-30497.
    [123] Jensen ON. Modification-specific proteomics: characterization of post-translational modifications by massspectrometry[J]. Current Opinion in Chemical Biology,2004,8:33-41.
    [124] Diella F, Cameron S, Gemund C, Linding R, Via A, Kuster B, Sicheritz-Ponten T, Blom N, Gibson TJ.Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins[J]. BMCBioinformatics,2004,5:79.
    [125] Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M. PHOSIDA (phosphorylation sitedatabase): management, structural and evolutionary investigation, and prediction of phosphorysites[J]. GenomeBiology,2007,8: R250.
    [126] Heazlewood JL, Durek P, Hummel J, Selbig J, Weckwerth W, Walther D, Schulze WX. PhosPhAt: adatabse of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor[J].Nucleic Acids Research,2008,36: D1015-D1021.
    [127] Hornbeck PV, Chabra I, Kornhauser JM, Skrzypek E, Zhang B. Phosphosite: a bioinformatics resourcededicated to physiological protein phosphorylation[J]. Proteomics,2004,4:1551-1561.
    [128] Boersema PJ, Mohammed S, Heck AJ. Phosphopeptide fragmentation and analysis by massspectrometry[J]. Journal of Mass Spectrometry,2009,44:861-878.
    [129] Huang JH, Cao DS, Yan J, Xu QS, Hu QN, Liang YZ. Using core hydrophobicity to identifyphosphorylation sites of human G protein-coupled receptors[J]. Biochimie,2012,94:1697-1704.
    [130] Huang H, Lee T, Tzeng S, Horing J. KinasePhos: a web tool for identifying protein kinase-specificphosphorylation sites[J]. Nucleic Acids Research,2005,33: W226.
    [131] Xue Y, Li A, Wang L, Feng H, Yao X. PPSP: prediction of PK-specific phosphorylation site withBayesian decision theory[J]. BMC Bioinformatics,2006,7:163.
    [132] Blom N, Sicheritz-ponten T, Gupta R, Gammeltoft S, Brunak S. Prediction of post-translationalglycosylational and phosphorylation of proteins from the amino acid sequences[J]. Proteomics,2004,4:1633-1649.
    [133] Xue Y, Ren J, Gao X, Jin C, Wen L, Yan X. GPS2.0, a tool to predict kinase-specific phosphorylationsites in hierarchy[J]. Molecular and Cellular Proteomics,2008,7:1598.
    [134] Ashis KB, Nasimul N, Abdur RS. Machine learning approach to predict protein phosphorylation sites byincorporating evolutionary information[J]. BMC Bioinformatics,2010,11:273.
    [135] Lakoucheva L, Radivojac P, Brown C, Oconnor T, Sikes J, Obradovic Z, Dunker A. The importance ofintrinsic disorder for protein phosphorylation[J]. Nucleic Acids Research,2004,32:1037.
    [136] Obenauer J, Cantley L, Yaffe M. Scansite2.0: proteome-wide prediction of cell signaling interactionsusing short sequence motifs[J]. Nucleic Acids Research,2003,31:3635-3641.
    [137] Blom N, Gammetltoft S, Brunak S. Sequence and structure-based prediction of eukaryotic proteinphosphorylation sites[J]. Journal of Molecular Biology,1999,294:1351-1362.
    [138] Plewcznski D, Tkacz A, Wyrwicz L, Rychlewski L. AutoMotif server: prediction of single residuepost-translational modifications in proteins[J]. Bioinformatics,2005,21:2525.
    [139] Trost B, Kusalik A. Computational prediction of eukaryotic phosphorylation sites[J]. Bioinformatics,2011,27:2927-2935.
    [140] Xue Y, Gao X, Cao J, Liu Z, Jin C, Wen L, Yan X, Ren J.A summary of computational resources forprotein phosphorylation[J]. Current Protein and Peptide Science,2010,11:485-496.
    [141] Chen K, Kurgan LA, Ruan J. Prediction of flexible/rigid regions from protein sequences using k-spacedamino acid pairs[J]. BMC Structural Biology,2007,7:25.
    [142] Chen K, Kurgan LA, Rahbari M. Prediction of protein crystallization using collocation of amino acidpairs[J]. Biochemical and Biophysical Research Communications,2007,355:764-769.
    [143] Chen Z, Chen YZ, Wang XF, Wang C, Yan RX, Zhang Z. Prediction of Ubiquitination Sites by Using theComposition of k-spaced amino acid pairs[J]. PLoS One,2011,6: e22930.
    [144] Hu LL, Li Z, Wang K, Niu S, Shi XH, Cai YD, Li HP. Prediction and Analysis of protein Methylarginineand Methyllysine based on Multisequence Features[J]. Biopolymers,2011,96:763-771.
    [145] Zhao XW, Li XT, Ma ZQ, Yin MH. Prediction of Lysine Ubiquitylation with Ensemble Classifier andFeature Selection[J]. International Journal of Molecular Sciences,2011,12:8347-8361.
    [146] Xue Y, Liu ZX, Gao XJ, Jin CJ, Wen LP, Yao XB, Ren J. GPS-SNO: Computational Prediction of ProteinS-Nitrosylation Sites with a Modified GPS algorithm[J]. PLoS One,2010,5: e11290.
    [147] Wang XB, Wu LY, Wang YC, Deng NY. Prediction of palmitoylation sites using the composition ofk-spaced amino acid pairs[J]. Protein Engineering Design and Selection,2009,22:707-712.
    [148] Chen YZ, Tang YR, Sheng ZY, Zhang ZD. Prediction of mucin-type O-glycosylation sites using thecomposition of k-spaced amino acid pairs[J]. BMC bioinformatics,2008,9:101.
    [149] Chen K, Jiang Y, Du L, Kurgan L. Prediction of integral membrane protein type by collocatedhydrophobic amino acid pairs[J]. Journal of Computational Chemistry,2009,30:163-172.
    [150] Chou KC, Zhang CT. Review: Prediction of protein structural classes[J]. Critical Reviews in Biochemistryand Molecular Biology,1995,30:275-349.
    [151] Chou KC, Shen HB. Cell-PLoc: A package of Web servers for predicting subcellular localization ofproteins in various organisms[J]. Nature Protocols,2008,3:153-162.
    [152] Gribskov M, Robinson NL. Use of receiver operating characteristic (ROC) analysis to evaluate sequencematching[J]. Journal of Computational Chemistry,1996,20:25-33.
    [153] Russell RB, Gibson TJ. A careful disorderliness in the proteome: sites for interaction and targets for futuretherapies[J]. FEBS Letters,2008,582:1271-1275.
    [154] Uversky VN, Dunker AK. Biochemistry. Controlled chaos[J]. Science,2008,322:1340-1341.
    [155] Xie H, Vucetic S, Iakoucheva LM, Oldfield CJ, Dunker AK. Functional anthology of intrinsic disorder,ligands, post-translational modifications, and disease associated with intrinsically disordered proteins[J]. Journalof Proteome Research,2007,6:1917-1932.
    [156] Gsponer J, Futschik ME, Teichmann SA, Babu MM. Tight regulation of unstructured proteins: fromtranscript synthesis to protein degradation[J]. Science,2008,322:1365-1368.
    [157] Neduva V, Linding R, Su-Angrand I, Stark A, Masi F. Systematic discovery of new recognition peptidesmediating protein interaction network[J]. PLoS Biology,2005,3: e405.
    [158] Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis toexplore the yeast protein interactome[J]. Proceedings of the National Academy of Sciences,2001,98:4569-4574.
    [159] Ho Y, Gruhler A, Heilbut A, Bader G.D, Moore L, Adams S, Millar A, Taylor P, Bennett K, Boutilier K.Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry[J]. Nature,2002,415:180-183.
    [160] Gavin AC, Boche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick J, Michon A, Cruciat C.Functional organization of the yeast proteome by systematic analysis of protein complexes[J]. Nature,2002,415:141-147.
    [161] Zhu H, Bilgin M, Bangham R, Hall D, Casamayor A, Bertone P, Lan N, Jansen R, Bidlingmaier S, HoufekT, Mitchell T, Miller P, Dean RA, Gerstein M, Snyder M. Global analysis of protein activities using proteomechips[J]. Science,2001,293:2101-2105.
    [162] Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The Database of InteractingProtein[J]. Nucleic Acids Research,2004,32: D449-D451.
    [163] Bader GD, Betel D, Hogue CWV. BIND: the Biomolecular Interaction Network Database[J]. NucleicAcids Research,2003,31:248-250.
    [164] Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V. MPact: theMIPS protein interaction resource on yeast[J]. Nucleic Acids Research,2006,34: D436-D441.
    [165] Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M,Friedrichsen A, Huntley R. IntAct-open source resource for molecular interaction data[J]. Nucleic AcidsResearch,2007,35: D561-D565.
    [166] Mering C, Krause R, Snel B, Cornell M, Oliver SG., Fields S, Bork P. Comparative assessment oflarge-scale data sets of protein-protein interactions[J]. Nature,2002,417:399-403.
    [167] Ogmen U, Keskin O, Aytuna AS, Nussinov R, Gursoy A. PRISM: protein interactions by structuralmatching[J]. Nucleic Acids Research,2005,33: W331-W336.
    [168] Marcotte EM. Detecting protein function and protein–protein interactions from genome sequences[J].Science,1999,285:751-753.
    [169] Singhal M, Resat H. A domain-based approach to predict protein-protein interactions[J]. BMCBioinformatics,2007,8:199-206.
    [170] Espadaler J, Romero-Isart O, Jackson RM, Oliva B. Prediction of protein-protein interactions using distantconservation of sequence patterns and structure relationships[J]. Bioinformatics,2005,21:3360-3368.
    [171] Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D. A combined algorithm forgenome-wide prediction of protein function[J]. Nature,1999,402:83-86.
    [172] Martin S, Roe D, Faulon JL. Predicting protein-protein interactions from primary structure[J].Bioinformatics,2001,17:455-460.
    [173] Chou KC, Cai YD. Predicting protein-protein interactions from sequences in a hybridization space[J].Journal of Proteome Research,2006,5:316-322.
    [174] Shen J, Zhang J, Luo X, Zhu W, Jiang H. Predicting protein-protein interactions based only on sequencesinformation[J]. Proceedings of the National Academy of Sciences, USA,2007,104:4337-4441.
    [175] Bock JR, Gough DA. Whole-proteome interaction mining[J]. Bioinformatics,2003,19:125-134.
    [176] Guo Y, Yu L, Wen Z, Li M. Using support vector machine combined with auto covariance to predictprotein-protein interactions from protein sequences[J]. Nucleic Acids Research,2008,36:3025-3030.
    [177] Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition (50thAnniversary Year Review)[J]. Journal of Theoretical Biology,2011,273:236-247.
    [178] Peng Z, Yang J, Chen X. An improved classification of G-protein-coupled receptors usingsequence-derived features[J]. BMC Bioinformatics,2010,11:420.
    [179] Xiao X, Wang P, Chou KC. Predicting protein structural classes with pseudo amino acid composition: anapproach using geometric moments of cellular automaton image[J]. Journal of Theoretical Biology,2008,254:691-696.
    [180] Chou KC, Cai YD. A new hybrid approach to predict subcellular localization of proteins by incorporatinggene ontology[J]. Biochemical and Biophysical Research Communications,2003,311:743-747.
    [181] Liu L, Cai YD, Lu WC, Feng KY, Peng CR, Niu B. Prediction of protein–protein interactions based onPseAA composition and hybrid feature selection[J]. Biochemical and Biophysical Research Communications,2009,380(2):318-322.
    [182] Martin S, Roe D, Faulon JL. Predicting protein-protein interactions using signature products[J].Bioinformatics,2005,21(2):218-226.
    [183] Chou KC. Prediction of protein cellular attributes using pseudo amino acid composition[J]. PROTEINS:Structure, Function, and Genetics,2001,43:246-255.
    [184] Chou KC. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect[J].Biochemical and Biophysical Research Communications,2000,278:477-483.
    [185] Cai CZ, Han LY, Ji ZL, Chen YZ. Enzyme family classification by support vector machines[J]. Proteins,2004,55(1):66-76.
    [186] Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ. SVM-Prot: web-based support vector machine software forfunctional classification of a protein from its primary sequence[J]. Nucleic Acids Research,2003,31(13):3692-3697.
    [187] Ding CHQ, Dubchak I. Multi-class protein fold recognition using support vector machines and neuralnetworks[J]. Bioinformatics,2001,17(4):349-358.
    [188] Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ. PROFEAT: A Web Server for Computing Structuraland Physicochemical Features of Proteins and Peptides from Amino Acid Sequence[J]. Nucleic Acids Research,2008,34: W32-W37.
    [189]郑享清,王见,吴莉君.主成分分析法下企业信用风险评估研究[J].财会通讯,2010,23:18-19.
    [190] Jolliffe IT. Principal Component Analysis[M]. New York: Springer,2002.
    [191] Nanni L, Lumini A. An ensemble of K-local hyperplanes for predicting protein-protein interactions[J].Bioinformatics,2006,22(10):1207-1210.
    [192] Nanni L. Fusion of classifiers for predicting protein-protein interactions[J]. Neurocomputing,2005,68(3):289-296.
    [193] Nanni L. Hyperplanes for predicting protein-protein interactions[J]. Neurocomputing,2005,69(3):257-263.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700