基于数据挖掘技术的蛋白质功能预测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类进入后基因组时代,以蛋白质组为研究对象的蛋白质组学越来越受到关注并且得到了迅速的发展。蛋白质是细胞的重要组成部分,是生命活动的执行者。蛋白质在细胞中有着十分重要的功能,包括组成器官、催化生化反应、接受与传递细胞信号、维护细胞环境等。然而,蛋白质的功能注释目前仍不完整,尤其对于高等生物有相当一部分蛋白质的功能是不明确的。用传统的实验方法去确定蛋白质的功能周期长、代价高昂,而且无法从蛋白质组这一整体层面去考虑。新兴的高通量技术产生了海量的蛋白质组学数据,使得用计算的方法来研究蛋白质的功能成为可能。本文基于数据挖掘技术,利用了高通量技术产生的大量蛋白质表达质谱数据、蛋白质氨基酸序列、蛋白质相互作用等蛋白质组学数据,针对蛋白质的功能预测这一问题进行了深入研究,具体内容如下:
     1)构建了一个崭新的禾谷镰孢菌(Fusarium graminearum)蛋白质亚细胞定位(subcellular localizations)预测模型FGsub。我们收集并整理了一个非冗余的真菌亚细胞定位信息数据集。一方面,基于蛋白质的氨基酸序列信息,通过特征提取、特征选择,使用支持向量机,结合多种特征向量,构建了一个能够预测禾谷镰孢菌蛋白质亚细胞位置的集成分类器。另一方面,用BLAST序列比对在数据集与禾谷镰孢菌蛋白质之间来查找同源蛋白,利用同源蛋白的信息对禾谷镰孢菌蛋白质亚细胞位置进行预测。对于数据不平衡的处理,我们还提出了一种新的平衡算法。该模型基于蛋白质的氨基酸序列使用了数据挖掘的多种技术对禾谷镰孢菌蛋白质亚细胞定位进行了精确的预测,丰富了禾谷镰孢菌蛋白质的功能注释,并为研究禾谷镰孢菌作为病原真菌的侵染机制提供了必要和可靠的信息。
     2)提出了一种预测蛋白质谷胱甘肽化(Protein S-Glutathionylation)位点的新颖模型。针对蛋白质翻译后修饰谷胱甘肽化的预测,首先,我们通过文本挖掘的方法建立了一个蛋白质谷胱甘肽化数据库。然后,我们基于谷胱甘肽化位点两侧的氨基酸序列信息,通过特征提取、特征选择,使用机器学习的方法构建了预测蛋白质谷胱甘肽化位点的模型。另外,我们从蛋白质的结构信息出发,利用了统计的方法对蛋白质谷胱甘肽化的机制进行了讨论。该模型可以对蛋白质谷胱甘肽化位点进行有效预测。该预测模型还能够筛选出关于蛋白质谷胱甘肽化位点的重要特征,这些特征为我们研究蛋白质谷胱甘肽化的发生和调控机制提供了有用的信息。
     3)提出了一种新的蛋白质磷酸化(Protein phosphorylation)网络构建模型。基于蛋白质表达数据、蛋白质磷酸化表达数据、蛋白质相互作用数据和已有的先验信息,提出了一种蛋白质磷酸化底物与磷酸激酶的全新概率模型。我们先构建了一个总体的磷酸化网络,然后根据蛋白质表达的组织特异性分别构建了人体三个组织的特异性磷酸化网络并筛选出了组织特异性的磷酸化关系。我们还对三个组织特异性磷酸化网络的功能进行验证,结果表明这些网络可以反映对应组织特有的生物功能,这也证明了我们构建的组织特异性磷酸化网络有相当的可靠性和生物意义。
With the coming of the post-genomic era, the proteomics that focuses oninvestigating proteins’ function has attracted a lot of attention in life science.Proteins play a critical role in life and are responsible for some very importantfunctions, such as organs constitution, the catalysis of biochemical reactions, thereception and transmission of cell signaling, the maintenance of cellenvironment, etc. However, the function of many proteins is unknown. Forexample, about half of the proteins remain uncharacterized for human being. Itis expensive and time-consuming to characterize proteins’ function withtraditional experimental techniques in lab. Computational biology provides analternative way to predicting protein function using high-throughput proteomicsdata. In our study, we utilize data mining to predict protein function based onprotein primary structures, protein-protein interactions and protein expressiondata, etc. Especially, I focus on the following topics.
     1) We constructed a novel model, namely FGsub, to predict the proteinsubcellular localizations for the fungal pathogen Fusariumgraminearum (telomorph Gibberella zeae). All fungi protein subcellularlocalizations annotations were collected and integrated into a database.On the one hand, we designed an ensemble classifier to predict proteinsubcellular localizations, where the Support Vector Machine (SVM) wasemployed as learner based on diverse feature descriptions. On the otherhand, BLAST is further utilized to transfer annotations of homologousproteins to uncharacterized F. graminearum proteins so that the F.graminearum proteins are annotated more comprehensively.Furthermore, we present a new algorithm to cope with the imbalanceproblem that arises in protein subcellular localization prediction, whichcan solve imbalance problem and avoid false positive results. The highaccurate predictions from FGsub can help one better understand F.graminearum proteins’function and provide insights into the pathogenicmechanisms of this destructive pathogen fungus.
     2) A new model was developed to predict protein S-glutathionylation sites.First, we collected experimentally determined S-glutathionylatedproteins and constructed a protein S-glutathionylation database by textmining. Then, we proposed a new method for predictingS-glutathionylation sites by employing machine learning methods basedon protein sequence data. The model could predict proteinS-glutathionylation sites effectively and help to uncover the mechanismsof protein S-glutathionylation
     3) A novel probability model was proposed to construct the proteinphosphorylation network. Firstly, we integrated protein phosphorylationexpression data and protein-protein interaction data and scanned all theexpressed proteins for phosphorylation motifs. Then, we calculated theprobability of motifs interact with kinase and the probability of proteinsubstrates catalyzed phosphorylation by kinase and predicted thekinase-substrate relations. Finally, we constructed human tissue specificprotein phosphorylation networks by combining protein tissue specificexpression data. Network function enrichment analysis demonstratedthat the three tissue specific phosphorylation networks were functionallyconsistent with the corresponding tissue, respectively.
引文
1.李伯良.功能蛋白质组学.生命的化学1998;18(6):1-3.
    2. Dove A. Proteomics: translating genomics into products? Nat Biotechnol1999;17(3):233-6.
    3. Wilkins MR, Williams KL. Proteome Research-Concepts Technology and Application:Springer;2007.
    4.饶子和.蛋白质组学方法:科学出版社;2012.
    5.江松敏,李军.蛋白质组学:军事医学科学出版社;2010.
    6. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, et al. Database resourcesof the National Center for Biotechnology Information. Nucleic Acids Res2011;39:D38-51.
    7. Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R. The GOA databasein2009--an integrated Gene Ontology Annotation resource. Nucleic Acids Res2009;37:D396-403.
    8. The Universal Protein Resource (UniProt) in2010. Nucleic Acids Res2010;38:D142-8.
    9. Rost B. Enzyme function less conserved than anticipated. J Mol Biol2002;318(2):595-608.
    10. Tian W, Skolnick J. How well is enzyme function conserved as a function of pairwisesequence identity? J Mol Biol2003;333(4):863-82.
    11. Erdin S, Lisewski AM, Lichtarge O. Protein function prediction: towards integration ofsimilarity metrics. Curr Opin Struct Biol2011;21(2):180-8.
    12. Hawkins T, Luban S, Kihara D. Enhanced automated function prediction using distantlyrelated sequences and contextual association by PFP. Protein Sci2006;15(6):1550-6.
    13. Tian W, Arakaki AK, Skolnick J. EFICAz: a comprehensive approach for accurategenome-scale enzyme function inference. Nucleic Acids Res2004;32(21):6226-39.
    14. Sun C, Zhao XM, Tang W, Chen L. FGsub: Fusarium graminearum protein subcellularlocalizations predicted from primary structures. BMC Syst Biol2010;4Suppl2:S12.
    15. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, et al. Datagrowth and its impact on the SCOP database: new developments. Nucleic Acids Res2008;36:D419-25.
    16. Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, et al. The CATH domainstructure database: new protocols and classification levels give a more comprehensiveresource for exploring evolution. Nucleic Acids Res2007;35:D291-7.
    17. Pegg SC, Brown SD, Ojha S, Seffernick J, Meng EC, Morris JH, et al. Leveraging enzymestructure-function relationships for functional inference and experimental design: thestructure-function linkage database. Biochemistry2006;45(8):2545-55.
    18. Holm L, Rosenstrom P. Dali server: conservation mapping in3D. Nucleic Acids Res2010;38:W545-9.
    19. Tseng YY, Dundas J, Liang J. Predicting protein function and binding profile via matching oflocal evolutionary and geometric surface patterns. J Mol Biol2009;387(2):451-64.
    20. Wallace AC, Laskowski RA, Thornton JM. Derivation of3D coordinate templates forsearching structural databases: application to Ser-His-Asp catalytic triads in the serineproteinases and lipases. Protein Sci1996;5(6):1001-13.
    21. Porter CT, Bartlett GJ, Thornton JM. The Catalytic Site Atlas: a resource of catalytic sitesand residues identified in enzymes using structural data. Nucleic Acids Res2004;32:D129-33.
    22. Ward RM, Venner E, Daines B, Murray S, Erdin S, Kristensen DM, et al. Evolutionary TraceAnnotation Server: automated enzyme function prediction in protein structures using3Dtemplates. Bioinformatics2009;25(11):1426-7.
    23. Laskowski RA, Watson JD, Thornton JM. ProFunc: a server for predicting protein functionfrom3D structure. Nucleic Acids Res2005;33:W89-93.
    24. Pal D, Eisenberg D. Inference of protein function from protein structure. Structure2005;13(1):121-30.
    25. Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol2007;3:88.
    26. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, et al. STRING8--a globalview on proteins and their functional interactions in630organisms. Nucleic Acids Res2009;37:D412-6.
    27. Manfredi JJ. An identity crisis for a cancer gene: subcellular location determines ASPP1function. Cancer Cell2010;18(5):409-10.
    28. Jensen ON. Interpreting the protein language using proteomics. Nat Rev Mol Cell Biol2006;7(6):391-403.
    29. Dalle-Donne I, Milzani A, Gagliano N, Colombo R, Giustarini D, Rossi R. Molecularmechanisms and potential clinical significance of S-glutathionylation. Antioxid RedoxSignal2008;10(3):445-73.
    30. Dalle-Donne I, Rossi R, Colombo G, Giustarini D, Milzani A. Protein S-glutathionylation: aregulatory device from bacteria to humans. Trends Biochem Sci2009;34(2):85-96.
    31. Gao J, Xu D. The Musite open-source framework for phosphorylation-site prediction. BMCBioinformatics2010;11Suppl12:S9.
    32. Sun C, Shi ZZ, Zhou X, Chen L, Zhao XM. Prediction of S-glutathionylation sites based onprotein sequences. PLoS One2013;8(2):e55512.
    33. Han JW, Kamber M. Data mining: concept and techniques: Morgan Kanfmann;2000.
    34. Kantardzic M. Data Mining Concepts, Models, Methods, and Algorithms: John Wiley andIEEE;2003.
    35.邵峰晶,于忠清.数据挖掘原理与算法:中国水利水电出版社;2003.
    36. Shapiro GP. Knowledge Discovery in Real Databases: A Report on the IJCAI-89Workshop.1991;11(5):68-70.
    37. Rud OP. Data Mining Cookbook: Modeling Data for Marketing, Risk and CustomerRelationship Management: John Wiley&Sons;2000.
    38.边肇祺,张学工.模式识别:清华大学出版社;2000.
    39.李弼程,邵美珍.模式识别原理与应用:西安电子科技大学出版社;2008.
    40. Kira K, Rendell LA. The feature selection problem: Traditional methods and a newalgorithm. Proc of the9th National Conf on Artificial Intelligence1992:129-34.
    41. John GH, Kohavi R, Pfleger K. Irrelevant features and the subset selection problem. Proc ofthe11th Int Conf on Machine Learning1994:121-29.
    42. Koller D, Sahami M. Toward optimal feature selection. Proc of Int Conf on MachineLearning1996:284-92.
    43. Dash M, Liu H. Feature selection for classification. Intelligent Data Analysis1997;1(3):131-56.
    44. Yu L, liu H. Efficient feature selection via analysis of relevance and redundancy. J ofMachine Learning Research2004;5(1):1205-24.
    45. Sun ZH, Bebis G, Miller R. Object detection using feature subset selection. PatternRecognition2004;37(11):2165-76.
    46. Langley P. Selection of relevant features in machine learning. Proc of the AAAI FallSymposium on Relevance1994:1-5.
    47. Kononenko I. Estimation attributes: Analysis and extensions of RELIEF. Proc of the1994European Conf on Machine Learning1994:171-82.
    48. Xu L, Yan P, Chang T. Best first strategy for feature selection. Proc of9th Int Conf onPattern Recognition1988:706-08.
    49. Xu Y, Li JT, Wang B. A category resolve power-based feature selection method. J ofSoftware2008;19(1):82-89.
    50. Almuallim H, Dietterich TG. Learning with many irrelevant features. Proc of9th NationalConf on Artificial Intelligence1992:547-52.
    51. Liu H, Setiono R. A probabilistic approach to feature selection–A filter solution. Proc of IntConf on Machine Learning1996:319-27.
    52. Hsu WH. Genetic wrappers for feature selection in decision tree induction and variableordering in Bayesian network structure learning. Information Sciences2004;163(17):103-22.
    53. Chiang LH, Pell RJ. Genetic algorithms combined with discriminant analysis for keyvariable identification. J of Process Control2004;14(2):143-55.
    54. Guyon I, Weston J, Barnhill S. Gene selection for cancer classification using support vectormachines.462002:389-422.
    55. Cortes C, Vapnik V. Support-vector network. Machine Learning1995;20:1-25.
    56. Cristianini N, Shawe-Taylor J. An introduction to support vector machines and otherkernel-based learning methods: Cambridge University2000.
    57.邓乃扬,田英杰.数据挖掘中的新方法—支持向量机:科学出版社;2005.
    58. Goswami RS, Kistler HC. Heading for disaster: Fusarium graminearum on cereal crops. MolPlant Pathol2004;5(6):515-25.
    59. Priest FG, Campbell I. In Brewing Microbiology: Springer;2002,3.
    60. Bennett JW, Klich M. Mycotoxins. Clinical Microbiology Reviews2003;16:497-516.
    61. Cuomo CA, Güldener U, Xu JR, Trail F. The Fusarium graminearum genome reveals a linkbetween localized polymorphism and pathogen specialization. Science2007;317:1400-02.
    62. Nakai K, Horton P. PSORT: a program for detecting sorting signals in proteins andpredicting their subcellular localization. Trends Biochem Sci1999;24(1):34-6.
    63. Huang Y, Li Y. Prediction of protein subcellular locations using fuzzy k-NN method.Bioinformatics2004;20(1):21-8.
    64. Lee K, Chuang HY, Beyer A, Sung MK, Huh WK, Lee B, et al. Protein networks markedlyimprove prediction of subcellular localization in multiple eukaryotic species. Nucleic AcidsRes2008;36(20):e136.
    65. Nair R, Rost B. Better prediction of sub-cellular localization by combining evolutionary andstructural information. Proteins2003;53(4):917-30.
    66. Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location ofproteins. Nucleic Acids Res1998;26(9):2230-6.
    67. Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization ofproteins based on their N-terminal amino acid sequence. J Mol Biol2000;300(4):1005-16.
    68. Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition.Proteins2001;43(3):246-55.
    69. Park KJ, Kanehisa M. Prediction of protein subcellular locations by support vector machinesusing compositions of amino acids and amino acid pairs. Bioinformatics2003;19(13):1656-63.
    70. Chang JM, Su EC, Lo A, Chiu HS, Sung TY, Hsu WL. PSLDoc: Protein subcellularlocalization prediction based on gapped-dipeptides and probabilistic latent semantic analysis.Proteins2008;72(2):693-710.
    71. Scott MS, Thomas DY, Hallett MT. Predicting subcellular localization via protein motifco-occurrence. Genome Res2004;14(10A):1957-66.
    72. Gardy JL, Spencer C, Wang K, Ester M, Tusnady GE, Simon I, et al. PSORT-B: Improvingprotein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res2003;31(13):3613-7.
    73. Garg P, Sharma V, Chaudhari P, Roy N. SubCellProt: predicting protein subcellularlocalization using machine learning approaches. In Silico Biol2009;9(1-2):35-44.
    74. Hua S, Sun Z. Support vector machine approach for protein subcellular localizationprediction. Bioinformatics2001;17(8):721-8.
    75. Wang J, Sung WK, Krishnan A, Li KB. Protein subcellular localization prediction forGram-negative bacteria using amino acid subalphabets and a combination of multiplesupport vector machines. BMC Bioinformatics2005;6:174.
    76. Hoglund A, Donnes P, Blum T, Adolph HW, Kohlbacher O. MultiLoc: prediction of proteinsubcellular localization using N-terminal targeting sequences, sequence motifs and aminoacid composition. Bioinformatics2006;22(10):1158-65.
    77. Tamura T, Akutsu T. Subcellular location prediction of proteins using support vectormachines with alignment of block sequences utilizing amino acid composition. BMCBioinformatics2007;8:466.
    78. Cedano J, Aloy P, Perez-Pons JA, Querol E. Relation between amino acid composition andcellular location of proteins. J Mol Biol1997;266(3):594-600.
    79. Nakashima H, Nishikawa K. Discrimination of intracellular and extracellular proteins usingamino acid composition and residue-pair frequencies. J Mol Biol1994;238(1):54-61.
    80. Bhasin M, Raghava GP. ESLpred: SVM-based method for subcellular localization ofeukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res2004;32:W414-9.
    81. Chen H, Huang N, Sun Z. SubLoc: a server/client suite for protein subcellular location basedon SOAP. Bioinformatics2006;22(3):376-7.
    82. Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, et al. WoLF PSORT:protein localization predictor. Nucleic Acids Res2007;35:W585-7.
    83. Cai YD, Chou KC. Predicting subcellular localization of proteins in a hybridization space.Bioinformatics2004;20(7):1151-6.
    84. Pierleoni A, Martelli PL, Fariselli P, Casadio R. BaCelLo: a balanced subcellular localizationpredictor. Bioinformatics2006;22(14):e408-16.
    85. Liu J, Kang S, Tang C, Ellis LB, Li T. Meta-prediction of protein subcellular localizationwith reduced voting. Nucleic Acids Res2007;35(15):e96.
    86. Zhao XM, Li X, Chen L, Aihara K. Protein classification with imbalanced data. Proteins2008;70(4):1125-32.
    87. The Universal Protein Resource (UniProt)2009. Nucleic Acids Res2009;37:D169-74.
    88. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein ornucleotide sequences. Bioinformatics2006;22(13):1658-9.
    89. Guldener U, Mannhaupt G, Munsterkotter M, Haase D, Oesterheld M, Stumpflen V, et al.FGDB: a comprehensive fungal genome resource on the plant pathogen Fusariumgraminearum. Nucleic Acids Res2006;34:D456-8.
    90. Zhao XM, Zhang XW, Tang WH, Chen L. FPPI: Fusarium graminearum protein-proteininteraction database. J Proteome Res2009;8(10):4714-21.
    91. Chang CC, Lin CJ. LIBSVM: a library for support vector machines2001. Software availableat http://www.csie.ntu.edu.tw/~cjlin/libsvm.;2001.
    92. Shin CJ, Wong S, Davis MJ, Ragan MA. Protein-protein interaction as a predictor ofsubcellular location. BMC Syst Biol2009;3:28.
    93. Mieyal JJ, Gallogly MM, Qanungo S, Sabens EA, Shelton MD. Molecular mechanisms andclinical implications of reversible protein S-glutathionylation. Antioxid Redox Signal2008;10(11):1941-88.
    94. Townsend DM. S-glutathionylation: indicator of cell stress and regulator of the unfoldedprotein response. Mol Interv2007;7(6):313-24.
    95. Hamnell-Pamment Y, Lind C, Palmberg C, Bergman T, Cotgreave IA. Determination ofsite-specificity of S-glutathionylated cellular proteins. Biochem Biophys Res Commun2005;332(2):362-9.
    96. Lind C, Gerdes R, Hamnell Y, Schuppe-Koistinen I, von Lowenhielm HB, Holmgren A, et al.Identification of S-glutathionylated cellular proteins during oxidative stress and constitutivemetabolism by affinity purification and proteomic analysis. Arch Biochem Biophys2002;406(2):229-40.
    97. Marino SM, Gladyshev VN. Analysis and functional prediction of reactive cysteine residues.J Biol Chem2012;287(7):4419-25.
    98. Marino SM, Gladyshev VN. A structure-based approach for detection of thioloxidoreductases and their catalytic redox-active cysteine residues. PLoS Comput Biol2009;5(5):e1000383.
    99. Mucchielli-Giorgi MH, Hazout S, Tuffery P. Predicting the disulfide bonding state ofcysteines using protein descriptors. Proteins2002;46(3):243-9.
    100. Martelli PL, Fariselli P, Casadio R. Prediction of disulfide-bonded cysteines in proteomeswith a hidden neural network. Proteomics2004;4(6):1665-71.
    101. Song J, Yuan Z, Tan H, Huber T, Burrage K. Predicting disulfide connectivity from proteinsequence using multiple sequence feature vectors and secondary structure. Bioinformatics2007;23(23):3147-54.
    102. Shao J, Xu D, Tsai SN, Wang Y, Ngai SM. Computational identification of proteinmethylation sites through bi-profile Bayes feature extraction. PLoS One2009;4(3):e4920.
    103. Wong YH, Lee TY, Liang HK, Huang CM, Wang TY, Yang YH, et al. KinasePhos2.0: a webserver for identifying protein kinase-specific phosphorylation sites based on sequences andcoupling patterns. Nucleic Acids Res2007;35:W588-94.
    104. Kawashima S, Ogata H, Kanehisa M. AAindex: Amino Acid Index Database. Nucleic AcidsRes1999;27(1):368-9.
    105. Chen XW, Jeong JC. Sequence-based prediction of protein interaction sites with anintegrative method. Bioinformatics2009;25(5):585-91.
    106. Li N, Sun Z, Jiang F. Prediction of protein-protein binding site by using core interfaceresidue and support vector machine. BMC Bioinformatics2008;9:553.
    107. Xia JF, Zhao XM, Song J, Huang DS. APIS: accurate prediction of hot spots in proteininterfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics2010;11:174.
    108. Song J, Tan H, Shen H, Mahmood K, Boyd SE, Webb GI, et al. Cascleave: towards moreaccurate prediction of caspase substrate cleavage sites. Bioinformatics2010;26(6):752-60.
    109. Yu CS, Chen YC, Lu CH, Hwang JK. Prediction of protein subcellular localization. Proteins2006;64(3):643-51.
    110. Zhao XM, Cheung YM, Huang DS. A novel approach to extracting features from motifcontent and protein composition for protein sequence classification. Neural Netw2005;18(8):1019-28.
    111. Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, et al. Predicting protein-protein interactionsbased only on sequences information. Proc Natl Acad Sci U S A2007;104(11):4337-41.
    112. Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X. GPS2.0, a tool to predict kinase-specificphosphorylation sites in hierarchy. Mol Cell Proteomics2008;7(9):1598-608.
    113. Gao J, Thelen JJ, Dunker AK, Xu D. Musite, a tool for global prediction of general andkinase-specific phosphorylation sites. Mol Cell Proteomics2010;9(12):2586-600.
    114. Falquet L, Pagni M, Bucher P, Hulo N, Sigrist CJ, Hofmann K, et al. The PROSITE database,its status in2002. Nucleic Acids Res2002;30(1):235-8.
    115. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, et al. InterPro: theintegrative protein signature database. Nucleic Acids Res2009;37:D211-5.
    116. Pineda-Molina E, Klatt P, Vazquez J, Marina A, Garcia de Lacoba M, Perez-Sala D, et al.Glutathionylation of the p50subunit of NF-kappaB: a mechanism for redox-inducedinhibition of DNA binding. Biochemistry2001;40(47):14134-42.
    117. Ghezzi P, Casagrande S, Massignan T, Basso M, Bellacchio E, Mollica L, et al. Redoxregulation of cyclophilin A by glutathionylation. Proteomics2006;6(3):817-25.
    118. Bas DC, Rogers DM, Jensen JH. Very fast prediction and rationalization of pKa values forprotein-ligand complexes. Proteins2008;73(3):765-83.
    119. Lee B, Richards FM. The interpretation of protein structures: estimation of staticaccessibility. J Mol Biol1971;55(3):379-400.
    120. Zolnierowicz S, Bollen M. Protein phosphorylation and protein phosphatases. De Panne,Belgium, September19-24,1999. EMBO J2000;19(4):483-8.
    121. Mann M, Ong SE, Gronborg M, Steen H, Jensen ON, Pandey A. Analysis of proteinphosphorylation using mass spectrometry: deciphering the phosphoproteome. TrendsBiotechnol2002;20(6):261-8.
    122. Seet BT, Dikic I, Zhou MM, Pawson T. Reading protein modifications with interactiondomains. Nat Rev Mol Cell Biol2006;7(7):473-83.
    123. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature2003;422(6928):198-207.
    124. Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ, et al. Phospho.ELM: a databaseof phosphorylation sites--update2011. Nucleic Acids Res2011;39:D261-7.
    125. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. The protein kinasecomplement of the human genome. Science2002;298(5600):1912-34.
    126. Diella F, Cameron S, Gemund C, Linding R, Via A, Kuster B, et al. Phospho.ELM: adatabase of experimentally verified phosphorylation sites in eukaryotic proteins. BMCBioinformatics2004;5:79.
    127. Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, et al. Global analysis of proteinphosphorylation in yeast. Nature2005;438(7068):679-84.
    128. Bain J, McLauchlan H, Elliott M, Cohen P. The specificities of protein kinase inhibitors: anupdate. Biochem J2003;371(Pt1):199-204.
    129. Hjerrild M, Stensballe A, Rasmussen TE, Kofoed CB, Blom N, Sicheritz-Ponten T, et al.Identification of phosphorylation sites in protein kinase A substrates using artificial neuralnetworks and mass spectrometry. J Proteome Res2004;3(3):426-33.
    130. Obenauer JC, Cantley LC, Yaffe MB. Scansite2.0: Proteome-wide prediction of cellsignaling interactions using short sequence motifs. Nucleic Acids Res2003;31(13):3635-41.
    131. Puntervoll P, Linding R, Gemund C, Chabanis-Davidson S, Mattingsdal M, Cameron S, et al.ELM server: A new resource for investigating short functional sites in modular eukaryoticproteins. Nucleic Acids Res2003;31(13):3625-30.
    132. Manke IA, Nguyen A, Lim D, Stewart MQ, Elia AE, Yaffe MB. MAPKAP kinase-2is a cellcycle checkpoint kinase that regulates the G2/M transition and S phase progression inresponse to UV irradiation. Mol Cell2005;17(1):37-48.
    133. Dar AC, Dever TE, Sicheri F. Higher-order substrate recognition of eIF2alpha by theRNA-dependent protein kinase PKR. Cell2005;122(6):887-900.
    134. Bhattacharyya RP, Remenyi A, Good MC, Bashor CJ, Falick AM, Lim WA. The Ste5scaffold allosterically modulates signaling output of the yeast mating pathway. Science2006;311(5762):822-6.
    135. Remenyi A, Good MC, Bhattacharyya RP, Lim WA. The role of docking interactions inmediating signaling input, output, and discrimination in the yeast MAPK network. Mol Cell2005;20(6):951-62.
    136. Newman RH, Hu J, Rho HS, Xie Z, Woodard C, Neiswinger J, et al. Construction of humanactivity-based phosphorylation networks. Mol Syst Biol2013;9:655.
    137. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, et al.Human Protein Reference Database--2009update. Nucleic Acids Res2009;37:D767-72.
    138. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a generalrepository for interaction datasets. Nucleic Acids Res2006;34:D535-9.
    139. Xia K, Dong D, Han JD. IntNetDB v1.0: an integrated protein-protein interaction networkdatabase generated by a probabilistic model. BMC Bioinformatics2006;7:508.
    140. McDowall MD, Scott MS, Barton GJ. PIPs: human protein-protein interaction predictiondatabase. Nucleic Acids Res2009;37:D651-6.
    141. Wang J, Huang Q, Liu ZP, Wang Y, Wu LY, Chen L, et al. NOA: a novel Network OntologyAnalysis method. Nucleic Acids Res2011;39(13):e87.