基于PU-learning的磷酸激酶预测算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Prediction algorithm of phosphokinase based on PU-learning
  • 作者:王艺琪 ; 王明举 ; 张进 ; 彭智才 ; 魏森 ; 谢多双
  • 英文作者:WANG Yiqi;WANG Mingju;ZHANG Jin;PENG Zhicai;WEI Sen;XIE Duoshuang;Department of Information Resource,Taihe Hospital;
  • 关键词:蛋白质磷酸化 ; 生物信息 ; 半监督学习 ; PU-learning ; 磷酸激酶预测
  • 英文关键词:protein phosphorylation;;bioinformatics;;semi-supervised learning;;PU-learning;;kinase prediction
  • 中文刊名:北京生物医学工程
  • 英文刊名:Beijing Biomedical Engineering
  • 机构:太和医院;
  • 出版日期:2019-08-15 16:53
  • 出版单位:北京生物医学工程
  • 年:2019
  • 期:04
  • 基金:国家自然科学基金青年基金(31501070);; 湖北省自然科学基金(2017CFB137)资助
  • 语种:中文;
  • 页:34-42
  • 页数:9
  • CN:11-2261/R
  • ISSN:1002-3208
  • 分类号:Q55;Q811.4
摘要
目的蛋白质磷酸化是通过激酶催化特定位点把磷酸基转移到底物蛋白质氨基酸残基的过程,是研究蛋白质活力及功能的重要机制。目前已鉴定的数千个磷酸化位点大多缺失激酶信息,为此本研究提出基于PU-learning的磷酸激酶预测算法,通过迭代标记磷酸位点,可以准确预测催化磷酸肽的磷酸激酶。方法首先该算法以PU-learning为框架,利用最大熵方差对不同种类的磷酸激酶自动筛选最佳阈值,从而提取每条磷酸肽上潜在的磷酸化位点,然后根据统计分析确定磷酸化位点对应的激酶,最后通过五折交叉验证该算法在Phospho. ELM数据库上的预测性能,并与现有算法对比。结果该算法的交叉验证特异性和灵敏度比现有最好算法在单个数据集上最高提高4%及10%,其预测Phospho. ELM中数据准确度达到79. 52%。结论基于PU-learning的磷酸激酶预测算法显著优于现有算法,且可以准确预测Phospho. ELM数据库中未知激酶信息的磷酸肽,在磷酸化实验中具有较强的指导意义。
        Objective Protein phosphorylation is a process by which a kinase catalyzes the transfer of a phosphate group to a protein residue at a specific site,as an important mechanism of protein activity and function. Most of identified phosphorylation sites are lack of kinase information. To this end,a prediction algorithm of phosphokinase based on PU-learning is proposed. By iterative phosphate site labeling,the phosphokinase that catalyzes the phosphopeptide can be accurately predicted. Methods The algorithm uses PUlearning as the framework to automatically screen the optimal thresholds for different kinds of phosphokinases by using the maximum entropy variance,so as to extract the potential phosphorylation sites on each phosphopeptide,and then determines the corresponding phosphorylation sites according to statistical analysis.Finally,the prediction performance is verified by a five-fold cross validation on the Phospho. ELM database and compared with existing algorithms. Results The cross-validation specificity and sensitivity of this algorithm are4% and 10% higher than those of the best existing approach on single data set,and the prediction accuracy on Phospho. ELM is as high as 79. 52%. Conclusions The prediction algorithm of phosphokinase based on PUlearning is significantly better than the existing algorithms,and can accurately predict the phosphopeptides of unknown kinase information in the Phospho. ELM database,which has a strong guiding significance in phosphorylation experiments.
引文
[1] Davis MI,Hunt JP,Herrgard S,et al. Comprehensive analysisof kinase inhibitor selectivity[J]. Nature Biotechnology,2011,29(11):1046-1051.
    [2]刘博雅,贺福初,王建.蛋白质翻译后修饰对STAT家族活性的调节[J].生命科学,2013(3):275-279.Liu BY,He FC,Wang J. The regulation of STAT activity by posttranslational modifications[J]. Chinese Bulletin of Life Sciences,2013(3):275-279.
    [3] Kim JH,Lee J,Oh B,et al. Prediction of phosphorylation sites using SVMs[J]. Bioinformatics,2004,20(17):3179-3184.
    [4] Wong YH,Lee TY,Liang HK,et al. KinasePhos 2.0:a webserver for identifying protein kinase-specific phosphorylation sites basedon sequences and coupling patterns[J]. Nucleic Acids Research,2007,35(Web Server issue):588-594.
    [5] Blom N,Sicheritz-Pontén T,Gupta R,et al. Prediction of posttranslational glycosylation and phosphorylation of proteins from the amino acid sequence[J]. Proteomics,2004,4(6):1633-1649.
    [6] Xue Y,Li A,Wang L,et al. PPSP:prediction of PK-specific phosphorylation site with Bayesian decision theory[J]. BMC Bioinformatics,2006,7:163.
    [7] Wang MH,Li CH,Chen WZ,et al. Prediction of PK-specific phosphorylation site based oninformation entropy[J]. Science in China Series C:Life Sciences,2008,51(1):12-20.
    [8] Xue Y,Ren J,Gao X,et al. GPS 2.0,a tool to predict kinasespecific phosphorylation sites in hierarchy[J]. Molecular&Cellular Proteomics,2008,7(9):1598-1608.
    [9] Diella F,Gould CM,Chica C,et al. Phospho.ELM:a database of phosphorylation sites-update[J]. Nucleic Acids Research,2008,36(suppl 1):D240-D244.
    [10] Ke T,Lyu H,Sun M,et al.A biased least squares support vector machine based on Mahalanobis distance for PU learning[J].Physica A:Statistical Mechanics and its Applications,2018,509:422-438.
    [11] Yamazaki K. Accuracy analysis of semi-supervised classification when the class balance changes[J]. Neurocomputing,2015,160:132-140.
    [12] Zou L,Wang M,Shen Y,et al. PKIS:computational identification of protein kinases for experimentally discovered protein phosphorylation sites[J]. BMC Bioinformatics, 2013,14(1):247.
    [13] Linding R,Jensen LJ,Pasculescu A,et al. NetworKIN:a resource for exploring cellular phosphorylation networks[J]. Nucleic Acids Research,2008,36(Suppl 1):D695-699.
    [14] Chen X,Shi SP,Suo SB,et al. Proteomic analysis and prediction of human phosphorylation sites in subcellular level reveals subcellular specificity[J]. Bioinformatics,2015 31(2):194-200.
    [15] Ismail HD,Jones A,Kim JH,et al. Phosphorylation sites prediction using random forest[C]//5th IEEE International Conference on Computational Advances in Bio and Medical Sciences(ICCABS). Miami,FL,USA,2015:1-6.
    [16] Li H,Xu X,Feng H,et al. A novel kinase-substrate relation prediction method based on substrate sequence similarity and phosphorylation network[J]. IFAC PapersOnLine,2015,48(28):17-21.
    [17] Patrick R,Horin C,Kobe B,et al. Prediction of kinase-specific phosphorylation sites through an integrative model of protein context and sequence[J]. Biochimica et Biophysica Acta(BBA)-Proteins and Proteomics,2016,1864(11):1599-1608.
    [18] Kaushik AC,Pal A,Kumar A,et al. Internal transcribed spacer sequence database of plant fungal pathogens:PFP-ITSS Database[J]. Informatics in Medicine Unlocked,2017,7:34-38.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700