改进CKSAAP结合RFE算法预测蛋白质棕榈酰化位点
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Identification of Palmitoylation Sites of Proteins Using Modified CKSAAP Combined with RFE Method
  • 作者:汤亚东 ; 谢鹭 ; 陈兰明
  • 英文作者:TANG Yadong;XIE Lu;CHEN Lanming;College of Food Science and Technology, Shanghai Ocean University;Shanghai Center for Bioinformation Technology;
  • 关键词:蛋白质棕榈酰化位点 ; k-spaced氨基酸对组分 ; 位置特异性得分矩阵 ; 支持向量机 ; 递归特征消除
  • 英文关键词:protein palmitoylation sites;;composition of k-spaced amino acid pairs;;position specific scoring matrix;;support vector machine;;recursive feature elimination
  • 中文刊名:JSGG
  • 英文刊名:Computer Engineering and Applications
  • 机构:上海海洋大学食品科学与技术学院;上海生物信息技术研究中心;
  • 出版日期:2018-04-20 16:40
  • 出版单位:计算机工程与应用
  • 年:2019
  • 期:v.55;No.924
  • 基金:国家自然科学基金面上项目(No.31671946)
  • 语种:中文;
  • 页:JSGG201905023
  • 页数:6
  • CN:05
  • 分类号:149-154
摘要
蛋白质棕榈酰化是一种可逆的蛋白质翻译后修饰,在蛋白质稳定性和亚细胞定位等方面发挥重要作用。构建了一种预测蛋白质棕榈酰化位点的新模型(PSSM-CKSAAP-RFE)。采用蕴含进化信息的k-spaced氨基酸对组分方法表征蛋白质序列,通过递归特征消除法进行特征选择;基于上述特征训练支持向量机分类器,并采用夹克刀交叉验证法测试模型性能。研究结果显示,训练集和独立测试集的预测准确率、马修斯相关系数、特异性、敏感性和受试者工作特征曲线下面积分别为98.44%、0.94、98.95%、95.65%和0.990,以及98.41%、0.93、99.39%、92.31%和0.994,优于文献中报道的相关方法,为蛋白质棕榈酰化位点的预测提供了一种新模型。
        Protein palmitoylation is reversible post-translational modification and plays important roles in protein stability,subcellular localization and many other functions. In this study, a new model to identify palmitoylation sites is constructed,designated as PSSM-CKSAAP-RFE. The evolutionary information of amino acid residues involved in tested proteins is represented by a Composition of k-Spaced Amino Acid Pairs(CKSAAP)method. Optional features are selected using a Recursive Feature Elimination(RFE)method. The Support Vector Machine(SVM)classifier is trained using the chosen features, and the performance of the model is examined using a Jackknife Cross Validation Test(JCVT). The resulting data shows that the values of accuracy, Matthews correlation coefficient, specificity, sensitivity and area under receiver operating characteristic curves(AUC)for the identification of palmitoylation sites are 98. 44%, 0.94, 98.95%, 95.65%and 0.990, as well as 98.41%, 0.93, 99.39%, 92.31% and 0.994 for the train dataset and test dataset, respectively, which are superior to previous methods in the literature. This study provides a new model for the identification of palmitoylation sites of proteins.
引文
[1] Raghava G P S,Kumari B,Kumar R,et al.PalmPred:an SVM based palmitoylation prediction method usingsequence profile information[J].PLoS One,2014,9:e89246.
    [2] Shi S P,Sun X Y,Qiu J D,et al.The prediction of pal-mitoylation site locations using a multiple feature extrac-tion method[J].Journal of Molecular Graphics&Model-ling,2013,40:125-130.
    [3] Hu L L,Wan S B,Niu S,et al.Prediction and analysisof protein palmitoylation sites[J].Biochimie,2011,93:489-496.
    [4] Wang X B,Wu L Y,Wang Y C,et al.Prediction of pal-mitoylation sites using the composition of k-spaced aminoacid pairs[J].Protein Engineering,Design&Selection,2009,22:707-712.
    [5] Ferri N,Paoletti R,Corsini A,Lipid-modified proteinsas biomarkers for cardiovascular disease:a review[J].Bio-markers:Biochemical Indicators of Exposure,Response,and Susceptibility to Chemicals,2005,10:219-237.
    [6] Zhou F,Xue Y,Yao X,et al.CSS-Palm:palmitoylationsite prediction with a Clustering and Scoring Strategy(CSS)[J].Bioinformatics,2006,22:894-896.
    [7] Xue Y,Chen H,Jin C,et al.NBA-Palm:prediction ofpalmitoylation site implemented in Naive Bayes algo-rithm[J].BMC Bioinformatics,2006,7:458.
    [8] Ren J,Wen L,Gao X,et al.CSS-Palm 2.0:an updatedsoftware for palmitoylation sites prediction[J].ProteinEngineering,Design&Selection,2008,21:639-644.
    [9] Chen K,Kurgan L A,Ruan J,Prediction of flexible/rigidregions from protein sequences using k-spaced aminoacid pairs[J].BMC Structural Biology,2007,7:25.
    [10] Tung C W.Prediction of pupylation sites using thecomposition of k-spaced amino acid pairs[J].Journal ofTheoretical Biology,2013,336:11-17.
    [11] Altschul S F,Madden T L,Schaffer A A,et al.Gappedblast and PSI-BLAST:a new generation of proteindatabase search programs[J].Nucleic Acids Research,1997,17(25):3389-3402.
    [12] Chang C C,Lin C J.Libsvm:a library for supporetvector machines[J].ACM Transactions on IntelligentSystems and Technology,2011,2(3):1-27.
    [13] Guyon I,Jason W,Stephen B.Gene selection for cancerclassification using support vector machines[J].MachineLearning,2002,46:389-422.
    [14]李烨,王永丽,贺国平.基于支持向量机的结肠癌信息基因提取[J].山东科技大学学报(自然科学版),2012,31(3):84-89.
    [15] Si J,Zhao R,Wu R,An overview of the prediction ofprotein DNA-binding sites[J].International Journal ofMolecular Sciences,2015,16:5194-5215.
    [16] Zou Q,Xie S,Lin Z,et al.Finding the best classifica-tion threshold in imbalanced classification[J].Big DataResearch,2016,5:2-8.
    [17] Chen Z,Chen Y Z,Wang X F,et al.Prediction of ubiq-uitination sites by using the composition of k-spacedamino acid pairs[J].PLoS One,2011,6:e22930.
    [18] Hasan M M,Zhou Y,Lu X,et al.Computational identi-fication of protein pupylation sites by using profile-based composition of k-spaced amino acid pairs[J].PLoS One,2015,10:e0129635.
    [19] Bui V M,Weng S L,Lu C T,et al.SOHSite:incorporatingevolutionary information and physicochemical propertiesto identify protein S-sulfenylation sites[J].BMC Genomics,2012,17:1-9.
    [20] Zangooei M H,Jalili S,Protein secondary structure pre-diction using DWKF based on SVR-NSGAII[J].Neuro-computing,2012,94:87-101.
    [21] Liu T,Qin Y,Wang Y,et al.Prediction of protein struc-tural class based on gapped-dipeptides and a recursivefeature selection approach[J].International Journal ofMolecular Sciences,2015,17.
    [22] Meher P K,Sahu T K,Banchariya A,et al.DIRProt:acomputational approach for discriminating insecticideresistant proteins from non-resistant proteins[J].BMCBioinformatics,2017,18:190.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700