基于SVM的蛋白质亚细胞定位预测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Protein Subcellular Localization Prediction Based on SVM
  • 作者:刘清华 ; 赖裕平 ; 丁洪伟 ; 杨志军 ; 崔晓龙
  • 英文作者:LIU Qinghua;LAI Yuping;DING Hongwei;YANG Zhijun;Cui Xiaolong;School of Information Science and Engineering, Yunnan University;College of Computer Science and Technology, North China University of Technology;Institute of Microbiology, Yunnan University;
  • 关键词:特征融合 ; 熵密度 ; 自相关系数 ; 线性判别分析(LDA) ; 支持向量机
  • 英文关键词:feature fusion;;entropy density;;autocorrelation coefficient;;Linear Discriminant Analysis(LDA);;support vector machine
  • 中文刊名:计算机工程与应用
  • 英文刊名:Computer Engineering and Applications
  • 机构:云南大学信息学院;北方工业大学计算机学院;云南大学微生物研究所;
  • 出版日期:2018-09-03 15:01
  • 出版单位:计算机工程与应用
  • 年:2019
  • 期:11
  • 基金:国家自然科学基金(No.61461053,No61461054,No.61072079)
  • 语种:中文;
  • 页:141-146
  • 页数:6
  • ISSN:1002-8331
  • 分类号:TP181;Q811.4
摘要
首先基于特征融合思想,采用氨基酸组成、熵密度和自相关系数结合的方式构建190维特征向量进行特征表达,与仅考虑氨基酸组成信息的传统方法相比,能更好地表达蛋白质结构信息。然后利用LDA(Linear Discriminant Analysis)方法进行降维,降低计算复杂性,加强同类样本间的相关性。接下来选用支持向量机作为分类器进行定位预测,最后采用留一法在Gram-negative和Gram-positive数据集上进行交叉检验。实验结果表明,多特征结合的方法优于传统的氨基酸组成方法和简单的自相关系数方法,证明了新方法的有效性。
        Based on feature fusion, combining amino acid composition, entropy density and autocorrelation coefficient to construct a 190 dimensional eigenvector for characteristic expression, this method can better express the protein structure information compared with the traditional method which only considers the amino acid composition information. It uses the Linear Discriminant Analysis(LDA)method to reduce the calculation complexity and increases the correlation between the samples. The support vector machine is selected as the classifier for positioning prediction. It uses the Jackknife method to cross-check the gram-negative and gram-positive data sets. The experimental results show that the multifeature combination method is superior to the traditional amino acid composition method and simple self-correlation coefficient method, and proves the validity of the new method.
引文
[1]胡清铭.蛋白质序列特征提取及其在亚细胞定位中的应用[D].长沙:湖南大学,2013.
    [2]杨红,徐慧敏,严寿江,等.基于氨基酸约化和统计特征的蛋白质亚细胞定位预测[J].生物信息学,2015,13(2):103-110.
    [3] Yu X,Zheng X,Liu T,et al.Predicting subcellular location of apoptosis proteins with pseudo amino acid composition:approach from amino acid substitution matrix and auto covariance transformation[J].Amino Acids,2012,42(5):1619-1625.
    [4] Liao B,Jiang J B,Zeng Q G,et al.Predicting apoptosis protein subcellular location with Pse AAC by incorporating tripeptide composition[J].Protein&Peptide Letters,2011,18(11):1086-1092.
    [5]张燕平,查永亮,赵姝,等.基于自相关系数和PseAAC的蛋白质结构类预测[J].计算机科学与探索,2014,8(1):103-108.
    [6] Hua S,Sun Z R.Support vector machine approach for protein subcellular localization prediction[J].Bioinformatics,2001,17(8):721-728.
    [7]乔善平,闫宝强.蛋白质亚细胞定位预测研究综述[J].计算机应用研究,2014,31(2):321-327.
    [8] Gu Q,Ding Y S,Jiang X Y,et al.Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection[J].Amino Acids,2010,38(4):975-983.
    [9] Mandle A K,Jain P,Shrivastava S K.Protein structure prediction using support vector machine[J].International Journal on Soft Computing,2012,3(1):67-78.
    [10] Li L Q,Zhang Y,Zou L Y,et al.An ensemble classifier for eukaryotic protein subcellular location prediction using gene ontology categories and amino acid hydrophobicity[J].Plos One,2012,7(1):e31057.
    [11] Candes E J,Li X,Ma Y,et al.Robust principal component analysis[J].Journal of the ACM,2011,58(3):60-73.
    [12] Liu T,Zheng X,Wang C,et al.Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition:an approach from auto covariance transformation[J].Protein&Peptide Letters,2010,17(10):1263-1269.
    [13] Du X Q,Cheng J X.Inferring protein-protein interactions from sequence using sequence order information[C]//Proceedings of the 5th International Conference on Computer Science and Education(ICCSE’10).Hefei:ICCSE,2010:481-486.
    [14] Saraswathi S,Fernandez-Martinez J L,Kolinski A,et al.Fast learning optimized prediction methodology(FLOPRED)for protein secondary structure prediction[J].Journal of Molecular Modeling,2012,18(9):4275-4289.
    [15]刘太岗,王春华.基于SVM-RFE算法的凋亡蛋白亚细胞定位预测[J].计算机工程与应用,2017,53(10):155-159.
    [16]唐黎哲,冯大为,李东升,等.以LDA为例的大规模分布式机器学习系统分析[J].计算机应用,2017,37(3):628-634.
    [17] Shin H,Sheu B,Joseph M,et al.Guilt-by-association feature selection:identifying biomarkers from proteomic profiles[J].Journal of Biomedical Informatics,2008,41(1):124-136.
    [18] Cortes C,Vapnik V.Support-vector networks[J].Machine Learning,1995,20(3):273-297.
    [19] Su D L,Cui Z M,Wu J,et al.Pre-filling collaborative filtering algorithm based on matrix factorization[C]//Applied Mechanics and Materials,2013:2223-2228.
    [20]曹隽喆,顾宏,贺建军.一种新的蛋白质亚细胞定位预测训练集构造方法[J].大连理工大学学报,2012,52(6):884-889.
    [21]王洪波.单分类支持向量机的学习方法研究[D].杭州:浙江大学,2012.
    [22]刘树慧,王顺芳.基于特征融合和有监督局部保持投影的蛋白质亚核定位[J].计算机应用与软件,2017,34(2):251-255.
    [23] Chou K C,Shen H B.Large-scale predictions of gramnegative bacterial protein sub-cellular locations[J].Proteome Research,2006,5(12):3420-3428.
    [24] Chou K C,Shen H B.Cell-PLoc:a package of Web servers for prediction[J].Nature Protocols,2008,17(3):153-162.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700