用户名: 密码: 验证码:
基于深度神经网络的蛋白质相互作用预测框架
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Prediction of protein-protein interactions based on deep neural networks
  • 作者:刘桂霞 ; 王沫沅 ; 苏令涛 ; 吴春国 ; 孙立岩 ; 王荣全
  • 英文作者:LIU Gui-xia;WANG Mo-yuan;SU Ling-tao;WU Chun-guo;SUN Li-yan;WANG Rong-quan;College of Computer Science and Technology,Jilin University;Symbol Computation and Knowledge Engineering of Ministry Education,Jilin University;
  • 关键词:人工智能 ; 蛋白质相互作用 ; 蛋白质特征 ; 蛋白质序列 ; 深度神经网络
  • 英文关键词:artificial intelligence;;protein-protein interaction;;protein features;;protein sequence;;deep neural network
  • 中文刊名:JLGY
  • 英文刊名:Journal of Jilin University(Engineering and Technology Edition)
  • 机构:吉林大学计算机科学与技术学院;吉林大学符号计算与知识工程教育部重点实验室;
  • 出版日期:2018-06-08 09:38
  • 出版单位:吉林大学学报(工学版)
  • 年:2019
  • 期:v.49;No.202
  • 基金:国家自然科学基金项目(61772226,61373051,61502343);; 吉林省科技发展计划项目(20140204004GX)
  • 语种:中文;
  • 页:JLGY201902030
  • 页数:8
  • CN:02
  • ISSN:22-1341/T
  • 分类号:251-258
摘要
为解决实验方法中结果存在较高假阳性率和假阴性率的问题,整合蛋白质特征数据,提出一种基于深度神经网络的蛋白质相互作用预测框架。提取蛋白质的GO语义相似性、序列相似性、蛋白质重要性以及亚细胞定位信息,得到低维度的输入数据。然后建立深度神经网络,进行预测。通过使用弃权技术,减少网络中复杂的互适应神经元,总体性能得到提高。预测框架在酿酒酵母蛋白质数据集上的准确率达到95.67%,精确度达到96.38%。实验结果表明:提取的特征数据较适合用于蛋白质互作的预测研究,且构建的基于深度神经网络的蛋白质相互作用预测框架具有出色的泛化性能,在多种数据上都能取得较好效果。
        In order to deal with the high false-positive to false-negative rate in experimental methods,a Deep Neural Network(DNN)is constructed based on several biology features.Protein features,including GO term semantic similarity,sequence similarity,essentiality and subcellular localization information,are integrated from diverse databases to form a fixed-length eigenvector.This vector contains a great deal of related information and can be used as the input of a classifier to predict protein interactions.Then the DNN which is data driven is constructed.It is used to automatically learn information from the input data and predict whether the unknown protein pairs interact or not.Dropout is used during the training phase to prevent co-adaption and improve its performance.The method achieves a prediction accuracy of 95.67% with 96.38% precision on the S.cerevisae dataset.Experimental results show that the extracted features are suitable for the prediction of PPIs,and many commonly used machine learning models can predict interaction effectively and efficiently based on this eigenvector.Moreover the DNN has good generalization capacity and shows high performance on various feature data.
引文
[1]Ho Y,Gruhler A,Heilbut A,et al.Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry[J].Nature,2002,415(6868):180-183.
    [2]Zhu H,Bilgin M,Bangham R,et al.Global analysis of protein activities using proteome chips[J].Science,2001,293(5537):2101-2105.
    [3]Ito T,Chiba T,Ozawa R,et al.A comprehensive two-hybrid analysis to explore the yeast protein interactome[J].Proceedings of the National Academy of Sciences of the United States of America,2001,98(8):4569-4574.
    [4]Huang Y A,You Z H,Gao X,et al.Using weighted sparse representation model combined with discrete cosine transformation to predict protein-protein interactions from protein sequence[J].Biomed Research International,2015,2015:1-10.
    [5]You Z H,Zhu L,Zheng C H,et al.Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set[J].BMC Bioinformatics,2014,15(Sup.15):1-9.
    [6]Wong L,You Z H,Ming Z,et al.Detection of interactions between proteins through rotation forest and local phase quantization descriptors[J].International Journal of Molecular Sciences,2015,17(1):21-31.
    [7]Huang Y A,You Z H,Xing C,et al.Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding[J].BMC Bioinformatics,2016,17(1):184-194.
    [8]Chatterjee P,Basu S,Kundu M,et al.PPI_SVM:prediction of protein-protein interactions using machine learning,domain-domain affinities and frequency tables[J].Cellular&Molecular Biology Letters,2011,16(2):264-278.
    [9]Saha I,Zubek J,Klingstr9m T,et al.Ensemble learning prediction of protein-protein interactions using proteins functional annotations[J].Molecular Biosystems,2014,10(4):820-830.
    [10]Xenarios I,Salwínski L,Duan X J,et al.DIP,the database of interacting proteins:a research tool for studying cellular networks of protein interactions[J].Nucleic Acids Research,2002,30(1):303-305.
    [11]王彬.基于序列与支持向量机预测蛋白质相互作用的数据集构造与精度分析[D].广州:华南理工大学计算机科学与工程学院,2013.Wang Bin.Datasets construction and accuracy analysis in protein-protein interaction prediction based on sequence and SVM[D].Guangzhou:School of Computer Science&Engineering,South China University of Technology,2013.
    [12]Consortium U P.UniProt:the universal protein knowledge base[J].Nucleic Acids Research,2017,45(1):158-169.
    [13]Shen J,Zhang J,Luo X,et al.Predicting protein-protein interactions based only on sequences information[J].Proceedings of the National Academy of Sciences of the United States of America,2007,104(11):4337-4341.
    [14]Chen W H,Pablo M,Lercher M J,et al.OGEE:an online gene essentiality database[J].Nucleic Acids Research,2012,40:901-906.
    [15]Consortium T G O.Geneontology consortium:going forward[J].Nucleic Acids Research,2015,43(D1):1049-1056.
    [16]Hinton G E,Srivastava N,Krizhevsky A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computer Science,2012,3(4):212-223.
    [17]Yang L,Xia J F,Gui J.Prediction of protein-protein interactions from protein sequence using local descriptors[J].Protein&Peptide Letters,2010,17(9):1085-1090.
    [18]Shi M G,Xia J F,Li X L,et al.Predicting protein-protein interactions from sequence using correlation coefficient and high-quality interaction dataset[J].Amino Acids,2010,38(3):891-899.
    [19]Martin S,Roe D,Faulon J L.Predicting proteinprotein interactions using signature products[J].Current Opinion in Structural Biology,2005,15(4):441-446.
    [20]Bock J R,Gough D A.Whole-proteome interaction mining[J].Bioinformatics,2003,19(1):125-135.
    [21]Nanni L.Letters:hyperplanes for predicting protein-proteininteractions[J]. Neurocomputing,2005,69(1):257-263.
    [22]Nanni L,Lumini A.An ensemble of K-local hyperplanes for predicting protein-protein interactions[J].Bioinformatics,2006,22(10):1207-1210.
    [23]李敏,王建新,陈建二.基于距离测定的蛋白质复合物识别算法[J].吉林大学学报:工学版,2010,40(5):1318-1323.Li Min,Wang Jian-xin,Chen Jian-er.Distance measure-based algorithm for identification of protein complexes[J].Journal of Jilin University(Engineering and Technology Edition),2010,40(5):1318-1323.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700