基于深度卷积神经网络的无序蛋白质功能模体的识别
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Identifying Molecular Recognition Feature in Disordered Proteins with Deep Convolutional Neural Network
  • 作者:方春 ; 田爱奎 ; 孙福振 ; 李彩虹 ; 朱大铭
  • 英文作者:FANG Chun;TIAN Aikui;SUN Fuzhen;LI Caihong;ZHU Daming;School of Computer Science and Technology,Shandong University of Technology;Shandong Provincial Key Laboratory of Software Engineering,Shandong University;
  • 关键词:深度卷积神经网络 ; 无序蛋白质 ; 序列模式 ; 识别
  • 英文关键词:deep convolutional neural network;;disordered protein;;sequence pattern;;identification
  • 中文刊名:SDJC
  • 英文刊名:Journal of University of Jinan(Science and Technology)
  • 机构:山东理工大学计算机科学与技术学院;山东大学山东省软件工程重点实验室;
  • 出版日期:2018-06-13 16:59
  • 出版单位:济南大学学报(自然科学版)
  • 年:2018
  • 期:v.32;No.136
  • 基金:国家自然科学基金项目(61602280,61473179);; 山东省自然科学基金项目(ZR2014FQ028)
  • 语种:中文;
  • 页:SDJC201804004
  • 页数:6
  • CN:04
  • ISSN:37-1378/N
  • 分类号:23-28
摘要
针对目前实验方法识别天然无序蛋白质中的功能模体耗时费力、难度大,而传统计算机辅助识别方法过于依赖人工挑选特征且准确度低等问题,提出一种利用深度卷积神经网络预测功能模体位置的方法;该方法直接将蛋白质序列作为输入,通过计算对应的位置特异性打分矩阵和3组氨基酸指数特征,将序列映射到数值矩阵中,模型自行抽取特征并自动识别功能模体的隐性序列模式来进行预测。结果表明:当使用相同数据集进行训练和测试时,本文中提出的方法的性能明显优于其他传统的识别算法,在验证集上的感受性曲线下的面积(AUC)值达到0.708,在测试集上的AUC值达到0.760,说明深度卷积神经网络能够有效地识别功能模体的隐性序列模式;该方法也可以用于其他聚集型蛋白质功能位点的识别。
        Aiming at the problem that identifying molecular recognition feature( MoRF) in intrinsic disordered proteins was complicated and difficult,while traditional prediction algorithms generally relied on artificial feature extraction and their accuracy was still low,a novel method based on deep convolution neural network was proposed for identifying MoRF in protein sequence. This method took the protein sequence as input directly,and maped the sequence to a feature matrix by calculating the position-specific scoring matrix of the sequence and three groups of amino acid indexes. The deep learning model extracted features and identified the recessive sequence pattern of MoRF automatically. The experimental results show that,using the same training and testing datasets,the proposed method obviously outperformes other traditional methods,achieving the value of area under curve( AUC) of the receiver operating characteristics 0.708 on the validation dataset and the AUC value 0.760 on the test dataset,which suggests that the deep convolution neural network provides an effective way to improve the MoRFs predication. This method can also be used to identify other aggregated functional sites of proteins.
引文
[1]黄永棋,刘志荣.天然无序蛋白质:序列-结构-功能的新关系[J].物理化学学报,2010,26(8):2061-2072.
    [2]UVERSKY V N.Introduction to intrinsically disordered proteins(IDPs)[J].Chem Rev,2014,114(13):6557-60.
    [3]PETER T.Intrinsically disordered proteins:a 10-year recap[J].Cell,2012,37(12):509-516.
    [4]MOHAN A,CHRISTOPHER J O.Analysis of molecular recognition features(MoRFs)[J].J Mol Biol,2006,362:1043-1059.
    [5]HU G,UVERSKY V N,KURGAN L.Functional analysis of human hub proteins and their interactors involved in the intrinsic disorder-enriched interactions[J].Int J Mol Sci,2017,18(12),2761.
    [6]FATEMEH M D,OLDFIELD C J,MARCIN J M,et al.MoRFpred,a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins[J].Bioinformatics,2012,28(12):i75-i83.
    [7]DOSZTANYI Z,MSZROS S I.ANCHOR:web server for predicting protein binding regions in disordered proteins[J].Bioinformatics,2009,25(20):2745-2746.
    [8]FANG C,NOGUCHI T.MFSPSSMpred:identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation[J].BMC Bioinformatics,2013,14:300.
    [9]XUE B,DUNKER A K,UVERSKY V N.Retro-MoRFs:identifying protein binding sites by normal and reverse alignment and intrinsic disorder prediction[J].Int J Mol Sci,2010,11(10):3725-3747.
    [10]MALHIS N,GSPONERR J.Computational identification of MoRFs in protein sequences[J].Bioinformatics,2015,31(11):1738-1744.
    [11]SHARMA R,BAYARJARGAL M,TSUNODA T,et al.MoRFpred-plus:computational identification of MoRFs in protein sequences using physicochemical properties and HMM profiles[J].J Theor Biol,2018,437:9-16.
    [12]曹赞霞,董川,赵立岭,等.固有无序蛋白质与蛋白质相互作用位点残基特征分析[J].生物化学与生物物理进展,2014,41(5):462-472.
    [13]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
    [14]STEPHEN F A,THOMAS L M,ALEJANDRO A S,et al.Gapped BLAST and PSI-BLAST:a new generation of protein database search programs[J].Nucleic Acids Research,1997,25(17):3389-3402.
    [15]KIDERA A,KONISHI Y,OKA M,et al.Statistical analysis of the physical properties of the 20 naturally occurring amino acids[J].Journal of Protein Chemistry,1985,4(1):23-25.
    [16]ATCHLEY W R,ZHAO J,FERNANDES A D,et al.Solving the protein sequence metric problem[J].Proceedings of the National Academy of Sciences of the United States of America,2005,102(18):6395-6400.
    [17]MEILER J,MLLER M,ZEIDLER A,et al.Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks[J].J Mol Model,2001,7:360-369.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700