面向同源蛋白质探测的一种新型混合深度学习模型
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A New Hybrid Deep Learning Model for Homologous Protein Detection
  • 作者:张茜 ; 孙一佳 ; 白琳 ; 李陶深
  • 英文作者:ZHANG Qian;SUN Yijia;BAI Lin;LI Taoshen;The First Affiliated Hospital of Guangxi Medical University;School of Computer,Electronics and Information,Guangxi University;Guangxi Colleges and Universities Key Laboratory of Parallel and Distributed Computing Technology;
  • 关键词:混合深度学习 ; 同源蛋白质 ; 深度卷积神经网络 ; 蛋白质特征提取 ; 深度学习模型 ; 机器学习算法
  • 英文关键词:hybrid deep learning;;homologous proteins;;deep convolution neural network;;protein feature learning;;deep learning model;;machine learning algorithm
  • 中文刊名:GXKK
  • 英文刊名:Guangxi Sciences
  • 机构:广西医科大学第一附属医院;广西大学计算机与电子信息学院;广西高校并行与分布式计算技术重点实验室;
  • 出版日期:2019-06-21 11:38
  • 出版单位:广西科学
  • 年:2019
  • 期:v.26;No.113
  • 基金:广西自然科学基金项目(2018GXNSFAA138085)资助
  • 语种:中文;
  • 页:GXKK201903004
  • 页数:8
  • CN:03
  • ISSN:45-1206/G3
  • 分类号:35-42
摘要
根据蛋白质氨基酸链探测其同源蛋白质,进而预测蛋白质的功能,是生物信息学研究领域的一个重要挑战,也是众多生物医学研究领域的基础研究内容,有着重要的科研价值和广泛的应用需求。其研究难点在于:(1)如何学习对同源蛋白质预测有效、有用的蛋白质特征信息;(2)如何更好地运用蛋白质特征信息,实现同源蛋白质的探测与识别。为了解决同源蛋白质探测与识别研究中的关键难点,本文提出一种基于混合深度学习架构的同源蛋白质探测与识别模型(HDLM-PHP)。通过采用统一的"管道式"深度学习架构,将蛋白质特征学习和探测识别统一为一个整体,提高同源蛋白质探测与识别的效能。采用多组并行的深度卷积神经网络,学习蛋白质的各种属性信息,以期获得丰富的待检测蛋白质和靶蛋白质的高级相关性特征,并通过全连接方式使用多层RBM结构融合和精炼这些相关性特征为全局相关性特征。通过统一的深度网络连接方式,以探测和识别任务为导向,学习到对于同源蛋白质预测最有效、最全面的蛋白质特征信息。在标准数据集SCOPe上,对所提模型进行性能与效率评测,结果表明:本文提出的模型能有效地学习到符合任务导向的蛋白质特征数据,提升同源蛋白质探测与识别的准确度和召回率,优于现有的模型和算法。
        It is an important challenge in the field of bioinformatics research to detect its homologous proteins based on protein amino acid chains and to predict the function of proteins. It is also a basic research content in many biomedical research fields with important scientific research value and extensive application requirements. The research difficulties are how to learn effective and useful protein feature information for homologous protein prediction and how to better use protein feature information to achieve detection and recognition of homologous proteins. In order to solve the key difficulties in the research of homologous protein detection and recognition,this paper proposed a homologous protein detection and recognition model based on hybrid deep learning architecture( HDLM-PHP). By using a unified " pipelined" deep learning architecture,protein feature learning and detection and recognition were unified into a single entity to improve the efficiency of homologous protein detection and recognition. The model used multiple sets of parallel deep convolutional neural networks to learn various attribute information of proteins and to obtain rich and advanced correlation features between the protein to be detected and the target protein. The multi-layer RBM structure through full connection was used to fuse and refine these correlation features into global correlation features. Through a unified deep network connection,the most effective and comprehensive protein feature information for homologous protein prediction was achieved,which guided by detection and recognition tasks. On the standard dataset SCOPe,performance and efficiency evaluation of the proposed model was performed. The experimental results show that the proposed model can effectively learn the task-oriented protein characteristic data and improve the accuracy and recall rate of homologous protein detection and recognition. The performance of this model is superior to existing models and algorithms.
引文
[1]MáRQUEZ-CHAMORRO A E,ASENCIO-CORTéS G,SANTIESTEBAN-TOCA C E,et al.Soft computing methods for the prediction of protein tertiary structures:A survey[J].Applied Soft Computing,2015,35:398-410.
    [2]KC D B.Recent advances in sequence-based protein structure prediction[J].Briefings in Bioinformatics,2016,18(6):1021-1032.
    [3]UPADHYAY V P,PANWAR S,MERUGU R.Protein sequence structure prediction using artificial intelligent techniques[C]//Proceedings of the International Conference on Advances in Information Communication Technology&Computing.Bikaner,India:ACM,2016.
    [4]ROY A,KUCUKURAL A,ZHANG Y.I-TASSER:A unified platform for automated protein structure and function prediction[J].Nature Protocols,2010,5(4):725-738.
    [5]YANG J,ZHANG W,HE B,et al.Template-based protein structure prediction in CASP11 and retrospect of I-TASSERin the last decade[J].Proteins,2016,84(Suppl 1):233-246.
    [6]BAI L,YANG L.A unified deep learning model for protein structure prediction[C]//2017 3rd IEEE International Conference on Cybernetics(CYBCONF).Exeter,UK:IEEE,2017:1-6.
    [7]YANG L,LIN B,PAN J,et al.Indirect method-potential theory in the harmonic transformation model[C]//20173rd IEEE International Conference on Cybernetics(CYB-CONF).Exeter,UK:IEEE,2017:1-6.
    [8]BAI L,CHEN Q.Visual phrase recognition by modeling 3Dspatial context of multiple objects[J].Neurocomputing,2017,253(C):183-192.
    [9]REMMERT M,BIEGERT A,HAUSER A,et al.HHblits:Lightning-fast iterative protein sequence searching by HMM-HMM alignment[J].Nature Methods,2012,9(2):173-175.
    [10]LIN Z,LANCHANTIN J,QI Y.MUST-CNN:A multilayer shift-and-stitch deep convolutional architecture for sequence-based protein structure prediction[C]//AAAI'16Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.Phoenix,Arizona:AAAI Press,2016:27-34.
    [11]ESWAR N,ERAMIAN D,WEBB B,et al.Protein structure modeling with MODELLER[M]//Methods in Molecular Biology.Switzerland AG:Springer Nature,2008,426:145-159.
    [12]WU S,SZILAGYI A,ZHANG Y.Improving protein structure prediction using multiple sequence-based contact predictions[J].Structure,2011,19(8):1182-1191.
    [13]ZAHIRI J,YAGHOUBI O,MOHAMMAD-NOORIM,et al.PPIevo:Protein-protein interaction prediction from PSSM based evolutionary information[J].Genomics,2013,102(4):237-242.
    [14]HE Y,RACKOVSKY S,YIN Y,et al.Alternative approach to protein structure prediction based on sequential similarity of physical properties[J].PNAS,2015,112(16):5029-5032.
    [15]BLASZCZYK M,JAMROZ M,KMIECIK S,et al.CABS-fold:Server for the de novo and consensus-based prediction of protein structure[J].Nucleic Acids Research,2013,41(Web Server issu):W406-W411.
    [16]EICKHOLT J,CHENG J.Predicting protein residue-residue contacts using deep networks and boosting[J].Bioinformatics,2012,28(23):3066-3072.
    [17]DI L P,NAGATA K,BALDI P.Deep architectures for protein contact map prediction[J].Bioinformatics,2012,28(19):2449-2457.
    [18]BAI L,LI K.Predicting image caption by a unified hierarchical model[C]//2015 IEEE International Conference on Multimedia and Expo(ICME).Turin,Italy:IEEE,2015:1-6.
    [19]BAI L,LI K,PEI J,et al.Main objects interaction activity recognition in real images[J].Neural Computing and Applications,2016,27(2):335-348.
    [20]KARIM R,AZIZ M M A,SHATABDA S,et al.A novel and effective scoring scheme for structure classification and pairwise similarity measurement[J].ar Xiv preprint ar Xiv:1610.01052,2016.
    [21]YANG Y,ZHAN J,ZHAO H,et al.A new size‐independent score for pairwise protein structure alignment and its application to structure classification and nucleic‐acid binding prediction[J].Proteins:Structure Function and Bioinformatics,2012,80(8):2080-2088.
    [22]ZHANG L,BAILEY J,KONAGURTHU A S,et al.A fast indexing approach for protein structure comparison[J].BMC Bioinformatics,2010,11(1):S46.
    [23]FOX N K,BRENNER S E,CHANDONIA J M.SCOPe:Structural classification of proteins-extended,integrating SCOP and ASTRAL data and classification of new structures[J].Nucleic Acids Research,2013,42(D1):D304-D309.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.