缺失数据下蛋白质多结构叠加的迭代方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Iterative Method for Multiple Structure Superposition of Proteins with Missing Data
  • 作者:路建波 ; 高华方 ; 张世华 ; 卢本卓 ; 马旭
  • 英文作者:LU Jian-Bo;GAO Hua-Fang;ZHANG Shi-Hua;LU Ben-Zhuo;MA Xu;Human Genetics Resource Center,National Research Institute for Family Planning;Institute of Applied Mathematics,Academy of Mathematics and System Sciences,Chinese Academy of Sciences;LSEC,Institute of Computational Mathematics,Academy of Mathematics and System Sciences,Chinese Academy of Sciences;
  • 关键词:蛋白质结构叠加 ; 蛋白质结构比对 ; 迭代算法 ; 缺失数据
  • 英文关键词:structure superposition;;structure alignment;;iterative algorithm;;missing data
  • 中文刊名:SWHZ
  • 英文刊名:Chinese Journal of Biochemistry and Molecular Biology
  • 机构:国家卫生计生委科学技术研究所人类遗传资源中心;中国科学院数学与系统科学研究院应用数学研究所;中国科学院数学与系统科学研究院计算数学与科学工程计算研究所;
  • 出版日期:2017-06-20
  • 出版单位:中国生物化学与分子生物学报
  • 年:2017
  • 期:v.33
  • 基金:国家重点研究发展计划(No.2016YFC1000307和No.2016YFB0201304);; 国家自然科学基金(No.21573274);; 国家重点研究发展计划子课题(No.2016YFC1000307-10);; 国家卫生计生委科学技术研究所科技创新基金面上项目(No.2017GJM04)资助~~
  • 语种:中文;
  • 页:SWHZ201706014
  • 页数:8
  • CN:06
  • ISSN:11-3870/Q
  • 分类号:107-114
摘要
蛋白质三维结构叠加面临的主要问题是,参与叠加的目标蛋白质的氨基酸残基存在某些缺失,但是多结构叠加方法却大多数需要完整的氨基酸序列,而目前通用的方法是直接删去缺失的氨基酸序列,导致叠加结果不准确。由于同源蛋白质间结构的相似性,因此,一个蛋白质结构中缺失的某个区域,可能存在于另一个同源蛋白质结构中。基于此,本文提出一种新的、简单、有效的缺失数据下的蛋白质结构叠加方法(ITEMDM)。该方法采用缺失数据的迭代思想计算蛋白质的结构叠加,采用优化的最小二乘算法结合矩阵SVD分解方法,求旋转矩阵和平移向量。用该方法成功叠加了细胞色素C家族的蛋白质和标准Fischer's数据库的蛋白质(67对蛋白质),并且与其他方法进行了比较。数值实验表明,本算法有如下优点:(1)与THESEUS算法相比较,运行时间快,迭代次数少;(2)与PSSM算法相比较,结果准确,运算时间少。结果表明,该方法可以更好地叠加缺失数据的蛋白质三维结构。
        The main problem in three-dimensional protein superposition is that some amino acid residues are missing in the superimposed target protein structures.However,most multiple structure superposition methods require the complete amino acid sequence.Current superposition methods deal with this problem usually by excluding amino acid sequence from the proteins,which leads to inaccurate results.Due to the similarity of the homologous protein structures,one structure of a protein may omit a region that is present in another structure of the same protein.In this paper,we propose a noval,simple and effective method(ITEMDM) for superpositioning multiple proteins with missing data.This method uses the idea of the iterative of missing data to compute the protein superposition problem.The rotation matrix and the translation vector are obtained by using the optimized least squares algorithm combined with matrix SVDdecomposition method.We successfully superimpose the cytochrome C family and the standard Fischer's database(67 pairs of proteins) by using ITEMDM method,and compare them with other methods.Numerical experiments show that our algorithm has the following advantages:1) The operation time is faster and the iterations' number is smaller when compared with the THESEUS algorithm.2) The result is more accurate and the operation time is smaller than PSSM algorithm.The results show that ITEMDM can superimpose the three-dimensional structures of the protein with missing data.
引文
[1]Theobald DL,Steindel P A.Optimal simultaneous superpositioning of multiple structures with missing data[J].Bioinformatics,2012,28(15):1972-1979
    [2]Flower DR.Rotational superposition:a review of methods[J].J Mol Graph Model,1999,17(3-4):238-244
    [3]Diamond R.On the comparison of conformations using linear and quadratic transformations[J].Acta Crystallogr,1976,A32:1-10
    [4]Irving JA,Whisstock JC,Lesk AM.Protein structural alignments and functional genomics[J].Proteins,2001,42(3):378-382
    [5]Edgar RC,Batzoglou S.Multiple sequence alignment[J].Curr Opin Struct Biol,2006,16(3):368-373
    [6]Dunbrack RL Jr.Sequence comparison and protein structure prediction[J].Curr Opin Struct Biol,2006,16(3):374-384
    [7]Panchenko A,Marchler-Bauer A,Bryant SH.Threading with explicit models for evolutionary conservation of structure and sequence[J].Proteins,1999,Suppl 3:133-140
    [8]Guda C,Lu S,Scheeff ED,et al.CE-MC:a multiple protein structure alignment server[J].Nucleic Acids Res,2004,32(Web Server Issue):W100-103
    [9]Menke M,Berger B,Cowen L.Matt:Local flexibility aids protein multiple structure alignment[J].PLo S Comput Biol,2008,4(1):e10
    [10]Leibowitz N,Nussinov R,Wolfson HJ.MUSTA-a general,efficient,automated method for multiple structure alignment and detection of common motifs:application to proteins[J].J Comput Biol,2001,8(2):93-121
    [11]Dror O,Benyamini H,Nussinov R,et al.Multiple structural alignment by secondary structures:algorithm and applications[J].Protein Sci,2003,12(11):2492-2507
    [12]Ye Y,Godzik A.Multiple flexible structure alignment using partial order graphs[J].Bioinformatics,2005,21(10):2362-2369
    [13]Birzele F,Gewehr JE,Csaba G,et al.Vorolign―fast structural alignment using Voronoi contacts[J].Bioinformatics,2007,23(2):e205-211
    [14]Diamond R.On the multiple simultaneous superposition of molecular structures by rigid body transformations[J].Protein Sci,1992,1(10):1279-1287
    [15]Flower DR.Rotational superposition:a review of methods[J].J Mol Graph Model,1999,17(3-4):238-244
    [16]Theobald DL,Wuttke DS.Empirical Bayes hierarchical models for regularizing maximum likelihood estimation in the matrix Gaussian Procrustes problem[J].Proc Natl Acad Sci U S A,2006,103(49):18521-18527
    [17]Dempster AP,Laird NM,Rubin DB.Maximum likelihood from incomplete data via the EM algorithm[J].J Roy Stat Soc,1977,39(1):1-38
    [18]Lu J,Xu G,Zhang S,et al.An effective sequence-alignmentfree superpositioning of pairwise or multiple structures with missing data[J].Algorithms Mol Biol,2016,11:18
    [19]Goldstein H.Classical Mechanics[M].Boston:AddisonWesley,1965
    [20]Evans DJ.On the representation of orientation space[J].Mol Phys,1977,34(2):317-325
    [21]Gene H,Golub Charles F.Van Loan.Matrix Computations(Johns Hopkins Studies in Mathematical Sciences),3rd Edition[M].Baltimore:The Johns Hopkins University Press,2011
    [22]Katoh K,Kuma K,Toh H,et al.MAFFT version 5:improvement in accuracy of multiple sequence alignment[J].Nucleic Acids Res,2005,33(2):511-518
    [23]Larkin MA,Blackshields G,Brown NP,et al.Clustal W and Clustal X version 2.0[J].Bioinformatics,2007,23(21):2947-2948
    [24]Corpet F.Multiple sequence alignment with hierarchical clustering[J].Nucleic Acids Res,1988,16(22):10881-10890
    [25]Notredame C,Higgins DG,Heringa J.T-Coffee:A novel method for fast and accurate multiple sequence alignment[J].J Mol Biol,2000,302(1):205-217

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700