用户名: 密码: 验证码:
基于序列从头预测法的蛋白质相互作用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蛋白质相互作用是细胞大部分功能的基础,直接关系着生物功能的多样性,它有两种主要的形式,包括“物理”上的相互作用和功能上的相互作用,一般的相互作用是指参与同一个代谢途径,具有相似的功能,也就是功能上的相互作用。
     蛋白质组学是在整体水平上研究蛋白质的结构、相互作用和功能的学科。相互作用连接着蛋白质的结构和功能,无疑是研究的热点和焦点。对蛋白质相互作用的研究人们已突破了试验的手段,而采取计算的方法对它作进一步的认证和高通量的预测,包括基于基因组方法、基于进化的方法和基于蛋白质序列的从头预测方法等。研究表明,基于基因组和进化的方法都各有其局限性,如基于基因组的方法需要知道全基因组的信息等。而基于蛋白质从头预测的方法它只需要知道蛋白质序列的主要结构,对于序列的长度等都没有限制,因而具有广泛的应用价值。
     本文利用蛋白质序列从头预测的方法识别相互作用的蛋白质,统计了蛋白质序列的多个特性,如氨基酸的疏水性、蛋白序列的摩尔分子量、极性以及平均隐蔽面积等。并应用BP神经网络和支持向量机(SVM)分类算法对蛋白质相互作用数据集进行了识别与比较。选取MIPS数据库中酿酒酵母(Scerevisiae yeast)相互作用数据集作为我们的标准数据集,其中包括阳性数据集4837对和阴性数据集9674对。实验表明,BP神经网络和SVM都具有较高的准确率,BP神经网络可达到87%以上的正确率并具有较高的敏感性,同时应用SVM的高斯核函数对本数据集也达到了64%以上的正确率,因而都可用于认证和预测由试验手段得到的蛋白质相互作用数据集。
     另外,通过实验的进一步分析,发现基于蛋白质序列从头预测法结合本文所用的分类算法能够有效的识别相互作用的蛋白质对。
Proteins are probably the most important players in a living cell, a lot of functions of cell have been accomplished by protein interactions. There are stranger relationships between function various and protein-protein interactions, it has two mainly form, including“physical”interactions and function interactions. In general, interaction proteins participate in the same metabolic pathway, and executive same functions, in other words, interaction protein is function interactions.
     Proteomics is the systematic study of the structure, interactions and functions of protein. It is obviously that protein interaction is the most hot spot in proteomics. The experimental techniques for finding protein-protein interactions have several limitations which stimulated the research in computational way of predicting the interactions. It mainly includes genome, evolution information and based on primary structure of protein. But some of them have many limitations, for instance, the method of genome needs full genome information. However, the approach of protein primary structure, only requires the primary structure of protein, it has no limitations for sequence length and has great application.
     In this paper, we employ primary structure of protein to predict protein-protein interactions. The statistical method is used to generate sequence features, which are then normalized for satisfying experiments. Few features are calculated for each protein. It involves hydrophobility, molecular weight, polarity and average area buried. And BP neural network、SVM are used to classify two kinds of protein. We used the Scerevisiae yeast dataset to verify the predictive ability of our method, which including 4837 of interaction protein pairs and 9674 of non-interaction protein pairs. Achieving above 87% accuracy rates using 10-fold cross-validation based on BP neural network, and above 64% accuracy rates using SVM.
     In additional, the experiments manifest that our methods have a good ability to identify and predict interaction protein pairs.
引文
[1] Akhilesh Pandey, Matthias Mann, Proteomics to study genes and genomes [J], Nature | VOL 405 | 15 JUNE 2000.
    [2] J.D. Han, N. Bertin, T. Hao, D.S. Goldberg, G.F. Berriz, L.V. Zhang, D. Dupuy, A.J. Walhout, M.E. Cusick, F.P. Roth, M. Vidal, Evidence for dynamically organized modularity in the yeast protein-protein interaction network [J], Nature 430 (2004) 88-93.
    [3] Jordi Espadaler, Oriol Romero-Isart, et al, Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships [J], BIOINFORMATICS, 2005.
    [4] Matteo Pellegrini, Edward M. Marcotte, et al, Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles [J], Biochemistry Vol.96 (1999) 4285-4288.
    [5] Enright,A.J., Ililopoulos,I., Kyrpides,N,C. and Ouzounis,C.A. Protein interaction maps for complete genomes based on gene fusion events [J]. Nature 402 (1999) 86-90.
    [6] J.Tamamers, G.Casari, C.Ouzounis, A.Valencia, Conserved clusters of functionally related genes in two bacterial genomes [J], J.Mol.Evol.44 (1997) 66-73.
    [7] Florencio Pazos and Alfonso Valencia, Similarity of phylogenetic trees as indicatior of protein-protein interaction [J], Protein Engineering Vol.14 (2001) 609-614.
    [8] D.R. Westhead, J.H. Parish & R,M. Twyman, Bioinformatics [M], BIOS Scientific Publishers Limited, 2002.
    [9] 朱新宇等, 预测蛋白质间相互作用的生物信息学方法 [J], 生物技术通报(综述), 2004.
    [10] J. Janin, B. Seraphin, Genome-wide studies of protein-protein interaction [J], Curr Opin Struct Biol 13 (2003) 383-388.
    [11] Min Su Lee, et al; A Protein Interaction Verification System Based on a Neural Network Algorithm [C], IEEE.
    [12] Jason McDermott, Roger Bumgarner and Ram Samudrala, Functional annotation from predicted protein interaction networks [J], Vol.21 no.152005, pages 3217-3226/bioinformatics.
    [13] Y. Ho, A. Gruhler, A. Heilbut, G.D. Bader, L. Moore, S.L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S. Schandorff, J. Shewnarane, M. Vo, J. Taggart, M. Goudreault, B. Muskat, C. Alfarano, D. Dewar, Z. Lin, K. Michalickova, A.R. Willems, H. Sassi, P.A. Nielsen, K.J. Rasmussen, J.R. Andersen, L.E. Johansen, L.H. Hansen, H. Jespersen, A. Podtelejnikov, E. Nielsen, J. Crawford, V. Poulsen, B.D. Sorensen, J. Matthiesen, R.C. Hendrickson, F. Gleeson, T. Pawson, M.F. Moran, D. Durocher, M. Mann, C.W. Hogue, D. Figeys, M. Tyers, Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry [J], Nature 415 (2002) 180-183.
    [14] S. Fields, O. Song, A novel genetic system to detect protein-protein interactions [J], Nature 340 (1989) 245-246.
    [15] G. Rigaut, A. Shevchenko, B. Rutz, M. Wilm, M. Mann, B. Seraphin, A generic protein purification method for protein complex characterization and proteome exploration [J], Nat Biotechnol 17 (1999) 1030-1032.
    [16] 洪奇华,陈安国,串联亲和纯化技术及其应用 [J],细胞生物学杂志.
    [17] 黄啸,蛋白质芯片的研究与应用 [J],临沂师范学院学报,29卷第3期,2007.
    [18] J. Ptacek, G. Devgan, G. Michaud, H. Zhu, X. Zhu, J. Fasolo, H. Guo, G. Jona, A. Breitkreutz, R. Sopko, R.R. McCartney, M.C. Schmidt, N. Rachidi, S.J. Lee, A.S. Mah, L. Meng, M.J. Stark, D.F. Stern, C. De Virgilio, M. Tyers, B. Andrews, M. Gerstein, B. Schweitzer, P.F. Predki, M. Snyder, Global analysis of protein phosphorylation in yeast [J], Nature 438 (2005) 679-684.
    [19] Joel R. Bock and David A. Gough,Predicting protein–protein interactions from primary structure [J], Bioinformatics Vol. 17 no. 5 2001.
    [20] Alfonso Valencia et, al, Computational methods for the prediction of protein interactions [J], Current Opinion in Structural Biology 2002, 12:368–373.
    [21] M. Pellegrini, E.M. Marcotte, M.J. Thompson, D. Eisenberg, T.O. Yeates, Assigning protein functions by comparative genome analysis: protein phylogenetic profiles [J], Proc Natl Acad Sci U S A 96 (1999) 4285-4288.
    [22] E.M. Marcotte, M. Pellegrini, H.L. Ng, D.W. Rice, T.O. Yeates, D. Eisenberg, Detecting protein function and protein-protein interactions from genome sequences [J], Science 285 (1999) 751-753.
    [23] A.J. Enright, I. Iliopoulos, N.C. Kyrpides, C.A. Ouzounis, Protein interaction maps for complete genomes based on gene fusion events [J], Nature 402 (1999) 86-90.
    [24] 孙啸,陆祖洪等,生物信息学基础 [M],清华大学出版社.
    [25] 罗静初等,生物信息学概论 [M],北京大学出版社.
    [26] I. Xenarios, D.W. Rice, et al., DIP: the database of interacting proteins [J], Nucleic Acids Res 28 (2000) 289-291
    [27] C.Alfarano, C.E. Andrade, K.Anthony, N.Bahroos, et al., The Biomolecular Interaction Network Database and related tools 2005 update [J], Nucleic Acids Res 33 (2005) D418-424.
    [28] B. Snel, G. Lehmann, P. Bork, M.A. Huynen, STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene [J], Nucleic Acids Res 28 (2000) 3442-3444.
    [29] Yvan Saeys, Inaki Inza and Pedro Larranaga, A review of feature selection techniques in bioinformatics [J], BIOINFORMATICS, 2007.
    [30] 李衍达,孙之荣等译,基因和蛋白质分析实用指南(中译本) [M],2004.
    [31] Jean-Michel Claverie and Cedric Notredame, Bioinformatics [P].
    [32] 金丽琴,生物化学 [M],浙江大学出版社,2007.
    [33] Shawn Martin, Diana Rose, et al., Predicting protein-protein interactions using signature products [J], BIOINFORMATICS 2005.
    [34] Asa Ben-Hur and William Stafford Noble, Kernel methods for predicting protein–protein Interactions [J], BIOINFORMATICS, 2005.
    [35] Juwen Shen, Jing Zhang, Xiaomin Luo, et al, Predicting protein-protein interactions based only on sequences information [J], PNAS (2007) 4337-4341.
    [36] Jung-Ying Wang, Application of Support Vector Machines in Bioinformatics [D], A dissertation submitted in partial fulfillment of the requirements for the degree of Master of Science, 2002.
    [37] Richard O.Duda, Peter E.Hart and David G.Stork, Pattern Classifiction [M], John Wiley & Sons, Inc. 2001.
    [38] 许建华,张学工,李衍达,一种基于核函数的非线性感知器算法,计算机学报,第25卷第7期 2002年7月.
    [39] Christina Leslie, Eleazar Eskin, et al, Mismatch string kernels for discriminative protein classification [J], BIOINFORMATICS, 2004.
    [40] 王伟,人工神经网络原理—入门与应用 [M],北京航空航天大学出版社.
    [41] 樊龙江,生物信息学手册 [M],浙江大学出版社.
    [42] Arunkumar Chinnasmamy, Ankush Mittal, Wing-Kin Sung, Probabilistic prediction of protein-protein interactions from the protein sequences [J], Computers in Biology and Medicine 36 (2006) 1143-1154.
    http://dip.doe-mbi.ucla.edu/ 蛋白质相互作用数据库
    http://mips.gsf.de/proj/funcatDB 莫尼黑生物信息学中心
    http://bond.unleashedinformatics.com/Action 生物分子互作网络数据库
    http://string.embl.de/ string 数据库
    http://www.expasy.ch/cgi-bin/protparam1 expasy数据库
    http://ftpmips.gsf.de/yeast/PPI 蛋白质相互作用数据下载
    http://dip.doe-mbi.ucla.edu/dip/Download.cgi?SM=7 蛋白质相互作用数据下载

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700