详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
Proteins are the primary components of the cellular machinery and it is impossible for body to work without proteins. Nowadays, the prediction of function and principle of proteins is one of the most important topics in the area of life sciences. Many proteins mediate their biological function through protein interactions, and protein interactions are crucial for many aspects of cellular biology. Firstly, genetic interactions often correlate with physical interactions between the corresponding gene products. Secondly, protein interactions are required to tether the components of signal-transduction pathways physically. Thirdly, enzyme-protein substrate interactions are important for catalysis ,and are often found to be more stable than those presumed . Last, protein interactions are crucial for the integrity of multicomponent enzymatic machines such as RNA polymerases and the SPLICEOSOME . Thus, computational prediction of protein interactions has been initiated under the assumption that identification of interaction partners for proteins of unknown function can provide insight into their biological function.
    Here in my work, the positive dataset is downloaded from Saccharomyces cerevisiae core subset of DIP database. Since a noninteracting protein dataset is not readily available, a hypothetical noninteracting protein dataset is generated based on subcellular localization information which is retrieved form MIPS database and consists of protein pairs that do not colocalize together. At first, with the knowledge of the amino acid sequence each protein sequence is converted into a feature vector using CTD encoding approach. A set of SVMs was trained to predict the protein interactions and the prediction accuracy averaged 79% for the ensemble of statistical experiments.After optimizing the set of parameter vectors by different strategies, the predictive accuracy obtain through 5-fold cross-validation tests is 82.43% ,about 5% higher than the literature. Then we predict protein interactions with the other four encoding approachs. All the result are better than the literature.The predictive
[1] 杜荣骞著.生物统计学(第二版).北京:高等教育出版社,2003.4
    [2] Pierre Baldi,Soren Brunak著,张东晖译.Bioinformatics---The Machine Learning Approach(第二版).北京:中信出版社,2003.7
    [3] 陈姗.蛋白质组学研究方法.基础医学,Vol.14 No.1 Feb.2005
    [4] Albertha J. M. Walhout and Marc Vidal. Protein interaction maps for model organisms, Nature Reviews Molecular Cell Biology, 2001, 2(1): 55-63
    [5] 朱新宇,沈百荣.预测蛋白质间相互作用的生物信息学方法.生物技术通讯,Vol.15 No.1 Jan,2004
    [6] Kahn P. From genome to proteome: looking at a cell's proteins. Science, 1995, 270: 369-370
    [7] Tucker CL, Gera JF, Uetz P. Towards an understanding of complex protein networks. Trends Cell Biol, 2001, 11: 102-106
    [8] Siaw Ling Lo, Cong Zhong Cai, Yu Zong Chen, Maxey C.M.Chung. Effect of training datasets on support vector machine prediction of protein-protein interactions Proteomics, 2005, 5, 876-884
    [9] 阎隆飞,孙之荣主编.蛋白质分子结构(第一版).北京:清华大学出版社,1999.5
    [10] Grantham R. Amino acid difference formula to help explain protein evolution. Science 1974: 185: 862-864
    [11] Charton M, Charton BI. The structural dependence of amino acid hydrophobicity parameters. J Theor Biol 1982: 99: 629-644
    [12] Black, S. D., Mould, D. R.. Development of Hydrophobicity Parameters to Analyze Proteins Which Bear post- or cotranslational Modifications. Anal. Biochem. 1991: 193: 72-82
    [13] Jones S, Thornton JM. Principles of protein-protein interactions. Proc Natl Acad Sci USA. 1996 Jan 9; 93(1): 13-20.
    [14] Jones S, Thornton JM. Analysis of protein-protein interaction sites using surface patches. J Mol Biol. 1997 Sep 12; 272(1): 121-32
    [15] 高莹,来鲁华.蛋白质.蛋白质相互作用能够界面统计分析.物理化学学报2004,20(7)676-679
    [16] 张成岗,贺福初编著.生物信息学方法与实践(第一版).北京:科学出版社,2002.6
    [17] 田云,卢向阳.蛋白质问相互作用研究技术进展.生物学通报2003,38(5)
    [18] 张丽苹,霍克克.蛋白质相互作用研究技术进展.高技术通讯2003.11
    [19] 曹建平,马义才,李亦学,石铁流.计算方法在蛋白质相互作用研究中的应用.生命科学,2005,Vol.17(1)
    [20] Albertha J. M. Walhout, Marc Vidal. Protein interaction maps for model organisms. Nature Reviews Molecular Cell Biology, 2001, 2(1): 55-63
    [21] Bartel. P. L., Fields. S. (eds)(1997) The yeast two-hybrid system In Advances in Molecular Biology. Oxford University Press. New York.
    [22] Zozulya S. Mapping signal transduction pathways by phage display. Nature Biotechnol, 1999, 17: 1193-1198
    [23] 高学良,赵群飞.噬菌体展示技术的发展及应用.生命的化学,2001,Vol.(5):432-433
    [24] Williams, C., Addona, T. A.. The integration of SPR biosensors with mass spectrometry: possible applications for proteome analysis. Trends in Biotechnology. 2000, Vol. 18(2): 45-48
    [25] Multhaup Gerd, Strausak, Daniel, Bissig, Karl-Dimiter, Solioz, Marc. Interaction of the Copz Copper Chaperone with the CopA Copper ATPase of Enterococcus hirae assessed by surface plasmon resonance. Biochemical and Biophysical Research Communications. 2001, Vol 288(1): 172-177
    [26] Mochizuki Naoki, Yamashita, Sibgeto, Kurokawa, et al. Spatio-temporal images of growth-factor induced activation of Ras and Rapl. Nature(London, United Kingdom). 2001, Vol 411(6841): 1056-1068
    [27] Christian VM, Roland K, Berend snel, et al.: Comparative assessment of large-scale data sets of protein-protein interaction. Nature, 2002, 417: 399-403
    [28] Faha B, Ewen M. E, Tsai L. H, Harlow E. Interaction between human cyclin A and adenovirus E1A-associated p 107 protein. Science(1992), Vol 255(5040): 87-90
    [29] Schaere Martin T, Kannenberg Kai, Hunziker, Peter et al.Interaction between GABAA receptorβsubunits and the multifunctional protein gC1q-R. Journal of Biological chemistry. 2001, Vol276(28): 26597-26604
    [30] Peter U, Loic G, Gerard C, et al. Acomprehensive analysis of protein-protein interactions in Sacccharomyces cerevisiae. Nature, 2000, 403: 623
    [31] 朱新宇,沈百荣.预测蛋白质相互作用的生物信息学方法.生物技术通讯,2004,Vol.15(1):70-72
    [32] Gaasterland T, Ragan MA. Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb Comp Genomics, 1998, 3(4): 199
    [33] Pellegrini M, Marcotte EM, Thompson MJ, et al. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA, 1999, 96(8): 4285
    [34] Thamames J, Casari G, Ouzounis C, et al. Conserved clusters of functionally related genes in two bacterial genomes.
    [35] Marcotte EM, Pellegrini M, Ho-Leung N, et al. Detecting protein function and protein-protein interactions from genome sequences. Science, 1999, 285: 751
    [36] Enright A J, Iliopoulos I, Kyrpides NC, et al. Protein interaction maps for complete genomes based on gene fusion events. Nature, 1999, 402(6757): 86
    [37] Goh C-S, Bogan AA, Joachimiak M, et al. Co-evolution of proteins with their interaction partners. J Mol Biol, 2000, 299: 283
    [38] Pazos F, Valencia A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng, 2001, 14: 609
    [39] Olmea O, Valencia A. Improving contact predictiongs by the combination of correlated mutations and other sources of sequence information. Fold Des, 1997, 2(3): 325
    [40] Pazos F, Helmer-Citterich M, Ausiello G, et al. Correlated mutations contain information about protein-protein interaction. J Mol biol, 1997, 271 (4): 511
    [41] Walhout AJM, Sordella R, Lu X, et al. Protein interaction mapping in C. elegans using proteins involved invulval development. Science, 2000, 287: 116
    [42] Matthews LR, Vaglio P, Reboul J. et al. Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". Genome Res, 2001, 11(12): 2120
    [43] Fraser HB, Hirsh AE, Steinmetz LM, et al. Evolutionary rate in the protein interaction network. Science, 2002, 296: 750
    [44] Joel R. Bock, David A. Gough. Predicting protein-protein interactions from primary structure. Bioinformaticts, 2001, Vol. 17(5): 455-460
    [45] Aloy P, Russell RB. InterPreT S:protein interaction prediction through tertiary structure. Bioninformatics, 2003, 19(1): 161-162
    [46] http://dip.doe-mbi.ucla.edu/
    [47] http://www.mips.biochem.mpg.de/
    [48] 张学工.关于统计学习理论与支持向量机.自动化学报,2000,Vol.26(1):32-41
    [49] 邓乃扬,田英杰著.数据挖掘中的新方法—支持向量机.北京:科学出版社,2004
    [50] C. C. Chang, C.J. Lin. LIBSVM: A library for Support Vector Machines [software], 2001, www.csie.ntu.edu.tw/~cjlin/libsvm
    [51] Keun-Joon Park and Minoru Kanehisa. Prediction of protein subcellular locations by support vector machines, using compositions of amino acids and amino acid pairs. Bioinformatics, 2003, Vol. 19(13): 1656-1663
    [52] Edgardo A. Ferran, Bernard Pflugfelder, Pascual Ferrara. Self-organized neural maps of human protein sequences. Protein Science, 1994,3:507-521
    [53] Kuo-Chen Chou. Prediction of protein cellular attributes using Pseudo-Amino Acid composition. Protein:Structure, Function, and Genetics. 2001,43:246-255
    [54] Tanford C. Contribution of hydrophobic interactions to the stability of the globular conformation of proteins. J Am Chem Soc, 1962,84:4240-4274
    [55] Hopp TP, Woods KR. Prediction of protein antigenic determinants from amino acid sequeences. Proc Natl Acad Sci USA, 1981,78:3824-3828
    [56] Chao Chen, Xibin Zhou, Yuanxin Tian, Xiaoyong Zou, Peixiang Cai. Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Analytical Biochemistry 2006,357:116-121
    [57] Loredana Lo Conte, Cyrus Chothia ,Joel Janin. The atomic structure of protein-protein recognition sites. J.Mol. Biol. 1999,285:2177-2198

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700