蛋白质网络中相互作用及功能预测算法的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蛋白质与蛋白质间的相互作用预测和蛋白质功能预测是后基因组时代生物学中很重要的研究内容。在蛋白质相互作用网络上,人们对这两个问题已经做了大量的研究工作。
     本文重点研究分析了蛋白质相互作用网络的基本拓扑属性和假阳性数据在相互作用网络上的拓扑分布特征,同时对蛋白质相互作用预测和蛋白质功能预测的研究进展作了简要的介绍。
     基于蛋白质相互作用网络拓扑结构以及假阳性在蛋白质相互作用网络中的分布特征,在相互作用预测算法DC(Defective Clique)的基础上,提出一种新蛋白质相互作用预测算法VTC(Vertex ToClique)。实验结果表明该算法的各项预测性能都要优于DC算法。
     针对在蛋白质相互作用网络中,顶点度的分布服从“幂次定律”而且大部分顶点分布在度较低区域的特点,本文提出了一种基于Google搜索技术的蛋白质功能预测算法。并且根据复杂网络的优先连接原则,删除预测后的功能集中大部分的假阳性,提高了预测的可靠性,并通过实验分析验证了该算法的有效性。
Both of the protein-protein interaction (PPI) prediction and the protein function prediction are the important research works in post-genomic era. Based on the PPI networks, there has been a remarkable line of research in the study of those two problems.
     Firstly, this paper summarized the research progress about the PPI prediction and protein function prediction. Then it analyzed the topological properties of the PPI networks and the distributed law of the false positives (FPs).
     According to the distribution of FPs in the PPI network which seems never considered by previous interaction prediction method, and the topological properties of PPI networks, we proposed an improved interaction prediction algorithm, VTC (Vertex to Clique). Compared with DC (defective clique) algorithm by experiments, VTC algorithm can predict protein interactions with not only higher reliability, but also larger quantity.
     In the PPI networks, the degree distribution follows a power law distribution. Moreover, the degree of most vertices is low. According to those characteristics, we have proposed a novel protein function prediction algorithm based on the Google search engine technology. In our method, we also utilize the preferential attachment criteria to improve the prediction reliability. Applied our method to S.cerevisiae PPI dataset, the experimental results have shown that our method has a high prediction performance.
引文
[1]张阳德.生物信息学.2004,北京:科学出版社.3-15
    [2]郝柏林,张淑誉.生物信息学手册.2000,上海:上海科学技术出版社.30-60
    [3]樊龙江.生物信息学札记.www.cab.zju.edu.cn/cab/xueyuanxiashubumen /nx/bioinplant.htm.2001
    [4]Andreas D.Baxevanis,BFF Ouellerre.生物信息学基因和蛋白质分析的使用指南(李衍达,等译).2000,北京:清华大学出版社.1-35
    [5]Yu HY,Paccanaro A,Trifonov V,et al.Predicting interactions in protein networks by completing defective cliques.Bioinformatics,2006,22(7):823-829
    [6]Fields S,Song OK.A novel genetic system to detect protein-protein interactions.Nature,1989,340(6230):245-246
    [7]Rigaut G,Shevchenko A,Rutz B,et al.A generic protein purification methodforprotein complexcharacterization and proteome exploration.Nat Biotechnol,1999,17(10):1030-1032
    [8]Gavin AC,Krause R,Grandi P,et al.Functional organization of they east proteome by systematic analysis of protein complexes.Nature,2002,415:141-147
    [9]Zhu H,Bilgin M,Bangham R,et al.Global analysis of protein activities using proteome chips.Science,2001,293(5537):2101-2105
    [10]Pellegrini M,Marcotte EM,Thompson MJ,et al.Assigning protein functions by comparative genome analysis:Protein phylogenetic profiles.Proc Natl Acad Sci.1999,96:4285-4288
    [11]Allison FG,Joseph MP,Frankie LP,et al.The Staphylococcus aureus collagen adhesion-encoding gene(cna)is within a discrete genetic element.Gene,1997,196:239-248
    [12]Marcotte EM,Pellegrini M,HO-Leung N,et al.Detecting protein function and protein-protein interactions from genome sequences.Science,1999,285(5428):751-753
    [13]Enright AJ,Iliopoulos I,Kyrpides NC,et al.Protein interaction maps for complete genomes based on gene fusion events.Nature,1999,402:86-90
    [14] Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE. Co-evolution of proteins with their interaction partners. J Mol Biol, 2000, 299(2):283-293
    [15] Pazos F, Valencia A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Proteins Eng, 2001, 14(9):609-614
    [16] Olmea O, Rost B, Valencia A. Effective use of sequence correlation and conservation in fold recognition. J Mol Biol, 1999, 293(5): 1221-1239
    [17] Pazos F, Helmer-Citterich M, Ausiello G, et al. Correlated mutations contain information about protein-protein interaction. J Mol Biol, 1997, 271(4):511-523
    [18] Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 2003,4:2
    [19] King AD, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics, 2004, 20(17):3013-3020
    [20]Paccanaro A, Trifonov V, Yu HY, et al. Inferring protein-protein interactions using interaction network topologies. IEEE IJCNN, 2005, 161-166
    [21] Legrain, P., Wojcik, J. and Gauthier, J.M. Protein-protein interaction maps: A lead towards cellular functions. Trends in Genetics, 2001, 17(6):346-352
    [22] Von Mering C, Krause R, et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 2002,417(6887):399-403
    [23] Sprinzak E, Sattath S, Margalit H. How reliable are Experimental Protein-Protein Interaction Data. Journal of Molecular Biology, 2003, 327:919-923
    [24] Deng M, Mehta S, et al. Inferring domain-domain interactions from protein-protein interactions. Genome Research, 2002, 12(10):1540-1548
    [25] Mitraki A, Barge A, Chroboczek J, et al. Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. European Journal of Biochemistry. 1999,264:610-650
    [26] Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 1995, 247:536-540
    [27] Mewes HW, Heumann K, Kaps A, Mayer K, et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Research, 1999 , 27(1):44-48
    [28] Ashbumer M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics, 2000,25:25-29
    [29] R Sharan, I Ulitsky, R Shamir. Network-based prediction of protein function, Molecular System Biology, 2007,10-13
    [30] Schwikowski B, Uetz P, Field S. A network of protein-protein interactions in yeasts, Nature Biotechnology, 2000,1257-1261
    [31] Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T. Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast, 2001,18:523-531
    [32] Chua HN, Sung WK, Wong L. Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions, Bioinformatics, 2006,1623-1630
    [33] Valerio F. Protein function prediction from interaction networks using a random walk ranking algorithm. IEEE BIBE, 2007,42-48
    [34] Vazquez A, Flammini A, Maritan A, Vespignani A. Global protein function prediction from protein-protein interaction networks. Nat Biotechnol, 2003, 21:697-700
    [35] Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M. Whole proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics, 2005, i302-i310
    [36] Deng M, Zhang K, Mehta S, Chen T, Sun F. Prediction of protein function using protein-protein interaction data. J Comput Biol, 2003,10:947-960
    [37] Letovsky S, Kasif S. Predicting protein function from protein-protein interaction data: a probabilistic approach. Bioinformatics, 2003, i197-i204
    [38] Sharan R, Ideker T, Kelley B, Shamir R, Karp RM. Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data. J Comput Biol, 2005,12: 835-846
    [39] Rives AW, Galitski T. Modular organization of cellular networks. Proc Natl Acad Sci USA, 2003, 100: 1128-1133
    [40] Arnau V, Mars S, Marin I. Iterative cluster analysis of protein interaction data. Bioinformatics, 2005, 21: 364-378
    [41] Spirin V, Mirny LA, Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA, 2003,100: 12123-12128
    [42] Przulj N, Wigle DA, Jurisica I. Functional topology in a network of protein interactions. Bioinformatics, 2004, 20: 340-348
    [43] Jansen R, Lan N, Qian J, Gerstein M. Integration of genomic datasets to predict protein complexes in yeast. J.Struct. Funct. Genomics, 2002, 2:71-81
    [44] Singhal M, Resat H. A domain-based approach to predict protein-protein interactions. BMC Bioinformatics, 2007, 8(13): 199
    [45] Han JD, Dupuy D, Bertin N, et al. Effect of sampling on topology predictions of protein-protein interaction networks. Nat. Biotech, 2005, 23(7):839-844
    [46] Patil A, Nakamura H. Filtering high-throughput protein-protein interaction data using a combination of genomic features. BMC Bioinformatics, 2005, 6:100
    [47] Uetz P, Giot L, Cagney G, Mansfield TA, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 403, 2000, 623-627
    [48] Ito T, Chiba T, Ozawa R, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA, 98, 2001, 4569-457
    [49] Albert R, Albert L, Barabasi A L. Statistical mechanics of complex networks. Review of Modern Physics , 2002 ,74 :47-97
    [50] Newman M EJ. The structure and function of complex networks. SIAM Review, 2003, 45:167-256
    [51] Tsukiyama S, Ide M, Ariyoshi H, et al. A new algorithm for generating all the maximal independent sets. SIAM J. Comput, 1977, 6(3):505-517
    [52] Jansen R, Yu HY, Greenbaum D, et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science, 2003, 302(5644):449-453
    [53] Bader GD, Donaldson I, Wolting C, et al. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res, 31, 2003. 248-250
    [54] Deng MH, Sun FZ, Chen T. Assessment of the reliability of protein-protein interactions and protein function prediction. Proceedings of Pacific Symposium on Biocomputing, 2003, 140-151
    [55] Ackermann K, Waxmann A, Pyerin W, et al. Genes targeted by protein kinase CK2: A genome-wide expression array analysis in yeast. Mol. Cell. Biochem. 227,2001,59-66
    [56] Gavin AC, Aloy P, Grandi P, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440,2005, 631-636
    [57] Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: bringing order to the web, Tech rep Stanford Digital Library Technologies Project, 1998
    [58] Brun C, et al. Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol, 2003
    [59] Murali TM, Wu CJ, Kasif S, The art of gene function prediction. Nature Biotechnology, 2006,1474-1476
    [60] Breitkreutz BJ, et al. The GRID: the general repository for interaction datasets. Genome Biol, 2003,4(3): R23
    [61] Mewes HW, Frishman D, Guldener, et al. Mips: a database for genomes and protein sequences. Nucleic Acid Research, 2002, 31-34
    [62] Hu P, Bader G, Wigle DA, Emili A. Computational prediction of cancer-gene function. Nature Reviews Cancer, 2007, 23-34
    [63] Lu LJ, Xia Y, Paccanaro A. et at. Assessing the limits of genomic data integration for predicting protein networks. Genome Research, 2005, 945-953