复杂生物网络聚类分析方法
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
基因组学和高通量技术提供了大量生命系统组成元件(如蛋白质)以及它们之间相互关系的数据,由这些关系数据构成的复杂生物网络蕴含着丰富的生命系统运行机制的知识,挖掘这些隐蔽的知识成为后基因组时代的主要任务之一。作为知识发现重要手段的图聚类方法,在复杂生物网络分析上受到了普遍关注。
     本文开发了高性能的图聚类算法CD(contraction-dilation),以复杂生物网络为研究对象,研究了基于蛋白质相似性网络的远同源性探测,分析了从蛋白质相互作用网络中探测到的功能模块。为了对图聚类算法进行评价,提出了一种融入研究对象重要特征的接近现实的网络模型。主要研究内容包括:
     (1)针对模块性(modularity)这个描述复杂网络集团结构强弱的全局指标,提出了一种简单而高效的基于模块性优化的图聚类算法CD。将此算法应用于计算机的生成基准测试网络,生物网络和社会网络,并将性能表现与已有的同是基于模块性优化的聚类算法进行了比较分析,实验结果表明CD算法能在较短的CPU时间内找到较大数值的模块性,探测到稳定且较精确的集团结构,且对内存需求非常低,可用于分析大型网络。该方法是本文后面分析的基础。
     (2)远同源蛋白间的序列相似性很低,处于随机涨落区域边缘(twilight zone),很难区分通过比对获得的序列特征是进化过程中功能约束还是随机突变导致的结果。为了从具有高度噪音的比对分数中提取关于同源性的微弱信号,本文将图聚类算法CD应用于蛋白质相似性网络来探测远同源性。将序列间的Smith-Waterman分数通过S型曲线变换作为蛋白质关联图的连接权重,然后使用CD算法对此关联图进行聚类分析。将该方法应用于蛋白质结构分类数据库SCOP超家族层次的几个数据集,且与谱聚类方法和MCL方法进行比较,实验结果表明,该方法能较好地探测蛋白质远同源性,因为输出的集团在很大程度上对应着蛋白质超家族;该方法输出的集团数目接近数据集中超家族的个数;该方法得到的结果明显优于其他方法。研究结果表明,序列相似性确实携带了关于远同源性的显著信息,使用CD算法挖掘这些信息能够较准确地探测蛋白质远同源性
     (3)蛋白质相互作用网络是典型的复杂生物网络,它的节点多且连接分布不均匀,不易被划分为有统计意义的集团。为此,我们从文献提供的数据集中挑选具有中度和高度置信度的酵母蛋白质相互作用数据来构建酵母蛋白质相互作用网络,采用CD算法来将此网络划分为集团。对于得到的集团,我们从拓扑结构的角度以及生物学意义的角度进行了分析。实验结果表明,通过CD算法得到的集团是内部连接紧密的子图,且MIPS数据库的ComplexCat中已知的蛋白质复合体很大程度上包含于这些集团中,且有许多蛋白质复合体完全包含于这些集团中。此外,我们使用超几何聚集分布的P值来分析一个集团对某个特定功能的富集程度,将最小的P值对应的功能作为该集团的主要功能。分析集团中成员的功能发现,集团中大部分的蛋白质具有相同的功能,且与集团的主要功能相一致。
     (4)在通常情况下,我们缺乏对现实网络背景知识的了解。为了评价图聚类算法在现实网络上的性能表现,本文构建了一种接近现实的网络模型(near-realistic model,简称NR模型),通过算法在模型网络上的性能表现来推断其分析现实网络的能力。为了确保此推断的合理性,构建的模型网络具有与所研究网络完全相同的一阶统计特征,即模型网络中每个节点的度与所研究网络中相应节点的度完全一致。同时,构建的模型网络可具有任意设定的集团结构,这就相当于给定了背景知识,即真实的分类信息是已知的。构建的NR模型为客观评价图聚类算法提供了一条途径。
Genomics and high-throughput technologies have yielded large amount of relational data about the components (like proteins) of living systems; such data form various complex biological networks that carry rich knowledge about the systems’mechanisms. In the post-genome era, a current challenge is to mine the hidden knowledge stored in the networks. As an important means for knowledge discovery, graph clustering attracts much attention in the analysis of complex biological networks.
     We develop a high-performance graph clustering algorithm CD (contraction-dilation) in this paper, investigate the detection of protein remote homology based on protein similarity network and analyze functional modules in protein-protein interaction network. To evaluate graph clustering algorithms, a near-realistic network model with key characteristic of the research subject is proposed. The main contents include:
     A novel graph clustering algorithm CD based on modularity optimization is proposed. The modularity is a global index describes the strength of community structures of complex networks. The proposed algorithm is extensively tested on computer-generated benchmark networks, biological networks and social networks. Compared with existing algorithms, CD algorithm is efficient to discover more stable community structures with higher modularity scores and accuracies at lower expenses of both CPU time and memory. Thus, it can be applied to analyze large networks. CD algorithm is the basis for this paper.
     Remote homologues are hidden in the twilight zone, where their sequence similarity is too low to distinguish them from pairs of sequences with equal or even higher sequence similarity due to chance. To extract weak signals about homology from sequence similarity with high noises, we apply CD algorithm to a protein similarity network to detect remote homology. The similarity matrix is formed by the sigmoidal transformation of the Smith-Waterman alignment scores, and then fed into the CD algorithm. The method is tested on several datasets from superfamily level of SCOP database. Numerical experiments show that CD algorithm manages to detect protein remote homology, for the communities it outputs are largely corresponding to protein superfamilies and the number of communities is close to the number of superfamilies in the given dataset. Also, the performance of CD algorithm is superior to spectral clustering algorithm and MCL algorithm. The results show that sequence similarity carry significant information on remote homology, which can be mined by using CD algorithm.
     Protein-protein interaction network is a typical complex biological network with many nodes and unevenly distributed links, thus is not easily divided into communities with statistically significant. We use CD algorithm to partition a yeast protein-protein interaction network, which is constructed by selecting yeast protein interactions with medium and high confidence from the literature. For the communities outputs by CD algorithm, we first assess the validity from the topology perspective and find that they are densely connected local subgraphs; then we find known protein complexes annotated by MIPS database are largely contained in them in their entirety. This indicates that these communities may have biological significance. Moreover, we annotate these communities based on P-values. And we investigate the functions of individual proteins in communities and find that most of them usually share common functions, which are in accordance with the main functions of communities.
     In most cases, we know little background knowledge of real networks. To evaluate the performance of graph clustering algorithms on real networks, a near-realistic network model (NR model) is constructed. Inferring the ability of a graph clustering algorithm to analyze real networks through its performance on model networks. To ensure rationality of this inference, the model network constructed has the same first-order statistical properties as the network under investigated, that is, degree of every node in the model network is in accordance with the corresponding node in the network under investigated. Meanwhile, the model network constructed can have any given community structure. It is equivalent to give the background knowledge, that is, the real classification information is known. NR model constructed here provides a way to evaluate graph clustering algorithms objectively.
引文
1. Watson J D, Crick F H. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid[J]. Nature, 1953, 171(4356): 737-738.
    2.顾燕诒.分子生物学的产生.世界科学, 1981(02): 14-18.
    3.沃森J D,贝克T A,贝尔S P等.基因的分子生物学[M].杨焕明译.北京:科学出版社,1982.
    4. Alon U. Biological networks: the tinkerer as an engineer[J]. Science, 2003, 301(5641): 1866-1867.
    5. Hartwell L H, Hopfield J J, Leibler S, et al. From molecular to modular cell biology[J]. Nature, 1999, 402(6761 Suppl): C47-C52.
    6. Hasty J, McMillen D, Collins J J. Engineered gene circuits[J]. Nature, 2002, 420(6912): 224-230.
    7. Kitano H. Computational systems biology[J]. Nature, 2002, 420(6912): 206-210.
    8. Koonin E V, Wolf Y I, Karev G. P. The structure of the protein universe and genome evolution[J]. Nature, 2002, 420(6912): 218-223.
    9. Oltvai Z N, Barabasi A L. Systems biology: Life's complexity pyramid[J]. Science, 2002, 298(5594): 763-764.
    10. Bray D. Molecular networks: the top-down view[J]. Science, 2003, 301(5641): 1864-1865.
    11. Barabasi A L, Oltvai Z N. Network biology: understanding the cell's functional organization[J]. Nat Rev Genet, 2004, 5(2): 101-113.
    12. Bader G. D, Betel, D, Hogue C W. BIND: the Biomolecular Interaction Network Database[J]. Nucleic Acids Res, 2003, 31(1): 248-250.
    13. Orengo C A, Michie A D, Jones S, et al. CATH--a hierarchic classification of protein domain structures[J]. Structure, 1997, 5(8): 1093-1108.
    14. Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000[J]. Nucleic Acids Res, 2000, 28(1): 45-48.
    15. Kanehisa M. The KEGG database[J]. Novartis Found Symp, 2002, 247: 91-101.
    16. Watts D J, Strogatz S H. Collective dynamics of 'small-world' networks[J]. Nature, 1998,393(6684): 440-442.
    17. Barabasi A L, Albert R. Emergence of scaling in random networks[J]. Science, 1999, 286(5439): 509-512.
    18. Albert R, Barabasi A L. Statistical mechanics of complex networks[J]. Reviews of Modern Physics, 2002, 74(1): 47-97.
    19. Amaral L A N, Ottino J M. Complex networks - Augmenting the framework for the study of complex systems[J]. Eur Phys J B, 2004, 38(2): 147-162.
    20. Rzhetsky A and Gomez S M. Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome[J]. Bioinformatics, 2001, 17(10): 988-996.
    21. Lee R E, Megeney L A. The yeast kinome displays scale free topology with functional hub clusters[J]. BMC Bioinformatics, 2005, 6: 271.
    22. Wuchty S. Scale-free behavior in protein domain networks[J]. Mol Biol Evol, 2001, 18(9): 1694-702.
    23. Fell D A, Wagner A. The small world of metabolism[J]. Nat Biotechnol, 2000, 18(11): 1121-1122.
    24. Wagner A, Fell D A. The small world inside large metabolic networks[J]. Proc Biol Sci, 2001, 268(1478): 1803-1810.
    25. del Sol A, Fujihashi H, O’Meara P. Topology of small-world networks of protein-protein complex structures[J]. Bioinformatics, 2005, 21(8): 1311-1315.
    26. Wall M E, Hlavacek W S, Savageau M A. Design of gene circuits: lessons from bacteria[J]. Nat Rev Genet, 2004, 5(1): 34-42.
    27. Eisenberg E, Levanon E Y. Preferential attachment in the protein network evolution[J]. Phys Rev Lett, 2003, 91(13): 138701.
    28. Ravasz E, Somera A L, Mongru D A, et al. Hierarchical organization of modularity in metabolic networks[J]. Science, 2002, 297(5586): 1551-1555.
    29. Yook S H, Oltvai Z N, Barabasi A L. Functional and topological characterization of protein interaction networks[J]. Proteomics, 2004, 4(4): 928-942.
    30. Albert R, Jeong H, Barabasi A L. Error and attack tolerance of complex networks[J]. Nature, 2000, 406(6794): 378-382.
    31. Winzeler E A, Shoemaker D D, Astromoff A, et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis[J]. Science, 1999, 285(5429):901-906.
    32. Gerdes S Y, Scholle M D, Campbell J W, et al. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655[J]. J Bacteriol, 2003, 185(19): 5673-5684.
    33. Milo R, Shen-Orr S, Itzkovitz S, et al. Network motifs: simple building blocks of complex networks[J]. Science, 2002, 298(5594): 824-827.
    34. Shen-Orr S S, Milo R, Mangan S, et al. Network motifs in the transcriptional regulation network of Escherichia coli[J]. Nat Genet, 2002, 31(1): 64-68.
    35. Rice J J, Kershenbaum A, Stolovitzky G.. Lasting impressions: motifs in protein-protein maps may provide footprints of evolutionary events[J]. Proc Natl Acad Sci USA, 2005, 102(9): 3173-3174.
    36. Wuchty S, Oltvai Z N, Barabasi A L. Evolutionary conservation of motif constituents in the yeast protein interaction network[J]. Nat Genet, 2003, 35(2): 176-179.
    37. Ott S, Hansen A, Kim S Y, et al. Superiority of network motifs over optimal networks and an application to the revelation of gene network evolution[J]. Bioinformatics, 2005, 21(2): 227-238.
    38. Lee W P, Jeng B C, Pai T W, et al. Differential evolutionary conservation of motif modes in the yeast protein interaction network[J]. BMC Genomics, 2006, 7: 89.
    39. Kashtan N, Itzkovitz S, Milo R, et al., Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs[J]. Bioinformatics, 2004, 20(11): 1746-1758.
    40. Berg J, Lassig M. Local graph alignment and motif search in biological networks[J]. Proc Natl Acad Sci USA, 2004, 101(41): 14689-14694.
    41. Alon N, Dao P, Hajirasouliha I, et al. Biomolecular network motif counting and discovery by color coding[J]. Bioinformatics, 2008, 24(13): i241-i249.
    42. Wernicke S, Rasche F. FANMOD: a tool for fast network motif detection[J]. Bioinformatics, 2006, 22(9): 1152-1153.
    43. Wang C, Xuan J, Chen L, et al. Motif-directed network component analysis for regulatory network inference[J]. BMC Bioinformatics, 2008, 9(Suppl 1): S21.
    44. Zhang Y J, Xuan J H, de los Reyes B G, et al. Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data[J].BMC Bioinformatics, 2008, 9: 203.
    45. Newman M E, Girvan M. Finding and evaluating community structure in networks[J]. Phys Rev E, 2004. 69(2): 026113.
    46. Gibson D, Kleinberg J, Raghavan P. Inferring web communities from link topology[C]. In: Hypertext 98-Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia. Pittsburgh: ACM Press, 1998: 225-234.
    47. Redner S. How popular is your paper? An empirical study of the citation distribution[J]. EUR PHYS J B, 1998; 4(2): 131-134.
    48. Girvan M, Newman M E. Community structure in social and biological networks[J]. Proc Natl Acad Sci USA, 2002. 99(12): 7821-7826.
    49. Rives A W, Galitski T. Modular organization of cellular networks[J]. Proc Natl Acad Sci USA, 2003. 100(3): 1128-1133.
    50. Spirin V, Mirny L A. Protein complexes and functional modules in molecular networks[J]. Proc Natl Acad Sci USA, 2003, 100(21): 12123 -12128.
    51. Caretta-Cartozo C, de los Rios P, Piazza1 F, et al. Bottleneck genes and community structure in the cell cycle network of S. pombe[J]. PLoS Comput Biol, 2007. 3(6): e103.
    52. Holme P, Huss M, Jeong H. Subnetwork hierarchies of biochemical pathways[J]. Bioinformatics, 2003, 19(4): 532-538.
    53. Papin J A, Reed J L, Palsson B O. Hierarchical thinking in network biology: the unbiased modularization of biochemical networks[J]. Trends Biochem Sci, 2004. 29(12): 641-647.
    54 Dahlke E E, Olson R M, Leverentz H R, et al. Assessment of the accuracy of density functionals for prediction of relative energies and geometries of low-lying isomers of water hexamers[J]. J Phys Chem A, 2008. 112(17): 3976-3984.
    55. Duda R O, Hart P E, Stork D G.模式分类[M]:第二版.李宏东,姚天翔等译.北京:机械工业出版社,2003. 415-467.
    56. Liao T W. Clustering of time series data--a survey. Pattern Recognition, 2005, 38(11): 1857-1874.
    57. Brohee S, van Helden J. Evaluation of clustering algorithms for protein-protein interaction networks[J]. BMC Bioinformatics, 2006, 7: 488.
    58. Filippone M, Camastra F, Masulli F, et al., A survey of kernel and spectral methods for clustering[J]. Pattern Recognition, 2008, 41(1): 176-190.
    59. Vashist A, Kulikowski, C A, Muchnik I. Ortholog clustering on a multipartite graph. IEEE/ACM Trans Comput Biol Bioinform, 2007. 4(1): 17-27.
    60. Qi Y J, Balem F, Faloutsos C, et al. Protein complex identification by supervised graph local clustering[J]. Bioinformatics, 2008, 24(13): i250-i258.
    61. Wasserman S, Faust K. Social Network Analysis: Methods and Applications[M]. Cambridge: Cambridge University Press, 1994.
    62. Bu D B, Zhao Y, Cai L, et al. Topological structure analysis of the protein-protein interaction network in budding yeast[J]. Nucleic Acids Res, 2003. 31(9): 2443-2450.
    63. Mewes H W, Dietmann S, Frishman D, et al. MIPS: analysis and annotation of genome information in 2007[J]. Nucleic Acids Res, 2008. 36(Database issue): D196-D201.
    64. Chang H, Yeung DY. Robust path-based spectral clustering[J]. Pattern Recognition, 2008. 41(1): 191-203.
    65. Meila M, Shi J. A random walks view of spectral segmentation[C]. In: International Conference on AI and Statistics(AISTAT). Florida, 2001, 177-182.
    66. van Vaerenbergh S, Santamaria I. A spectral clustering approach to underdetermined postnonlinear blind source separation of sparse sources[J]. IEEE Trans Neural Netw, 2006, 17(3): 811-814.
    67. Sen T Z, Kloczkowski A, Jernigan R L. Functional clustering of yeast proteins from the protein-protein interaction network[J]. BMC Bioinformatics, 2006. 7: 355.
    68. Breitkreutz B J, Stark C, Tyers M. The GRID: the General Repository for Interaction Datasets[J]. Genome Biol, 2003, 4(3): R23.
    69 Hwang W, Cho Y R, Zhang A D, et al. A novel functional module detection algorithm for protein-protein interaction networks[J]. Algorithms Mol Biol, 2006, 1: 24.
    70. Xenarios I, Salwínski L, Duan X J, et al. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions[J]. Nucleic Acids Res, 2002, 30(1): 303-305.
    71. Brun C, Herrmann C, Guenoche A. Clustering proteins from interaction networks for the prediction of cellular functions[J]. BMC Bioinformatics, 2004, 5: 95.
    72. Brun C, Chevenet F, Martin D, et al. Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network[J]. Genome Biol, 2003. 5(1): R6.
    73. Yona G., Linial N, Linial M. ProtoMap: automatic classification of protein sequences and hierarchy of protein families[J]. Nucleic Acids Res, 2000. 28(1): 49-55.
    74. Pipenbacher P, Schliep A, Schneckener S, et al. ProClust: improved clustering of protein sequences with an extended graph-based approach[J]. Bioinformatics, 2002, 18(Suppl 2): S182-S191.
    75. Kawaji H, Yamaguchi Y, Matsuda H, et al. A graph-based clustering method for a large set of sequences using a graph partitioning algorithm[J]. Genome Inform, 2001. 12: 93-102.
    76. Fiduccia C M, Mattheyses R M. A Linear-time heuristic for improving network partitions[C]. In: Proceedings of the 19th Design Automation Conference. Piscataway: IEEE Press, 1982, 175-181.
    77. Apweiler1 R, Attwood T K, Bairoch A, et al. The InterPro database, an integrated documentation resource for protein families, domains and functional sites[J]. Nucleic Acids Res, 2001. 29(1): 37-40.
    78. Enright A J, Ouzounis C A. GeneRAGE: a robust algorithm for sequence clustering and domain detection[J]. Bioinformatics, 2000, 16(5): 451-457.
    79. Enright A J, van Dongen S, Ouzounis C A. An efficient algorithm for large-scale detection of protein families[J]. Nucleic Acids Res, 2002, 30(7): 1575-1584.
    80. Murzin AG, Brenner S E, Hubbard T, et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures[J]. J Mol Biol, 1995, 247(4): 536-540.
    81. Paccanaro A, Casbon J A, Saqi M A. Spectral clustering of protein sequences[J]. Nucleic Acids Res, 2006, 34(5): 1571-1580.
    82. Corpet F. Multiple sequence alignment with hierarchical clustering[J]. Nucleic Acids Res, 1988, 16(22): 10881-10890.
    83. Jeong H, Tombor B, Albert R, et al., The large-scale organization of metabolic networks[J]. Nature, 2000, 407(6804): 651-654.
    84. Guimera R, Amaral L A N. Functional cartography of complex metabolic networks[J]. Nature, 2005, 433(7028): 895-900.
    85. Kirkpatrick S, Gelatt Jr C D, Vecchi, M P. Optimization by Simulated Annealing. Science, 1983, 220(4598): 671-680.
    86. Schuster S, Pfeiffer T, Moldenhauer F, et al. Exploring the pathway structure ofmetabolism: decomposition into subnetworks and application to Mycoplasma pneumoniae[J]. Bioinformatics, 2002, 18(2): 351-361.
    87. Overbeek R, Larsen N, Pusch G D, et al. WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction[J]. Nucleic Acids Res, 2000, 28(1): 123-125.
    88. Ma H W, Zeng A P. The connectivity structure, giant strong component and centrality of metabolic networks[J]. Bioinformatics, 2003, 19(11): 1423-1430.
    89. Bauer A, Kuster B Affinity purification-mass spectrometry. Powerful tools for the characterization of protein complexes[J]. Eur J Biochem, 2003, 270(4): 570-578.
    90. Gavin A C, Superti-Furga G.. Protein complexes and proteome organization from yeast to man[J]. Curr Opin Chem Biol, 2003, 7(1): 21-27.
    91. Fields S, Song O. A novel genetic system to detect protein-protein interactions[J]. Nature, 1989, 340(6230): 245-246.
    92. Uetz P, Giot L, Cagney G., et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae[J]. Nature, 2000, 403(6770): 623-627.
    93. Rigaut G, Shevchenko A, Rutz B, et al. A generic protein purification method for protein complex characterization and proteome exploration[J]. Nat Biotechnol, 1999, 17(10): 1030-1032.
    94. Gavin A C, B?sche M, Krause R, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes[J]. Nature, 2002, 415(6868): 141-147.
    95. Gavin A C, Aloy P, Grandi P, et al. Proteome survey reveals modularity of the yeast cell machinery[J]. Nature, 2006, 440(7084): 631-636.
    96. Zhu H, Bilgin M, Bangham R, et al. Global analysis of protein activities using proteome chips[J]. Science, 2001, 293(5537): 2101-2105.
    97. Ho Y, Gruhler A, Heilbut A, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry[J]. Nature, 2002, 415(6868): 180-183.
    98. Smith G. P. Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface[J]. Science, 1985, 228(4705): 1315-1317.
    99. Tong A H, Drees B, Nardelli G, et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules[J]. Science, 2002, 295(5553): 321-324.
    100. Rain J C, Selig L, de Reuse H, et al. The protein-protein interaction map of Helicobacter pylori[J]. Nature, 2001, 409(6817): 211-215.
    101. Ito T, Chiba T, Ozawa R, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome[j]. Proc Natl Acad Sci USA, 2001, 98(8): 4569-4574.
    102. Li S, Armstrong C M, Bertin N, et al. A map of the interactome network of the metazoan C. elegans[J]. Science, 2004, 303(5657): 540-543.
    103. Giot L, Bader J S, Brouwer C, et al. A protein interaction map of Drosophila melanogaster[J]. Science, 2003, 302(5651): 1727-1736.
    104. von Mering C, Krause R, Snel B, et al. Comparative assessment of large-scale data sets of protein-protein interactions[J]. Nature, 2002, 417(6887): 399-403.
    105. Jeong H, Mason S P, Barabási A L, et al. Lethality and centrality in protein networks[J]. Nature, 2001,. 411(6833): 41-42.
    106. Kelly W, Stumpf M. Protein-protein interactions: from global to local analyses[J]. Curr Opin Biotechnol, 2008, 19(4): 396-403.
    107. Hu X. Mining and analysing scale-free protein-protein interaction network[J]. Int J Bioinform Res Appl, 2005, 1(1): 81-101.
    108. Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. J Mol Biol, 1970, 48(3): 443-453.
    109. Smith T F, Waterman M S. Identification of common molecular subsequences[J]. J Mol Biol, 1981, 147(1): 195-197.
    110. Lipman D J, Pearson W R. Rapid and sensitive protein similarity searches[J]. Science, 1985, 227(4693): 1435-1441.
    111. Pearson W R, Lipman D J. Improved tools for biological sequence comparison[J]. Proc Natl Acad Sci USA, 1988, 85(8): 2444-2448.
    112. Altschul S F, Gish W, Miller W, et al. Basic local alignment search tool[J]. J Mol Biol, 1990, 215(3): 403-410.
    113. Altschul S F, Madden T L, Sch?ffer A A, et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Res, 1997, 25(17): 3389-402.
    114. Schuster S, Fell D A, Dandekar T. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks[J]. Nat Biotechnol, 2000,18(3): 326-332.
    115. Ma H, Zeng A P. Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms[J]. Bioinformatics, 2003, 19(2): 270-277.
    116. Gagneur J, Jackson D B, Casari G. Hierarchical analysis of dependency in metabolic networks[J]. Bioinformatics, 2003, 19(8): 1027-1034.
    117. Lemke N, Herédia F, Barcellos C K, et al. Essentiality and damage in metabolic networks[J]. Bioinformatics, 2004, 20(1): 115-119.
    118. Ma H W, Zhao X M, Yuan Y J, et al. Decomposition of metabolic network into functional modules based on the global connectivity structure of reaction graph[J]. Bioinformatics, 2004, 20(12): 1870-1876.
    119. Horne A B, Hodgman T C, Spence H D, et al. Constructing an enzyme-centric view of metabolism. Bioinformatics[J]. 2004, 20(13): 2050-2055.
    120. Yildirim M A, Goh K I, Cusick M E, et al., Drug-target network[J]. Nat Biotechnol, 2007, 25(10): 1119-1126.
    121. Wishart D S, Knox C, Guo A C, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration[J]. Nucleic Acids Res, 2006, 34(Database issue): D668-D672.
    122. Kleinberg J, Lawrence S. Network analysis. The structure of the Web[J]. Science, 2001, 294(5548): 1849-1850.
    123. Sen P, Dasgupta S, Chatterjee A, et al. Small-world properties of the Indian railway network[J]. Physical Review E, 2003, 67(3): 036106.
    124. Carreras B A, Newman, D E, Dobson, I, et al. Evidence for self-organized criticality in a time series of electric power system blackouts[J]. IEEE Transactions on Circuits and Systems I-Regular Papers. 2004, 51(9): 1733-1740.
    125. Newman M E. Coauthorship networks and patterns of scientific collaboration[J]. Proc Natl Acad Sci USA, 2004, 101(Suppl 1): 5200-5205.
    126. Bader G D, Hogue C W. Analyzing yeast protein-protein interaction data obtained from different sources[J]. Nat Biotechnol, 2002, 20(10): 991-997.
    127. Flake G W, Lawrence S, Giles C L, et al. Self-organization and identification of web communities[J]. IEEE Comput, 2002. 35(3): 66-71.
    128. Liao T W. Clustering of time series data - a survey[J]. Pattern Recognition, 2005, 38(11): 1857-1874.
    129. Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(11): 1222-1239.
    130. Shi J B, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.
    131. Carballido-Gamio J, Belongie S J, Majumdar S. Normalized cuts in 3-D for spinal MRI segmentation[J]. IEEE Transactions on Medical Imaging, 2004, 23(1): 36-44.
    132. Fortunato S, Barthelemy M. Resolution limit in community detection[J]. Proc Natl Acad Sci USA, 2007, 104(1): 36-41.
    133. Arenas A, Fernandez A, Gomez, S. Analysis of the structure of complex networks at different resolution levels[J]. New Journal of Physics, 2008, 10: 053039.
    134. Ruan J, Zhang W. Identifying network communities with a high resolution[J]. Phys Rev E , 2008, 77(1): 016104.
    135. Schuetz P, Caflisch A. Efficient modularity optimization by multistep greedy algorithm and vertex mover refinement[J]. Phys Rev E . 2008, 77(4): 046112.
    136. Reichardt J, Bornholdt S. Statistical mechanics of community detection[J]. Phys Rev E. 2006, 74(1 ): 016110.
    137. Lehmann S, Hansen L K. Deterministic modularity optimization[J]. European Physical Journal B, 2007, 60(1): 83-88.
    138. Newman M E. Modularity and community structure in networks[J]. Proc Natl Acad Sci USA, 2006, 103(23): 8577-8582.
    139. Brandes U, Delling D, Gaertler M, et al. Maximizing Modularity is hard. arXiv:physics/0608255v2, 2006.
    140. Danon L, Duch J, Diaz-Guilera A, et al. Comparing community structure identification[J]. Journal of Statistical Mechanics-Theory and Experiment, 2005: P09008.
    141. Clauset A, Newman M E J, Moore C. Finding community structure in very large networks[J]. Physical Review E, 2004, 70(6): 066111.
    142. Duch J, Arenas A. Community detection in complex networks using extremal optimization[J]. Phys Rev E, 2005, 72(2): 027104.
    143. Pujol J M, Bejar J, Delgado J. Clustering algorithm for determining community structure in large networks[J]. Phys Rev E, 2006, 74(1): 016107.
    144. Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing communitydetection algorithms[J]. Phys Rev E Stat Nonlin Soft Matter Phys, 2008, 78(4): 046110.
    145. Guimera R, Danon L, Díaz-Guilera A, et al. Self-similar community structure in a network of human interactions[J]. Phys Rev E, 2003, 68(6): 065103.
    146. Palla, G., Derenyi I, Farkas I, et al. Uncovering the overlapping community structure of complex networks in nature and society[J]. Nature, 2005, 435(7043): 814-818.
    147. Lancichinetti A, Fortunato S, Kertesz J. Detecting the overlapping and hierarchical community structure in complex networks[J]. New Journal of Physics, 2009, 11: 033015.
    148. Zipf G K. Selective studies and the principle of relative frequency in language[M]. Cambridge, MA: Harvard Univ. Press, 1932.
    149. Zachary W W. An information flow model for conflict and fission in small groups[J]. Journal of Anthropological Research, 1977, 33: 452-473.
    150. Duch J, Arenas A. Community detection in complex networks using extremal optimization[J]. Physical Review E, 2005, 72(2 ): 027104.
    151. Xenarios I, Rice D W, Salwinski L, et al. DIP: the database of interacting proteins[J]. Nucleic Acids Res, 2000, 28(1): 289-291.
    152. Newman M E. The structure of scientific collaboration networks[J]. Proc Natl Acad Sci USA, 2001, 98(2): 404-409
    153. Kuncheva L I, Hadjitodorov S T. Using diversity in cluster ensembles[C]. In: Proceedings of IEEE Int. Conf. on Systems, Man and Cybernetics. USA: IEEE Press, 2004, 2:1214-1219.
    154. Fred A L N, Jain A K. Robust data clustering[C]. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. USA: IEEE Press, 2003, 2:128-136.
    155. Bernal A, Ear U, Kyrpides N. Genomes OnLine Database (GOLD): a monitor of genome projects world-wide[J]. Nucleic Acids Res, 2001, 29(1): 126-127.
    156. Tsoka S, Ouzounis C A. Recent developments and future directions in computational genomics[J]. FEBS Lett, 2000, 480(1): 42-48.
    157. Eisenberg D, Marcotte E M, Xenarios I, et al. Protein function in the post-genomic era[J]. Nature, 2000, 405(6788): 823-826.
    158. Bork P, Dandekar T, Diaz-Lazcoz Y, et al. Predicting function: from genes to genomes and back[J]. J Mol Biol, 1998, 283(4): 707-725.
    159. Hegyi H, Gerstein M. The relationship between protein structure and function: acomprehensive survey with application to the yeast genome[J]. J Mol Biol, 1999, 288(1): 147-164.
    160. Bolten E, Schliep A, Schneckener S, et al. Clustering protein sequences--structure prediction by transitive homology[J]. Bioinformatics, 2001, 17(10): 935-941.
    161. Gerstein M. Measurement of the effectiveness of transitive sequence comparison, through a third 'intermediate' sequence[J]. Bioinformatics, 1998, 14(8): 707-714.
    162. Park J, Teichmann S A, Hubbard T, et al. Intermediate sequences increase the detection of homology between sequences[J]. J Mol Biol, 1997, 273(1): 349-354.
    163. Pearson W R. Identifying distantly related protein sequences[J]. Comput Appl Biosci, 1997, 13(4): 325-332.
    164. Aittokallio T, Schwikowski B. Graph-based methods for analysing networks in cell biology[J]. Brief Bioinform, 2006, 7(3): 243-255.
    165. Schaeffer S E. Graph clustering[J]. Computer Science Review, 2007, 1(1): 27-64.
    166. Krause, A, Stoye J, Vingron M. The SYSTERS protein sequence cluster set[J]. Nucleic Acids Res, 2000, 28(1): 270-272.
    167. Brenner S E, Koehl P, Levitt M. The ASTRAL compendium for protein structure and sequence analysis[J]. Nucleic Acids Res, 2000, 28(1): 254-256.
    168. Ng A, Jordan M, Weiss Y. On spectral clustering: Analysis and an algorithm[C]. In: In Advances in Neural Information Processing Systems, USA: MIT Press, 2001, (14): 849-856.
    169. Fleischmann R D, Adams M D, White O, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd[J]. Science, 1995, 269(5223): 496-512.
    170. Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: a probabilistic approach[J]. Bioinformatics, 2003, 19(Suppl 1): i197-i204.
    171. Pereira-Leal J B, Enright A J, Ouzounis C A. Detection of functional modules from protein interaction networks[J]. Proteins, 2004, 54(1): 49-57.
    172. Spirin V, Mirny L A. Protein complexes and functional modules in molecular networks[J]. Proc Natl Acad Sci USA, 2003, 100(21): 12123-12128.
    173. Bader G D, Hogue C W. An automated method for finding molecular complexes in large protein interaction networks[J]. BMC Bioinformatics, 2003, 4(1): 2.
    174. King A D, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering[J]. Bioinformatics, 2004, 20(17): 3013-3020.
    175. Hartuv E, Shamir R. A clustering algorithm based on graph connectivity[J]. Information Processing Letters, 2000, 76(4-6): 175-181.
    176. Samanta M P, Liang S. Predicting protein functions from redundancies in large-scale protein interaction networks[J]. Proc Natl Acad Sci USA, 2003, 100(22): 12579-12583.
    177. Deane C M, Salwiński ?, Xenarios I, et al. Protein interactions: two methods for assessment of the reliability of high throughput observations[J]. Mol Cell Proteomics, 2002, 1(5): 349-356.
    178. Radicchi F, Castellano C, Cecconi F, et al. Defining and identifying communities in networks[J]. Proc Natl Acad Sci USA, 2004, 101(9): 2658-2663.
    179. Altaf-Ul-Amin M, Shinbo Y, Mihara K, et al. Development and implementation of an algorithm for detection of protein complexes in large interaction networks[J]. BMC Bioinformatics, 2006, 7(1): 207.
    180. Ruepp A, Zollner A, Maier D, et al. The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes[J]. Nucleic Acids Res, 2004, 32(18): 5539-5545.
    181. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks[J]. Genome Res, 2003, 13(11): 2498-2504.
    182. Wu Z, Leahy R. An optimal graph theoretic approach to data clustering: theory and its application to image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1993, 15(11): 1101-1113.
    183.杨春梅.基因表达数据聚类分析算法研究和应用[D]: [博士学位论文].天津:天津大学精密仪器与光电子工程学院, 2006.
    184. Albert R. Scale-free networks in cell biology[J]. J Cell Sci, 2005, 118(21): 4947-4957.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700