统计建模分析高通量生物数据及其应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着现代生物学的发展,从生物分子层面上研究生物问题对发现生命现象的本质尤其是理解疾病的致病原理的重要作用已经得到了广泛认同.高通量试验技术包括生物芯片[78;115;135],酵母菌双层杂交试验[58;130],质谱分析[40;54],染色质免疫沉淀反应[59;109]等等.在这些高通量试验技术快速发展的驱动下,同时获得人类或其他模型生物的数以千计的分子及这些分子之间交互作用的数据成为可能。如此大量的数据信息为我们重新理解细胞生物和疾病提供了宝贵的机会。与此同时,高通量数据的特点是:预测变量的维数远远大于样本个数;数据结构非常复杂;数据噪声很大;观测值缺失或不确定等等。在这种情况下,大多数传统统计方法不能得到正确的预测结果,或者得到的预测结果效用有限。因此,我们面临的重大挑战就是设计新的统计模型来高效的获取,分析,解释这些数据中包含信息。
     在这篇论文中,我们主要从如下几个方面出发建立统计模型分析生物数据:
     1.利用dK随机图模型预测生物网络中的功能模块。
     生物功能的很多方面可以用生物网络建模,例如蛋白质交互网络,新陈代谢网络和基因共表达网络。研究这些网络的统计特征可以帮助我们推测生物功能。复杂的统计网络模型能够更加精确的描述网络,但是复杂模型是否有助于找到有生物意义的子网络还不清楚。
     近来的研究表明网络中点的度分布不足以充分刻画网络特征。在第二章中,我们尝试将度分布扩展到二阶和三阶相关度分布。我们设计了一种伪似然的方法来估计参数。我们应用这种方法分析了MIPS和BIOGRID酵母菌蛋白质交互网络和两个酵母菌基因共表达网络。结果表明,不论是在蛋白质交互网络还是在基因共表达网络中,二阶相关度分布模型能够更好的预测基因之间的交互关系。然而,对于预测功能模块,相关度分布模型在蛋白质交互网络的情况下表现比一般度分布模型稍好,对于基因共表达网络,相关度分布模型表现不如一般度分布模型。
     我们的计算结果表明:结合度分布交互信息可以在某些方面提高预测准确度,但是,在所有的方面,三阶相关度分布模型的预测精度反而不好。如果我们使用其它的参数估计方法,例如极大似然估计,有可能体现二阶和三阶度分布交互模型在预测功能模块方面的优势。
     2.从蛋白质功能域交互网络出发,在蛋白质功能域上定位致病变异
     对导致人类复杂疾病的遗传变异的辨识和致病基因的定位是非常重要的。蛋白质分子一般由数个蛋白质功能域组成。我们假设有害的遗传变异会导致蛋白质功能域结构发生变化,影响蛋白质功能,并最终导致疾病。以此为出发点,我们探索利用蛋白质功能域交互网络恢复蛋白质功能域与疾病之间的关联关系。根据非同义单核苷酸多态性与复杂疾病之间的关联关系,我们定义蛋白质功能域与复杂疾病之间的关联关系。以蛋白质功能域交互网络为出发点,我们提出了“guilt-by-proximity”方法:根据候选功能域与种子功能域之间在蛋白质功能域交互网络中的平均距离对候选蛋白质功能域排序。我们用大规模交叉验证试验的方法在模拟连锁区间,随机控制集合和整个基因组三种情况下验证了我们的方法。通过致病蛋白质功能域的AUC值和平均排名比率对方法进行量化验证。结果表明:我们的方法的AUC值为77.9%,平均排名比率为21.82%。我们进一步对整个基因组中蛋白质功能域与疾病之间的关联关系进行了排名,并提供了免费的查询网站。这个网站为定位导致复杂疾病的遗传变异提供了有用的信息。
     3.在候选位点存在强连锁不平衡的条件下辨识功能位点
     在单个基因区域中,多个生物学标记有可能表现出强烈的连锁不平衡性。某种表现型可能与数个生物学标记之间有强列的统计相关性。邻居位点上变异之间的连锁不平衡,尤其是强连锁不平衡不仅为辨识与特定表现型相关的生物学标记制造了困难,而且阻碍了区分功能相关变异和非功能相关变异。在第四章中,我们考虑了5种不同的方法:助推法,Lasso,岭回归,逐步回归和单位点分析。在变异之间存在连锁不平衡的情况下,我们利用模拟比较这五种方法预测功能变异的表现。我们发现:如果有100个样本,在20位点之间存在强连锁不平衡的条件下,岭回归表现最好;在500或1000个位点之间有退化的连锁不平衡的条件下,助推法表现最好。
With the fast development of modern biology,it is generally accepted that the research on the molecule level is of great importance to find the essence of biological phenomena,and specifically understand the pathogenesis of human disease.Spurred on the advances of high-throughput data collection techniques,such as microarrays [78;115;135],yeast two-hybrid assays[58;130],mass spectrometry[40;54],chromatin immunoprecipitation[59;109],data on thousands of molecules and their interactions in humans and most model species have become available.This flood of information presents exciting new opportunities for understanding cellular biology and disease.At the same time,the high-throughput data is characterized by high dimensionality of predictors,which is far beyond the number of samples,complex data structure,great data noise,uncertain or missing values and so on.Given this landscape,which make most traditional statistical tools either fail or provide outcomes with limited usefulness, the great challenge is to develop new statistical model to explore,analyze and interpret this information effectively and efficiently.
     In this thesis,we mainly analyze the high-throughput data by establishing statistical models in the following aspects:
     1.Prediction of functional homogeneous module in biological network with dK random graph models.
     Many aspects of biological functions can be modeled by biological networks,such as protein interaction networks,metabolic networks,and gene coexpression networks. Studying the statistical properties of these networks in turn allows us to infer biological function.Complex statistical network models can potentially more accurately describe the networks,but it is not clear whether such complex models are better suited to find biologically meaningful subnetworks.
     Recent studies have shown that the degree distribution of the nodes is not an adequate statistic in many molecular networks.In chapter 2,we sought to extend this statistic with 2nd and 3rd order degree correlations and developed a pseudo-likelihood approach to estimate the parameters.The approach was used to analyze the MIPS and BIOGPID yeast protein interaction networks,and two yeast coexpression networks. We showed that 2nd order degree correlation information gave better predictions of gene interactions in both protein interaction and gene coexpression networks.However, in the biologically important task of predicting functionally homogeneous modules, degree correlation information performs marginally better in the case of the MIPS and BIOGRID protein interaction networks,but worse in the case of gene coexpression networks.
     Our use of dK models showed that incorporation of degree correlations could increase predictive power in some contexts,albeit sometimes marginally,but,in all contexts, the use of third-order degree correlations decreased accuracy.However,it is possible that other parameter estimation methods,such as maximum likelihood,will show the usefulness of incorporating 2nd and 3rd degree correlations in predicting functionally homogeneous modules.
     2.Recover the associations between the protein domain and complex disease based on protein domain interaction network.
     It is of vital importance to find genetic variants that underlie human Complex diseases and locate genes that are responsible for these diseases.Since proteins are typically composed of several structural domains,it is reasonable to assume that harmful genetic variants may alter structures of protein domains,affect functions of proteins,and eventually cause disorders.With this understanding,in chapter 3,we explore the possibility of recovering associations between protein domains and complex diseases with the use of domain-domain interaction networks.We define associations between protein domains and disease families on the basis of associations between nonsynonymous single nucleotide polymorphisms(nsSNPs) and complex diseases,similarities between diseases, and relations between proteins and domains.Based on a domain-domain interaction network,we propose the use of a "guilt-by-proximity" principle to rank candidate domains according to their average distance to a set of seed domains in the domain-domain interaction network.We validate the method through large-scale cross-validation experiments on simulated linkage intervals,random controls,and the whole genome,and we evaluate the method in terms of AUC score and mean rank ratio of disease domains. Results show that the AUC scores can be as high as 77.90%,and the mean rank ratios can be as low as 21.82%.We further calculate a genome-wide landscape of associations between domains and disease families and offer a freely accessible web interface for this landscape,which can be potentially used with existing methods for determining disease genes,thereby providing useful information in the localization of genetic risk factors underlying complex diseases.
     3.Verification of functional loci in the case that candidate loci are in strong linkage disequilibrium
     Multiple makers exhibiting strong linkage disequilibrium(LD) in a single genomic region and a phenotype of interest generate very compelling statistical associations in the large-scale genetic-association studies.LD,especially strong LD,between variations at neighboring loci can not only make it difficult to discern markers associated with phenotype,but also create difficulties for distinguishing the functionally relevant variations from nonfunctional variations.In chapter 4,we compared 5 different methods, Boosting,Lasso,Ridge regression,Stepwise and Single locus analysis,for identifying real functional variations in the circumstance of LD exiting in the variations at different loci via simulation.We found that if there are 100 samples,in the case of strong LD among 20 loci,Ridge regression performs the best while in the case of degenerated LD among 500 and 1000 loci,Boosting outperforms other methods.
引文
[1]Adie,E.A.,Adams,R.R.,Evans,K.L.,Porteous,D.J.,and Pickard,B.S.,Speeding Disease Gene Discovery by Sequence Based Candidate Prioritization,Bmc Bioinformatics,6:55,2005
    [2]Aerts,S.,Lambrechts,D.,Maity,S.,Van Loo,P.,Coessens,B.,De Smet,F.,Tranchevent,L.C.,De Moor,B.,Marynen,P.,Hassan,B.,Carmeliet,P.,and Moreau,Y.,Gene Prioritization through Genomic Data Fusion,Nature Biotechnology,24(5):537-544,2006.
    [3]Aggarwal,A.,Guo,D.L.,Hoshida,Y.,Yuen,S.T.,Chu,K.M.,So,S.,Boussioutas,A.,Chen,X.,Bowtell,D.,Aburatani,H.,Leung,S.Y.,and Tan,P.,Topological and functional discovery in a gene coexpression meta-network of gastric cancer,Cancer Research,66(1):232-241,2006.
    [4]Albert,R.,Jeong,H.and Barabasi,A.L.,Internet-diameter of the world-wide web,Nature,401(6749):130-131,1999.
    [5]Alberts,B.,Molecular biology of the cell,Garland Science,2008.
    [6]Alon,U.,Biological networks:the tinkerer as an engineer,Science,301(5641):1866-1867,2003.
    [7]Altshuler,D.,Daly,M.and Kruglyak,L.,Guilt by Association,Nature Genetics,26(2):135-137,2000.
    [8]Apic,G.,Gough,J.and Teichmann,S.A.,Domain Combinations in Archaeal,Eubacterial and Eukaryotic Proteomes,Journal of Molecular Biology,310(2):311-325,2001.
    [9]Barabasi,A.L.and Albert,R.,Emergence of scaling in random networks,Science,286(5439):509-512,1999.
    [10]Barabasi,A.L.and Oltvai,Z.N.,Network biology:Understanding the cell's functional organization,Nature Reviews Genetics,5(2):101-U115,2004.
    [11]Barrett,T.,Troup,D.B.,Wilhite,S.E.,Ledoux,P.,Rudnev,D.,Evangelista,C.,Kim,I.F.,Soboleva,A.,Tomashevsky,M.and Edgar,R.,NCBI GEO:mining tens of millions of expression profiles-database and tools update,Nucleic Acids Research,35:D760-5,2006
    [12]Bateman,A.,Birney,E.,Cerruti,L.,Durbin,R.,Etwiller,L.,Eddy,S.R.,Griffiths-Jones,S.,Howe,K.L.,Marshall,M.,and Sonnhammer,E.L.,The Pfam Protein Families Database,Nucleic Acids Research 30(1):276-280,2002.
    [13]Beck,M.Mucopolysaccharidosis I,Orphanet encyclopedia,September:1-3,2003
    [14]Berends,M.J.,Wu,Y.,Sijmons,R.H.,Mensink,R.G.,van der Sluis,T.,Hordijk-Hos,J.M.,de Vries,E.G.,Hollema,H.,Karrenbeld,A.,Buys,C.H.,van der Zee,A.G.,Hofstra,R.M.,and Kleibeuker,J.H.,Molecular and Clinical Characteristics of Msh6 Variants:An Analysis of 25 Index Carriers of a Germline Variant,The American Journal of Human Genetics 70(1):26-37,2002
    [15]Boguna,M.and Pastor-Satorras,R.,Epidemic spreading in correlated complex networks,Physical Review E,66:047104.1-047104.4,2002.
    [16]Boguna,M.and Pastor-Satorras,R.,Class of correlated random networks with hidden variables,Physical Review E,68:036112.1-036112.13,2003.
    [17]Boguna,M.,Pastor-Satorras,R.,and Vespignani,A.Absence of epidemic threshold in scale-free networks with degree correlations,Physical Review Letters,90:028701.1-028701.4,2003.
    [18]Bork,P.,Jensen,L.J.,von Mering,C.,Ramani,A.K.,Lee,I.,and Marcotte,E.M.,Protein interaction networks from yeast to human,Curr Opin Struct Biol,14(3):292-299,2004.
    [19]Borrebaeck,C.A.K.,Ekstrom,S.,Hager,A.C.M.,Nilsson,J.,Laurell,T.,and Marko-Varga,G.,Protein chips based on recombinant antibody fragments:A highly sensitive approach as detected by mass spectrometry,Biotechniques,30(5):1126-1130,2001.
    [20]Bowcock,A.M.,Genomics:Guilt by Association,Nature 447(7145):645-646,2007
    [21]Bray,D.,Molecular networks:the top-down view,Science,301(5641):1864-1865,2003.
    [22]Brenner,S.,Johnson,M.,Bridgham,J.,Golda,G.,Lloyd,D.H.,Johnson,D.,Luo,S.,McCurdy,S.,Foy,M.,Ewan,M.,Roth,R.,George,D.,Eletr,S.,Albrecht,G.,Vermaas,E.,Williams,S.R.,Moon,K.,Burcham,T.,.Pallas,M.,DuBridge,R.B.,Kirchner,J.,Fearon,K.,Mao,J.,and Corcoran,K.,Gene Expression Analysis by Massively Parallel Signature Sequencing(Mpss) on Microbead Arrays,Nature Biotechnology,18(6):630-634,2000
    [23]Brunner,H.G.and van Driel,M.A.,From Syndrome Families to Functional Genomics,Nature Reviews Genetics,5(7):545-551,2004
    [24]Burke,W.,Coughlin,S.S.,Lee,N.C.,Weed,D.L.,and Khoury,M.J.,Application of Population Screening Principles to Genetic Screening for Adult-Onset Conditions,Genetic Testing 5(3):201-211,2001.
    [25]Carter,S.L.,Brechbuhler,C.M.,Griffin,M.,and Bond,A.T.,Gene co-expression network topology provides a framework for molecular characterization of cellular state,Bioinformatics,20(14):2242-2250,2004.
    [26]Chasman,D.and Adams,R.M.,Predicting the Functional Consequences of Non-Synonymous Single Nucleotide Polymorphisms:Structure-Based Assessment of Amino Acid Variation,Journal of Molecular Biology,307(2):683-706,2001
    [27]Chung,F.and Lu,L.,Connected components in random graphs with given expected degree sequences,Annals of Combinatorics,6:125-145,2002.
    [28]Consortium TGO,Gene Ontology:tool for the unification of biology,Nature Genetics,25:25-9,2000.
    [29]Consortim,I.H.,A Haplotype Map of the Human Genome,Nature,437(7063):1299-1320,2005
    [30]Cyr,J.L.and Heinen,C.D.,Hereditary Cancer-Associated Missense Mutations in Hmsh6 Uncouple Atp Hydrolysis from DNA Mismatch Binding,The Journal of Biological Chemistry 283(46):31641-31648,2008
    [31]Davey Smith,G.,Ebrahim,S.,Lewis,S.,Hansell,A.L.,Palmer,L.J.and Burton,P.R.,Genetic epidemiology and public health:hope,hype,and future prospects,The Lancet 366(9495):1484-98,2005.
    [32]Draper,N.R.and Smith,H.Applied Regression Analysis,Second Edition,The John Wiley and Sons,New York,1981.
    [33]Efron,B.,Hastie,T.,Johnstone,I.and Tibshirani,R.Least angle regression,Annals of Statistics,32:407-409,2004
    [34]Erd(o|¨)s,P.and Renyi,A.On random graphs,Publicationes Mathematicae Debrecen,6:290-297,1959.
    [35]Farkas,I.J.,Beg,Q.K.,and Oltvai,Z.N.,Exploring transcriptional regulatory networks in the worm,Cell,125(6):1032-1034,2006.
    [36]Franke,L.,van Bakel,H.,Diosdado,B.,van Belzen,M.,Wapenaar,M.,and Wijmenga,C.,Team:A Tool for the Integration of Expression,and Linkage and Association Maps,European Journal of Human Genetics,12(8):633-638,2004
    [37]Franke,L.,van Bakel,H.,Fokkens,L.,de Jong,E.D.,Egmont-Petersen,M.,and Wijmenga,C.,Reconstruction of a Functional Human Gene Network,with an Application for Prioritizing Positional Candidate Genes,The American Journal of Human Genetics,78(6):1011-1025,2006
    [38]J.Freudenberg,and P.Propping,A Similarity-Based Method for Genome-Wide Prediction of Disease-Relevant Human Genes,Bioinformatics,18(Suppl 2):S110-115,2002
    [39]Furney,S.J.,Higgins,D.G.,Ouzounis,C.A.,and Lopez-Bigas,N.,Structural and Functional Properties of Genes Involved in Human Cancer,BMC Genomics 7:3,2006
    [40]Gavin,A.C.,Bosehe,M.,Krause,R.,Grandi,P.,Marzioeh,M.,Bauer,A.,Sehultz,J.,Rick,J.M.,Miehon,A.M.,Crueiat,C.M.,Remor,M.,Hofert,C.,Sehelder,M.,Brajenovie,M.,Ruffner,H.,Merino,A.,Klein,K.,Hudak,M.,Diekson,D.,Rudi,T.,Gnan,V.,Baueh,A.,Bastuek,S.,Huhse,B.,Leutwein,C.,Heurtier,M.A.,Copley,R.R.,Edelmann,A.,Querfurth,E.,Rybin,V.,Drewes,G.,Raida,M.,Bouwmeester,T.,Bork,P.,Seraphin,B.,Kuster,B.,Neubauer,G.,and Superti-Furga,G.,Functional organization of the yeast proteome by systematic analysis of protein complexes,Nature,415(6868):141-147,2002.
    [41]Girvan,M.,and Newman,M.E.,Community structure in social and biological networks,Proceedings of the National Academy of Sciences of the United States of America,99(12):7821-7826,2002.
    [42]Glazier,A.M.,Nadeau,J.H.,and Aitman,T.J.,Finding Genes That Underlie Complex Traits,Science 298.5602:2345-2349,2002.
    [43]Glenner,G.G.,and Wong,C.W.,Alzheimer's Disease and Down's Syndrome:Sharing of a Unique Cerebrovascular Amyloid Fibril Protein,Biochemical and Biophysical Research Communications,122(3):1131-1135,1984
    [44]Goh,K.I.,Cusick,M.E.,Valle,D.,Childs,B.,Vidal,M.,and Barabasi,A.L.,The Human Disease Network,Proceedings of the National Academy of Sciences of the United States of America,104(21):8685-8690,2007
    [45]Golde,T.E.,Estus,S.,Usiak,M.,Younkin,L.H.,and Younkin,S.G.,Expression of Beta Amyloid Protein Precursor Mrnas:Recognition of a Novel Alternatively Spliced Form and Quantitation in Alzheimer's Disease Using Pcr,Neuron 4(2):253-267,1990
    [46]Goedert,M.,and Spillantini,M.G.,A Century of Alzheimer's Disease,Science 314(5800):777-781,2006
    [47]Couzin,J.,and Kaiser,J.,Genome-wide association:Closing the net on common disease genes,Science,316:820+822,2007.
    [48]Hamosh,A.,Scott,A.F.,Amberger,J.,Bocchini,C.,Valle,D.,and McKusick,V.A.,Online Mendelian Inheritance in Man(Omim),a Knowledgebase of Human Genes and Genetic Disorders,Nucleic Acids Research,30(1):52-55,2002.
    [49]Harbison,C.T.,Gordon,D.B.,Lee,T.I.,Rinaldi,N.J.,Macisaac,K.D.,Danford,T.W.,Hannett,N.M.,Tagne,J.B.,Reynolds,D.B.,Yoo,J.,Jennings,E.G.,Zeitlinger,J.,Pokholok,D.K.,Kellis,M.,Rolfe,P.A.,Takusagawa,K.T.,Lander,E.S.,Gifford,D.K.,Fraenkel,E.,and Young,R.A.,Transcriptional regulatory code of a eukaryotic genome,Nature,431(7004):99-104,2004.
    [50]Hartwell,L.H.,Hopfield,J.J.,Leibler,S.,and Murray,A.W.,From molecular to modular cell biology,Nature,402(6761 Suppl):C47-52,1999.
    [51]Hastie,T.,Tibshirani,R.,and Friedman,J.,The Elements of Statistical Learning:Data Mining,Inference,and Prediction,Springer,New York,2001.
    [52]Hasty,J.,McMillen,D.,and Collins,J.J.,Engineered gene circuits,Nature,420(6912):224-230,2002.
    [53]Hayhurst,G.P.,Lee,Y.H.,Lambert,G.,Ward,J.M.,and Gonzalez,F.J.,Hepatocyte Nuclear Factor 4alpha(Nuclear Receptor 2al) Is Essential for Maintenance of Hepatic Gene Expression and Lipid Homeostasis,Molecular and Cellular Biology,21(4):1393-1403,2001
    [54]Ho,Y.,Gruhler,A.,Heilbut,A.,Bader,G.D.,Moore,L.,Adams,S.L.,Millar,A.,Taylor,P.,Bennett,K.,Boutilier,K.,Yang,L.Y.,Wolting,C.,Donaldson,I.,Schandorff,S.,Shewnarane,J.,Vo,M.,Taggart,J.,Goudreault,M.,Muskat,B.,Alfarano,C.,Dewar,D.,Lin,Z.,Michalickova,K.,Willems,A.R.,Sassi,H.,Nielsen,P.A.,Rasmussen,K.J.,Andersen,J.R.,Johansen,L.E.,Hansen,L.H.,Jespersen,H.,Podtelejnikov,A.,Nielsen,E.,Crawford,J.,Poulsen,V.,Sorensen,B.D.,Matthiesen,J.,Hendrickson,R.C.,Gleeson,F.,Pawson,T.,Moran,M.F.,Durocher,D.,Mann,M.,Hogue,C.W.V.,Figeys,D.,and Tyers,M.,Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry,Nature,415(6868):180-183,2002.
    [55]Hoerl,A.E.and Kennard,R.W.,Ridge regression:Biased estimation for nonorthogonal problems,Technoraetrics,12:55-67,1970
    [56]Huang,H.,Winter,E.E.,Wang,H.,Weiustock,K.G.,Xing,H.,Goodstadt,L.,Stenson,P.D.,Cooper,D.N.,Smith,D.,Alba,M.M.,Ponting,C.P.,and Fechtel,K.,Evolutionary Conservation and Selection of Human Disease Gene Orthologs in the Rat and Mouse Genomes,Genome Biology,5(7):R47,2004
    [57]Hugot,J.P.,Chamaillard,M.,Zouali,H.,Lesage,S.,Cezard,J.P.,Belaiche,J.,Almer,S.,Tysk,C.,O'Morain,C.A.,Gassull,M.,Binder,V.,Finkel,Y.,Cortot,A.,Modigliani,R.,Laurent-Puig,P.,Gower-Rousseau,C.,Macry,J.,Colombel,J.F.,Sahbatou,M.,and Thomas,G.,Association of Nod2 Leucine-Rich Repeat Variants with Susceptibility to Crohn's Disease,Nature 411(6837):599-603,2001
    [58]Ito,T.,Chiba,T.,Ozawa,R.,Yoshida,M.,Hattori,M.,and Sakaki,Y.,A comprehensive two-hybrid analysis to explore the yeast protein interactome,Pro- ceedings of the National Academy of Sciences of the United States of America,98(8):4569-4574,2001.
    [59]Iyer,V.R.,Horak,C.E.,Scafe,C.S.,Botstein,D.,Snyder,M.,and Brown,P.O.,Genomic binding sites of the yeast cell-cycle transcription factors sbf and mbf,Nature,409(6819):533-538,2001.
    [60]Jeong,H.,Tombor,B.,Albert,R.,Oltvai,Z.N.,and Barabasi,A.L.,The largescale organization of metabolic networks,Nature,407(6804):651-654,2000.
    [61]Jiang,G.,Nepomuceno,L.,Hopkins,K.,and Sladek,F.M.,Exclusive Homodimerization of the Orphan Receptor Hepatocyte Nuclear Factor 4 Defines a New Subclass of Nuclear Receptors,Molecular and Cellular Biology,15(9):5131-5143,1995
    [62]Jiang,R.,Tu,Z.,Chen,T.,and Sun,F.,Network motif identification in stochastic networks,Proceedings of the National Academy of Sciences of the United States of America,103(25):9404-9409,2006.
    [63]Kann,M.G.,Protein Interactions and Disease:Computational Approaches to Uncover the Etiology of Diseases,Briefings in Bioinformatics,8(5):333-346,2007.
    [64]Hoerl,A.E,Kennard,R.W.and Baldwin,K.F.,Ridge regression:Some simulations,Communications in Statistics,Theory Methods 4:105-123,1975.
    [65]Kirkpatriek,S.,Gelatt,C.D.and Veeehi,M.P.,Optimization by Simulated Annealing,Science,220(4598):671-680,1983.
    [66]Kitano,H.,Computational systems biology,Nature,420(6912):206-210,2002.
    [67]Kohler,S.,Bauer,S.,Horn,D.,and Robinson,P.N.,Walking the Interactome for Prioritization of Candidate Disease Genes,The American Journal Human Genetics,82(4):949-958,2008
    [68]Kolodner,R.D.,Tytell,J.D.,Sehmeits,J.L.,Kane,M.F.,Gupta,R.D.,Weger,J.,Wahlberg,S.,Fox,E.A.,Peel,D.,Ziogas,A.,Garber,J.E.,Syngal,S.,Anton-Culver,H.,and Li,F.P.,Germ-Line Msh6 Mutations in Coloreetal Cancer Families,Cancer Research,59(20):5068-5074,I999.
    [69]Koonin,E.V.,Wolf,Y.I.,and Karev,G.P.,The structure of the protein universe and genome evolution,Nature,420(6912):218-223,2002.
    [70]Krishnan,V.G.,and Westhead,D.R.,A Comparative Study of Machine-Learning Methods to Predict the Effects of Single Nucleotide Polymorphisms on Protein Function,Bioinformatics 19(17):2199-2209,2003
    [71]Kruglyak,L.,and Lander,E.S.,High-resolution genetic mapping of complex traits,The American Jounal of Human Genetics,56:1212-1223,1995
    [72]Lage,K.,Karlberg,E.O.,Storling,Z.M.,Olason,P.I.,Pedersen,A.G.,Rigina,O.,Hinsby,A.M.,Turner,Z.,Pociot,F.,Tommerup,N.,Moreau,Y.,and Brunak,S.,A Human Phenome-Interactome Network of Protein Complexes Implicated in Genetic Disorders,Nature Biotechnology,25(3):309-316,2007
    [73]Lausen,J.,Thomas,H.,Lemm,I.,Bulman,M.,Borgschulze,M.,Lingott,A.,Hattersley,A.T.,and Ryffel,G.U.,Naturally Occurring Mutations in the Human Hnf4alpha Gene Impair the Function of the Transcription Factor to a Varying Degree,Nucleic Acids Research,28(2):430-437,2000
    [74]Lander,E.S.,and Schork,N.J.,Genetic Dissection of Complex Traits,Science,265.5181:2037-2048,1994.
    [75]Lee,T.I.,Rinaldi,N.J.,Robert,F.,Odom,D.T.,Bar-Joseph,Z.,Gerber,G.K.,Hannett,N.M.,Harbison,C.T.,Thompson,C.M.,Simon,I.,Zeitlinger,J.,Jennings,E.G.,Murray,H.L.,Gordon,D.B.,Ren,B.,Wyrick,J.J.,Tagne,J.B.,Volkert,T.L.,Fraenkel,E.,Gifford,D.K.,and Young,R.A.,Transcriptional regulatory networks in saccharomyces cerevisiae,Science,298(5594):799-804,2002.
    [76]Lee,H.K.,Hsu,A.K.,Sajdak,J.,Qin,J.,and Pavlidis,P.,Coexpression analysis of human genes across many microarray data sets,Genome Research,14(6):1085-1094,2004.
    [77]Lu,P.,Liu,J.,Melikishvili,M.,Fried,M.G.,and Chi,Y.I.,Crystallization of Hepatoeyte Nuclear Factor 4 Alpha(Hnf4 Alpha) in Complex with the Hnfl Alpha Promoter Element,Acta Crystallographica Section F:Structural Biology and Crystallization Communications,64(Pt 4):313-317,2008.
    [78]Lockhart,D.J.,Dong,H.,Byrne,M.C.,Follettie,M.T.,Gallo,M.V.,Chee,M.S.,Mittmann,M.,Wang,C.,Kobayashi,M.,Horton,H.,and Brown,E.L.,Expression monitoring by hybridization to high-density oligonueleotide arrays,Nature Biotechnology,14(13):1675-1680,1996.
    [79]Luan,Y.,and Li,H.,Group Additive Regression Models for Genomic Data Analysis,Biostatistics,9(1):100-113,2008
    [80]Mahadevan,P.,Krioukov,D.,Fall,K.,and Vahdat,A.,Systematic Topology Analysis and Generation Using Degree Correlations,SIGCOMM,36:135-46,2006.
    [81]Malo,N.,Libiger,O.,and Sehork,N.J.,Accommodating Linkage Disequilibrium in Genetic-Association Analyses via Pddge Regression,The American Jounal of Human Genetics,82,pp:375-385,2008
    [82]Margulies,M.,Egholm,M.,Altman,W.E.,Attiya,S.,Bader,J.S.,Bemben,L.A.,Berka,J.,Braverman,M.S.,Chen,Y.J.,Chen,Z.,Dewell,S.B.,Du,L.,Fierro,J.M.,Gomes,X.V.,Godwin,B.C.,He,W.,Helgesen,S.,Ho,C.H.,Irzyk,G.P.,Jando,S.C.,Alenquer,M.L.,Jarvie,T.P.,Jirage,K.B.,Kim,J.B.,Knight,J.R.,Lanza,J.R.,Leamon,J.H.,Lefkowitz,S.M.,Lei,M.,Li,J.,Lohman,K.L.,Lu,H.,Makhijani,V.B.,McDade,K.E.,McKenna,M.P.,Myers,E.W.,Nickerson,E.,Nobile,J.R.,Plant,R.,Puc,B.P.,Ronan,M.T.,Roth,G.T.,Sarkis,G.J.,Simons,J.F.,Simpson,J.W.,Srinivasan,M.,Tartaro,K.R.,Tomasz,A.,Vogt,K.A.,Volkmer,G.A.,Wang,S.H.,Wang,Y.,Weiner,M.P.,Yu,P.,Begley,R.F.,and Rothberg,J.M.,Genome Sequencing in Microfabricated High-Density Picolitre Reactors,Nature 437(7057):376-380,2005.
    [83]Maslov,S.,and Sneppen,K.,Specificity and stability in topology of protein networks,Science,296(5569):910-913,2002.
    [84]Masseroli,M.,Galati,O.,and Pinciroli,F.,Gfinder:Genetic Disease and Phenotype Location Statistical Analysis and Mining of Dynamically Annotated Gene Lists,Nucleic Acids Research,33(Web Server issue):W717-723,2005
    [85]Milo,R.,Itzkovitz,S.,Kashtan,N.,Levitt,R.,Shen-Orr,S.,Ayzenshtat,I.,Sheffer,M.,and Alon,U.,Superfamilies of evolved and designed networks,Science,303(5663):1538-42,2004.
    [86]Middendorf,M.,Ziv,E.,Adams,C.,Hom,J.,Koytcheff,R.,Levovitz,C.,Woods,G.,Chen,L.,and Wiggins,C.,Discriminative topological features reveal biological network mechanisms,Bmc Bioinformatics,5:-,2004.
    [87]Mewes,H.W.,Dietmann,S.,Frishman,D.,Gregory,R.,Mannhaupt,G.,Mayer,K.F.,Munsterkotter,M.,Ruepp,A.,Spannagl,M.,Stumpflen,V.,and Rattei,T.,MIPS:analysis and annotation of genome information in 2007,Nucleic Acids Research,36(Database issue):D196-201,2008
    [88]Miura,A.,Yamagata,K.,Kakei,M.,Hatakeyama,H.,Takahashi,N.,Fukui,K.,Nammo,T.,Yoneda,K.,Inoue,Y.,Sladek,F.M.,Magnuson,M.A.,Kasai,H.,Miyagawa,J.,Gonzalez,F.J.,and Shimomura,I.,Hepatocyte Nuclear Factor-4alpha Is Essential for Glucose-Stimulated Insulin Secretion by Pancreatic Beta-Cells,The Journal of Biological Chemistry,281(8):5246-5257,2006
    [89]Moir,R.D.,Lynch,T.,Bush,A.I.,Whyte,S.,Henry,A.,Portbury,S.,Multhaup,G.,Small,D.H.,Tanzi,R.E.,Beyreuther,K.,and Masters,C.L.,Relative Increase in Alzheimer's Disease of Soluble Forms of Cerebral Abeta Amyloid Protein Precursor Containing the Kunitz Protease Inhibitory Domain,The Journal of Biological Chemistry,273(9):5013-5019,1998.
    [90]Motulsky,A.G.,Genetics of complex diseases,Journal of Zhejiang University SCIENCE B,7(2):167-168,2006
    [91]Nebert,D.W.,Dalton,T.P.,Stuart,G.W.,and Carvan,M.J.,3rd.”Gene-Swap Knock-In” Cassette in Mice to Study Allelic Differences in Human Genes,Annals of the New York Academy of Sciences,919:148-170,2000
    [92]Newman,M.E.J.,Assortative mixing in networks,Physical Review Letters,89:208701.1-208701.4,2002.
    [93]Newman,M.E.J.,Mixing patterns in networks,Physical Review E,67:026126.1-026126.13,2002.
    [94]Odom,D.T.,Zizlsperger,N.,Gordon,D.B.,Bell,G.W.,Rinaldi,N.J.,Murray,H.L.,Volkert,T.L.,Schreiber,J.,Rolfe,P.A.,Gifford,D.K.,Fraenkel,E.,Bell,G.I.,and Young,R.A.,Control of Pancreas and Liver Gene Expression by Hnf Transcription Factors,Science 303(5662):1378-1381,2004
    [95]Olsen,J.V.,Blagoev,B.,Gnad,F.,Macek,B.,Kumar,C.,Mortensen,P.,and Mann,M.,Global,in Vivo,and Site-Specific Phosphorylation Dynamics in Signaling Networks,Cell 127(3):635-648,2006
    [96]Oltvai,Z.N.,and Barabasi,A.L.,Systems biology,life's complexity pyramid,Science,298(5594):763-764,2002.
    [97]Oti,M.,and Brunner,H.G.,The Modular Nature of Genetic Diseases,Clinical Genetics,71(1):1-11,2007
    [98]Oti,M.,Snel,B.,Huynen,M.A.,and Brunner,H.G.,Predicting Disease Genes Using Protein-Protein Interactions,Journal of Medical Genetics,43(8):691-698,2006
    [99]Ott,J.,Analysis of Human Genetic Linkage,The Johns Hopkins University Press,Baltimore,1991.
    [100]Pagon,R.A.,Genetic Testing for Disease Susceptibilities:Consequences for Genetic Counseling,Trends in Molecular Medicine,8(6):306-307,2002
    [101]Park,T.,and CASELLA,G.,The Bayesian Lasso,The Journal of the American Statistical Association,103:681-686,2008
    [102]Pastor-Satorras,R,Vazquez,A,and Vespignani,A,Dynamical and correlation properties of the internet,Physical Review Letters,87:258701.1-258701.4,2001.
    [103]Pawson,T.,and Nash,P.,Assembly of Cell Regulatory Systems through Protein Interaction Domains,Science 300(5618):445-452,2003
    [104]Pearson,H.,What is a gene? Nature,441(7092):398-401,2006.
    [105]Portelius,E.,Zetterberg,H.,Gobom,J.,Andreasson,U.,and Blennow,K.,Targeted Proteomics in Alzheimer's Disease:Focus on Amyloid-Beta,Expert Review of Proteomics,5(2):225-237,2008.
    [106]Raghavachari,B.,Tasneem,A.,Przytycka,T.M.,and Jothi,R.,Domine:A Database of Protein Domain Interactions,Nucleic Acids Research,36.Database issue:D656-661,2008
    [107]Radicchi,F.,Castellano,C.,Cecconi,F.,Loreto,V.,and Parisi,D.,Defining and identifying communities in networks,Proceedings of the National Academy of Sciences of the United States of America,101(9):2658-2663,2004.
    [108]Rajas,F.,Gautier,A.,Bady,I.,Montano,S.,and Mithieux,G.,Polyunsaturated Fatty Acyl Coenzyme a Suppre.ss the Glucose-6-Phosphatase Promoter Activity by Modulating the DNA Binding of Hepatocyte Nuclear Factor 4 Alpha,The Journal of Biological Chemistry,277(18):15736-15744,2002
    [109]Ren,B.,Robert,F.,Wyrick,J.J.,Aparicio,O.,Jeimings,E.G.,Simon,I.,Zeitlinger,J.,Schreiber,J.,Hannett,N.,Kanin,E.,Volkert,T.L.,Wilson,C.J.,Bell,S.P.,and Young,R.A.,Genome-wide location and function of dna binding proteins,Science,290(5500):2306-2309,2000.
    [110]Rives,A.W.,and Galitski,T.,Modular organization of cellular networks,Proceedings of the National Academy of Sciences of the United States of America,100(3):1128-1133,2003.
    [111]Rossi,S.,Masotti,D.,Nardini,C.,Bonora,E.,Romeo,G.,Macii,E.,Benini,L.,and Volinia,S.,Tom:A Web-Based Integrated Approach for Identification of Candidate Disease Genes,Nucleic Acids Research,34(Web Server issue):W285-292,2006
    [112]Sachidanandam,R.,Weissman,D.,Schmidt,S.C.,Kakol,J.M.,Stein,L.D.,Marth,G.,Sherry,S.,Mullikin,J.C.,Mortimore,B.J.,Willey,D.L.,Hunt,S.E.,Cole,C.G.,Coggill,P.C.,Rice,C.M.,Ning,Z.,Rogers,J.,Bentley,D.R.,Kwok,P.Y.,Mardis,E.R.,Yeh,R.T.,Schultz,B.,Cook,L.,Davenport,R.,Dante,M.,Fulton,L.,Hillier,L.,Waterston,R.H.,McPherson,J.D.,Gilman,B.,Schaffner,S.,Van Etten,W.J.,Reich,D.,Higgins,J.,Daly,M.J.,Blumenstiel,B.,Baldwin,J.,Stange-Thomann,N.,Zody,M.C.,Linton,L.,Lander,E.S.,and Altshuler,D.,A Map of Human Genome Sequence Variation Containing 1.42 Million Single Nucleotide Polymorphisms,Nature,409(6822):928-933,2001
    [113]Scheuner,M.T.,Yoon,P.W.,and Khoury,M.J.,Contribution of Mendelian Disorders to Common Chronic Disease:Opportunities for Recognition,Intervention,and Prevention,American Journal of Medical Genetics Part C:Seminars in Medical Genetics,125C(1):50-65,2004
    [114]Schrem,H.,Klempnauer,J.,and Borlak,J.,Liver-Enriched Transcription Factors in Liver Function and Development.Part Ⅰ:The Hepatocyte Nuclear Factor Network and Liver-Specific Gene Expression,Pharmacological Reviews,54(1):129-158,2002
    [115]Schena,M.,Shalon,D.,Davis,R.W.,and Brown,P.O.,Quantitative monitoring of gene expression patterns with a complementary dna microarray,Science,270(5235):467-470,1995.
    [116]Shen-Orr,S.S.,Milo,R.,Mangan,S.,and Alon,U.,Network motifs in the transcriptional regulation network of escherichia coli,Nature Genetics,31(1):64-68,2002.
    [117]Shriner,D.,Baye,T.M.,Padilla,M.A.,Zhang,S.,Vaughan,L.K.,and Loraine,A.E.,Commonality of Functional Annotation:A Method for Prioritization of Candidate Genes from Genome-Wide Linkage Studies,Nucleic Acids Research,36(4):e26,2008
    [118]Spirin,V.,and Mirny,L.A.,Protein complexes and functional modules in molecular networks,Proceedings of the National Academy of Sciences of the United States of America,100(21):12123-12128,2003.
    [119]Stabenau,A.,McVicker,G.,Melsopp,C.,Proctor,G.,Clamp,M.,and Birney,E.,The Ensembl Core Software Libraries,Genome Research 14(5):929-933,2004
    [120]Stark,C.,Breitkreutz,B.J.,Reguly,T.,Boucher,L.,Breitkreutz,A.,and Tyers,M.,Biogrid:A General Repository for Interaction Datasets,Nucleic Acids Research,34:D535-9,,2006
    [121]Stoffel,M.,and Duncan,S.A.,The Maturity-Onset Diabetes of the Young (Modyl) Transcription Factor Hnf4alpha Regulates Expression of Genes Required for Glucose Transport and Metabolism,Proceedings of the National Academy of Sciences of the United States of America,94(24):13209-13214,1997
    [122]A.Tanay,R.Sharan and R.Shamir,Discovering statistically significant biclusters in gene expression data,Bioinformatics 18 Suppl 1:S136-44,2002
    [123]Tanay,A.,Sharan,R.,Kupiec M.,and Shamir,R.,Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data,Proceedings of the National Academy of Sciences of the United States of America,101(9):2981-6,2004
    [124]Tanzi,R.E.,and Bertram,L.,Twenty Years of the Alzheimer's Disease Amyloid Hypothesis:A Genetic Perspective,Cell,120(4):545-555,2005,
    [125]Tibshirani,R.,Regression shrinkage and selection via the lasso,Journal of the Royal Statistical Society Series B,58:267-288,1996.
    [126]Tiffin,N.,Kelso,J.F.,Powell,A.R.,Pan,H.,Bajic,V.B.,and Hide,W.A.,Integration of Text-and Data-Mining Using Ontologies Successfully Selects Disease Gene Candidates,Nucleic Acids Research,33(5):1544-1552,2005
    [127]Topol,E.J.,Murray,S.S.,and Frazer,K.A.,The genomics gold rush,The Journal of the American Medical Association,298:218-221,2007
    [128]Tornow,S.,and Mewes,H.W.,Functional modules by relating protein interaction networks and gene expression,Nucleic Acids Research,31(21):6283-6289,2003.
    [129]Turner,F.S.,Clutterbuck,D.R.,and Semple,C.A.,Pocus:Mining Genomic Sequence Annotation to Predict Disease Genes,Genome Biology,4(11):R75,2003
    [130]Uetz,P.,Giot,L.,Cagney,G.,Mansfield,T.A.,Judson,R.S.,Knight,J.R.,Lockshort,D.,Narayan,V.,Srinivasan,M.,Pochart,P.,Qureshi-Emili,A.,Li,Y.,Godwin,B.,Conover,D.,Kalbfleisch,T.,Vijayadamodar,G.,Yang,M.J.,Johnston,M.,Fields,S.,and Rothberg,J.M.,A comprehensive analysis of protein-protein interactions in saccharomyces cerevisiae,Nature,403(6770):623-627,2000.
    [131]van Driel,M.A.,Bruggeman,J.,Vriend,G.,Brunner,H.G.,and Leunissen,J.A.,A Text-Mining Analysis of the Human Phenome,European Journal of Human Genetics,14(5):535-542,2006
    [132]van Driel,M.A.,Cuelenaere,K.,Kemmeren,P.P.,Leunissen,J.A.,Brunner,H.G.,and Vriend,G.,Geneseeker:Extraction and Integration of Human Disease-Related Information from Web-Based Genetic Databases,Nucleic Acids Research,33(Web Server issue):W758-761,2005
    [133]Vazquez,A.,and Moreno,Y.,Resilience to damage of graphs with degree correlations,Physical Review E,67:015101.1-0151014.4,2003.
    [134]Vazquez,A.,Pastor-Satorras,R.,and Vespignani,A.,Large-scale topological and dynamical properties of the internet,Physical Review E,65:066130.1-066130.12,2002.
    [135]Velculescu,V.E.,Zhang,L.,Vogelstein,B.,and Kinzler,K.W.,Serial analysis of gene expression,Science,270(5235):484-487,1995.
    [136]Wall,M.E.,Hlavacek,W.S.,and Savageau,M.A.,Design principles for regulator gene expression in a repressible gene circuit,Journal of Molecular Biology,332(4):861-876,2003.
    [137]Watts,D.J.,and Strogatz,S.H.,Collective dynamics of 'small-world' networks,Nature,393(6684):440-442,1998.
    [138]Wilkie,G.S.,and Schirmer,E.C.,Guilt by Association:The Nuclear Envelope Proteome and Disease,Molecular(?) Cellular Proteomics 5(10):1865-1875,2006
    [139]Wu,Y.,Berends,M.J.,Mensink,R.G.,Kempinga,C.,Sijmons,R.H.,van Der Zee,A.G.,Hollema,H.,Kleibeuker,J.H.,Buys,C.H.,and Hofstra,R.M.,Association of Hereditary Nonpolyposis Colorectal Cancer-Related Tumors Displaying Low Microsatellite Instability with Msh6 Germline Mutations,The American Journal Human Genetics,65(5):1291-1298,1999.
    [140]Wu,C.H.,Apweiler,R.,Bairoch,A.,Natale,D.A.,Barker,W.C.,Boeckmann,B.,Ferro,S.,Gasteiger,E.,Huang,H.,Lopez,R.,Magrane,M.,Martin,M.J.,Mazumder,R.,O'Donovan,C.,Redaschi,N.,and Suzek,B.,The Universal Protein Resource(Uniprot):An Expanding Universe of Protein Information,Nucleic Acids Research 34.Database issue:D187-191,2006.
    [141]Wu,X.,Jiang,R.,Zhang,M.Q.,and Li,S.,Network-Based Global Inference of Human Disease Genes,Molecular Systems Biology,4:189,2008.
    [142]Xing,B.,and van der Laan,M.J.,A statistical method for constructing transcriptional regulatory networks using gene expression and sequence data,Journal of Computational Biology,12(2):229-246,2005.
    [143]Xu,J.,and Li,Y.,Discovering Disease-Genes by Topological Features in Human Protein-Protein Interaction Network,Bioinformatics 22(22):2800-2805,2006
    [144]Yamagata,K.,Oda,N.,Kaisaki,P.J.,Menzel,S.,Furuta,H.,Vaxillaire,M.,Southam,L.,Cox,R.D.,Lathrop,G.M.,Boriraj,V.V.,Chen,X.,Cox,N.J.,Oda,Y.,Yano,H.,Le Beau,M.M.,Yamada,S.,Nishigori,H.,Takeda,J.,Fajans,S.S.,Hattersley,A.T.,Iwasaki,N.,Hansen,T.,Pedersen,O.,Polonsky,K.S.,Bell,G.I.,and et al.,Mutations in the Hepatocyte Nuclear Factor-lalpha Gene in Maturity-Onset Diabetes of the Young(Mody3),Nature 384(6608):455-458,1996
    [145]Ye,Z.Q.,Zhao,S.Q.,Gao,G.,Liu,X.Q.,Langlois,R.E.,Lu,H.,and Wei,L.,Finding New Structural and Sequence Attributes to Predict Possible Disease Association of Single Amino Acid Polymorphism(Sap),Bioinformatics 23(12):1444-1450,2007
    [146]Yeger-Lotem,E.,Sattath,S.,Kashtan,N.,Itzkovitz,S.,Milo,R.,Pinter,R.Y.,Alon,U.,and Margalit,H.,Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction,Proceedings of the National Academy of Sciences of the United States of America,101(16):5934-5939,2004.
    [147]Yue,P.,and Moult,J.,Identification and Analysis of Deleterious Human Snps,Journal of Molecular Biology,356(5):1263-1274,2006
    [148]Zhang,A.,Protein interaction networks:computational analysis,Cambridge University Press,2009.
    [149]Zhu,H.,and Snyder,M.,Protein chip technology,Current Opinion in Chemical Biology,7(1):55-63,2003.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700