基于贝叶斯方法的基因调控网络构建
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着基因芯片技术的发展,产生的海量基因表达数据与一定计算方法的结合可以重构基因调控网络。目前已有许多模型应用于基因调控网络的构建,其中贝叶斯网络模型以其坚实的理论基础、知识结构的自然表述、灵活的推理能力以及高效的决策机制使其应用范围越来越广泛,已成为构建基因调控网络的一种有力工具。
     使用贝叶斯方法构建基因调控网络已经确立了许多研究方向,如以信息论为基础的约束性方法,在基因表达数据中融合先验知识,无标尺网络的研究等。其中,使用互信息理论构建基因调控网络可以考虑其它基因对此基因的影响,但它只提供基因的功能特性而没用提供基因间的因果关系;在基因中融合先验知识可以克服基因的稀疏问题但是缺少在基因时序数据集中融合先验知识的实验比对,所以无法获得对先验知识错误的敏感度信息。
     本文在总结分析贝叶斯方法构建基因调控网络研究现状的基础上,对以上问题进行了改进,主要完成了以下工作:
     1.在基于条件互信息的路径一致性算法PCA-CMI的基础上,利用节点拓扑排序建立了构建调控网络的PCA-CMI-NO算法。为了建立这一算法对图分裂方法加以改进:首先对基因对间的互信息进行筛选,然后按贝叶斯得分对子图排序,根据子图顺序选取不同子图中含相同基因对间边的方向,从而确定基因表达数据中节点的顺序。最后,将节点拓扑排序结果应用于PCA-CMI所构建的网络,获得有向网络,同时使用条件互信息去除独立关系的边以提高网络准确率;
     2.利用吉布斯分布方法的能量函数融合一源生物先验知识,并将其拓展到多源生物先验知识的融合上面,并用不同可信度指标来减小先验知识与数据不一致的影响,最后分别使用MCMC算法与爬山算法在时序表达数据上构建不同生物源的基因调控网络,获得了对先验知识错误的敏感度信息;
     3.第一种方法在DREAM3的10基因和50基因酵母(yeast)上进行实验,第二种方法使用KEGG数据库中选取的14个基因的调控网络(包括3个转录因子),一组先验知识为Lee提出的实验数据和另外一组先验知识是Harbison提出的实验数据进行实验,分别实现了基于贝叶斯方法的基因调控网络构建实验系统,从而验证了方法的有效性。
With the development of gene chip technology, the way that massive gene expressiondata are combined with certain calculation methods can result in the construction of a generegulatory network. There are many models used in the construction of gene regulatorynetworks, Bayesian network model with its solid theoretical foundation, naturalrepresentation of the knowledge structure, flexible reasoning ability and convenientdecision-making mechanism makes it’s application range more widely, becoming apowerful tool for building gene regulatory networks.
     Using Bayesian methods to reconstruct gene regulatory networks has establishedmany research directions, such as information theory-based constraint method, priorknowledge integration, and large scale free network research and so on. Mutualinformation theory to construct the gene regulatory network can consider the impact of theother genes to this gene, but it only provides the function features of genes, can’t offercausal relationships between genes; prior knowledge integration can overcome the sparseproblem of a gene network but lack the experiments in time-series expression data, makingit impossible to obtain error sensitivity information on prior knowledge.
     This article summarized the research status of Bayesian approaches to construct thegene regulatory network, and made some improvements on this thesis, the followingspecific research work was completed:
     1. Combined the node ordering with path consistency algorithm based on conditionalmutual information, solving the problem that the network has no causal directions. Toachieve this purpose, we made some improvements on the graph splitting method: firstfiltered mutual information between a pair of nodes, then arranged substructures indescending order of Bayesian scores, and finally according to the arrangement chose the orientation of the edge between the same gene pair included in the different substructures;
     2. Used Gibbs distribution method to integrate with the one source and multi-sourcebiological prior knowledge respectively, and applied different confidence indicators toreduce the impact of inconsistencies between prior knowledge and data, and finally appliedthe MCMC algorithm and hill-climbing algorithm in the time-series expression data tobuild the gene regulatory networks to verify its effectiveness;
     3. The first method used10and50yeast genes in DREAM3respectively; the secondmethod selected the14genes (including3transcription factors) from the KEGG database,and a set of prior knowledge applied the data which Lee had proposed, and another set ofprior knowledge used the data Harbison had proposed; then a gene regulatory constructionexperiment system was built, which verified the effectiveness of this two methods.
引文
[1]杨斌.微分方程系统在基因调控网络和代谢途径中的应用研究[D]:[硕士].济南大学,2010.
    [2]何海燕.基于贝叶斯网络技术的基因调控网络构建方法研究[D]:[硕士].合肥工业大学,2009.
    [3]雷耀山.基于数据挖掘的基因调控网络集成分析系统的算法设计与实现[D]:[硕士].上海大学,2004.
    [4]刘昱昊,刘贵霞,苏兰莹等.边排序贝叶斯网络结构学习算法应用于基因调控网络构建[J].吉林大学学报,2010,48(4):625-630.
    [5]崔光照,张勋才,牛云云.重构基因调控网络[J].生物信息学,2007,5(3):125-128.
    [6] Graudenzi A, Serra R, Villani M, et al. Dynamical properties of a Boolean Model ofGene Regulatory Network with Memory[J]. J Comput Biol,2011,18(10):1291-1303.
    [7] Qian L, Wang H, Dougherty E. Inference of noise nonlinear differential equationmodels for gene regulatory networks using genetic programming and Kalman filtering[C]. IEEE Trans Signal process,2008,56(7):3327-3339.
    [8] Gustafsson M, H rnquist M. Stability and Flexibility from a System Analysis of GeneRegulatory Networks Based on Ordinary Differential Equations[J]. The OpenBioinformatics Journal,2011,5:26-33.
    [9] Liu ZP, Zhang WW, Horimoto K, et al. A Gaussian graphical model for identifyingsignificantly responsive regulatory networks from time series gene expression data[C].IEEE6thInternational Conference on System Biology (ISB), Xi’an, China,2012.
    [10] Allen E, Moing A, Ebbels TMD, et al. Correlation Network Analysis reveals asequential reorganization of metabolic and transcriptional states during germinationand gene-metabolite relationships in developing seedlings of Arabidopsis[J].BMCSystem Biology,2010,4:62.
    [11] Kabir M, Noman N, Iba H. Reverse engineering gene regulatory network frommicroarray data using linear time-variant model[J]. BMC Bioinformatics,2010,11(Suppl1):S56.
    [12] Hecker M, Lambeck S, Toepfer S, et al. Gene regulatory network inference: dataintegration in dynamic models-a review[J]. Biosystems,2009,96(1):86-103.
    [13] Friedman N, Linial M, Nachman I, et al. Using Bayesian Networks to AnalyzeExpression Data[J]. J Comput Biol,2000,7(3/4):601-620.
    [14] Li L, Xu JZ, Yang DY, et al. Computational approaches for microRNA studies: areview[J]. Mamm Genome,2010,21:1-12.
    [15] Lo K, Raftery AE, Dombek KM, et al. Integrating external biological knowledge inthe construction of regulatory networks from time-series expression data[J]. BMCSystem Biology,2012,6(101):1-13.
    [16] Sima C, Hua JP, Jung SW. Inference of gene regulatory networks using time-seriesdata: a survey [J].Current Genomics,2009,10:416-429.
    [17] Sales G, Romualdi C. Parmigene-a parallel R package for mutual informationestimation and gene network reconstruction[J].System biology,2011,27(13):1876-1877.
    [18] Grzegorczyk M, Husmeier D. Improvements in the reconstruction of time-varyinggene regulatory networks: dynamic programming and regulation by informationsharing among genes[J]. Bioinformatics,2011,27:693-699.
    [19] Chaitankar V, Zhang CY, Ghosh P. Gene Regulatory Network Inference UsingPredictive Minimum Description Length Principle and Conditional MutualInformation[C].2009International Joint Conference on Bioinformatics,2009.
    [20] Liang S, Fuhrman S, Somogyi R. Reveal:a general reverse engineering algorithm forinference of genetic network architectures[J]. Pac Symp Biocomput,1998,3:18-29.
    [21] Chaitankar V, Ghosh P, Perkins EJ, et al. A novel gene network inference algorithmusing predictive minimum description length approach[J]. BMC Systems Biology,2010,4(Suppl1):S7.
    [22] Zhang XJ, Zhao XM, He K, et al. Inferring gene regulatory networks from geneexpression data by path consistency algorithm based in conditional mutual information.Bioinformatics[J],2012,100:98-104.
    [23] Pena JM, Biorkegren J, et al. Growing Bayesian network models of gene networksfrom seed genes[J]. Bioinformatics,2005,21(Suppl2):ii224-ii229.
    [24] Watanabe Y, Seno S, Kuharaka S, et al. An estimation method for inference of generegulatory network using Bayesian network with uniting of partial problems[J]. BMCGenomics,2012,13(Suppl1):S12.
    [25] Bottcher SG, Dethlefen C. Deal: A Package for Learning Bayesian Networks[J].EconPapers,2003,8(i20):1-41.
    [26] Hecker M, Lambeck S, Toepfer S. Gene regulatory network inference: dataintegration in dynamic models-a review[J]. Biosystems,2009,96:86-103.
    [27] Tamada Y, Kim SY, Bannai H, et al. Eestimating gene networks from gene expressiondata by combining Bayesian network model with promoter element detection[J].Bioinformatics,2003,19(Suppl2):i227-i236.
    [28] Werhli AV. Bayesian network structure inference with an Hierarchical Bayesianmodel [J]. SBIA,2010,6404:92-101.
    [29] Ernst J, Beg QK, Kay KA, et al. A semi-supervised method for predictingtranscription factor—gene interactions in Escherichia coli[J]. PLoS Comput. Biol.,2008,4(3):e1000044.
    [30] Schulz M, Davanny W, Gitter A. et al. DREM2.0: improved reconstruction ofdynamic regulatory networks from time-series expression data[J]. BMC Syst. Biol.,2012,6:104.
    [31] Fu Y, Jarboe L, Dickerson J. Reconstructing genome-wide regulatory network ofE.coli using transcriptme data and predicted transcription factor activities[J]. BMCBioinformatics,2012,12:233.
    [32] Ensen ST, Chen G, Stoeckert C. Bayesian variable selection and data integration forbiological regulatory networks[J]. Ann. Appl. Stat,2007,1(2):612-633.
    [33] Li C, Li H. Network-constrained regularization and variable selectionfor analysis ofgenomic data[J]. Bioinformatics,2008,24(9):1175-1182.
    [34] Wang Y, Joshi T, Zhang XS, Xu D, et al. Inferring gene regulatory networks frommultiple microarray datasets[J]. Bioinformatics,2006,22(19):2413-2420.
    [35] Cho KH, Choo SM, Jung Sh, et al. Reverse engineering of gene regulatory networks[J]. Nat Biotechnol,2007,1(3):149-163.
    [36] Lemmens K, Dhollander T, De Bie T, et al. Inferring transcriptional modules fromCHIP-chip, motif and microarray data[J]. Genome Biol,2006,7(5):R37.
    [37] Vijesh N, Chakrabarti SK, Sreekumar J. Modeling of gene regulatory networks: Areview[J]. J.Biomedical Science and Engineering (JBiSE),2013,6:223-231.
    [38] McAdams HH, Arkin A. It’s a noisy business! Genetic regulation at the nanomolarscale[J]. Trends in Genetics,1999,15(2):65-69.
    [39] Shea MA, Ackers GK. The OR control system of bacteriophage λ: Aphysical-chemical model for gene regulation[J]. Journal of Molecular Biology,1985,181(2):211-230.
    [40] Paulsson J. Models of stochastic gene expression[J]. Physics of Life Reviews,2005,2(2):157-175.
    [41] Ioannis AM, Andrei D, Dimitris T. Gene regulatory networks modeling using adynamic evolutionary hybrid [J]. BMC Bioinformatics,2010,11:140.
    [42]付丹丹.贝叶斯网络学习算法研究[J].大庆师范学院学报,2011,31(3):36-39.
    [43]李冰寒,刘三洋,李战国.构建贝叶斯网络本质图的新方法[J].计算机工程与应用,2011,47(7):25-29.
    [44] Spirtes P, Glymour C. An Algorithm for Fast Recovery of Sparse Causal Graphs[J].Social Science Computer Review1991,9(1):62-72.
    [45] Cheng J, Greiner R, Kelly J, et al. Learning Bayesian networks from data: Aninformation-theory based approach[J]. Artificial Intelligence,2002,137:43-90.
    [46] Chen XW, Anantha G, Wang XK. An effective structure learning method forconstructing gene networks [J]. Bioinformatics,2006,22(11):1367-1374.
    [47] Gustafsson M, Hornquist M, Lundstrom J, et al. Reverse Engineering of GeneNetworks with LASSO and Nonlinear Basis Functions[J]. The Challenges of SystemsBiology,2009,1158:265-275.
    [48] Marbach D, Prill RJ, Schaffter T, et al. Revealing strengths and weaknesses ofmethods for gene network inference[J]. PNAS,2010,107(14):6286-6291.
    [49] Marbach D, Schaffter T, Mattiussi C, et al. Generating realistic in silico genenetworks for performance assessment of Reverse Engineering methods[J]. Journal ofComputational Biology,2009,16(2):229-239.
    [50] Zhang Y, Deng ZD, Jiang HS, et al. Dynamic Bayesian network (DBN) with structureexpectation maximization (SEM) for modeling of gene network from time series geneexpression data[C]. Proc the2006International Conference on Bioinformatics&Computational Biology. Las Vegas, USA: CSREA Press,2006.
    [51] Lee TI, Rinaldi NJ, Robert F, et al. Transcriptional regulatory networks inSaccharomyces Cerevisiae[J].Science,2002,298(5594):799-804.
    [52] Bernard A, Hartemink AJ. Informative structure priors: joint learning of dynamicregulatory networks from multiple types of data[C]. In Pacific Symposium onBiocomputing. New Jersey,2005.
    [53]张妤,邓志东,孙欣,贾培发.基于动态Bayesian网络的基因调控网络建模[J].清华大学学报(自然科学报),2008,48(7):1173-1177.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700