摘要
基因表达数据蕴含着大量的生物信息,在生物基因信息研究中,筛选表达水平发生显著变化的差异基因是认识疾病形成机理和辅助靶点药物研究的关键问题.根据急性髓细胞白血病(AML)的基因表达数据,构造基因均值差序列,建立贝叶斯分层混合模型,并为模型的参数赋予具有基因生物特征的先验信息.采用马尔可夫链蒙特卡洛(MCMC)算法对模型参数进行估计,并筛选出急性髓细胞白血病差异表达基因.在实际数据分析中,从美国生物信息中心(NCBI)的高通量基因表达数据库中获取急性髓细胞白血病基因数据集,从经过非特异滤波预处理的14688个急性髓细胞白血病基因中筛选出711个差异表达基因,差异表达基因数仅占急性髓细胞白血病基因总数的4.84%,这一结果与基因差异表达的生物学原理相吻合.
Based on the fact that gene expression data includes lots of biological message,detecting differential expressed genes can make significance sense to help learn more about the diseases and the discovery of new drugs. In this paper, a Bayesian Hierarchical Normal Mixture model is constructed to detect differential expressed genes of acute myeloid leukemia,with fix components of three. Specific priors are introduced into the model, which are in some sense reflecting the biological characters of genes and make the model more practical. The parameters are estimated via the Markov Chain Monte Carlo(MCMC) method. A set of data from the National Center for Biotechnology Information in USA is analyzed. Result shows that 711 of the 14688 acute myeloid leukemia genes are differential expressed. That is to say,the number of differential expressed genes account for 4.84% of the total number of genes.The results are in consistent with the biological principle, i.e., most genes are not differential expressed.
引文
[1]应嘉,赵睿颖,尚彤.生物信息学在人类基因组计划中的应用[J].北京大学学报,2002, 34(4):389-392.
[2] Chen Y, Dougherty E, et al., Ratio-based decisions and the quantitative analysis of cDNA microarray images[J]. Biomed Opt, 1997, 2:364-374.
[3] Cui X. et al, Improved statistical tests for differential gene expression by shrinking variance components estimates[J]. Biostatistics, 2005, 6:59-75.
[4] Raphael, Gottardo et al., A Flexible and powerful Bayesian hierarchical model for ChIP-Chip Experiments[J]. Biometrics, 2007, 64:468-478.
[5]蒋定锋,潘娟娟,赵耐青.差异表达基因筛选方法的比较[J].中国卫生统计,2006, 23(5):417-420.
[6] Newton M A. et al., On differential variability of expression ratios:improving statistical inference about gene expression changes from microarray data[J]. Comput Biol, 2001, 8:37-52.
[7] Kendziorski C M. et al., On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles[J]. Statist Med, 2003, 22:3899-3914.
[8]Lonnstedt I,and Speed T. Replicated microarray data[J]. Statist, Sinica, 2002, 12:31-46.
[9] Lo K, and Gottardo R. Flexible empirical Bayes models for differential gene expression[J]. Bioinformatics, 2003, 23(3):328-335.
[10] Hong Z P, and Lian H. A Bayesian hierarchical model for outlier expression detection[J]. Computational Statistics and Data Analysis, 2012, 56:4146-4156.
[11]阳洁,江庭秀,陈宏.急性髓系白血病17种基因异常的检测[J].现代肿瘤医学,2014, 22(12):2955-2958.
[12] Erin, et al., A Bayesian mixture model for metaanalysis of microarray studies[J]. Funct Interger Genomics, 2008, 8:43-53.
[13] Richardson S,Green P J. On Bayesian analysis of mixtures with an unknown number of components[J]. Journal of the Royal Statistical Society, Series B, 1997, 59(4):731-792.
[14]曹诗若,苏宇楠,田茂再.基于分层线性模型的贝叶斯推断及其应用[J].统计与决策, 2015(03):4-8.
[15]曾平,王婷,黄水平,赵华硕.定性临床试验资料meta分析的经验贝叶斯模型原理和应用[J].中国卫生统计,2012, 29(05):657-660.
[16]王杨,王睿,陈涛,李卫.贝叶斯分层模型在医疗器械临床试验中的应用[J].中华疾病控制杂志,2012,16(03):254-256.