基于基因表达谱及序列特征的转录调控关系挖掘
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
基因转录调控在生物体中广泛存在,并对生物体行使正常的生理功能起着至关重要的作用。对基因调控网络进行研究,有助于增进人们对生物系统特征的了解。实验技术的进步,尤其是包括微阵列芯片技术在内的高通量实验的广泛开展,使得生物学数据海量涌现。利用微阵列等高通量数据进行转录调控关系挖掘的生物信息学方法研究,也逐渐得到了学术界的广泛关注。
     然而,目前众多的基因转录调控的生物信息学研究,仍然存在以下多个问题:相当一部分工作是围绕着某一个具体的生理、病理问题,设计有针对性的湿实验,再对得到的实验数据进行挖掘,这类方法缺少推广性;而另外一些宏观层次的工作往往引入了复杂的模型,没有对基因表达谱数据本身特征或性质进行进一步的提取和挖掘;还有一些工作对转录调控关系元的序列特征进行了分析,但只考虑了已知的模式或特征结构,在数据挖掘方面具有一定的偏性;另外一些工作使用了基于微阵列或染色质免疫共沉淀芯片等数据所得到的转录调控关系强度,但由于这些数据本身敏感性的问题,具有较大误差。
     为了解决上述问题,本文基于基因微阵列表达谱及序列特征就转录调控关系挖掘方法展开研究,并取得了如下成果:
     首先,我们利用微阵列表达谱进行调控关系挖掘的新参数体系的构建。从微阵列所表示的基因表达水平出发,我们引入及提出包括表达谱相关性、动态变化范围,以及表达水平矢量等多种参数或方法,描述了转录调控关系元表达水平相似度、动态变化范围差异、统计性质差异、各条件下表达水平的一致性程度等特性,用以进行转录调控关系分析。结合转录因子与靶基因的功能共注释分析,衡量转录调控关系元功能一致性,提高模型预测的准确度。在此基础上,我们使用贝叶斯模型对几组参数进行整合,以获得转录调控关系的存在概率。同时,为增强预测的效能和可信度,我们提出了联合似然比来描述成对参数的性质。利用时序微阵列数据中所体现的扰动的时延特性,选取合适的参数,辅助判定转录调控关系方向性,从而得到了完整的转录调控关系,为准确构建基因调控网络打下基础。
     其次,我们提出了微阵列表达谱特征的无监督机器学习与优化方法。参数化的学习,固然可以给出直观的参数,便于后续的分析研究。但是将高维的微阵列数据进行参数化提取信息,可能会导致信息损失,或产生先入为主的偏性。另一方面,微阵列数据中包含的大量噪声也会对转录调控关系挖掘带来负面影响。有鉴于此,我们以无监督的机器学习降维算法,代替经验的参数选择,提取有代表性的表达量信息,并排除干扰信息影响,进行转录调控关系的挖掘。我们定义了转录调控关系对的表达模式参数,通过非负矩阵分解以及主成分分析来提取表达水平的主要特征,提高了转录调控关系预测的准确率。
     第三,我们提出调控关系元序列特征的无偏提取方法。受微阵列表达谱原理的局限,某些随条件或时序变化较小的基因所参与的转录调控关系难以通过分析微阵列表达谱数据而获得。因此,对转录调控关系元的序列特征进行考察是很有必要的。我们利用氨基酸序列的特征,结合数学降维算法,提取转录调控元的序列特征。结合先验知识,通过机器学习方法训练模型参数,提出寻找转录调控关系元的特征序列的无偏提取方法。同时我们还使用空间向量作为特征序列的数学表示方法,构建合适的模型,将序列特征与转录调控关系存在与否联系起来。结果表明基于序列进行转录调控关系挖掘是可行的。进一步的分析证明,不同的特征选取方法与聚类方法,对结果的影响不大。通过进一步改进特征提取方法,可以得到更为理想的预测准确度。总之,使用序列信息构建的向量空间模型可以较为有效地预测出转录调控关系的存在。该方法具有其重要性和可行性,与基于微阵列进行转录调控关系的方法可以互为补充和参照。
     不同于其它通过全局计算微阵列表达谱的基因调控网络构建方法,本文通过寻找多种参数,辅以其它生物学知识,挖掘转录调控关系元与其表达谱之间的联系,构建较为精细而准确的基因调控网络。并结合转录因子与靶基因无偏序列特征提取的方法,发展利用序列特征进行转录调控关系预测的新方法。最终,建立了一套结合不同数据源、利用多种策略进行转录调控关系挖掘的综合性方法。这套方法可以在一定程度上避免或者减少现有方法的不足,提高转录调控关系挖掘的灵敏性和覆盖度,从而促进对以基因调控网络为代表的生化网络乃至整个生物学系统的了解。全文研究内容层层递进,互为支撑。
     本文的主要创新点包括:利用微阵列表达谱进行转录调控关系挖掘的新参数体系的构建;微阵列表达谱特征的无监督机器学习与优化;转录调控关系元序列特征的无偏提取。几方面研究互相支持和补充,用于转录调控关系的预测和挖掘。此外,从方法学研究来说,本研究具有较强的通用性和可拓展性。同时,疾病的遗传学检验日益成为研究的热点,目前来看,微阵列是最适用于这一领域的分析手段。因此,我们所建立的这一系列快速、参数化的表达谱分析体系,将会对临床诊断中利用微阵列的基因型研究和分析有所帮助。
The properties of a biological system include system structures, system dynamics, control method and design method. Biological systems can be depicted as various biological networks, such as metabolic networks, signal transduction networks, regulatory networks, and so on.
     As one basic process of biological activity, gene regulation plays a dominant role in the biological system. By analyzing gene regulation via experimental and bioinformatic method, we could extract the structure features of a biological system. We can also identify the complex regulatory relationships, uncover the regulatory patterns in the cell, and gain the systematic view of the biological process by the gene regulatory network analyses.
     With the deeper development and broader application of the high-throughput techniques in the research of life science, microarray data emerges massively and rapidly, which makes the research on the gene regulatory network reconstruction become a hotspot.
     Many algorithms have been developed to construct gene regulatory networks based on microarray data. Unfortunately, most of these works focus on a specific biological or pathology problem by mining the precise wet-experiment data. Besides, intuitive parameters could not be produced by most models. One remaining problem is whether there are some simple but potential basic characteristics of microarray to be uncovered.
     Aiming to overcome these shortcomings, we integrated multiple parameters to characterize the expression profile features and combined them with other biological evidences. Meanwhile, we extracted sequence features of regulatory elements without using the prior knowledge. Combining several different evidences, we developed a new approach to predict the regulatory relationship.
     Our research is based on the model organism Saccharomyces Cerevisiae. The first step is to select features to measure expression profiles. Then we extract sequences features of the expression elements. Finally, a comprehensive method is constructed to infer the gene regulatory relationships, which expand our knowledge on biological system.
     Based on the expression correlation, the expression level variation and the vectors derived from microarray datasets, we first introduced several novel parameters to describe the characters of regulating gene pairs. Subsequently, we used the na?ve Bayesian network model to integrate these features and the functional co-annotation which lies between the transcription factors and their target genes. This model is proved to have higher efficacy than the previous individual feature model. With this model and based on the time-delay character of time-series microarray datasets, we can predict the accuracy and coverage of existence and direction of the regulatory relationship respectively. This helps to build an integrated prediction and evaluation system.
     Parametric approach has both pros and cons. A series of parameters may be intuitive indexes. However, information extraction may cause information loss or misleading. Besides, noise included in microarray may disturb the results. So we chose machine learning approach instead of manual selection. We introduced an expression pattern index FAB . With this index, we extracted the main features of expression level and excluded interference elements via Principle Component Analysis method. This approach is proved to be able to improve the accuracy of regulatory relationship prediction.
     Not all the essential genes can be detected by the knock-out or knock-down experiments because of the expression diversity. In this case, sequence features analysis should be considered. We used dimension reducing algorithm to extract sequence features of the regulatory elements. With the help of prior knowledge, we adopted support vector machine-based method to find the sequence feature of regulatory elements. The results show that it is feasible to mine regulatory relationships based on sequence feature. The accuracy is stable when the clustering methods and the clustering character are changed. And the parameters extracted from tensor analysis have also been verified to be acceptable. This approach might be a suitable complement to microarray-based approach.
     Unlike other global expression profiles computing methods, our approach is mainly based on several novel parameters, which could be intuitive indicators. Combining some prior knowledge, our approach could improve the accuracy of regulatory relationship mining. The regulatory element feature selection result shows its advantages on mining the regulatory relationship by using the sequence feature.
     To summarize, we firstly proposed a novel parametric approach to infer gene regulatory relationship from microarray datasets. Then we used machine learning method to extract expression feature and mine the regulatory relationship. Finally we developed a new strategy for gene regulatory relationship mining based on sequence features analysis, which can greatly improve the sensitivity and coverage of transcriptional regulatory mining.
     With the development of the microarray technology, our approaches are promising to bring more contribution to the regulatory network research as well as the genome type analysis in the clinical diagnosis.
引文
[1] Kitano, H., Systems biology: a brief overview. Science, 2002. 295(5560): 1662-4.
    [2]沈珝琲,方福德主编,真核基因表达调控.北京:高等教育出版社, 1996年2月第一版: 14-16, 182.
    [3] [德]柯利普等著,贺福初等译,系统生物学的理论、方法和应用.上海:复旦大学出版社, 2007年12月第一版: 195-197.
    [4] [美]本杰明?卢因编著,余龙等译,基因VIII.北京:科学出版社, 2005年2月第一版: 287, 339, 340.
    [5] Wyrick, J.J., Young, R.A., Deciphering gene expression regulatory networks. Curr Opin Genet Dev, 2002. 12(2): 130-6.
    [6] Babu, M.M., Computational approaches to study transcriptional regulation. Biochem Soc Trans, 2008. 36(Pt 4): 758-65.
    [7] Schulze, A., Downward, J., Navigating gene expression using microarrays?aa technology review. Nature Cell Biology, 2001. 3(8): E190-E195.
    [8] Aravind, L., Walker, D.R., Koonin, E.V., Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res, 1999. 27(5): 1223-42.
    [9] Egg, M., et al., Structural and bioinformatic analysis of the Roman snail Cd-Metallothionein gene uncovers molecular adaptation towards plasticity in coping with multifarious environmental stress. Mol Ecol, 2009. 18(11): 2426-43.
    [10] Luscombe, N.M., et al., Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004. 431(7006): 308-12.
    [11] Lemmens, K., et al., Inferring transcriptional modules from ChIP-chip, motif and microarray data. Genome biology, 2006. 7(5): R37.
    [12] Ren, B., et al., Genome-wide location and function of DNA binding proteins. Science's STKE, 2000. 290(5500): 2306.
    [13] Qian, J., et al., Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data. Bioinformatics, 2003. 19(15): 1917-26.
    [14] Zou, M., Conzen, S.D., A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics, 2005. 21(1): 71-9.
    [15] Ptitsyn, A.A., Zvonic, S., Gimble, J.M., Permutation test for periodicity in short time series data. BMC Bioinformatics, 2006. 7 Suppl 2: S10.
    [16] Chen, H.C., et al., Quantitative characterization of the transcriptional regulatory network in the yeast cell cycle. Bioinformatics, 2004. 20(12): 1914-27.
    [17] Eisen, M.B., et al., Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A, 1998. 95(25): 14863-8.
    [18] Hughes, T.R., et al., Functional discovery via a compendium of expression profiles. Cell, 2000. 102(1): 109-26.
    [19] Strand, A.D., et al., Conservation of Regional Gene Expression in Mouse and Human Brain. PLoS Genet, 2007. 3(4): e59.
    [20]李瑶主编,基因芯片数据分析与处理.北京:化学工业出版社, 2006年7月第一版: 3-4, 178-179.
    [21] [美]M?谢纳著,张亮等译,生物芯片分析.北京:科学出版社, 2004年10月第一版: 1-3.
    [22] Bhan, A., Galas, D.J., Dewey, T.G., A duplication growth model of gene expression networks. Bioinformatics, 2002. 18(11): 1486-93.
    [23] Altman, R., Raychaudhuri, S., Whole-genome expression analysis: challenges beyond clustering. Current Opinion in Structural Biology, 2001. 11(3): 340-347.
    [24] Wu, L., et al., Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. nature genetics, 2002. 31(3): 255-265.
    [25] Tavazoie, S., et al., Systematic determination of genetic network architecture. nature genetics, 1999. 22: 281-285.
    [26] Rawool, S.B., Venkatesh, K.V., Steady state approach to model gene regulatory networks--simulation of microarray experiments. Biosystems, 2007. 90(3): 636-55.
    [27] Yu, H., et al., Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet, 2003. 19(8): 422-7.
    [28] Lee, Y.S., et al., Transcriptome and protein domain analyses in Aplysia nervous system with evolutionary implications. Commun Integr Biol, 2009. 2(4): 321-3.
    [29] [英]特纳等著,刘进元等译,分子生物学(第二版).北京:科学出版社, 2001年9月第二版: 215-225.
    [30] Ward, L.D., Bussemaker, H.J., Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences. Bioinformatics, 2008. 24(13): i165-71.
    [31] Lee, S., Kohane, I., Kasif, S., Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes. BMC Genomics, 2005. 6: 168.
    [32]刘万霖等,基于微阵列数据构建基因调控网络.遗传, 2007. 29(12): 1434-1442.
    [1] Friedman, N., et al., Using Bayesian networks to analyze expression data. J Comput Biol, 2000. 7(3-4): 601-20.
    [2] Eisen, M.B., et al., Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A, 1998. 95(25): 14863-8.
    [3] Amato, R., et al., A multi-step approach to time series analysis and gene expression clustering. Bioinformatics, 2006. 22(5): 589-96.
    [4] Huang, S., Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery. J Mol Med, 1999. 77(6): 469-80.
    [5] Shmulevich, I., Dougherty, E.R., Zhang, W., Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics, 2002. 18(10): 1319-31.
    [6] Kauffman, S., et al., Random Boolean network models and the yeast transcriptional network. Proc Natl Acad Sci U S A, 2003. 100(25): 14796-9.
    [7] Kim, H., Lee, J.K., Park, T., Boolean networks using the chi-square test for inferring large-scale gene regulatory networks. BMC Bioinformatics, 2007. 8: 37.
    [8] Pe'er, D., et al., Inferring subnetworks from perturbed expression profiles. Bioinformatics, 2001. 17 Suppl 1: S215-24.
    [9] Yu, J., et al., Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 2004. 20(18): 3594-603.
    [10] Ong, I.M., Glasner, J.D., Page, D., Modelling regulatory pathways in E. coli from time series expression profiles. Bioinformatics, 2002. 18 Suppl 1: S241-8.
    [11] Husmeier, D., Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics, 2003. 19(17): 2271-82.
    [12] Zou, M., Conzen, S.D., A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics, 2005. 21(1): 71-9.
    [13] Dojer, N., et al., Applying dynamic Bayesian networks to perturbed gene expression data. BMC Bioinformatics, 2006. 7: 249.
    [14] Lahdesmaki, H., et al., Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks. Signal Processing, 2006. 86(4): 814-834.
    [15] Steuer, R., et al., The mutual information: detecting and evaluating dependencies between variables. Bioinformatics, 2002. 18 Suppl 2: S231-40.
    [16] Margolin, A.A., et al., ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 2006. 7 Suppl 1: S7.
    [17] Butte, A.J., Kohane, I.S., Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput, 2000: 418-29.
    [18] Basso, K., et al., Reverse engineering of regulatory networks in human B cells. Nat Genet, 2005. 37(4): 382-90.
    [19] Kim, J.M., et al., A copula method for modeling directional dependence of genes. BMC Bioinformatics, 2008. 9: 225.
    [20] Luo, W., Hankenson, K.D., Woolf, P.J., Learning Transcriptional Regulatory Networks from High Throughput Gene Expression Data Using Continuous Three-Way Mutual Information. BMC Bioinformatics, 2008. 9(1): 467.
    [21] Vu, T.T., Vohradsky, J., Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae. Nucleic Acids Res, 2007. 35(1): 279-87.
    [22] Cao, J., Zhao, H., Estimating dynamic models for gene regulation networks. Bioinformatics, 2008. 24(14): 1619-24.
    [23] Chen, K.C., et al., A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics, 2005. 21(12): 2883-90.
    [24] Bansal, M., Gatta, G.D., di Bernardo, D., Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics, 2006. 22(7): 815-22.
    [25]刘万霖等,基于微阵列数据构建基因调控网络.遗传, 2007. 29(12): 1434-1442.
    [26] Ashburner, M., et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): 25-9.
    [27] Witten, I., Frank, E., Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Pub, 2005.
    [28] Frank, E., et al., Weka: A machine learning workbench for data mining. Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, 2005: 1305-1314.
    [29] Luscombe, N.M., et al., Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004. 431(7006): 308-12.
    [30] Lee, T.I., et al., Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 2002. 298(5594): 799-804.
    [31] Barrett, T., et al., NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res, 2005. 33(Database issue): D562-6.
    [32] Edgar, R., Barrett, T., NCBI GEO standards and services for microarray data. Nat Biotechnol, 2006. 24(12): 1471-2.
    [33] Barrett, T., et al., NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res, 2007. 35(Database issue): D760-5.
    [34] Barrett, T., et al., NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res, 2009. 37(Database issue): D885-90.
    [35] Gasch, A., et al., Genomic expression programs in the response of yeast cells to environmental changes. Molecular biology of the cell, 2000. 11(12): 4241.
    [36] Spellman, P.T., et al., Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell, 1998. 9(12): 3273-97.
    [37] Zhu, G., et al., Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth. Nature, 2000. 406(6791): 90-94.
    [38] Shapira, M., Segal, E., Botstein, D., Disruption of yeast forkhead-associated cell cycle transcription by oxidative stress. Molecular biology of the cell, 2004. 15(12): 5659.
    [39] Gasch, A., et al., Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. Molecular biology of the cell, 2001. 12(10): 2987.
    [40] Shi, Y., Mitchell, T., Bar-Joseph, Z., Inferring pairwise regulatory relationships from multiple time series datasets. Bioinformatics, 2007. 23(6): 755-63.
    [1] Li, D., et al., PRINCESS, a protein interaction confidence evaluation system with multiple data sources. Mol Cell Proteomics, 2008. 7(6): 1043-52.
    [2] Friedman, N., et al., Using Bayesian networks to analyze expression data. J Comput Biol, 2000. 7(3-4): 601-20.
    [3] Pe'er, D., et al., Inferring subnetworks from perturbed expression profiles. Bioinformatics, 2001. 17 Suppl 1: S215-24.
    [4] Rhodes, D.R., et al., Probabilistic model of the human protein-protein interaction network. Nat Biotechnol, 2005. 23(8): 951-9.
    [5] Eddy, S.R., What is Bayesian statistics? Nat Biotechnol, 2004. 22(9): 1177-8.
    [6]周开利,康耀红,神经网络模型及其MATLAB仿真程序设计.北京:清华大学出版社, 2005年7月第一版: 10-13.
    [7] Amato, R., et al., A multi-step approach to time series analysis and gene expression clustering. Bioinformatics, 2006. 22(5): 589-96.
    [8] Kato, T., et al., Network-based de-noising improves prediction from microarray data. BMC Bioinformatics, 2006. 7 Suppl 1: S4.
    [9]贲可荣,张彦铎,人工智能.北京:清华大学出版社, 2006年3月第一版: 294-305.
    [10]肖健华,智能模式识别方法.广州:华南理工大学出版社, 2006年1月第一版: 38-40.
    [11] Baldi, P., et al., Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 2000. 16(5): 412-24.
    [12] Luscombe, N.M., et al., Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004. 431(7006): 308-12.
    [13] Lee, T.I., et al., Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 2002. 298(5594): 799-804.
    [14] Barrett, T., et al., NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res, 2005. 33(Database issue): D562-6.
    [15] Edgar, R., Barrett, T., NCBI GEO standards and services for microarray data. Nat Biotechnol, 2006. 24(12): 1471-2.
    [16] Barrett, T., et al., NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res, 2007. 35(Database issue): D760-5.
    [17] Barrett, T., et al., NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res, 2009. 37(Database issue): D885-90.
    [18]李瑶主编,基因芯片数据分析与处理.北京:化学工业出版社, 2006年7月第一版: 3-4, 178-179.
    [1] Edgar, R., Barrett, T., NCBI GEO standards and services for microarray data. Nat Biotechnol, 2006. 24(12): 1471-2.
    [2] Gollub, J., et al., The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res, 2003. 31(1): 94-6.
    [3] Brazma, A., et al., ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Res, 2003. 31(1): 68-71.
    [4] Wingender, E., et al., TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res, 1996. 24(1): 238-41.
    [5] Wingender, E., et al., TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res, 2000. 28(1): 316-9.
    [6] Barrett, T., et al., NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res, 2005. 33(Database issue): D562-6.
    [7] Barrett, T., et al., NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res, 2007. 35(Database issue): D760-5.
    [8] Barrett, T., et al., NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res, 2009. 37(Database issue): D885-90.
    [9] Edgar, R., Domrachev, M., Lash, A.E., Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res, 2002. 30(1): 207-10.
    [10] Barrett, T., Edgar, R., Mining microarray data at NCBI's Gene Expression Omnibus (GEO)*. Methods Mol Biol, 2006. 338: 175-90.
    [11] Yeung, K., et al., Model-based clustering and data transformations for gene expression data. Bioinformatics, 2001. 17(10): 977.
    [12] MacQueen, J., Le Cam, L., Neyman, J., Proceedings of 5th Berkeley symposium on mathematical statistics and probability. 1967.
    [13] Janes, K., Yaffe, M., Data-driven modelling of signal-transduction networks. Nature Reviews Molecular Cell Biology, 2006. 7(11): 820-828.
    [14] Huang, D., Zheng, C., Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics, 2006. 22(15): 1855.
    [15] Sandberg, R., Ernberg, I., Assessment of tumor characteristic gene expression in cell lines using a tissue similarity index (TSI). Proceedings of the National Academy of Sciences of the United States of America, 2005. 102(6): 2052.
    [16] Lee, D., Seung, H., Learning the parts of objects by non-negative matrix factorization. Nature, 1999. 401(6755): 788-791.
    [17] Janes, K., et al., A systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis. Science, 2005. 310(5754): 1646.
    [18]李瑶主编,基因芯片数据分析与处理.北京:化学工业出版社, 2006年7月第一版: 3-4, 178-179.
    [19]周志华等主编,机器学习及其应用2007.北京:清华大学出版社, 2007年10月第一版: 49-58, 80-81.
    [20]肖健华,智能模式识别方法.广州:华南理工大学出版社, 2006年1月第一版: 150-155.
    [21] Platt, J. Fast training of support vector machines using sequential minimal optimization. 1999: MIT press.
    [22] Witten, I., Frank, E., Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Pub, 2005.
    [23] Soong, T.T., Wrzeszczynski, K.O., Rost, B., Physical protein-protein interactions predicted from microarrays. Bioinformatics, 2008. 24(22): 2608-14.
    [24] Guan, Q., et al., Impact of nonsense-mediated mRNA decay on the global expression profile of budding yeast. PLoS Genet, 2006. 2(11): e203.
    [25] Luscombe, N.M., et al., Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004. 431(7006): 308-12.
    [1] Bansal, M., et al., How to infer gene networks from expression profiles. Mol Syst Biol, 2007. 3: 78.
    [2] Mnaimneh, S., et al., Exploration of essential gene functions via titratable promoter alleles. Cell, 2004. 118(1): 31-44.
    [3] Lee, S., Kohane, I., Kasif, S., Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes. BMC Genomics, 2005. 6: 168.
    [4] Zhang, J., et al., Deep sequencing of Brachypodium small RNAs at the global genome level identifies microRNAs involved in cold stress response. BMC Genomics, 2009. 10: 449.
    [5] Vize, P.D., Transcriptome analysis of the circadian regulatory network in the coral Acropora millepora. Biol Bull, 2009. 216(2): 131-7.
    [6] Vallania, F., et al., Genome-wide discovery of functional transcription factor binding sites by comparative genomics: the case of Stat3. Proc Natl Acad Sci U S A, 2009. 106(13): 5117-22.
    [7] Tian, F., et al., Flynet: a genomic resource for Drosophila melanogaster transcriptional regulatory networks. Bioinformatics, 2009. 25(22): 3001-4.
    [8] Lee, Y.S., et al., Transcriptome and protein domain analyses in Aplysia nervous system with evolutionary implications. Commun Integr Biol, 2009. 2(4): 321-3.
    [9] [英]斯特罗恩等编著,孙开来主译,人类分子遗传学.北京:科学出版社, 2007年1月第一版: 323-324.
    [10] [英]特纳等著,刘进元等译,分子生物学(第二版).北京:科学出版社, 2001年9月第二版: 215-225.
    [11] Airoldi, E.M., et al., Predicting cellular growth from gene expression signatures. PLoS Comput Biol, 2009. 5(1): e1000257.
    [12] Chen, X., et al., Learning position weight matrices from sequence and expression data. Comput Syst Bioinformatics Conf, 2007. 6: 249-60.
    [13] Daenen, F., van Roy, F., De Bleser, P.J., Low nucleosome occupancy is encoded around functional human transcription factor binding sites. BMC Genomics, 2008. 9: 332.
    [14] Rach, E.A., et al., Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome. Genome Biol, 2009. 10(7): R73.
    [15] Chitsaz, H., et al., A partition function algorithm for interacting nucleic acid strands. Bioinformatics, 2009. 25(12): i365-73.
    [16] Marchais, A., et al., Single-pass classification of all noncoding sequences in a bacterial genome using phylogenetic profiles. Genome Res, 2009. 19(6): 1084-92.
    [17] Morita, K., et al., Genome-wide searching with base-pairing kernel functions for noncoding RNAs: computational and expression analysis of snoRNA families in Caenorhabditis elegans. Nucleic Acids Res, 2009. 37(3): 999-1009.
    [18] Bi, C., Leeder, J.S., Vyhlidal, C.A., A comparative study on computational two-block motif detection: algorithms and applications. Mol Pharm, 2008. 5(1): 3-16.
    [19] Abeel, T., et al., ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics, 2008. 24(13): i24-31.
    [20] Alleyne, T.M., et al., Predicting the Binding Preference of Transcription Factors to Individual DNA k-mers. Bioinformatics, 2008.
    [21] Angarica, V.E., et al., Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics, 2008. 9: 436.
    [22] Chorley, B.N., et al., Discovery and verification of functional single nucleotide polymorphisms in regulatory genomic regions: current and developing technologies. Mutat Res, 2008. 659(1-2): 147-57.
    [23] Bernard, C., et al., New indicators of beef sensory quality revealed by expression of specific genes. J Agric Food Chem, 2007. 55(13): 5229-37.
    [24] Ellegren, H., Comparative genomics and the study of evolution by natural selection. Mol Ecol, 2008. 17(21): 4586-96.
    [25] Kellis, M., et al., Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature, 2003. 423(6937): 241-54.
    [26] Kurmangaliyev, Y.Z., Gelfand, M.S., Computational analysis of splicing errors and mutations in human transcripts. BMC Genomics, 2008. 9: 13.
    [27] Hon, G., Ren, B., Wang, W., ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome. PLoS Comput Biol, 2008. 4(10): e1000201.
    [28] Beer, M.A., Tavazoie, S., Predicting gene expression from sequence. Cell, 2004. 117(2): 185-98.
    [29] Aparicio, O., et al., Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences in vivo. Curr Protoc Mol Biol, 2005. Chapter 21: Unit 21 3.
    [30] Zhou, Q., Liu, J.S., Extracting sequence features to predict protein-DNA interactions: a comparative study. Nucleic Acids Res, 2008. 36(12): 4137-48.
    [31] Chen, L., Zheng, S., Studying alternative splicing regulatory networks through partial correlation analysis. Genome Biol, 2009. 10(1): R3.
    [32] Chen, X., et al., W-AlignACE: an improved Gibbs sampling algorithm based on more accurate position weight matrices learned from sequence and gene expression/ChIP-chip data. Bioinformatics, 2008. 24(9): 1121-8.
    [33] Wingender, E., et al., TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res, 1996. 24(1): 238-41.
    [34] Wingender, E., et al., TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res, 2000. 28(1): 316-9.
    [35] Matys, V., et al., TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res, 2003. 31(1): 374-8.
    [36] Teixeira, M.C., et al., The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Nucleic Acids Res, 2006. 34(Database issue): D446-51.
    [37] Dreizler, R., Gross, E., Density Functional Theory. Plenum Press, New York, 1995.
    [38] Kawashima, S., Ogata, H., Kanehisa, M., AAindex: Amino Acid Index Database. Nucleic Acids Res, 1999. 27(1): 368-9.
    [39] Kawashima, S., Kanehisa, M., AAindex: amino acid index database. Nucleic Acids Res, 2000. 28(1): 374.
    [40] Kawashima, S., et al., AAindex: amino acid index database, progress report 2008. Nucleic Acids Res, 2008. 36(Database issue): D202-5.
    [1] Shendure, J., The beginning of the end for microarrays? Nat Methods, 2008. 5(7): 585-7.
    [2] Yaffe, M., Beyond Microarrays. Science's STKE, 2008. 1(51): eg11.
    [3] Berninger, P., et al., Computational analysis of small RNA cloning data. Methods, 2008. 44(1): 13-21.
    [4] Zhang, J., et al., Deep sequencing of Brachypodium small RNAs at the global genome level identifies microRNAs involved in cold stress response. BMC Genomics, 2009. 10: 449.
    [5] Wederell, E.D., et al., Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing. Nucleic Acids Res, 2008. 36(14): 4549-64.
    [6] Ledford, H., The death of microarrays? Nature, 2008. 455(7215): 847.
    [7] Meissner, A., et al., Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature, 2008. 454(7205): 766-70.
    [8] Giresi, P.G., Lieb, J.D., Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). Methods, 2009. 48(3): 233-9.
    [9] Degner, J.F., et al., Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics, 2009.
    [10] Auffray, C., Chen, Z., Hood, L., Systems medicine: the future of medical genomics and healthcare. Genome Med, 2009. 1(1): 2.
    [11] Aerts, S., Butland, S., Sequencing the regulatory genome. Genome Biol, 2008. 9(6): 313.
    [12] Jothi, R., et al., Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res, 2008. 36(16): 5221-31.
    [13] Valouev, A., et al., Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods, 2008. 5(9): 829-34.
    [14] Eeckhoute, J., Lupien, M., Brown, M., Combining chromatin immunoprecipitation and oligonucleotide tiling arrays (ChIP-Chip) for functional genomic studies. Methods Mol Biol, 2009. 556: 155-64.
    [15] Mathur, D., et al., Analysis of the mouse embryonic stem cell regulatory networks obtained by ChIP-chip and ChIP-PET. Genome Biol, 2008. 9(8): R126.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700