肿瘤基因芯片表达数据分析相关问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着“肿瘤基因组计划”的进行,基因芯片技术在肿瘤研究中得到了广泛的应用。肿瘤基因芯片能够为肿瘤基因组研究提供大量的转录水平上的基因表达数据。这些数据反映了基因在不同组织细胞的不同生长发育阶段或不同生理状态中表达水平的变化。相应的数据分析技术使得从基因组水平上揭示肿瘤的本质成为可能,为肿瘤相关基因的研究提供了一种全新的、系统的研究方法,并在肿瘤临床诊断与治疗等领域备受关注。目前,人们已经确认了一些与肿瘤发生发展相关的基因,并对其功能和调控机制有了一定的了解,积累了一些相关知识。但是,这些研究成果对于绘制肿瘤基因组图谱,攻克肿瘤还是远远不够的。因此,如何对肿瘤基因芯片表达数据进行有效地分析,以及如何利用已有知识作为辅助对这些数据进行有效地分析,从而找出与肿瘤相关的基因并确定其功能及调控机制,已经成为肿瘤基因组学研究中亟待解决的问题。在这一背景下,本文以肿瘤基因芯片表达数据分析为主题,围绕肿瘤基因表达数据的预处理、聚类分析以及基因表达调控网络的构建三方面问题进行了深入分析和研究,其主要内容和创新之处包括:
     (1)缺失值估计方法和标准化方法研究。在对缺失值估计方法的研究中发现,基因表达数据间的相似性对缺失值估计的精度有很大影响,而且用来估计缺失值的完全基因的表达数据在空间中的分布规律是估计缺失值一个很好的依据。因此,本文提出了一种基于KNN-SVR (K-nearest Neighbor and Support Vector Regression, KNN-SVR)的缺失值估计方法。该方法以与目标基因具有较高相似性的完全基因子集为训练集使用SVR算法建立回归模型对缺失值进行估计,提高了估计的精确性和稳定性。在对肿瘤基因表达谱分类诊断和分型识别的研究中发现,用当前的标准化方法处理后的数据进行分析会引起类型偏倚,导致样本的错误分类。因此,本文对标准化方法进行了扩展,利用类别信息进行标准化处理,使表达数据更适用于肿瘤基因表达谱分类诊断和分型识别的分析。
     (2)肿瘤基因芯片时序表达数据的聚类方法研究。针对基因间普遍存在的异步调控和局部调控关系,本文以细胞周期的基因表达数据为研究对象,提出了局部最大相关系数的概念,定义了基因间的相关关系;然后给出了在对异步调控和局部调控的识别中设定最大时延范围和局部相关的最短样本长度应遵循的规律;最后在局部最大相关系数的基础上对K均值算法进行了改进,提出了一种基于局部最大相关系数的聚类方法。该方法的核心是局部最大相关系数,它能够在不破坏基因表达数据间整体相关性的基础上很好地识别出表达数据间的局部和异步相关性,为功能相似的基因和共调控基因的聚类提供了一种更为有效的相似性测度。
     (3)肿瘤基因芯片非时序表达数据的聚类方法研究。为了消除非时序表达数据中的噪声并识别弱差异表达基因,本文提出了降噪CICA(Constrained Inde-pendent Component Analysis, CICA)模型并对肿瘤基因的非时序表达数据进行聚类。基于降噪CICA模型的聚类方法主要包括两部分:首先使用Ljung-Box Q统计量作为对“白”特性的约束,以高斯性最强为目标,抽取出一个高斯白噪声对表达数据降噪;然后用CICA对降噪后的基因表达数据聚类,其中,以待研究的基因的表达水平为约束,以非高斯性最强为目标,分离出相关的生物过程或功能类。该方法能够在降噪的同时较好地保持基因表达数据的细节信息,实现了对基因表达数据的降噪,提高了对弱差异表达基因的识别能力。
     (4)基因表达调控网络构建方法研究。本文首先针对基因表达调控的多时延特性,建立了N阶动态贝叶斯网络模型;然后针对仅从基因表达数据中不能得到理想的调控网络的问题,在N阶动态贝叶斯网络的基础上,提出了一种结合多源先验信息的多时延基因表达调控网络构建方法。该方法根据多源先验信息的特点将其转换为不同分布的网络结构先验概率,并与基因芯片时序表达数据相结合,通过马尔可夫链蒙特卡罗法(Markov Chain Monte Carlo, MCMC)学习N阶动态贝叶斯网络的结构。该方法还在表达数据与先验信息相互独立的基础上,在MCMC学习过程中将网络结构接受概率分解计算,灵活地实现了基因表达数据和多源先验信息的融合,从而达到共同学习调控网络的目的。结合多源先验信息的多时延基因表达调控网络构建方法不但对基因间的多时延调控关系具有很好的识别能力,而且降低了数据噪声的影响。
With the development of Tumor Genomic Project, DNA microarray is widely used in tumor research. Tumor DNA microarray can provide a great number of gene expres-sion data for tumor genomic research, which reflects the fluctuation of gene expression level in different development stage or physiological state of different tissue cells. Be-cause of the capability of uncovering the nature of tumor on the genomic level and pro-viding a kind of new systematic method, the analysis of tumor gene expression data has got great attention. At present, researchers have confirmed some tumor genes and ac-cumulated some knowledge relative to oncogenesis and the regulation mechanism of tumor genes. But these achievements are too little to understand and cure tumor. Thus how to effectively analyze tumor gene expression data has become a problem which must be solved as soon as possible. So taking tumor DNA microarray expression data analysis as the research topic, this dissertation refers to studies on relative preprocessing techniques, cluster analysis algorithms and gene regulation networks modeling methods. The main contents and creative contributions of the dissertation are summarized as fol-lows:
     (1) The research on methods for missing value estimation and normalization of gene expression data. For the missing value estimation problem, we found that the similarity between gene expression data influences the estimation precision, and the di-mensional distribution of the gene expression data without missing values is a favorable reference to the estimation of missing values. So this dissertation presents a new miss-ing value estimation method based on K-nearest Neighbor and Support Vector Regres-sion (KNN-SVR). This algorithm takes genes without missing values and much similar to genes whose missing values are to be estimated as the training sets, and establishes regressive models through SVR to estimate missing values. This algorithm has better accuracy and stability. In the classification and class discovery of tumor gene expres-sion data, the current normalization methods are likely to make the samples be classified incorrectly. So this dissertation recomposes the normalization methods and uses class information to normalize gene expression data, which makes gene expression data more suitable to the analysis of the classification and class discovery of tumor gene expres-sion data.
     (2) The research on methods for gene cluster analysis of tumor time series mi-croarray data. In order to identify the asynchronous or local correlation in expression profile, this dissertation presents the concept of Local Maximum Correlative Coefficient (LMCC) and defines the correlative relationship between genes. And then the rules of setting maximum time delay and minimum local time segment are studied. Lastly, this dissertation presents a new clustering method which uses LMCC as the similarity measure of K-means method and makes some corresponding improvements. This method can identify the asynchronous or local correlation preferable and LMCC can provide a more effective measure for similarity.
     (3) The research on methods for gene cluster analysis of tumor non-time series mi-croarray data. In order to eliminate noise and identify genes with unobviously differen-tial expression in microarray data, this dissertation presents the model of Constrained Independent Component Analysis (CICA) with decreasing noise (deCICA) and uses this model to cluster tumor non-time series microarray data. The clustering method based on deCICA model includes two parts. Firstly, this method extracts a Gaussian white noise to eliminate the noise in gene expression data, in which the statistic of Ljung-Box Q is used as the constraint to the‘white’character and gaussianity maximi-zation is used as the object. Secondly, this method uses CICA model to cluster the de-noised gene expression data, in which the expression data of target genes are used as the constraint to the relative biological processes or functional clusters and nongaussianity maximization is used as the object. Because of the capability of eliminating noise partly and retaining the specific information in expression data, this method can identify genes with unobviously differential expression effectively.
     (4) The research on methods for constructing gene regulatory networks. This dis-sertation first builds the N-order Dynamic Bayesian Network (N-DBN) to model the multi-time delay in gene regulation, and then presents a new method for constructing multi-time delay gene regulatory network using N-DBN by combining expression data with multiple independent sources of prior knowledge (N-DBN-MP). In order to com-bining with time series microarray data, this method transforms multiple independent sources of prior knowledge into different prior probability distributions according to their characteristic, and uses Markov Chain Monte Carlo (MCMC) algorithm to learn the network structure of N-DBN. During the MCMC learning, the acceptance probabil-ity of network structure is decomposed on the basis of the hypothesis that microarray data is independent with prior knowledge, which realizes the fusion of microarray data and prior knowledge. N-DBN-MP can not only effectively identify the regulation rela-tionships between genes, but also reduce the affect of noise in microarray data.
引文
[1] Collins S, Morgan M, Patrinos A. The Human Genome Project: Lessons from large-scale biology[J]. Science. 2003, 300(5617): 286-290.
    [2] International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome[J]. Nature. 2004, 431(7011): 931-945.
    [3] Mouse Genome Sequence Consortium. Initial sequencing and comparative analy-sis of the mouse genome[J]. Nature. 2002, 420(6915): 520-562.
    [4] Aparicio S, Chapman J, Stupka E, et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes[J]. Science. 2002, 297(5585): 1301-1310.
    [5] Rat Genome Sequencing Project Consortium. Genome sequence of the Brown Norway rat yields insights into mammalian evolution[J]. Nature. 2004, 428(6982): 493-521.
    [6] International Chicken Genome Sequencing Consortium. Sequence and compara-tive analysis of the chicken genome provide unique perspectives on vertebrate evolution[J]. Nature. 2004, 432(7018): 695-716.
    [7]李瑶主编.基因芯片与功能基因组[M].北京:化学工业出版社, 2004.
    [8]陈忠斌主编.生物芯片技术[M].北京:化学工业出版社, 2005.
    [9]孙之荣等译.后基因组信息学[M].北京:清华大学出版社, 2002.
    [10]孙之荣等译.探索基因组学、蛋白质组学和生物信息学[M].北京:科学出版社, 2004.
    [11] Marshall A, Hodgson J. DNA chips: An array of possibilities[J]. Nature Biotech-nology. 1998, 16(1): 731-738.
    [12] Ramsay R. DNA chips: State-of-the-art[J]. Nature Biotechnology. 1998, 16(1): 40-49.
    [13]李瑶主编.基因芯片数据分析与处理[M].北京:化学工业出版社, 2006.
    [14]马立人等.生物芯片[M].北京:化学工业出版社, 2002.
    [15] The Wellcome Trust Sanger Institute. The cancer genome project[EB/OL]. 2006. http://www.sanger.ac.uk/genetics/CGP/.
    [16]中华人民共和国科学技术部.国家863计划"功能基因组与蛋白质组"重大项目[EB/OL]. 2006. http://program.most.gov.cn/.
    [17]中华人民共和国科学技术部.国家重大科学研究计划"多基因遗传性肿瘤多阶段发病过程转录组学规律及其分子机制研究"[EB/OL]. 2007. http://program.most.gov.cn/.
    [18]王红强.应用于基因选择与癌症分类的微阵列数据分析[D].合肥:中国科学技术大学, 2005.
    [19]李霞.复杂疾病特征基因挖掘的方法研究[D].哈尔滨:哈尔滨工业大学, 2004.
    [20]王亚平.肿瘤的基因诊断[EB/OL]. 2006. http://www.sdccl.com.cn/ltxxlr.asp?id =1124.
    [21]曹亚.肿瘤分子生物学研究进展[J].国外医学(生理、病理科学与临床分册). 2005, 25(1): 4-7.
    [22]洪靖君,何祥火,杨劲松.基因芯片技术及其在肿瘤基因组学中的应用[J].生物技术通讯. 2002, 13(2): 11-13.
    [23]闫实,韩金祥.生物芯片技术及其在肿瘤研究的应用进展[J].肿瘤防治杂志. 2002, 9(4): 473-477.
    [24]金钢,应康,景在平.基因芯片在肿瘤研究中的应用[J].第二军医大学学报. 2000, 21(9): 831-834.
    [25]中国癌症信息库.恶性肿瘤生长的分子机制概述---多阶段基因突变学说[EB/OL]. 2002. http://www.bufotanine.com/genedatabase/contents/1031214445. html.
    [26] Fodor S P, Rava R P, Huang X C, et al. Multiplexed biochemical assays with biological chips[J]. Nature. 1993, 364(6437): 555-556.
    [27] Schena M, Shalon D, Davis R W, et al. Quantitative monitoring of gene expres-sion patterns with a complementary microarray[J]. Science. 1995, 270(5235): 467-470.
    [28] Iyer V R, Eisen M B, Ross D T, et al. The transcription program in the response of human fibroblasts to serum[J]. Science. 1999, 283(53985): 83-87.
    [29] Imothy G T, Alok J S, Cora A S, et al. Ploidy regulation of gene expression[J]. Science. 1999, 285(5425): 251-254.
    [30] Berns A. Cancer: Gene expression in diagnosis[J]. Nature. 2000, 403(6769): 491-492.
    [31] 37℃医学网.分子生物学理论[EB/OL]. 2006. http://www.37c.com.cn/topic/004 /theory/default.asp.
    [32]绍兴文理学院生物科学系.细胞生物学教程[EB/OL]. 2005. http://www.cella. cn/book/.
    [33]邱浪波.基因芯片表达数据分析相关问题研究[D].长沙:国防科学技术大学, 2007.
    [34]孙啸,陆祖宏,谢建明.生物信息学基础[M].北京:清华大学出版社, 2005.
    [35]杨春梅,万伯坤,高晓峰.基因表达聚类分析技术的现状与发展[J].生物化学与生物物理进展. 2003, 30(6): 974-979.
    [36] Herrero J, Díaz-uriarteR, Dopazo J. Gene expression data preprocessing[J]. Bio-informatics. 2003, 19(5): 655-656.
    [37] Kazumi H, Masahiro O, Taizo H. Novel technique for preprocessing high dimen-sional time-course data from DNA microarray: Mathematical model-based clus-tering[J]. Bioinformatics. 2006, 22(7): 843-848.
    [38] Ilk H G, Konu O, Ozdag H. Investigation and comparison of the preprocessing algorithms for microarray analysis for robust gene expression calculation and performance analysis of technical replicates[C]. Proceedings of the IEEE 14th Signal Processing and Communications Applications Conference. 2006: 1-4.
    [39]吴斌,沈自尹.基因芯片基因表达数据的预处理分析[J].中国生物化学与分子生物学报. 2006, 22(4): 272-277.
    [40] Olga T, Michael C, Sherlock G, et al. Missing value estimation methods for DNA microarrays[J]. Bioinformatics. 2001, 17(6): 520-525.
    [41] Oba S, Sato M, Takemasa I, et al. A Bayesian missing value estimation method for gene expression profile data[J]. Bioinformatics. 2003, 19(16): 2088-2096.
    [42] Hyunsoo K, Gene H G, Haesun P. Missing value estimation for DNA microarray gene expression data: Local least squares imputation[J]. Bioinformatics. 2005, 21(2): 187-198.
    [43] Wang X, Li A, Jiang Z H, et al. Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme[J]. BMC Bioinformatics. 2006, 7(32).
    [44]贺宪民,贺佳, Xiang Z Y.基因芯片数据的标准化及分析方法[J].中国卫生统计. 2004, 21(2): 122-127.
    [45]李存华,潘祝山.聚类分析技术与基因数据知识发现[J].淮海工学院学报. 2002, 11(3): 20-23.
    [46] Amir B, Friedman N, Yakhini Z. Class discovery in gene expression data[C]. Proceedings of the Fifth Annual Conference on Computational Biology (RE-COMB). ACM Press, 2001: 31-38.
    [47] Jiang D, Tang C, Zhang A. Cluster analysis for gene expression data: A survey[J]. IEEE Transactions on Knowledge and Data Engineering. 2004, 16(11): 1370-1386.
    [48]杨春梅,万柏坤,高晓峰.基因聚类分析中数据预处理方式和相似度的选择[J].自然科学进展. 2006, 16(3): 293-299.
    [49]李杰,唐降龙,王亚东,等.基因表达谱聚类\分类技术研究及展望[J].生物工程学报. 2005, 21(4): 667-673.
    [50] Alon U, Barkai N, Notterman D A, et al. Broad patterns of gene expression re-vealed by clustering analysis of tumor and normal colon tissues probed by oli-gonucleotide arrays[J]. Proc. Nat. Acad. Sci. USA. 1999, 96: 6745-6750.
    [51] Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science. 1999, 286(15): 531-537.
    [52]徐连彬.基于基因表达谱的疾病亚型特征基因挖掘算法的研究[D].哈尔滨:哈尔滨工业大学, 2005.
    [53] Isabelle G, Jason W, Stephen B, et al. Gene selection for cancer classification us-ing support vector machines[J]. Machine Learning. 2000, 46(13): 389-422.
    [54] Wright G, Tan B, Rosenwald A, et al. A gene expression-based method to diag-nose clinically distinct subgroups of diffuse large B cell lymphoma[J]. Proc. Nat. Acad. Sci. USA. 2003, 100(17): 9991-9996.
    [55] Hyuk C, Inderjit S D. Coclustering of human cancer microarrays using minimum sum-squared residue coclustering[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2008, 5(3): 385-400.
    [56] Eisen M B, Spellman P T, Brown P O, et al. Cluster analysis and display of ge-nome-wide expression patterns[J]. Proc. Nat. Acad. Sci. USA. 1998, 95: 14863-14868.
    [57] Tavazoie S, Hughes J D, Campbell M J, et al. Systematic determination of genetic network architecture[J]. Nature Genetics. 1999, 22(7): 281-285.
    [58] Tamayo P, Solni D, Mesirov J, et al. Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation[J]. Proc. Nat. Acad. Sci. USA. 1999, 96: 2907-2912.
    [59] Li C W, Chu Y H, Chen B S. Construction and clarification of dynamic gene regulatory network of cancer cell cycle via microarray data[J]. Cancer Informat-ics. 2006(2): 223-241.
    [60] Li X, Rao S Q, Jiang W. Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling[J]. BMC Bioinformatics. 2006, 7(26).
    [61] Whitfield M L, Sherlock G, Saldanha A J, et al. Identification of genes periodi-cally expressed in the human cell cycle and their expression in tumors[J]. Mo-lecular Biology of the Cell. 2002, 13(6): 1977-2000.
    [62]韩凤君.利用遗传算法改进基于哈希树的关联规则挖掘共调控基因[D].成都:电子科技大学, 2007.
    [63]印莹,赵宇海,张斌.时序微阵列数据中的同步和异步共调控基因聚类[J].计算机学报. 2007, 30(8): 1302-1314.
    [64] Yeung K Y, Medvedovic M, Bumgarner R E. From co-expression to co-regulation: How many microarray experiments do we need?[J]. Genome Bi-ology. 2004, 5(7): 481-4811.
    [65] M?ller-levet, C S. Clustering of gene expression time-series data[D]. Manchester,UK: The University of Manchester, 2003.
    [66] Pawel M. Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes[J]. Genomics. 2008, 91: 243-248.
    [67] Son Y S, Baek J. A modified correlation coefficient based similarity measure for clustering time-course gene expression data[J]. Pattern Recognition Letters. 2008, 29: 232-242.
    [68] Mcadams H H, Arkin A. Tochastic mechanisms in gene expression[J]. Proc. Nat. Acad. Sci. USA. 1997, 94: 814-819.
    [69] Savageau M A. Rules for the evolution of gene circuitry[C]. Pacific Symposium on Biocomputing. 1998: 55-65.
    [70] Thieffry D, Thomas R. Qualitative analysis of gene networks[C]. Pacific Sympo-sium on Biocomputing. 1998: 77-88.
    [71] Szallasi Z. Genetic network analysis in light of massively parallel biological data acquisition[C]. Pacific Symposium on Biocomputing. 1999: 5-16.
    [72]刘秉文,陈俊杰.医学分子生物学[M].北京:中国协和医科大学出版社, 2000.
    [73] Liang S, Fuhrman S, Somogyi R R. A general reverse engineering algorithm for inference of genetic networks architectures[C]. Pacific Symposium on Biocom-puting. 1998: 18-29.
    [74]王明怡.微阵列数据挖掘技术的研究[D].杭州:浙江大学, 2004.
    [75] Hartemink A J, Gifford D K, Jaakkola T S, et al. Using graphical models and ge-nomic expression data to statistically validate models of genetic regulatory net-works[C]. Pacific Symposium on Biocomputing. 2001: 422-433.
    [76] Pe'er, D; Regev, A; Elidan, G; Friedman N. Inferring subnetworks from perturbed expression profiles[J]. Bioinformatics. 2001, 17(Suppl. 1): 215-224.
    [77] Friedman N, Linial M, Nachman I, et al. Using Bayesian networks to analyze ex-pression data[J]. Jounal of Computational Biology. 2000, 7(3-4): 601-620.
    [78] Imoto S, T G. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression[C]. Pacific Symposium on Biocomputing. 2002: 175-186.
    [79] Yoo C T, Horsson V, Cooper G F. Discovery of causal relationships in a gene-regulation pathway from a mixture of experimental and observational DNA microarray data[C]. Pacific Symposium on Biocomputing. 2002: 498-509.
    [80] Smith V A, Jarvis E D, Hartemink A J. Evaluating functional network inference using simulation of complex biological systems[J]. Bioinformatics. 2002(18): 216-224.
    [81]王明怡,夏顺仁,陈作舟.基于微阵列数据的基因网络预测方法研究进展[J].生物物理学报. 2005, 21(1): 19-25.
    [82] Zou M, Conzen S D. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data[J]. Bio-informatics. 2005, 21(1): 71-79.
    [83]崔光照,张勋才,曹祥红,等.基于动态贝叶斯网络的多时延基因调控网络构建[J].科学技术与工程. 2005, 5(17): 1247-1251.
    [84] Heckerman D, Geiger D, Chickering D M. Learning Bayesian networks: The combination of knowledge and statistical data[J]. Machine Learning. 1995, 20(3): 197-243.
    [85] Imoto S, Higuchi T, Goto T, et al. Combining microarrays and biological knowl-edge for estimating gene networks via Bayesian networks[J]. Journal of Bioin-formatics and Computational Biology. 2004, 2(1): 77-98.
    [86] Werhli A V, Husmeier D. Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowl-edge[J]. Statistical Applications in Genetics and Molecular Biology. 2007, 6(1): Article 15.
    [87] Rosner B.生物统计学基础(第五版)[M].北京:科学出版社, 2004.
    [88] Nathalie P, Frank D S, Johan A K, et al. Systematic benchmarking of microarray data classification: Assessing the role of non-linearity and dimensionality reduc-tion[J]. Bioinformatics. 2004, 20(17): 3185-3195.
    [89] Pan D, Wang F, Guo J K, et al. A new method to mine gene regulation relation-ship information[C]. Fuzzy System and Knowledge Discovery International Con-ference (FSKD'05). Changsha, China: 2005: 1051-1060.
    [90] Lee S, Batzoglou S. Application of independent component analysis to microar-rays[J]. Genome Biology. 2003, 4(11): 761-7621.
    [91] Liebermeister W. Linear modes of gene expression determined by independent component analysis[J]. Bioinformatics. 2002, 18(1): 51-60.
    [92]张连文,郭海鹏.贝叶斯网络引论[M].北京:科学出版社, 2006.
    [93]肖秦琨,高耸,高小光.动态贝叶斯网络推理学习理论及应用[M].北京:国防工业出版社, 2007.
    [94] Husmeier D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks[J]. Bioinformatic. 2003, 19(17): 2271-2282.
    [95] Kim S Y, Imoto S, Miyano S. Inferring gene networks from time series microar-ray data using dynamic Bayesian networks[J]. Briefings in Bioinformatics. 2003, 4(3): 228-235.
    [96] Liao X J, Carin L. Constrained independent component analysis of DNA mi-croarray signals[C]. Proceedings of Workshop on Genomic Signal Processing and Statistics (GENSIPS). Raleigh, NC: 2002.
    [97] Lu W, Rajapakse C J. Constrained independent component analysis[C]. Advances in Neural Information Processing Systems 13 (NIPS2000). MIT Press, 2000: 570-576.
    [98] Lu W, Rajapakse C J. ICA with reference[C]. Proc. Third Int. conf. On ICA and Blind Source Separation (ICA2001). 2001: 120-125.
    [99] Wikipedia, the free encyclopedia. Ljung-Box test[EB/OL]. 2007. http://en.wikipedia.org/wiki/Ljung-Box_test.
    [100]王金发主编.细胞生物学网络课程[EB/OL]. 2003. http://www.bio.hbnu.edu.cn /wjfcell/.
    [101]边肇祺,张学工.模式识别[M].北京:清华大学出版社, 2000.
    [102] Nell C, John S.支持向量机导论[M].北京:电子工业出版社, 2005.
    [103] Liu Y C, RingnérMarkus. Multiclass discovery in array data[J]. BMC Bioinfor-matics. 2004, 5(70).
    [104] Brazma A, Vilo J. Gene expression data analysis[J]. FEBS Letters. 2000, 480(1): 17-24.
    [105] http://genome-www.stanford.edu/Human-CellCycle/HeLa/[EB/OL].
    [106]杜树新,吴铁军.用于回归估计的支持向量机方法[J].系统仿真学报. 2003, 15(11): 1580-1585.
    [107]邱浪波.基因表达缺失值的加权回归估计算法[J].国防科学技术大学学报. 2007, 29(1): 111-115.
    [108]王广云,邱浪波,强波,等.基于支持向量聚类的肿瘤表达谱分型识别算法[J].生物医学工程. 2007, 26(4): 305-310.
    [109] Liu J Z, Wang W. Op-cluster: Clustering by tendency in high dimensional space[C]. Proceedings of the Third IEEE International Conference on Data Min-ing (ICDM'03). Melbourne, Florida, US: 2003: 187-194.
    [110] Dominic J A, Isaac S K, Atul J B. Quantifying the relationship between co-expression, co-regulation and gene function[J]. BMC Bioinformatics. 2004, 5(18).
    [111] Zhang Y, Zha H Y, Wang J Z, et al. Gene co-regulation vs. co-expression[C]. Poster Proceedings of the International Conference on Research in Computational Molecular Biology (RECOMB). San Diego, CA: 2003: 232-233.
    [112] Leng X Y, Müller H. Time ordering of gene co-expression[J]. Biostatistics. 2006, 7(4): 569-584.
    [113] D'haeseleer, P; Liang, S; Somogyi R. Genetic network inference: from co-expression clustering to reverse engineering[J]. Bioinformatics. 2000, 16(8): 707-726.
    [114]生物电子学国家重点实验室.基因表达数据分析[J]. 2005.
    [115] Ji L P, Tan K L. Mining gene expression data for positive and negative co-regulated gene clusters[J]. Bioinformatics. 2004, 20(16): 2711-2718.
    [116] Hughes T R, Marton M J, Jones A R, et al. Functional discovery via a compen-dium of expression profiles[J]. Cell. 2000, 102(1): 109-126.
    [117]徐建震,郭政,李霞,等.结合基因功能分类体系筛选聚类特征基因[J].生物物理学报. 2005, 21(3): 187-194.
    [118] Zhou X, Kao M C, Wong W H. Transitive functional annotation by shortest-path analysis of gene expression data[J]. Proc. Nat. Acad. Sci. USA. 2002, 99: 12783- 12788.
    [119] HPRD: Human Protein Reference Database[EB/OL]. http://www.hprd.org/.
    [120] iHOP: the International House of Pancakes[EB/OL]. http://www.ihop-net.org/ UniPub/iHOP/.
    [121] TRANSFAC 7.0 Public 2005[EB/OL]. http://www.gene-regulation.com/cgi-bin/ pub/databases.html.
    [122] GO: the Gene Ontology[EB/OL]. http://www.geneontology.org/.
    [123] Spellman P T, Gavin S, Michael Q Z. Comprehensive identification of cell cy-cle-regulated genes of the yeast saccharomyces cerevisiae by microarray hy-bridization[J]. Molecular Biology of the Cell. 1998, 9(12): 3273-3297.
    [124] BioGRID: Biological General Repository for Interaction Database[EB/OL]. http:// www.thebiogrid.org/.
    [125]张蔚,方芙蓉,徐小霞.用基因芯片技术检测宫颈癌患者相关基因的差异表达[J].武汉大学学报(医学版). 2007, 28(1): 4-8.
    [126] Cho R J, Huang M X, Campbell M J, et al. Transcriptional regulation and func-tion during the human cell cycle[J]. Nature Genetics. 2001, 27(1): 48-54.
    [127]陈葳,李旭,杨玉琮.不同表型的人宫颈癌细胞亚克隆株的基因表达谱[J].西安交通大学学报(医学版). 2006, 25(6): 563-566.
    [128] GEO: Gene Expression Omnibus[EB/OL]. https://www.pubmed.com/projects/geo /index.cgi.
    [129] Generic GOTermFinder[EB/OL]. http://go.princeton.edu/cgi-bin/GOTermFinder.
    [130] Cahill D P, Lengauer C, Yu J, et al. Mutations of mitotic checkpoint genes in hu-man cancers[J]. Nature. 1998, 392: 300-303.
    [131]郭德玉,陈意生.细胞周期调控与细胞癌变[J].国外医学(生理、病理科学与临床分册). 1997, 17(1): 1-4.
    [132]张闻编译.人类基因功能手册[EB/OL]. 2006. http://zwbi.zoomshare.com.
    [133]李雨民,杨凤桐. DNA损伤修复与细胞凋亡[J].国外医学(放射医学核医学分册). 1999, 23(3): 112-115.
    [134]罗惠霞,王玉炯.组蛋白乙酰化/去乙酰化在真核基因转录调控中的作用[J].中国生物工程杂志. 2004, 24(2): 19-21.
    [135]邓欢,黄年根,张吉翔. DNA甲基化与细胞周期[J].国际遗传学杂志. 2006, 29(2): 106-109.
    [136]兰斌,刘炳亚,张济.胃癌细胞周期G1/S转换期基因表达谱的分析[J].中华胃肠外科杂志. 2005, 8(3): 229-233.
    [137]董笑盈.基于神经网络和小波分析技术的基因表达谱数据分析[D].北京:北京工业大学, 2004.
    [138] Kurt F. Storage and analysis of microarray data[D]. Heidelberg: Universit?t zu K?ln, 2002.
    [139]王向峰.基因芯片数据分析中的标准化算法和聚类算法[R].北京:北京大学生命科学院, 2004.
    [140]曹祥红.基于小波去噪和互谱估计的基因表达数据分析[D].郑州:郑州轻工业学院, 2006.
    [141]刘泉.基因表达及其调控过程的随机动力学研究[D].武汉:华中师范大学, 2007.
    [142] Common P. Independent component analysis, A new concept?[J]. Signal Proc-essing. 1994, 36(3): 287-314.
    [143] Hyv?rinen A, Oja E. Independent component analysis: Algorithms and applica-tions[J]. Neural Networks. 2000, 13(4-5): 411-430.
    [144] Hyv?rinen A, Karhunen J, Oja E.独立成分分析[M].北京:电子工业出版社, 2007.
    [145]浙江工业大学.细胞生物学[EB/OL]. 2005. http://kczy.zjut.edu.cn/cellweb/ Article_Class2.asp?ClassID=15&LayoutID=1.
    [146]天琼工作室.分子生物学[EB/OL]. 2005. http://w3.hevttc.edu.cn/smkxx/xb/ks/ fzsw05.htm.
    [147]冯明功.生物化学[M].北京:科学出版社, 2004.
    [148]焦宏,邱丽颖,董明刚.细胞信号传递研究的进展[J].张家口医学院学报. 2001, 18(6): 69-71.
    [149] Lu W, Rajapakse C J. Approach and applica-tion of constrained ICA[J]. IEEE Transactions on Neural Networks. 2005, 16(1): 203-212.
    [150] Fletcher R.实用最优化方法[M].天津:天津科技翻译出版公司, 1990.
    [151] Liao X J, Carin L. A New Algorithm for Independent Component Analysis With or Without Constraints[C]. Proceedings of the Second IEEE Sensor Array and Multichannel (SAM) Signal Processing Workshop. Rosslyn, VA: 2002: 413-417.
    [152] Xu Y, Olman V, Xu D. Clustering gene expression data using a graph-theoretic approach: An application of minimum spanning trees[J]. Bioinformatics. 2002,18(4): 536-545.
    [153]李玉鑑.自适应K-均值聚类算法[J].计算机研究与发展. 2007, 44(Suppl.): 100-104.
    [154]冯兴杰,黄亚楼.带约束条件的聚类算法研究[J].计算机工程与应用. 2005(7): 12-14.
    [155] Ljung G M, Box G E P. On a measure of a lack of fit in time series models[J]. Biometrika. 1978, 65(2): 297-303.
    [156] Brockwell P, Davis R. Introduction to time series and forecasting[M]. New York: Springer, 2002.
    [157] Hyv?rinen A, Oja E. A Fast Fixed-Point Algorithm for Independent Component Analysis[J]. Neural Computation. 1997, 9(7): 1483-1492.
    [158]刘惟信编著.机械最优化设计[M].北京:清华大学出版社, 1994.
    [159]张惟皎,刘春煌,李芳玉.聚类质量的评价方法[J].计算机工程. 2005, 31(20): 10-12.
    [160] Eilstein D, Hedelin G, Schaffer P. Incidence of colorectal cancer in Bas-Rhin, trend and prediction in 2009[J]. Bull Cancer. 2000, 87(7-8): 595-599.
    [161] Kamangar F, Dores G M, Anderson W F. Patterns of cancer incidence, mortality, and prevalence across five continents: defining priorities to reduce cancer dis-parities in different geographic regions of the world[J]. Journal of Clinical On-cology. 2006, 24(14): 2137-2150.
    [162]闫扬,孙秀菊.核糖体蛋白基因家族与肿瘤的关系[J].国际遗传学杂志. 2007, 30(5): 391-394.
    [163]王辉,刘伟利.核糖体蛋白基因表达与肿瘤的关系[J].生理科学进展. 2007, 38(4): 376-379.
    [164] Wang J B, Trond H B, Inge J, et al. Tumor classification and marker gene predic-tion by feature selection and fuzzy c-means clustering using microarray data[J]. BMC Bioinformatics. 2003, 4(60).
    [165] Davide R, Pier P P. Does the ribosome translate cancer?[J]. Cancer. 2003, 3(3): 179-192.
    [166] de Lai M, Xu J. Ribosomal proteins and colorectal cancer[J]. Current Genomics. 2007, 8(1): 43-49.
    [167] David R, Silvia G, Francesco P, et al. Dyskeratosis congenita and cancer in mice deficient in ribosomal RNA modification[J]. Science. 2003, 299(10): 259-262.
    [168]汤明,陈森林,曾亮.结直肠癌中HSP60和HSP27表达的免疫组织化学研究[J].现代生物医学进展. 2007, 7(7): 1039-1041.
    [169]朱勤,张阳德,胡煜,等.热休克蛋白90β在结肠癌组织中的表达及其意义[J].现代生物医学进展. 2007, 7(7): 1042-1044.
    [170]陈怡,冉志华,陈翔,等. HSP70, HSP90在结肠癌中表达及其和生物学行为的相关性[J].世界华人消化杂志. 2006, 14(33): 3201-3205.
    [171] Hegde P, Qi R, Gaspard R, et al. Identification of tumor markers in models of human colorectal cancer using a 19,200-element complementary DNA microar-ray[J]. Cancer Research. 2001, 61: 7792-7797.
    [172] Asea A A A, Calderwood S K. Heat shock proteins in cancer[M]. New York: Springer, 2007.
    [173]孙希才,葛荣明.热休克蛋白与肿瘤的关系[J].同济大学学报(医学版). 2004, 25(4): 356-358.
    [174] GeneCards[EB/OL]. http://www.genecards.org/.
    [175] Francois B, SébastienS, SéverineE, et al. Gene expression profiling of colon can-cer by DNA microarray and correlation with histoclinical parameters[J]. Onco-gene. 2004, 23: 1377-1391.
    [176] Xu C S, Xin Z H, Yuan J Y. Advance in research on biological function and tran-scriptional control of heat shock proteins[J]. Developmental & Reproductive Bi-ology. 2002, 11(2): 88-94.
    [177]刘伟杰,秦环龙.蛋白质组学技术在大肠癌肿瘤标志物研究中的应用[J].世界华人消化杂志. 2007, 15(36): 3836-3841.
    [178]王少彬,陈理明,陈俊辉.热休克蛋白与肿瘤研究进展[J].现代肿瘤医学. 2005, 13(1): 131-133.
    [179]刘化锋,王文燕.基因转录调控网络模型[J].山东大学学报(理学版). 2006, 41(6): 103-108.
    [180]徐肖江.从功能基因组数据重建基因调控网络[D].上海:中国科学院上海生命科学研究院生物化学与细胞生物学研究所, 2005.
    [181]易东,李辉志,杨梦苏.基因调控网络研究与数学模型的建立[J].中国现代医学杂志. 2003, 13(24): 74-78.
    [182] Akutsu T, Miyano S, Kuhara S. Identification of genetic networks from a small number of gene expression patterns under the boolean network model[C]. Pacific Symposium on Biocomputing. 1999: 17-28.
    [183] Gouze J, Sari T. A class of piecewise linear differential equations arising in bio-logical models[J]. Dynamical Systemsan International Journal. 2002, 17(4): 299-316.
    [184]岳博,焦李成. Bayes网络学习的MCMC方法[J].控制理论与应用. 2003, 20(4): 582-584.
    [185] Ong I M, Glasner J D, Page D. Modelling regulatory pathways in E.coli from time series expression profiles[J]. Bioinformatics. 2002, 18(Suppl. 1): 241-248.
    [186] Murphy K, Milan S. Modelling gene expression data using dynamic Bayesiannetworks[R]. Berkeley, CA: Computer Science Division, University of California, 1999.
    [187] Friedman N, Murphy K, Russell S. Learning the structure of dynamic probabilis-tic networks[C]. Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI'98). 1998: 139-147.
    [188]何书元.应用时间序列分析[M].北京:北京大学出版社, 2003.
    [189] Husmeier D. DBmcmc[EB/OL]. 2003. http://www.bioss.ac.uk/~dirk/software/ DBmcmc.
    [190]赵琪. MCMC方法研究[D].济南:山东大学, 2007.
    [191] KEGG: Kyoto Encyclopedia of Genes and Genomes[EB/OL]. http://www.genome.jp/kegg/kegg2.html.
    [192]刘丹. GO术语间语义相似性的度量方法[D].沈阳:东北师范大学, 2008.
    [193] Resnik P. Semantic similarity in a taxonomy:An information-based measure and its application to problems of ambiguity in natural language[J]. Artificial Intelli-gence Research. 1999, 11: 95-130.
    [194] Johnson D G, Cress W D, Jakoi L, et al. Oncogenic capacity of the E2F1 gene[J]. Proc. Nat. Acad. Sci. USA. 1994, 91: 12823-12827.
    [195] Pierce A M, Schneider-broussard R, Gimenez-conti Z B, et al. E2F-1 has both oncogenic and tumor-suppressive properties in a transgenic model[J]. Molecular and Cellular Biology. 1999, 19(9): 6408-6414.
    [196]孟庆慧, Itzhak D G. BRCA1在人类肿瘤细胞中的表达[J].实验生物学报. 2001, 34(1): 55-64.
    [197] Joseph R N. The Rb/E2F pathway and cancer[J]. Human Molecular Genetics. 2001, 10(7): 699-703.
    [198]刘辰,钱其军,吴孟超. E2F转录因子与肿瘤基因治疗[J].国外医学(肿瘤学分册). 2004, 31(7): 499-502.
    [199]赵爱国,吴曙光.沉默Cdk7导致pRb和Cdk2磷酸化水平下降并诱导HepG2细胞凋亡[J].中国药理学通报. 2005, 25(1): 106-110.
    [200]张安平,张连阳. Sp1/Krüppel样转录因子家族研究进展[J].世界华人消化杂志. 2004, 12(7): 1641-1645.
    [201]王玉莲,方林彬,林琼燕. PCNA、p53和hTERT在子宫颈病变中的表达及意义[J].河北医学. 2006, 12(9): 847-849.
    [202]高隽.智能信息处理方法导论[M].北京:机械工业出版社, 2004.
    [203] Attila F, Srinivas V, David L, et al. Independent component analysis reveals new and biologically significant structures in microarray data[J]. BMC Bioinformatics. 2006, 7(290).
    [204] Huang D S, Zheng C H. Independent component analysis-based penalized dis-criminant method for tumor classification using gene expression data[J]. Bioin-formatics. 2006, 22(15): 1855-1862.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700