基于样本子集差异基因表达检测的统计方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
活体细胞内的基因通常按照一定的顺序进行基因表达,但在某些情况下,会因环境条件等因素的变化导致基因突变,并引起一定的表型异常变化,即所谓的差异基因表达。基因芯片数据差异基因表达检测统计方法作为近几年迅速发展的生物学前沿技术之一,其主要目的是分析基因表达谱数据的生物学意义,并采用微阵列基因芯片技术同时、快速、准确的检测成千上万种基因是否有差异表达。
     基因芯片差异基因表达检测研究单基因水平的基因表达谱数据,利用统计学中的假设检验,从基因表达谱数据中筛选出潜在的、过表达的癌症样本,并研究有关基因和基因群组,发现癌症特异性基因。差异基因表达检测的应用广泛,例如研究适应药物作用的分子机制,寻找新药开发源头的药物靶标,筛选多靶点高通量药物,评价药物活性和毒性等,对揭示癌症疾病发生机制、开发抗癌药物等方面有重要的意义。
     基因芯片差异基因表达检测技术的核心方法通常基于统计学原理,即利用统计学中的假设检验从基因表达谱中筛选出潜在的特异性基因。传统差异基因表达检测的前提是假设整个癌症样本组的基因表达强度相对于正常样本组的基因表达强度都存在过表达的情况。2005年Tomlins等人在Science上撰文指出差异基因表达可以只出现在癌症样本组的某个子集中,而不是整个癌症样本组。近年来,有大量的研究工作针对癌症样本组子集的差异基因表达问题展开,并且产生了多种用以解决这类问题的统计方法。
     本文的工作是针对癌症样本组子集的差异基因表达检测,并主要体现在如下几个方面:
     1)对六种广泛应用的差异基因检测方法做了比较研究,包括T统计方法、PPST方法、COPA方法、OS方法、ORT方法和MOST方法。T统计方法是传统的差异基因表达检测方法,其假定癌症样本组相对于正常样本组普遍呈现过表达,通过计算正常样本组和癌症样本组的均值及合并标准差,求得T统计值。PPST方法通过识别在A组统计学意义样本中表现值强度超过B组统计学意义样本中基因的表达值强度一个特定的百分比的基因来检测差异基因。COPA方法、OS方法、ORT方法和MOST方法中用样本中值和中值绝对离差进行样本表达值的转换。OS方法在COPA方法的基础上利用四分位数间距能度量数据分散性,ORT方法和OS方法区别在于OS方法的基因表达数据是正常组样本和癌症组样本的全部数据一起使用,而ORT方法是相对正常组样本数据定义的。MOST方法隐性地考虑差异基因表达强度临界值所有可能的取值,通过确定其统计量最大值来确定阈值,从而来检测差异表达基因。文章通过模拟实验和对真实数据的实验,比较和分析了T统计方法、PPST方法、COPA方法、OS方法、ORT方法和MOST方法差异基因表达检测方法。
     2)提出了两种用于差异基因表达检测的统计量,即三均值和三均值绝对离差。当微阵列基因芯片数据中存在差异表达基因值时,其均值易受差异表达基因值影响,中值具有较好的稳健性且受差异表达值的影响较小。三均值综合利用了上四分位数、下四分位数、中位数三个数据,对异常数据具有较强的抗扰性。当需要充分利用样本信息和稳健性特征时,样本三均值和三均值绝对离差可以描述数据变化,不忽略距离中位数较远的信息,从而能够全面、稳定的表示样本的信息。
     3)提出了针对癌症样本组子集的差异基因表达检测方法。在ORT方法的基础上提出了差异基因表达检测方法TriORT方法,TriORT方法以三均值和三均值绝对离差为统计量表示数据转换的变化。TriORT方法采用了中位数及少数其它次序统计量,能充分反映基因芯片中样本数据的特征,并且稳健性较强。通过启发式规则附加表达值,利用四分位差判断基因芯片数据的异常差异值,从而进行差异基因表达检测。实验结果表明,本文提出的基于三均值和三均值绝对离差的差异基因表达检测方法对于癌症组样本子集相对于正常组样本过表达的差异基因表达检测有效,并具有较好的敏感性和特异性。
     4)提出了针对癌症样本组子集的差异基因表达检测方法。在MOST方法的基础上提出了一种基于三均值的差异基因表达检测方法,称之为TriMOST方法。该方法将三均值引入到癌症组样本子集相对于正常组样本过表达的差异基因表达检测方法中,通过用三均值和三均值绝对差对样本表达值进行转换,当差异基因活跃的数目未知时,又引入了均值和方差,使用标准化的、代数形式的表达值间的差别确定差异基因的标准,尽可能较全面地考虑可能的阈值,将可能的值默认为差异表达阈值,从而使得检测的效果理想。
     5)讨论并分析了改进的方法和已有差异基因表达检测方法在乳腺癌真实数据上的应用。为了进一步研究本文所提出的差异基因表达检测方法的性能,首先将改进方法的仿真实验结果和已有的方法进行比较,然后将改进方法应用到乳腺癌真实数据集West(2001),再将得到的结果在NCBI数据资源库上进行验证,并针对验证结果对各方法的性能进行对比分析。检测乳腺癌差异表达基因,并认识相应基因群,这对乳腺癌疾病的治疗提供了有益的辅助信息。
     综上所述,本文针对癌症样本组子集相对于正常样本组存在过表达的情况,提出了两种改进的差异基因表达检测方法。仿真实验表明,所提出的多种差异基因表达检测方法均具有较好的敏感性和特异性,且其检测效果较已有的检测方法理想。此外,将提出的方法和已有方法应用于真实的乳腺癌数据集,并对其检测结果进行验证。通过实验分析,可知在癌症样本组子集相对于正常样本组过表达的基因芯片数据差异基因表达检测中,基于三均值和三均值绝对离差的差异基因表达检测方法能够反应基因芯片的数据特征,具有较好的稳定性。
Normally, cell genes are expressed according to specific time and spatial se-quences. However, influenced by environmental condition or other factors, cell genes might have gene mutation which would cause abnormal change of phenotype, called differential gene expression (DGE). Microarray is a cutting edge technology that mainly serves to analyze the biological significance of gene expression profile. By us-ing microarray technology, it is possible to detect DGE of thousands of genes simulta-neously with rapid speed and good accuracy.
     DGE detecting methods study the gene expression profile on single-gene level through duplicate experiments and recognizes potential over-expressed cancer sample by statistical hypothesis test. The detected genes can help with identifying can-cer-related genes and gene clusters. DGE detection can be applied to many areas, such as studying drug molecular mechanism, developing new drug target, screening high throughput drug, and evaluating drug activity as well as toxicity, etc. It is of great sig-nificance to revealing cancer disease mechanism and developing anticancerogen.
     The core algorithms of DGE detection in microarray data are normally based on statistics. Potential DGE genes are screened out through hypothesis test. Traditional DGE detection is based on the hypothesis that the entire cancer group is over-expressed compared with the normal group. However, in 2005, Tomlins et al. pointed out on Science that DGE might only exist in cancer subgroup rather than in the entire cancer group. In recent years, great effort was devoted to solve DGE detection in over-expressed cancer subgroup, and various statistical methods were proposed based on the assumption by Tomlins et al.
     This dissertation is focused on the DGE detection based on the assumption of over-expressed cancer subgroup. The main content of this thesis includes:
     1) Comparison study was carried out on six popular DGE detecting methods, in-cluding T-statistic, PPST, COPA, OS, ORT, and MOST. T-statistic is a traditional de-tecting method, which assumes that the entire cancer group is over-expressed com- pared with the normal group, and calculates the mean and pooled standard deviation of both normal and cancer group. PPST compares the expression levels of genes between the case group (A) and the control group (B), and targets DGEs with the difference exceeding a certain percentile. COPA is based on the median and median absolute de-viation. Based on COPA, OS introduces the quartile distance to measure data dispersity. ORT is similar to OS as they both use the quartile as threshold. The difference lies in that when calculating quartile, the OS method uses both the cancer group and the healthy group, and ORT only uses the healthy group. MOST method considers all the possible critical value of gene expression, and defines the detecting threshold using the maximum value of the statistics. The six methods aforementioned were tested and analyzed through simulation study and real data experiment.
     2) Two statistics, tri-mean and tri-MAD, were proposed for DGE detection. When DGE exists in microarray data, the mean value of gene expression value is prone to the DGE value, while median is more robust and has better anti-interference ability. Tri-mean synthesizes the information of upper quartile, lower quartile and median, therefore can offer more comprehensive descriptions of sample information with better stability, without neglecting data points distant from group median.
     3) New DGE detecting method TriORT was proposed for over-expressed cancer subgroup. TriORT was based on ORT and defined the DGE by tri-mean and tri-MAD. Besides, median and other few statistics were also used to fully represent the data cha-racteristic in microarray data in a more robust manner. The threshold was decided ac-cording to quartile and intuitive rule as additional expression value. Experimental re-sults indicated that the proposed method was more effective for the over-expressed cancer subgroup and had better sensitiveness as well as specificity.
     4) Novel DGE detecting method TriMOST was proposed for over-expressed can-cer subgroup. TriMOST was based on MOST and introduced tri-mean and tri-MAD to the definition of DGE value in the over-expressed cancer subgroup compared with normal group. When the active number of DGE genes was unknown, mean and MAD values were used to give a more through search of all possible thresholds that screen out DGE genes. Experimental results also indicated that the proposed method had very promising performance in both simulation study and real data experiment.
     5) The proposed methods were compared with six existing methods. We first car-ried out a simulation study to test all the discussed methods on simulated data. Then all the discussed methods were applied to the real database provided by West. The ex-periment results of real data were checked on NCBI to verify the detected cancer genes. The total eight methods were compared and analyzed based on their experimental re-sults. Through DGE detection, we can obtain further knowledge of cancer relevant gene groups, and this can provide new approach to the healing of breast cancer disease.
     In summary, we analyzed six methods for DGE detection and proposed two novel DGE detecting methods based on the assumption of over-expressed cancer subgroup. Through simulation study and real data experiments, the proposed methods were demonstrated to be of good sensitiveness and specificity, as well as better detecting performance compared with the other six methods. Based on the experimental results, it can be concluded that for over-expressed cancer subgroup in microarray data, de-tecting methods based on tri-mean and tri-MAD can reflect the microarray data with more comprehensiveness and stability which would bring better detecting perform-ance.
引文
[1] Chee S M, Yang R, Hubbell R. Accessing genetic information with high-density DNA arrays [J]. Science, 1996, 274: 610–613.
    [2] Schena M. Microarray Analysis [M]. Eaton publishing, 2000,11(23), 220-251.
    [3] Fodor S, Rava R P, Huang X C, et al. Multiplexed biochemical assays with bio-logical chips [J]. Nature, 1993, 364: 555-556.
    [4] Goto T, Takano M, Sakamoto M, et al. Gene expression profiles with cDNA mi-croarray reveal RhoGDI as a predictive marker for paclitaxel resistance in ovarian cancers [J]. Oncol.Rep. 2006, 15: 1265-1271.
    [5] Graves D J . Powerf ul tools fo r genet ic analysis come of age [J]. Trends Bio-technol, 1999, 17(3): 127–134.
    [6]黄德双.基因表达谱数据挖掘方法研究[M].北京:科学出版社, 2009.
    [7] Dhimana N, Bonilla R, Dennis K, et al. Gene expression microarrays: a 21st cen-tury tool for directed vaccine design Vaccine [J]. 2002, (20): 22–30.
    [8]杨华伟,刘剑仑,阳扬等.基因芯片技术检测乳腺癌多柔比星耐药相关基因[J].中华肿瘤防治杂志. 2008, 5(9): 659-664.
    [9]西北农林科教大学,分子生物学网络课程[EB/OL], [2006-11-21]. http://sm.nwsuaf.edu.cn/mb/admin/upload/ files/wjl/htm/chapter8_5.htm.
    [10] Westhead D R著.王明怡等译.生物信息学[M].北京:科学出版社, 2005, 6: 176-179.
    [11] Shoemaker D, Schadt E, Armour C, et al. Experimental annotation of the human genome using microarray technology [J]. Nature, 2001, 409(6822): 922-927.
    [12] Pease A C, Solas D, Sullivan E J, et al. Light-generated oligonucleotide arrays for rapid DNA sequence analysis [J].Proc Natl Acad Sci USA, 1994, 91(11): 5022–5026.
    [13] Imofeev E, Kochetkova S V, Mirzabekov A D, et al. Regioselective immobiliza-tion of short oligonucleotides to acrylic copolymer gels [J]. Nucleic Acids Re-search, 1996, 24(16): 3142–3148.
    [14] Schena M, Shalon D, Davis R, et al. Quantitative monitoring of gene expression pat-terns with a complementary DNA microarray [J]. Science, 1995, 270(5235): 467-470.
    [15] Harrington C.Monitoring gene expression using DNA microarrays [J]. Current Opinion in Microbiology, 2000, 3: 285-291.
    [16] Drmanac S, Kita D, Labat I, et al. Accurate sequencing by hybridization for DNA diagnostics and individual genomics [J]. Nature Biotechnology, 1998, 16(1): 54–58.
    [17] Brown P O, Botstein D. Exploring the new world of the genome with DNA mi-croarrays [J]. Nature Genetics, 1999, 21(Suppl.1): 33–37.
    [18]杨春梅.基因表达数据聚类分析算法研究和应用[D].天津:天津大学, 2006年.
    [19]陈宏等编著.基因工程原理与应用[M].北京:中国农业大学出版社, 2004: 171.
    [20] Benedetti V M, Biglia N, Sismondi P, et al. DNA chips: the future of biomarkers [J]. Int J Biol Markers, 2000, 5(1): 1–9.
    [21] Xiang C, Chen Y. cDNA microarray technology and its applications [J]. Biotech-nology Advances, 2000, 18: 35–46.
    [22] Heller R A,Schena M, Chai A, et al. Discovery and analysis of inflammatory dis-ease-related genes using cDNA microarray [J]. Proc Natl Acad Sci USA, 1997, 94: 2150–2155.
    [23] JainE. Current trends in bioinformatics [J]. Trendsin Bioteehnology, 2002, 20(8): 317-319.
    [24]房爱华,尹彦涛.基因芯片技术的应用现状及展望[J].中国畜牧兽医, 2009, 5(36): 75-78.
    [25] Ross D T, Scherf U, Eisen M B, et al. Systematic variation ingene expression pat-terns in human cancer cell lines [J]. Nat Genet, 2000, 24: 227–235.
    [26] Cummings C A,Relman D A. Using DNA microarrays to study host–microbe in-teractions [J]. Emerging Infectious Diseases, 2000, 6(5): 513–525.
    [27] Veer L J, Dal H, Vijver M J, et al. Gene expression profiling predicts clinical outcome of breast cancer [J]. Nature, 2002,4: 530-536.
    [28] Alizadeh A A, Ross D T, Perou C M, Van de Rijn M. Towards a novel classifica-tion of human malignancies based on gene expression patterns [J]. J Pathol, 2001, 195: 41–52.
    [29] Sorlie T, Perou C M, Tibshirani R, Aas T, Geisler S, Johnson H, Hastie T, Eisen M B, Rijn M van de, Jeffrey S S, Thorsen T, Quist H, Matese J C, Brown P O, Botstein D, Eystein L P. Borresen-Dale AL Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications [J]. Proc Natl Acad Sci USA, 2001, 98:10869–10874.
    [30]鲁晶红,符芳等.基因芯片技术及其应用[J].畜牧兽医科技信息, 2006(3):23-25.
    [31] Heller M J. DNA microarray technology: devices, systems, and applications [J]. Annu Rev Biomed Eng, 2002(4): 129-153.
    [32] Goffeau A. DNA technology: Molecular fish on chips [J]. Nature, 1997(385): 202-203.
    [33] Oh T J, Kim C J, Woo S K, Kim T S,J eong D J, et al. Development and clinical evaluation of a highly sensitive DNA microarray for detection and genotyping of human papillomaviruses [J]. J Clin Microbiol, 2004(42): 3272-3280.
    [34] Fodor S,Read J L,Pirrung M C, et al. Light-directed spatially addressable parallel chemical synthesis [J]. Science, 1991, 251(4995): 767-773.
    [35] Hegde P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R, Hughes J E, Snesrud E, Lee N, Quackenbush J. A concise guide to cDNA microarray analysis [J]. BioTechniques 2000, 29(3): 548-562.
    [36] Duggan D J, Bittner M, Chen Y, Meltzer P, Trent J M. Expression profiling using cDNA microarrays [J]. Nature Genetics 1999, 21(1 suppl): 10-14.
    [37] Schena M, Heller R A, Theriault T P, Konrad K, Lachenmeier E, Davis R W. Mi-croarrays: Biotechnology's discovery platform for functional genomics [J]. Trends in Biotechnology 1998, 16(7): 301-306.
    [38] Ramsay G. DNA chips: State-of-the art [J]. Nature Biotechnology 1998, 16(1):40-44.
    [39] Schena M. Microarray Biochip Technology [M]. Natick, M A: Eaton Publishing, 2000.
    [40] Bowtell D D L, Sambrook J. DNA Microarrays: A molecular cloning manual [J]. Cold Spring, N. Y: Cold Spring Harbor Press, 2003.
    [41]吴斌.基因表达谱芯片的数据分析[J].世界华人消化杂志, 2006, 1(8): 68-74.
    [42] Baldi P, Long A D. A Bayesian framework for the analysis of microarray expres-sion data: Regularized t-test and statistical inferences of gene changes [J]. Bioin-formatics, 2001, 17(6): 509-519.
    [43] Seo Young Kim, Gwangju, Korea, Jae Won Lee. Comparison of various statistical methods for identifying differential gene expression in replicated microarray data [J]. Statistical Methods in Medical Research, 2006(15): 3–20.
    [44] Friddle C J, Koga T, Rubin E M , Bristow J. Expression profiling reveals distinct sets of genes altered during induction and regression of cardiac hypertrophy [J]. Proc. Natl. Acad. Sci., 97: 6745- 6750, 2000.
    [45] Galitski T, Saldanha A J, Styles C A, Lander E S, Fink G R. Ploidy regulation ofgene expression [J]. Science, 1999, 285: 251-254.
    [46] Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring [J]. Science, 1999, 286: 531-537.
    [47] Spellman P T, Sherlock G, Zhang M Q, Iyer V R, et al. Comprehensive identifi-cation of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by mi-croarray hybridization [J]. Molecular Biology of the Cell, 1998(9): 3273-3297.
    [48] Ghosh D, Chinnaiyan A M. Genomic outlier profile analysis: mixture models, null ypotheses, and nonparametric estimation [J]. Biostatistics, 2009, 1(10): 60–69.
    [49] Huber W, Von Heydebreck A, Sultmann H, Poustka A, Vingron M. Variance sta-bilization applied to microarray data calibration and to the quantification of dif-ferential expression [J]. Bioinformatics, 2002, 18(suppl. 1): S96-S104.
    [50] Jain N, Thatte J, Braciale T, Ley K, O'Connell M, Lee J K. Localpooled-error test for identifying differentially expressed genes with a small number of replicated microarrays [J]. Bioinformatics, 2003, 19(15): 1945-1951.
    [51] Pan W. A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments [J]. Bioinformatics, 2002, 18(4): 546-554.
    [52] Pan W. On the use of permutation in and the performance of a class of nonpara-metric methods to detect differential gene expression [J]. Bioinformatics, 2003, 19(11): 1333-1340.
    [53] Rajagopalan D: A comparison of statistical methods for analysis of high density oligonucleotide array data [J]. Bioinformatics, 2003, 19(12): 1469-1476.
    [54] Rocke D M, Durbin B. A model for measurement error for gene expression arrays [J]. Journal of Computational Biology, 2001, 8(6): 557-569.
    [55] Troyanskaya O G, Garber M E, Brown P O, Botstein D, Altman R B. Nonpara-metric methods for identifying differentially expressed genes in microarray data [J]. Bioinformatics 2002, 18(11): 1454-1461.
    [56] Tusher V G, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response [J]. Proc Natl Acad Sci USA, 2001, 98(9): 5116-5121.
    [57] Wille A, Gruissem W, Buhlmann P, Hennig L. EVE (external variance estimation) increases statistical power for detecting differentially expressed genes [J]. Plant J, 2007, 52(3): 561-569.
    [58] Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: Understanding cancer using microarrays [J]. Nature Genetics, 2005, 37(suppl. 6).
    [59] Mischel P S, Cloughesy T F, Nelson S F. DNA-microarray analysis of brain can-cer: Molecular classification for therapy [J]. Nature Reviews Neuroscience, 2004, 5(10): 782-792.
    [60] Nevins J R, Huang E S, Dressman H, Pittman J, Huang A T, West M. Towards integrated clinico-genomic models for personalized medicine: Combining gene expression signatures and clinical factors in breast cancer outcomes prediction [J]. Human Molecular Genetics, 2003, 12(2).
    [61] Carr K M, Bittner M, Trent J M. Gene-expression profiling in human cutaneous melanoma [J]. Oncogene, 2003, 22(20): 3076-3080.
    [62] Staudt L M. Gene expression profiling of lymphoid malignancies [J]. Annual Re-view of Medicine 2002, 53: 303-318.
    [63] Cooper C S: Applications of microarray technology in breast cancer research [J]. Breast Cancer Research, 2001, 3(3): 158-175.
    [64] Belbin T J, Singh B, Barber I, Socci N, Wenig B, Smith R, Prystowsky M B, Childs G. Molecular classification of head and neck squamous cell carcinoma us-ing cDNA microarrays [J]. Cancer Research, 2002, 62(4): 1184-1190.
    [65] West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles [J]. Proceedings of the National Academy of Sciences of the United States of America. 2001, 98(20): 11462-11467.
    [66] Notterman D A, Alon U, Sierk A J, Levine AJ. Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays [J]. Cancer Research, 2001, 61(7): 3124-3130.
    [67] Alizadeh A A, Elsen M B, Davis R E, et al. Distinct types of diffuse large B-cell lym-phoma identified by gene expression profiling [J]. Nature, 2000, 403(6769): 503-511.
    [68] Golub T R, Slonim D K, Tamayo P, Huard C, Gaasenbeek M, Mesirov J P, Coller H, Loh M L, Downing J R, Caligiuri M A, Bloomfield C D, Lander E S. Molecu-lar classification of cancer: Class discovery and class prediction by gene expres-sion monitoring [J]. Science 1999, 286(5439): 531-527.
    [69] Dudoit S, Yang Y H, Speed T P, Callow M J. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments [J]. Stat Sin, 2002, 12: 111-139.
    [70] Troyanskaya O G, Garber M E, Brown P O, Botstein D, Altman R B. Nonpara-metric methods for identifying differentially expressed genes in microarray data [J]. Bioinformatics, 2002, 18(11): 1454-1461.
    [71] Cui X Q, Kerr M K, Churchill G A. Transformations for cDNA microarray data [J]. Statistical Applications in Genetics and Molecular Biology, 2003, 4.
    [72] Allison D, Cui X., Page G, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus [J]. Nature Reviews Genetics, 2006, 7, 55–65.
    [73] Dudoit S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data [J]. Journal of the American Statistical Association[J], 2002, 97: 77-87.
    [74] Atanu B, Sujay D, Jason P F, Mark R S. Statistical Advances in the Biomedical Sciences: Clinical Trials, Epidemiology, Survival Analysis, Bioinformatics [M]. Hardcover, 2008(1): 341-363.
    [75] Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing [J]. Journal of the Royal Statistical Society, Series B(Methodological), 1995, 57(1): 289–300.
    [76] Churchill G A. Using ANOVA to analyze microarray data. Biotechniques, 2004, 37(2): 173-175.
    [77] Draghici S, Kulaeva O, Hoff B, Petrov A, Shams S, Tainsky M A. Noise sam-pling method: an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarrays [J]. Bioinformatics, 2003, 19(11):1348-1359.
    [78] Kerr M K, Martin M, Churchill G A. Analysis of variance for gene expression microarray data. J. Comput. Biol, 2000(7), 819–837.
    [79] Gottardo R, Raftery A E, Yeung K Y, Bumgarner R E. Robust estimation of cDNA microarray intensities with replicates [J]. J. Am. Statist. Assoc, 2006,101: 30–40.
    [80] Gottardo R, Raftery A E, Yeung K Y, Bumgarner R E. Bayesian robust inference for differential gene expression in cDNA microarrays with multiple samples [J]. Biometrics, 2006, 62: 10–18.
    [81] Smyth G K. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments [J]. Statist. Appl. Genet. Molec. Biol, 2004(3).
    [82] Wolfinger R D, Gibson G, Wolfinger E D, Bennett L, Hamadeh H, Bushel P, Af-shari C, Paules R S. Assessing gene significance from cDNA microarray expres-sion data via mixed models [J]. J. Comput. Biol., 2001, 8(6): 625–637.
    [83] Lu Y, Zhu J, Liu P. A two-step strategy for detecting differential gene expression of cDNA microarray data [J]. Current Genetics, 2004(47): 121-131.
    [84] Teschendorff A E, Naderi A, Barbosa-Morais N L, Caldas C. PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer [J]. Bioinformatics, 2006, 22: 2269–2275.
    [85] Li Li, Amitabha Chaudhuri, John Chant, Zhijun Tang. PADGE: analysis of het-erogeneous patterns of differential gene expression [J]. Physiol Genomics, 2007, 10(9): 154-159.
    [86] Mugdha Gadgil. A Population Proportion approach for ranking differentially ex-pressed genes [J]. BMC Bioinformatics, 2008, 9: 380.
    [87] Oleksiak M F, Churchill G A, Crawford D L. Variation in gene expression within and among natural populations [J]. Nat Genet, 2002, 32(2): 261-266.
    [88] Lambrechts D, Robberecht W, Carmeliet P. Heterogeneity in motoneuron disease [J]. Trends in Neurosciences, 2007, 30(10): 536-544.
    [89] Bijlani R, Cheng Y, Pearce D A, Brooks A I, Ogihara M. Prediction of biologi-cally significant components from microarray data: Independently Consistent Ex-pression Discriminator (ICED) [J]. Bioinformatics, 2003, 19(1): 62-70.
    [90] Lyons-Weiler J, Patel S, Becich M J, Godfrey T E. Tests for finding complex pat-terns of differential expression in cancers: towards individualized medicine [J]. BMC Bioinformatics, 2004, 5: 110.
    [91] Seth G, Philp R J, Lau A, Kok Y J, Yap M, Hu W S. Molecular portrait of high pro-ductivity in recombinant NSO cells [J]. Biotechnol Bioeng, 2007, 97(4): 933-951.
    [92] Jianhua Hu. Cancer outlier detection based on likelihood ratio test [J]. Bioinfor-matics, 2008, 24(19): 2193-2199.
    [93] Brown R L, et al. Techniques for testing the constancy of regression relationships over time (with discussion) [J]. J. R. Stat. Soc. B, 1975, 37: 149–192.
    [94] Welsh J B, et al. Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer [J]. Cancer Res., 2001, 61: 5974–5978.
    [95] Lapointe J, et al. Gene expression profiling identifies clinically relevant subtypes of prostate cancer [J]. Proc. Natl Acad. Sci. USA, 2004,101: 811–816.
    [96]蒋彦,王小行,曹毅等.基础生物信息学及应用[M].清华大学出版社, 2003: 195-196.
    [97]王禄山,高培基等.生物信息学应用技术[M].北京:化学工业出版社, 2008, 1(1): 48-67.
    [98] Fang Liu, Baolin Wu. Multi-group cancer outlier differential gene expression de-tection [J]. Computational Biology and Chemistry, 2007( 31): 65–71.
    [99] Tomlins, Daniel R Rhodes, Sven Perner, et al. Recurrent fusion of TMPRSS2 andETS transcription factor genes in Prostate Cancer [J]. Science, 2005, 10(310): 644-648.
    [100] MacDonald J W, Ghosh D. Copa-cancer outlier profile analysis [J]. Bioinfor-matics, 2006, 22: 950–951.
    [101] Gruetzmann R, Boriss H, Ammerpohl O, et al. Meta-analysis of microarray data on pancreatic cancer defines a set of commonly dysregulated genes [J]. Oncogene, 2005, 7(28): 5079–5088.
    [102] David A H, Daniel R R, Chandan Sinha-Kumar, et al. Bioinformatics Ap-proaches in the Study of Cancer [J]. Current Molecular Medicine, 2007, 7: 133-141.
    [103] Lian H. MOST: detecting cancer differential gene expression [J]. Biostat., 2008; 9(3): 411– 418.
    [104] Wu B. Cancer outlier differential gene expression detection [J]. Biostatistics, 2007, 8(3): 566-575.
    [105] Tibshirani R, Hastie T. Outlier sums for differential gene expression analysis [J]. Biostatistics, 2007, 8: 2–8.
    [106] Xiangqin Cui, Gary A Churchill. Statistical tests for differential expression in cDNA microarray experiments [J]. Genome Biology, 2003, 4: 210.
    [107]程德福.智能仪器[M].北京:机械工业出版社, 2005.
    [108]宇传华. ROC分析的基本原理[J].中华流行病学杂志, 1998, 19: 413-415.
    [109] Schena M, Heller R A, Theriault T P, Konrad K, Lachenmeier E, Davis R W. Microarrays: Biotechnology's discovery platform for functional genomics [J]. Trends in Biotechnology, 1998, 16(7): 301-306.
    [110] Brent R. Genomic biology [J]. Cell, 2000, 100: 169-183.
    [111]孙啸.生物信息学基础[M].北京:清华大学出版社, 2005: 282-314.
    [112]付旭平.基因芯片数据分析[D].上海:复旦大学,2007. 5.
    [113]梅长林,范金城编.数据分析[M].北京:高等教育出版社, 2006: 1-5.
    [114]宇传华. ROC分析方法及其在医学研究中的应用[D].第四军医大学, 2000(12).
    [115] Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gau-tier L, Ge Y, Gentry J, et al. Bioconductor: open software development for com-putational biology and bioinformatics [J]. Genome Biology, 2004, 5: R80.
    [116] Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gau-tier L, Ge Y, Gentry J, et al. Bioconductor: open software development for com-putational biology and bioinformatics [J]. Genome Biology, 2004, 5: R80.
    [117] Raouf, et al. EMSY gene overexpression involved in breast oncogenesis [J]. J Natl Cancer Inst, 2005, 97: 1302-1306.
    [118] Hanauer D A, Rhodes D R, Sinha-Kumar C, et al. Bioinformatics Approaches in the Study of Cancer [J]. Current Molecular Medicine, 2007, 7: 133-141.
    [119]李颖新.基于基因表达谱的肿瘤亚型识别与分类特征基因选取研究[J].电子学报, 2005, 33(4): 651-655.
    [120] Storey J D, Tibshirani R.“SAM thresholding and false discovery rates for de-tecting differential gene expression in DNA microarrays”, in The Analysis of Gene Expression Data: Methods and Software, by Parmigani G, Garrett, E S, Iri-zarry R A, Zeger S L(editors) [M]. Springer, New York, 2003: 272-290.
    [121] Ghosh D, Chinnaiyan A M. Genomic outlier profile analysis: mixture models, null ypotheses, and nonparametric estimation [J]. Biostatistics, 2009, 10(1): 60-69.
    [122]卢新国.基于微阵列基因表达谱的一种关联空间的癌症分类算法[J].电子学报, 2008, 36(4): 614-619.
    [123]陈剑英,张波,王国斌等.人类乳腺癌基因表达分析.中华实验外科杂志, 2005, 4(22): 429-431.
    [124] Standford Breast Cancer Microarray Project. The supplement to C.M. Preou et al. Proceedings of the National Academy of Sciences of Americas, 1999, 96: 9212-9217.
    [125]上海研发公共服务平台.药物基因组学诊断技术开发进展. http://www.sgst.cn/xwdt/shsd/200705/t20070518_124623.html.
    [126]张俊平.细胞毒抗肿瘤药物的共同基因表达谱的研究[D].第二军医大学, 2007, 5.
    [127] Shimizu D , Ishikawa T , Ichikawa Y, et al . Current progress in the predi ct ion of chemosen sitivity for breast cancer [J]. Breast Cancer, 2004 ,11(1): 42 - 48 .
    [128]邵志敏,胡震,徐崇锐.基因芯片技术用于预测乳腺癌新辅助化疗疗效的研究[J].循证医学, 2005, 5(1): 19–25.
    [129] Liu B, Li S, Hu J. Technological advances in high-throughput screening [J]. Am. J. Pharmacogenomics, 2004(4): 263-276.
    [130] Szakacs G, Annereau J P, Lababidi S et al. Predicting drug sensitivity and resistance: profiling ABC transporter genes in cancer cells [J]. Cancer Cell, 2004(6): 129-137.
    [131] Villeneuve D J, Hembruff S L, Veitch Z, et al. cDNA microarray analysis of isogenicpaclitaxel- and doxorubicin-resistant breast tumor cell lines reveals distinct drug-specific genetic signatures of resistance [J]. Breast Cancer Res.Treat., 2006,96: 17-39.
    [132] Zhang J P, Ying K, Xiao Z Y, et al. Analysis of gene expression profiles in hu-man HL-60 cell exposed to cantharidin using cDNA microarray [J]. Int.J.Cancer, 2004,108: 212-218.
    [133] Olivier R I, van B M, Veer L J. The role of gene expression profiling in the clin-ical management of ovarian cancer [J]. Eur.J.Cancer, 2006,42: 2930-2938.
    [134] Marcotte E R, Srivastava L K, Quirion R. DNA microarrays in neuropsychopharmacology [J]. Trends Pharmacol. Sci., 2001, 22: 426-436.
    [135] (美)David C. Hoaglin Frederick Mosteller John W. Tukey著. Understanding Robust and Exploratory Data Analysis.陈忠琏,郭德媛译.探索性数据分析[M].北京:中国统计出版社, 1998: 321-333.
    [136] Liu L, Hawkins D M, Ghosh S, Young S S. Robust singular value decomposi-tion analysis of microarray data [J]. Proc Natl Acad Sci USA, 2003, 100(23): 13167-13172.
    [137] Zhao Y, Pan W. Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments [J]. Bioinformatics, 2003, 19(9): 1046-1054.
    [138] Ding Y, Wilkins D. The effect of normalization on microarray data analysis [J]. DNA Cell Biol., 2004, 23(10): 635-642.
    [139] Herrero J, Díaz U R, Dopazo J. Gene expression data preprocessing [J]. bioinformatics, 2003, 19(4): 655-656.
    [140] Bolstad B, Irizarry R, Astrand M, Speed T. A comparison of nornormalization methods for high density oligonucleotide array data based on malization variance and bias [J]. Bioinformatics, 2003, 19(2), 185–193.
    [141] Shi J, Walker M G. Gene Set Enrichment Analysis (GSEA) for Interpreting Gene Expression Profiles [J]. Current Bioinformatics, 2007, 2(2): 1-5.