Preferred analysis methods for Affymetrix GeneChips. II. An expanded, balanced, wholly-defined spike-in dataset

详细信息查看全文

作者：Qianqian Zhu (1) (2) (7)
Jeffrey C Miecznikowski (2) (5)
Marc S Halfon (1) (3) (4) (6)
刊名：BMC Bioinformatics
出版年：2010
出版时间：December 2010
年：2010
卷：11
期：1
全文大小：5716KB
参考文献：1. MAQC Consortium: The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. / Nat Biotech 2006, 24:1151鈥?161. CrossRef
2. McCall MN, Irizarry RA: Consolidated strategy for the analysis of microarray spike-in data. / Nucleic Acids Research 2008, 36:e180. CrossRef
3. Affymetrix Latin square data [http://www.affymetrix.com/support/technical/sample_data/datasets.affx]
4. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. / Nucleic Acids Research 2003, 31:e15. CrossRef
5. Choe S, Boutros M, Michelson A, Church G, Halfon M: Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. / Genome Biology 2005, 6:R16. CrossRef
6. Pearson R: A comprehensive re-analysis of the Golden Spike data: Towards a benchmark for differential expression methods. / BMC Bioinformatics 2008, 9:164. CrossRef
7. Schuster E, Blanc E, Partridge L, Thornton J: Correcting for sequence biases in present/absent calls. / Genome Biology 2007, 8:R125. CrossRef
8. Schuster E, Blanc E, Partridge L, Thornton J: Estimation and correction of non-specific binding in a large-scale spike-in experiment. / Genome Biology 2007, 8:R126. CrossRef
9. Chen Z, McGee M, Liu Q, Scheuermann RH: A distribution free summarization method for Affymetrix GeneChip(R) arrays. / Bioinformatics 2007, 23:321鈥?27. CrossRef
10. Turro E, Bochkina N, Hein A-M, Richardson S: BGX: a Bioconductor package for the Bayesian integrated analysis of Affymetrix GeneChips. / BMC Bioinformatics 2007, 8:439. CrossRef
11. Hochreiter S, Clevert D-A, Obermayer K: A new summarization method for affymetrix probe level data. / Bioinformatics 2006, 22:943鈥?49. CrossRef
12. Irizarry R, Cope L, Wu Z: Feature-level exploration of a published Affymetrix GeneChip control dataset. / Genome Biology 2006, 7:404. CrossRef
13. Dabney A, Storey J: A reanalysis of a published Affymetrix GeneChip control dataset. / Genome Biology 2006, 7:401. CrossRef
14. Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. / Nucleic Acids Res 2002, 30:207鈥?10. CrossRef
15. Human Gene 1.0 ST Array Performance [http://www.affymetrix.com/support/technical/whitepapers/hugene_perf_whitepaper.pdf]
16. Affymetrix Statistical Algorithms Description Document [http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf]
17. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. / Biostatistics 2003, 4:249鈥?64. CrossRef
18. Wu Z, Irizarry RA: Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays. / Journal of Computational Biology 2005, 12:882鈥?93. CrossRef
19. Choe S, Boutros M, Michelson A, Church G, Halfon M: Correspondence: response to Irizarry, Cope and Wu. / Genome Biology 2006, 7:404. CrossRef
20. Peppel J, Kemmeren P, van Bakel H, Radonjic M, van Leenen D, Holstege FC: Monitoring global messenger RNA changes in externally controlled microarray experiments. / EMBO Rep 2003, 4:387鈥?93. CrossRef
21. Hannah MA, Redestig H, Leisse A, Willmitzer L: Global mRNA changes in microarray experiments. / Nat Biotechnol 2008, 26:741鈥?42. CrossRef
22. McClish DK: Analyzing a portion of the ROC curve. / Med Decis Making 1989, 9:190鈥?95. CrossRef
23. Storey JD: The positive false discovery rate: A Bayesian interpretation and the q-value. / Ann Stat 2003, 31:2013鈥?035. CrossRef
24. Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. / Journal of the Royal Statistical Society B 1995, 57:289鈥?00.
25. Choe S, Boutros M, Michelson A, Church G, Halfon M: Correspondence: response to Dabney and Storey. / Genome Biology 2006, 7:401. CrossRef
26. Fodor A, Tickle T, Richardson C: Towards the uniform distribution of null P values on Affymetrix microarrays. / Genome Biology 2007, 8:R69. CrossRef
27. Dobbin KK, Simon RM: Sample size planning for developing classifiers using high-dimensional DNA microarray data. / Biostatistics 2007, 8:101鈥?17. CrossRef
28. Ferreira JA, Zwinderman A: Approximate sample size calculations with microarray data: An illustration. / Stat Appl Genet Mo B 2006, 5:25.
29. Jorstad TS, Langaas M, Bones AM: Understanding sample size: what determines the required number of microarrays for an experiment? / Trends Plant Sci 2007, 12:46鈥?0. CrossRef
30. Jorstad TS, Midelfart H, Bones AM: A mixture model approach to sample size estimation in two-sample comparative microarray experiments. / BMC Bioinformatics 2008, 9:117. CrossRef
31. Lee MLT, Whitmore GA: Power and sample size for DNA microarray studies. / Stat Med 2002, 21:3543鈥?570. CrossRef
32. Pavlidis P, Li QH, Noble WS: The effect of replication on gene expression microarray experiments. / Bioinformatics 2003, 19:1620鈥?627. CrossRef
33. Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A: False discovery rate, sensitivity and sample size for microarray studies. / Bioinformatics 2005, 21:3017鈥?024. CrossRef
34. Wei CM, Li JN, Bumgarner RE: Sample size for detecting differentially expressed genes in microarray experiments. / Bmc Genomics 2004, 5:87. CrossRef
35. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. / J Mol Biol 1990, 215:403鈥?10.
36. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, / et al.: Bioconductor: open software development for computational biology and bioinformatics. / Genome Biol 2004, 5:R80. CrossRef
37. Sing T, Sander O, Beerenwinkel N, Lengauer T: ROCR: visualizing classifier performance in R. / Bioinformatics 2005, 21:3940鈥?941. CrossRef
38. Li C, Hung Wong W: Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. / Genome Biol 2001, 2:RESEARCH0032.
39. Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. / Bioinformatics 2003, 19:185鈥?93. CrossRef
40. Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. / Bioinformatics 2002, 18:S96鈥?04.
41. Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. / Bioinformatics 2001, 17:509鈥?19. CrossRef
42. Smyth GK: Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. / Stat Appl Genet Mo B 2004, 3:3.
43. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. / Proceedings of the National Academy of Sciences 2001, 98:5116鈥?121. CrossRef
44. Liu X, Milo M, Lawrence ND, Rattray M: A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips. / Bioinformatics 2005, 21:3637鈥?644. CrossRef
45. Zhang L, Miles MF, Aldape KD: A model of molecular interactions on short oligonucleotide microarrays. / Nat Biotech 2003, 21:818鈥?21. CrossRef
46. R Development Core Team: R: A Language and Environment for Statistical Computing. 2008.
作者单位：Qianqian Zhu (1) (2) (7)
Jeffrey C Miecznikowski (2) (5)
Marc S Halfon (1) (3) (4) (6)

1. Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY, 14214, USA
2. Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY, 14214, USA
7. Current Address: Center for Human Genome Variation, Duke University, Durham, NC, 27708, USA
5. Department of Biostatistics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA
3. Department of Biology, State University of New York at Buffalo, Buffalo, NY, 14260, USA
4. New York State Center of Excellence in Bioinformatics and the Life Sciences, Buffalo, NY, 14203, USA
6. Department of Molecular and Cellular Biology, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA
ISSN：1471-2105

文摘

Background Concomitant with the rise in the popularity of DNA microarrays has been a surge of proposed methods for the analysis of microarray data. Fully controlled "spike-in" datasets are an invaluable but rare tool for assessing the performance of various methods. Results We generated a new wholly defined Affymetrix spike-in dataset consisting of 18 microarrays. Over 5700 RNAs are spiked in at relative concentrations ranging from 1- to 4-fold, and the arrays from each condition are balanced with respect to both total RNA amount and degree of positive versus negative fold change. We use this new "Platinum Spike" dataset to evaluate microarray analysis routes and contrast the results to those achieved using our earlier Golden Spike dataset. Conclusions We present updated best-route methods for Affymetrix GeneChip analysis and demonstrate that the degree of "imbalance" in gene expression has a significant effect on the performance of these methods.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700