Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis
详细信息    查看全文
  • 作者:Arran K Turnbull (1)
    Robert R Kitchen (1) (2)
    Alexey A Larionov (1)
    Lorna Renshaw (1)
    J Michael Dixon (2)
    Andrew H Sims (1)
  • 刊名:BMC Medical Genomics
  • 出版年:2012
  • 出版时间:December 2012
  • 年:2012
  • 卷:5
  • 期:1
  • 全文大小:1437KB
  • 参考文献:1. Tseng GC, Ghosh D, Feingold E: Comprehensive literature review and statistical considerations for microarray meta-analysis. / Nucleic Acids Res 2012,40(9):3785鈥?799. CrossRef
    2. Lin CY, Strom A, Vega VB, Kong SL, Yeo AL, Thomsen JS, Chan WC, Doray B, Bangarusamy DK, Ramasamy A, / et al.: Discovery of estrogen receptor alpha target genes and response elements in breast tumor cells. / Genome Biol 2004,5(9):R66. CrossRef
    3. Kitchen RR, Sabine VS, Simen AA, Dixon JM, Bartlett JM, Sims AH: Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments. / BMC Genomics 2011,12(1):589. CrossRef
    4. Kitchen RR, Sabine VS, Sims AH, Macaskill EJ, Renshaw L, Thomas JS, van Hemert JI, Dixon JM, Bartlett JM: Correcting for intra-experiment variation in Illumina BeadChip data is necessary to generate robust gene-expression profiles. / BMC Genomics 2010,11(1):134. CrossRef
    5. Sims AH, Smethurst GJ, Hey Y, Okoniewski MJ, Pepper SD, Howell A, Miller CJ, Clarke RB: The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets - improving meta-analysis and prediction of prognosis. / BMC Med Genomics 2008,1(1):42. CrossRef
    6. Sims AH, Bartlett JM: Approaches towards expression profiling the response to treatment. / Breast Cancer Res 2008,10(6):115. CrossRef
    7. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA: Tackling the widespread and critical impact of batch effects in high-throughput data. / Nat Rev Genet 2010,11(10):733鈥?39. CrossRef
    8. Sims AH: Bioinformatics and breast cancer: what can high-throughput genomic approaches actually tell us? / J Clin Pathol 2009,62(10):879鈥?85. CrossRef
    9. Ein-Dor L, Zuk O, Domany E: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. / Proc Natl Acad Sci U S A 2006,103(15):5923鈥?928. CrossRef
    10. Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA, Wang Y: The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. / Nat Rev Cancer 2008,8(1):37鈥?9. CrossRef
    11. Ong KR, Sims AH, Harvie M, Chapman M, Dunn WB, Broadhurst D, Goodacre R, Wilson M, Thomas N, Clarke RB, / et al.: Biomarkers of dietary energy restriction in women at increased risk of breast cancer. / Cancer Prev Res (Phila Pa) 2009,2(8):720鈥?31. CrossRef
    12. Kendall A, Anderson H, Dunbier AK, Mackay A, Dexter T, Urruticoechea A, Harper-Wynne C, Dowsett M: Impact of estrogen deprivation on gene expression profiles of normal postmenopausal breast tissue in vivo. / Cancer Epidemiol Biomarkers Prev 2008,17(4):855鈥?63. CrossRef
    13. Miller WR, Larionov A, Renshaw L, Anderson TJ, Walker JR, Krause A, Sing T, Evans DB, Dixon JM: Gene expression profiles differentiating between breast cancers clinically responsive or resistant to letrozole. / J Clin Oncol 2009,27(9):1382鈥?387. CrossRef
    14. Sabine VS, Sims AH, Macaskill EJ, Renshaw L, Thomas JS, Dixon JM, Bartlett JM: Gene expression profiling of response to mTOR inhibitor everolimus in pre-operatively treated post-menopausal women with oestrogen receptor-positive breast cancer. / Breast Cancer Res Treat 2010,122(2):419鈥?28. CrossRef
    15. Culhane AC, Quackenbush J: Confounding effects in "A six-gene signature predicting breast cancer lung metastasis". / Cancer Res 2009,69(18):7480鈥?485. CrossRef
    16. Zhang Z, Gasser DL, Rappaport EF, Falk MJ: Cross-platform expression microarray performance in a mouse model of mitochondrial disease therapy. / Mol Genet Metab 2010,99(3):309鈥?18. CrossRef
    17. Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P: Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. / Nucleic Acids Res 2005,33(18):5914鈥?923. CrossRef
    18. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, / et al.: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. / Nat Biotechnol 2006,24(9):1151鈥?161. CrossRef
    19. Shen R, Ghosh D, Chinnaiyan AM: Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data. / BMC Genomics 2004,5(1):94. CrossRef
    20. Benito M, Parker J, Du Q, Wu J, Xiang D, Perou CM, Marron JS: Adjustment of systematic microarray data biases. / Bioinformatics 2004,20(1):105鈥?14. CrossRef
    21. Johnson WE, Li C, Rabinovic A: Adjusting batch effects in microarray expression data using empirical Bayes methods. / Biostatistics 2007,8(1):118鈥?27. CrossRef
    22. Shabalin AA, Tjelmeland H, Fan C, Perou CM, Nobel AB: Merging two gene-expression studies via cross-platform normalization. / Bioinformatics 2008,24(9):1154鈥?160. CrossRef
    23. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. / Proc Natl Acad Sci U S A 2001,98(9):5116鈥?121. CrossRef
    24. Miller WR, Larionov A, Krause A, Anderson TJ, Evans DB, Dixon JM: Genes Discriminating between Breast Cancers Responsive or Resistant to the Aromatase Inhibitor. / Letrozole. EJCMO 2010, 2010:2.
    25. Miller WR, Larionov AA, Renshaw L, Anderson TJ, White S, Murray J, Murray E, Hampton G, Walker JR, Ho S, / et al.: Changes in breast cancer transcriptional profiles after treatment with the aromatase inhibitor, letrozole. / Pharmacogenet Genomics 2007,17(10):813鈥?26. CrossRef
    26. Naderi A, Teschendorff AE, Barbosa-Morais NL, Pinder SE, Green AR, Powe DG, Robertson JF, Aparicio S, Ellis IO, Brenton JD, / et al.: A gene-expression signature to predict survival in breast cancer across independent data sets. / Oncogene 2007,26(10):1507鈥?516. CrossRef
    27. Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, Viale G, Delorenzi M, Zhang Y, d'Assignies MS, / et al.: Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. / Clin Cancer Res 2007,13(11):3207鈥?214. CrossRef
    28. Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, / et al.: Supervised risk predictor of breast cancer based on intrinsic subtypes. / J Clin Oncol 2009,27(8):1160鈥?167. CrossRef
    29. Hu Z, Fan C, Oh DS, Marron JS, He X, Qaqish BF, Livasy C, Carey LA, Reynolds E, Dressler L, / et al.: The molecular portraits of breast tumors are conserved across microarray platforms. / BMC Genomics 2006, 7:96. CrossRef
    30. Leong HS, Yates T, Wilson C, Miller CJ: ADAPT: a database of affymetrix probesets and transcripts. / Bioinformatics 2005,21(10):2552鈥?553. CrossRef
    31. Okoniewski MJ, Miller CJ: Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations. / BMC Bioinformatics 2006, 7:276. CrossRef
    32. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, / et al.: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. / Nucleic Acids Res 2005,33(20):e175. CrossRef
    33. Lu X, Zhang X: The effect of GeneChip gene definitions on the microarray study of cancers. / Bioessays 2006,28(7):739鈥?46. CrossRef
    34. Sandberg R, Larsson O: Improved precision and accuracy for microarrays using updated probe set definitions. / BMC Bioinformatics 2007, 8:48. CrossRef
    35. Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, Darot JF, Ritchie ME, Lynch AG, Tavare S: A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. / Nucleic Acids Res 2010,38(3):e17. CrossRef
    36. Fan X, Lobenhofer EK, Chen M, Shi W, Huang J, Luo J, Zhang J, Walker SJ, Chu TM, Li L, / et al.: Consistency of predictive signature genes and classifiers generated using different microarray platforms. / Pharmacogenomics J 2010,10(4):247鈥?57. CrossRef
    37. Rudy J, Valafar F: Empirical comparison of cross-platform normalization methods for gene expression data. / BMC Bioinformatics 2011, 12:467. CrossRef
    38. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD: The sva package for removing batch effects and other unwanted variation in high-throughput experiments. / Bioinformatics 2012,28(6):882鈥?83. CrossRef
    39. Teschendorff AE, Zhuang J, Widschwendter M: Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. / Bioinformatics 2011,27(11):1496鈥?505. CrossRef
    40. McCall MN, Uppal K, Jaffee HA, Zilliox MJ, Irizarry RA: The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. / Nucleic Acids Res 2011., 39. doi:D1011-1015. Database issue
    41. Engreitz JM, Chen R, Morgan AA, Dudley JT, Mallelwar R, Butte AJ: ProfileChaser: searching microarray repositories based on genome-wide patterns of differential expression. / Bioinformatics 2011,27(23):3317鈥?318. CrossRef
    42. Engreitz JM, Morgan AA, Dudley JT, Chen R, Thathoo R, Altman RB, Butte AJ: Content-based microarray search using differential expression profiles. / BMC Bioinformatics 2010, 11:603. CrossRef
    43. Mackay A, Weigelt B, Grigoriadis A, Kreike B, Natrajan R, A'Hern R, Tan DS, Dowsett M, Ashworth A, Reis-Filho JS: Microarray-based class discovery for molecular classification of breast cancer: analysis of interobserver agreement. / J Natl Cancer Inst 2011,103(8):662鈥?73. CrossRef
    44. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, / et al.: Bioconductor: open software development for computational biology and bioinformatics. / Genome Biol 2004,5(10):R80. CrossRef
    45. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. / Proc Natl Acad Sci U S A 1998,95(25):14863鈥?4868. CrossRef
    46. Snedecor GW, Cochran WG: / Statistical Methods. 8th edition. Iowa State Univ Press, Ames, Iowa; 1989:503.
    47. Neter J, Wasserman W: / Kutner MH: Applied Linear Statistical Models, Regression, Analysis of Variance, and Experimental Design, (2nd Edition). Homewood, IL; 1985.
    48. Oberg AL, Mahoney DW: Linear mixed effects models. / Methods Mol Biol 2007, 404:213鈥?34. CrossRef
    49. Kitchen RR, Kubista M, Tichopad A: Statistical aspects of quantitative real-time PCR experiment design. / Methods 2010,50(4):231鈥?36. CrossRef
    50. Tichopad A, Kitchen R, Riedmaier I, Becker C, Stahlberg A, Kubista M: Design and optimization of reverse-transcription quantitative PCR experiments. / Clin Chem 2009,55(10):1816鈥?823. CrossRef
    51. Lindstrom ML, Bates DM: Nonlinear mixed effects models for repeated measures data. / Biometrics 1990,46(3):673鈥?87. CrossRef
    52. Laird NM, Ware JH: Random-effects models for longitudinal data. / Biometrics 1982,38(4):963鈥?74. CrossRef
    53. The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/5/35/prepub
  • 作者单位:Arran K Turnbull (1)
    Robert R Kitchen (1) (2)
    Alexey A Larionov (1)
    Lorna Renshaw (1)
    J Michael Dixon (2)
    Andrew H Sims (1)

    1. Breakthrough Research Unit, University of Edinburgh, Crewe Road South, Edinburgh, EH4 2XR, UK
    2. Department of Molecular Biophysics & Biochemistry and Department of Psychiatry, Yale University School of Medicine, 266 Whitney Ave, New Haven, CT, 06511, USA
  • ISSN:1755-8794
文摘
Background Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis. Results Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets. Conclusion Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700