Workflows for microarray data processing in the Kepler environment
详细信息    查看全文
  • 作者:Thomas Stropp (1)
    Timothy McPhillips (2)
    Bertram Lud?scher (2)
    Mark Bieda (1)
  • 刊名:BMC Bioinformatics
  • 出版年:2012
  • 出版时间:December 2012
  • 年:2012
  • 卷:13
  • 期:1
  • 全文大小:908KB
  • 参考文献:1. Bioconductor Annual Report 2011 http://www.bioconductor.org/about/annual-reports/AnnRep2011.pdf
    2. Birney E, Stamatoyannopoulos JA, Dutta A, / et al.: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. / Nature 2007, 447:799-16. CrossRef
    3. Comprehensive genomic characterization defines human glioblastoma genes and core pathways / Nature 2008, (455):1061-068.
    4. Xia X-Q, McClelland M, Porwollik S, Song W, Cong X, Wang Y: WebArrayDB: cross-platform microarray data analysis and public data repository. / Bioinformatics 2009, 25:2425-429. CrossRef
    5. Halling-Brown M, Shepherd AJ: Constructing computational pipelines. / Methods Mol Biol 2008, 453:451-70. CrossRef
    6. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JGR, Korf I, Lapp H, Lehv?slaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E: The Bioperl toolkit: Perl modules for the life sciences. / Genome Res 2002, 12:1611-618. CrossRef
    7. Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. / Genome Biol 2004, 5:R80. CrossRef
    8. Liu T, Ortiz JA, Taing L, Meyer CA, Lee B, Zhang Y, Shin H, Wong SS, Ma J, Lei Y, Pape UJ, Poidinger M, Chen Y, Yeung K, Brown M, Turpaz Y, Liu XS: Cistrome: an integrative platform for transcriptional regulation studies. / Genome Biol 2011, 12:R83. CrossRef
    9. Pelizzola M, Pavelka N, Foti M, Ricciardi-Castagnoli P: AMDA: an R package for the automated microarray data analysis. / BMC Bioinforma 2006, 7:335. CrossRef
    10. Tárraga J, Medina I, Carbonell J, Huerta-Cepas J, Minguez P, Alloza E, Al-Shahrour F, Vegas-Azcárate S, Goetz S, Escobar P, Garcia-Garcia F, Conesa A, Montaner D, Dopazo J: GEPAS, a web-based tool for microarray data analysis and interpretation. / Nucleic Acids Res 2008, 36:W308-W314. CrossRef
    11. Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. / Genome Biol 2010, 11:R86. CrossRef
    12. Curcin V, Ghanem M: Scientific workflow systems - can one size fit all? In Biomedical Engineering Conference, 2008. CIBEC, Cairo International. / IEEE 2008, 2008:1-.
    13. McPhillips T, Bowers S, Zinn D, Lud?scher B: Scientific workflow design for mere mortals. / Futur Gener Comput Syst 2009, 25:541-51. CrossRef
    14. Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P: Taverna: a tool for the composition and enactment of bioinformatics workflows. / Bioinformatics 2004, 20:3045-054. CrossRef
    15. Dinov ID, Torri F, Macciardi F, Petrosyan P, Liu Z, Zamanyan A, Eggert P, Pierce J, Genco A, Knowles JA, Clark AP, Van Horn JD, Ames J, Kesselman C, Toga AW: Applications of the pipeline environment for visual informatics and genomics computations. / BMC Bioinforma 2011, 12:304. CrossRef
    16. Spjuth O, Alvarsson J, Berg A, Eklund M, Kuhn S, M?sak C, Torrance G, Wagener J, Willighagen EL, Steinbeck C, Wikberg JES: Bioclipse 2: a scriptable integration platform for the life sciences. / BMC Bioinforma 2009, 10:397. CrossRef
    17. Martín-Requena V, Ríos J, García M, Ramírez S, Trelles O: jORCA: easily integrating bioinformatics Web Services. / Bioinformatics 2010, 26:553-59. CrossRef
    18. McConnell P, Lin S, Hurban P: / Methods of Microarray Data Analysis V. Springer, New York; 2010.
    19. Beckman Coulter Genomics [http://www.beckmangenomics.com/genomic_services/gene_expression.html]
    20. Li P, Castrillo JI, Velarde G, Wassink I, Soiland-Reyes S, Owen S, Withers D, Oinn T, Pocock MR, Goble CA, Oliver SG, Kell DB: Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data. / BMC Bioinforma 2008, 9:334. CrossRef
    21. Barseghian D, Altintas I, Jones MB, Crawl D, Potter N, Gallagher J, Cornillon P, Schildhauer M, Borer ET, Seabloom EW, Hosseini PR: Workflows and extensions to the Kepler scientific workflow system to support environmental sensor data access and analysis. / Ecological Informatics 2010, 5:42-0. CrossRef
    22. Hartman AL, Riddle S, McPhillips T, Lud?scher B, Eisen JA: Introducing W.A.T.E.R.S.: a workflow for the alignment, taxonomy, and ecology of ribosomal sequences. / BMC Bioinforma 2010, 11:317. CrossRef
    23. Gibas C, Jambeck P: / Developing Bioinformatics Computer Skills. 1st edition. O’Reilly Media, Sebastopol; 2001.
    24. Lud?scher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee EA, Tao J, Zhao Y: Scientific workflow management and the Kepler system. / Concurr Comput: Pract Exper 2005, 18:2006.
    25. myExperiment 2006. http://www.myexperiment.org/
    26. Bieda M, Xu X, Singer MA, Green R, Farnham PJ: Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. / Genome Res 2006, 16:595-05. CrossRef
    27. Zhu LJ, Gazin C, Lawson ND, Pagès H, Lin SM, Lapointe DS, Green MR: ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. / BMC Bioinforma 2010, 11:237. CrossRef
    28. Smyth G: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. / Stat Appl Genet Mol Biol 2004, 3:3.
    29. GFF (General Feature Format) specifications document 2005. [http://www.sanger.ac.uk/resources/software/gff/spec.html]
    30. Wilson CL, Miller CJ: Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis. / Bioinformatics 2005, 21:3683-685. CrossRef
    31. GEO Accession viewer GSE718 [http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE7181]]
    32. Beier D, Hau P, Proescholdt M, Lohmeier A, Wischhusen J, Oefner PJ, Aigner L, Brawanski A, Bogdahn U, Beier CP: CD133(+) and CD133(? glioblastoma-derived cancer stem cells show differential growth characteristics and molecular profiles. / Cancer Res 2007, 67:4010-015. CrossRef
    33. Dudoit S, Gentleman RC, Quackenbush J: Open source software for the analysis of microarray data. / BioTechniques 2003, 45-1.
    34. Hahne F, Huber W, Gentleman R, Falcon S: Bioconductor case studies. / Cancer Res 2008.
    35. Acevedo LG, Bieda M, Green R, Farnham PJ: Analysis of the mechanisms mediating tumor-specific changes in gene expression in human liver tumors. / Cancer Res 2008, 68:2641-651. CrossRef
    36. Cui W, Taub DD, Gardner K: qPrimerDepot: a primer database for quantitative real time PCR. / Nucleic Acids Res 2007, 35:D805-D809. CrossRef
    37. Karolchik D, Hinrichs AS, Kent WJ: The UCSC Genome Browser. / Curr Protoc Hum Genet 2011, 18:18.6.
    38. Yi M, Horton JD, Cohen JC, Hobbs HH, Stephens RM: WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data. / BMC Bioinforma 2006, 7:30. CrossRef
    39. Kamburov A, Cavill R, Ebbels TMD, Herwig R, Keun HC: Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA. / Bioinformatics 2011, 27:2917-918. CrossRef
    40. bioKepler [http://sites.google.com/site/biokepler/]
  • 作者单位:Thomas Stropp (1)
    Timothy McPhillips (2)
    Bertram Lud?scher (2)
    Mark Bieda (1)

    1. Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada
    2. Genome Center, University of California-Davis, Davis, CA, USA
  • ISSN:1471-2105
文摘
Background Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. Results We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. Conclusions We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700