Introducing W.A.T.E.R.S.: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences
详细信息    查看全文
  • 作者:Amber L Hartman (1) (3)
    Sean Riddle (2)
    Timothy McPhillips (2)
    Bertram Lud?scher (2)
    Jonathan A Eisen (1)
  • 刊名:BMC Bioinformatics
  • 出版年:2010
  • 出版时间:December 2010
  • 年:2010
  • 卷:11
  • 期:1
  • 全文大小:1342KB
  • 参考文献:1. Pace NR: A molecular view of microbial diversity and the biosphere. / Science 1997,276(5313):734-40. CrossRef
    2. Carney KM, Hungate BA, Drake BG, Megonigal JP: Altered soil microbial community at elevated CO(2) leads to loss of soil carbon. / Proc Natl Acad Sci USA 2007,104(12):4990-995. CrossRef
    3. Carney KM, Matson PA: The influence of tropical plant diversity and composition on soil microbial communities. / Microb Ecol 2006,52(2):226-38. CrossRef
    4. Schloter M, Bach HJ, Metz S, Sehy U, Munch JC: Influence of precision farming on the microbial community structure and functions in nitrogen turnover. / Agriculture, Ecosystems & Environment 2003,98(1-):295-04. CrossRef
    5. Macfarlane S, Steed H, Macfarlane GT: Intestinal bacteria and inflammatory bowel disease. / Crit Rev Clin Lab Sci 2009,46(1):25-4. CrossRef
    6. Stecher B, Hardt WD: The role of microbiota in infectious disease. / Trends Microbiol 2008,16(3):107-14. CrossRef
    7. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, / et al.: A core gut microbiome in obese and lean twins. / Nature 2009,457(7228):480-84. CrossRef
    8. Ley RE, Turnbaugh PJ, Klein S, Gordon JI: Microbial ecology: human gut microbes associated with obesity. / Nature 2006,444(7122):1022-023. CrossRef
    9. Distel DL, Lane DJ, Olsen GJ, Giovannoni SJ, Pace B, Pace NR, Stahl DA, Felbeck H: Sulfur-oxidizing bacterial endosymbionts: analysis of phylogeny and specificity by 16 S rRNA sequences. / J Bacteriol 1988,170(6):2506-510.
    10. Stahl DA, Lane DJ, Olsen GJ, Pace NR: Characterization of a Yellowstone hot spring microbial community by 5 S rRNA sequences. / Appl Environ Microbiol 1985,49(6):1379-384.
    11. Schmidt TM, DeLong EF, Pace NR: Analysis of a marine picoplankton community by 16 S rRNA gene cloning and sequencing. / J Bacteriol 1991,173(14):4371-378.
    12. Wilson KH, Blitchington RB, Greene RC: Amplification of bacterial 16 S ribosomal DNA with polymerase chain reaction. / J Clin Microbiol 1990,28(9):1942-946.
    13. Medlin L, Elwood HJ, Stickel S, Sogin ML: The characterization of enzymatically amplified eukaryotic 16S-like rRNA-coding regions. / Gene 1988,71(2):491-99. CrossRef
    14. Woese CR, Stackebrandt E, Macke TJ, Fox GE: A phylogenetic definition of the major eubacterial taxa. / Syst Appl Microbiol 1985, 6:143-51.
    15. Gutell RR, Weiser B, Woese CR, Noller HF: Comparative anatomy of 16-S-like ribosomal RNA. / Prog Nucleic Acid Res Mol Biol 1985, 32:155-16. CrossRef
    16. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, / et al.: Environmental genome shotgun sequencing of the Sargasso Sea. / Science 2004,304(5667):66-4. CrossRef
    17. DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, Frigaard NU, Martinez A, Sullivan MB, Edwards R, Brito BR, / et al.: Community genomics among stratified microbial assemblages in the ocean's interior. / Science 2006,311(5760):496-03. CrossRef
    18. Grzymski JJ, Carter BJ, DeLong EF, Feldman RA, Ghadiri A, Murray AE: Comparative genomics of DNA fragments from six Antarctic marine planktonic bacteria. / Appl Environ Microbiol 2006,72(2):1532-541. CrossRef
    19. McDade-Ngutter C, Versalovic J, Alexander W, Hubbard VS, Starke-Reed P, Klein M, Raju T, Milner J, Davis C, Pontzer C, / et al.: National Institutes of Health Gastrointestinal Microbiota and Advances in Prebiotic and Probiotic Research conference summary. / Gastroenterology 2009,136(5):1473-475. CrossRef
    20. Ashelford KE, Chuzhanova NA, Fry JC, Jones AJ, Weightman AJ: New screening software shows that most recent large 16 S rRNA gene clone libraries contain chimeras. / Appl Environ Microbiol 2006,72(9):5734-741. CrossRef
    21. Huber T, Faulkner G, Hugenholtz P: Bellerophon: a program to detect chimeric sequences in multiple sequence alignments. / Bioinformatics 2004,20(14):2317-319. CrossRef
    22. Lozupone C, Knight R: UniFrac: a new phylogenetic method for comparing microbial communities. / Appl Environ Microbiol 2005,71(12):8228-235. CrossRef
    23. Bowers S, McPhillips T, Wu M, Lud?scher B: Project Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs. / Data Integration in the Life Sciences 2007, 122-38. CrossRef
    24. Davidson SB, Boulakia SC, Eyal A, Lud?scher B, McPhillips TM, Bowers S, Anand MK, Freire J: Provenance in Scientific Workflow Systems. / IEEE Data Eng Bull 2007,30(4):44-0.
    25. McPhillips T, Bowers S, Zinn D, Ludaescher B: Scientific workflow design for mere mortals. / Future Generation Computer Systems 2009,25(5):541-51. CrossRef
    26. Bowers S, Timothy McPhillips, Sean Riddle, Manish Anand, Bertram Ludaescher: Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life. / International Provenance and Annotation Workshop (IPAW '08). Salt Lake City, Utah 2008.
    27. McPhillips TM, Bowers S: An approach for pipelining nested collections in scientific workflows. / SIGMOD Rec 2005,34(3):12-7. CrossRef
    28. Peplies J, Kottmann R, Ludwig W, Glockner FO: A standard operating procedure for phylogenetic inference (SOPPI) using (rRNA) marker genes. / Syst Appl Microbiol 2008,31(4):251-57. CrossRef
    29. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, / et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. / Appl Environ Microbiol 2009,75(23):7537-541. CrossRef
    30. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, / et al.: QIIME allows analysis of high-throughput community sequencing data. / Nat Methods 7(5):335-36.
    31. Ludascher B, Altintas I, Bowers S, Cummings J, Critchlow T, Deelman E, DeRoure D, Freire J, Goble C, Jones M, / et al.: Scientific Process Automation and Workflow Management. In / Scientific Data Management Edited by: Shoshani A. 2009.
    32. Taylor I, Deelman E, Gannon D: Workflows for e-Science: Scientific Workflows for Grids. / Springer 2006.
    33. Ludascher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, E Lee A, Tao J, Zhao Y: Scientific workflow management and the Kepler system: Research Articles. / Concurr Comput: Pract Exper 2006,18(10):1039-065. CrossRef
    34. The Kepler Project [https://kepler-project.org/]
    35. Zinn D, Bowers S, McPhillips T, Ludascher B: Scientific workflow design with data assembly lines. In / Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science. Portland, Oregon: ACM; 2009.
    36. Nawrocki EP, Kolbe DL, Eddy SR: Infernal 1.0: inference of RNA alignments. / Bioinformatics 2009,25(10):1335-337. CrossRef
    37. Wu D, Hartman A, Ward N, Eisen JA: An automated phylogenetic tree-based small subunit rRNA taxonomy and alignment pipeline (STAP). / PLoS ONE 2008,3(7):e2566. CrossRef
    38. Ashelford KE, Chuzhanova NA, Fry JC, Jones AJ, Weightman AJ: At least 1 in 20 16 S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. / Appl Environ Microbiol 2005,71(12):7724-736. CrossRef
    39. Price MN, Dehal PS, Arkin AP: FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. / Mol Biol Evol 2009,26(7):1641-650. CrossRef
    40. Howe K, Bateman A, Durbin R: QuickTree: building huge Neighbour-Joining trees of protein sequences. / Bioinformatics 2002,18(11):1546-547. CrossRef
    41. Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. / Bioinformatics 2006,22(21):2688-690. CrossRef
    42. Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for the RAxML Web servers. / Syst Biol 2008,57(5):758-71. CrossRef
    43. Chao A: Non-parametric estimation of the number of classes in a population. / Scand J Stat 1984, 11:265-70.
    44. Shannon WWCE: / The Mathematical Theory of Communication. Urbana, Illinois: University of Illinois; 1949.
    45. Saldanha AJ: Java Treeview--extensible visualization of microarray data. / Bioinformatics 2004,20(17):3246-248. CrossRef
    46. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. / Genome Res 2003,13(11):2498-504. CrossRef
    47. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL: Greengenes, a chimera-checked 16 S rRNA gene database and workbench compatible with ARB. / Appl Environ Microbiol 2006,72(7):5069-072. CrossRef
    48. Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent M, Gill SR, Nelson KE, Relman DA: Diversity of the human intestinal microbial flora. / Science 2005,308(5728):1635-638. CrossRef
    49. Hartman AL, Lough DM, Barupal DK, Fiehn O, Fishbein T, Zasloff M, Eisen JA: Human gut microbiome adopts an alternative state following small bowel transplantation. / Proc Natl Acad Sci USA 2009,106(40):17187-7192. CrossRef
    50. Anand MK, Bowers S, Ludascher B: A navigation model for exploring scientific workflow provenance graphs. In / Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science. Portland, Oregon: ACM; 2009.
    51. Freire J, Koop D, Santos E, Silva CT: Provenance for Computational Tasks: A Survey. / Computing in Science and Engg 2008,10(3):11-1. CrossRef
    52. Cole JR, Chai B, Farris RJ, Wang Q, Kulam-Syed-Mohideen AS, McGarrell DM, Bandela AM, Cardenas E, Garrity GM, Tiedje JM: The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data. / Nucleic Acids Res 2007, (35 Database):D169-72.
    53. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glockner FO: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. / Nucleic Acids Res 2007,35(21):7188-196. CrossRef
  • 作者单位:Amber L Hartman (1) (3)
    Sean Riddle (2)
    Timothy McPhillips (2)
    Bertram Lud?scher (2)
    Jonathan A Eisen (1)

    1. Department of Medical Microbiology and Immunology and the Department of Evolution and Ecology, Genome Center, University of California Davis, One Shields Avenue, Davis, CA, 95616, USA
    3. Department of Biology, The Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
    2. Department of Computer Science, Genome Center, University of California Davis, One Shields Avenue, Davis, CA, 95616, USA
  • ISSN:1471-2105
文摘
Background For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly. Results We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-source Kepler system as a platform. Conclusions By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-to-combine tools for asking increasingly complex microbial ecology questions.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700