MeSH ORA framework: R/Bioconductor packages to support MeSH over-representation analysis
详细信息    查看全文
  • 作者:Koki Tsuyuzaki (1) (5)
    Gota Morota (2) (3)
    Manabu Ishii (5)
    Takeru Nakazato (4)
    Satoru Miyazaki (1)
    Itoshi Nikaido (5)

    1. Department of Medical and Life Science
    ; Faculty of Pharmaceutical Science ; Tokyo University of Science ; 2641 Yamazaki ; Noda ; 278-8510 ; Chiba ; Japan
    5. Bioinformatics Research Unit
    ; Advanced Center for Computing and Communication ; RIKEN ; 2-1 Hirosawa ; Wako ; 351-0198 ; Saitama ; Japan
    2. Department of Animal Science
    ; University of Nebraska-Lincoln ; Lincoln ; NE ; USA
    3. Department of Animal Sciences
    ; University of Wisconsin-Madison ; Madison ; WI ; USA
    4. Database Center for Life Science (DBCLS)
    ; Research Organization of Information and Systems (ROIS) ; Faculty of Engineering Building 12 ; The University of Tokyo ; 2-11-16 Yayoi ; Bunkyo-ku ; 113-0032 ; Tokyo ; Japan
  • 关键词:MeSH ; Over ; representation analysis ; Enrichment analysis ; Annotation
  • 刊名:BMC Bioinformatics
  • 出版年:2015
  • 出版时间:December 2015
  • 年:2015
  • 卷:16
  • 期:1
  • 全文大小:4,815 KB
  • 参考文献:1. Irizarry, RA, Bolstad, BM, Collin, F, Cope, LM, Hobbs, B, Speed, TP (2003) Summaries of affymetrix genechip probe level data. Nucleic Acid Res. 31: pp. e15 CrossRef
    2. Mardis, ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet. 24: pp. 134-41 CrossRef
    3. Marioni, JC, Mason, CE, Mane, SM, Stephens, M, Gilad, Y (2008) Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18: pp. 1509-17 CrossRef
    4. Wang, L, Feng, Z, Wang, X, Wang, X, Zhang, X (2009) Degseq: an r package for identifying differentially expressed genes from rna-seq data. BMC Bioinformatics 26: pp. 136-8 formatics/btp612" target="_blank" title="It opens in new window">CrossRef
    5. Tarazona, S, Garc铆a-Alcalde, F, Dopazo, J, Ferrer, A, Conesa, A (2003) Differential expression in rna-seq: a matter of depth. Genome Res. 21: pp. 2213-23 CrossRef
    6. Bourgon, R, Gentleman, R, Huber, W (2010) Independent filtering increases detection power for high-throughput experiments. PNAS 107: pp. 9546-51 CrossRef
    7. Tsai, FJ, Yang, CF, Chen, CC, Chuang, LM, Lu, CH, Chang, CT (2010) A genome-wide association study identifies susceptibility variants for type 2 diabetes in han chinese. PLOS Genet. 6: pp. e1000847 CrossRef
    8. Li, M, Atmaca-Sonmez, P, Othman, M, Branham, KEH, Khanna, R, Wade, MS (2008) Cfh haplotypes without the y402h coding variant show strong association with susceptibility to age-related macular degeneration. Nat Genet. 38: pp. 1049-54 CrossRef
    9. Marioni, JC, Mason, CE, Mane, SM, Stephens, M, Gilad, Y (2008) Rna-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18: pp. 1509-17 CrossRef
    10. Auer, PL, Doerge, RW (2011) A two-stage poisson model for testing rna-seq data. Stat Appl Genet Mol Biol. 10: pp. 1-26
    11. Anders, S, Huber, W (2010) Differential expression analysis for sequence count data. Genome Biol. 11: pp. 106 CrossRef
    12. Robinson, MD, McCarthy, DJ, Smyth, GK (2010) edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: pp. 139-40 formatics/btp616" target="_blank" title="It opens in new window">CrossRef
    13. Hardcastle, TJ, Kelly, KA (2010) bayseq: Empirical bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11: pp. 422 CrossRef
    14. Zhou, Y-H, Xia, K, Wright, FA (2011) A powerful and flexible approach to the analysis of rna sequence count data. Bioinformatics 27: pp. 2672-8 formatics/btr449" target="_blank" title="It opens in new window">CrossRef
    15. McCarthy, DJ, Smyth, GK (2009) Testing significance relative to a fold-change threshold is a treat. Bioinformatics 25: pp. 765-71 formatics/btp053" target="_blank" title="It opens in new window">CrossRef
    16. Feng, J, Meyer, CA, Wang, Q, Liu, JS, Shirley, LX, Zhang, Y (2012) Gfold: a generalized fold change for ranking differentially expressed genes from rna-seq data. Bioinformatics 28: pp. 2782-8 formatics/bts515" target="_blank" title="It opens in new window">CrossRef
    17. Troyanskaya, OG, Garber, ME, Brown, PO, Botstein, D, Altman, RB (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18: pp. 1454-61 formatics/18.11.1454" target="_blank" title="It opens in new window">CrossRef
    18. Breitling, R, Armengaud, P, Amtmann, A, Herzyk, P (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 573: pp. 83-92 CrossRef
    19. Yamamoto H, Fujimori T, Sato H, Ishikawa G, Kami K, Ohashi Y. Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis. BMC Bioinformatics. 2014;15(51).
    20. Tusher, VG, Tibshirani, R, Chu, G (2001) Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98: pp. 5116-21 CrossRef
    21. Huang, W, Sherman, BT, Lempicki, RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37: pp. 1-13 CrossRef
    22. Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: Current approaches and outstanding challenges. PLOS Comput. Biol.2012;8(2).
    23. Consortium, TGO (2000) Gene ontology: tool for the unification of biology. Nat Genet. 25: pp. 25-9 CrossRef
    24. Kanehisa, M, Goto, S (2000) Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28: pp. 27-30 CrossRef
    25. Matthews, L, Gopinath, G, Gillespie, M, Caudy, M, Croft, D, de Bono, B (2008) Reactome knowledgebase of biological pathways and processes. Nucleic Acids Res. 37: pp. 619-22 CrossRef
    26. Caspi, R, Altman, T, Dale, JM, Dreher, K, Fulcher, CA, Gilham, F (2010) The metacyc database of metabolic pathways and enzymes and the biocyc collection of pathway/genome databases. Nucleic Acids Res. 38: pp. 473-9 CrossRef
    27. Nishimura, D (2001) Biocarta. Biotech Softw Internet Rep. 2: pp. 117-20 CrossRef
    28. Schriml, LM, Arze, C, Nadendla, S, Chang, YW, Mazaitis, M, Felix, V (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 40: pp. 940-6 CrossRef
    29. Liberzon, A, Subramanian, A, Pinchback, R, Thorvaldsdottir, H, Tamayo, P, Mesirov, JP (2011) Molecular signatures database (msigdb) 3.0. Bioinformatics 27: pp. 1739-40 formatics/btr260" target="_blank" title="It opens in new window">CrossRef
    30. Hosack, DA, Dennis, GJ, Sherman, BT, Lane, HC, Lempicki, PA (2003) Identifying biological themes within lists of genes with ease. Genome Biol. 4: pp. 70 CrossRef
    31. Shahrour, A, Diaz-Uriarte, R, Dopazo, J (2004) Fatigo: a web tool for finding significant associations of gene ontology terms with groups of genes. Bioinformatics 20: pp. 578-80 formatics/btg455" target="_blank" title="It opens in new window">CrossRef
    32. Falcon, S, Gentleman, R (2007) Using gostats to test gene lists for go term association. Bioinformatics 23: pp. 257-8 formatics/btl567" target="_blank" title="It opens in new window">CrossRef
    33. Dahlquist, KD, Salomonis, N, Vranizan, K, Lawlor, SC, Conklin, BR (2002) Genmapp, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet. 31: pp. 19-20 CrossRef
    34. Zeeberg, BR, Feng, W, Wang, G, Wang, MD, Fojo, AT, Sunshine, M (2003) Gominer: a resource for bilogical interpretation of genomic and proteomic data. Genome Biol. 4: pp. 28 CrossRef
    35. Zhong, S, Storch, KF, Lipan, O, Kao, MC, Weitz, CJ, Wong, WH (2004) Gosurfer: a graphical interactive tool for comparative analysis of large gene sets in gene ontology space. Appl Bioinformatics 3: pp. 261-4 CrossRef
    36. DAndrea, D, Grassi, L, Mazzapioda, M, Tramontano, A (2013) Fidea: a server for the functional interpretation of differential expression analysis. Nucleic Acids Res. 41: pp. 84-8 CrossRef
    37. Young, MD, Wakefield, MJ, Smyth, GK, Oshlack, A (2010) Gene ontology analysis for rna-seq: accounting for selection bias. Genome Biol. 11: pp. R14 CrossRef
    38. Glaab, E, Baudot, A, Krasnogor, N, Schneider, R, Valencia, A (2012) Enrichnet: network-based gene set enrichment analysis. Bioinformatics 28: pp. 451-7 formatics/bts389" target="_blank" title="It opens in new window">CrossRef
    39. Draghici, S, Khatri, P, Bhavsar, P, Shah, A, Krawetz, SA, A, TM (2003) Onto-tools, the toolkit of the modern biologist: Onto-express, onto-compare, onto-design, and onto-translate. Nucleic Acids Res. 31: pp. 3775-81 CrossRef
    40. Maere, S, Heymans, K, Kuiper, M (2005) Bingo: a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21: pp. 3448-9 formatics/bti551" target="_blank" title="It opens in new window">CrossRef
    41. Huang, W, Sherman, BT, Lempicki, RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37: pp. 1-13 CrossRef
    42. Khatri, P, Sirota, M, Butte, AJ (2012) Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Comput. Biol. 8: pp. e1002375 CrossRef
    43. Glass K, Girvan M. Annotation enrichment analysis: An altenative method for evluating the functional propertives of gene sets. Sci Rep.2014;4(4191).
    44. Subramanian, A, Tamayo, P, Mootha, V. K, Mukherjee, S, Ebert, B. L, Gillette, M. A (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102: pp. 15545-50 CrossRef
    45. Tipney, H, Hunter, L (2010) An introduction to effective use of enrichment analysis software. Hum Genomics 4: pp. 202-6 CrossRef
    46. Nelson, SJ, Schopen, M, Savage, AG, Schulman, JL, Arluk, N (2004) The mesh translation maintenance system: structure, interface design, and implementation. Stud Health Technol Inform. 107: pp. 67-9
    47. Nakazato, T, Takinaka, T, Mizuguchi, H, Matsuda, H, Bono, H, Asogawa, M (2007) Biocompass: a novel functional inferance tool that utilizes mesh hierarchy to analyze groups of genes. In Silico Biol. 8: pp. 53-61
    48. Nakazato, T, Bono, H, Matsuda, H, Takagi, T (2009) Gendoo: functional profiling of gene and disease features using mesh vocabulary. Nucleic Acids Res. 37: pp. 166-9 CrossRef
    49. Sartor, MA, Ade, A, Wright, Z, States, D, Omenn, GS, Athey, B (2012) Metab2mesh: annotating compounds with medical subject headings. Bioinformatics 28: pp. 1408-10 formatics/bts156" target="_blank" title="It opens in new window">CrossRef
    50. Jani SD, Argraves GL, Barth JL, Argraves WS. Genemesh: a web-based microarray analysis tool for relating differentially expressed genes to mesh terms. BMC Bioinformatics. 2010;11(166).
    51. Benjamini, Y, Hochberg, Y (1995) Controlling the false discovery rate: a practical and powerful approarch to multiple testing. J R Stat Soc B. 57: pp. 289-300
    52. Gentleman, RC, Carey, VJ, Bates, DM, Bolstad, B, Dettling, M, Dudoit, S (2004) Bioconductor: open software development for computational biology and bioformatics. BMC Genome Biol. 5: pp. R80 CrossRef
    53. Meyer, LR, Zweig, AS, Hinrichs, AS, Karolchik, D, Kuhn, RM, Wong, M (2012) The ucsc genome browser database: extensions and updates 2013. Nucleic Acids Res. 41: pp. 64-69 CrossRef
    54. Kawai, J, Shinagawa, A, Shibata, K, Yoshino, M, Itoh, M, Ishii, Y (2001) Functional annotation of a full-length mouse cdna collection. Nature 409: pp. 685-690 CrossRef
    55. Okazaki, Y, Furuno, M, Kasukawa, T, Adachi, J, Bono, H, Kondo, S (2002) Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cdnas. Nature 420: pp. 563-573 CrossRef
    56. Conesa, A, Gotz, S, Garcia-Gomez, J. M, Terol, J, Talon, M, Robles, M (2005) Blast2go: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21: pp. 3674-76 formatics/bti610" target="_blank" title="It opens in new window">CrossRef
    57. Jones, P, Binns, D, Chang, H, Fraser, M, Li, W, McAnulla, C (2014) Interproscan 5: genome-scale protein function classification. Bioinformatics 30: pp. 1236-40 formatics/btu031" target="_blank" title="It opens in new window">CrossRef
    58. Ye Y, Choi J, Tang H. Rapsearch: a fast protein similarity search tool for short reads. BMC Bioinformatics. 2011;12(159).
    59. Altschul, SF, Gish, W, Miller, W, Myers, EW, Lipman, DJ (1990) Basic local alignment search tool. J Mol Biol. 215: pp. 403-10 CrossRef
    60. Quinlan, JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc., Burlington, Massachusetts, US
    61. Storey, JD (2003) The positive false discovery rate: A bayesian interpretation and the q-value. Ann Stat. 31: pp. 2013-35 CrossRef
    62. Storey, JD, Tibshirani, R (2003) Statistical significance for genomewide studies. PNAS 100: pp. 9440-5 CrossRef
    63. Efron, B, Tibshirani, R, Storey, JD, Tusher, V (2001) Empirical bayes analysis of a microarray experiment. J Am Stat Assoc. 96: pp. 1151-60 CrossRef
    64. Efron, B, Tibshirani, R (2002) Empirical bayes methods and false discovery rates for microarrays. Genetic Epidemiol. 23: pp. 70-86 CrossRef
    65. Durinck, S, Spellman, PT, Birney, E, Huber, W (2009) Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart. Nat Protocols 4: pp. 1184-91 CrossRef
    66. Durinck, S, Moreau, Y, Kasprzyk, A, Davis, S, Moor, BD, Brazma, A (2005) Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21: pp. 3439-40 formatics/bti525" target="_blank" title="It opens in new window">CrossRef
    67. Chujo, Y, Fujii, N, Okita, N, Konishi, T, Narita, T, Yamada, A (2013) Caloric restriction-associated remodeling of rat white adipose tissue: effects on the growth hormone/insulin-like growth factor-1 axis, sterol regulatory element binding protein-1, and macrophage infiltration. Age (Dordr) 35: pp. 1143-1156 CrossRef
    68. Konishi T. Three-parameter lognormal distribution uniquitosusly found in cdna microarray data and its application to parametric data treatment. BMC Bioinformatics. 2004;5.
    69. Gallagher, LA, Shendure, J, Manoil, C (2011) Genome-scale identification of resistance functions in pseudomonas aeruginosa using tn-seq. mBio 2: pp. 00315-10 CrossRef
    70. Aravind, S, Pablo, T, Vamsi, KM, Sayan, M, Benjamin, LE, Michael, AG (2005) A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102: pp. 10
    71. Irizarry, RA, Wang, C, Zhou, Y, Speed, TP (2009) Gene set enrichment analysis made simple. Stat Methods Med Res. 18: pp. 565-75 CrossRef
    72. Efron, B, Tibshirani, R (2007) On testing the significance of sets of genes. Annu Appl Stat. 1: pp. 107-129 CrossRef
  • 刊物主题:Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms;
  • 出版者:BioMed Central
  • ISSN:1471-2105
文摘
Background In genome-wide studies, over-representation analysis (ORA) against a set of genes is an essential step for biological interpretation. Many gene annotation resources and software platforms for ORA have been proposed. Recently, Medical Subject Headings (MeSH) terms, which are annotations of PubMed documents, have been used for ORA. MeSH enables the extraction of broader meaning from the gene lists and is expected to become an exhaustive annotation resource for ORA. However, the existing MeSH ORA software platforms are still not sufficient for several reasons. Results In this work, we developed an original MeSH ORA framework composed of six types of R packages, including MeSH.db, MeSH.AOR.db, MeSH.PCR.db, the org.MeSH.XXX.db-type packages, MeSHDbi, and meshr. Conclusions Using our framework, users can easily conduct MeSH ORA. By utilizing the enriched MeSH terms, related PubMed documents can be retrieved and saved on local machines within this framework.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700