DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis
详细信息    查看全文
  • 作者:Quanhu Sheng (24) (25)
    Yu Shyr (25) (26)
    Xi Chen (25) (26)

    24. Department of Cancer Biology
    ; Vanderbilt University School of Medicine ; Nashville ; TN ; 37232 ; USA
    25. Center for Quantitative Sciences
    ; Vanderbilt University School of Medicine ; Nashville ; TN ; 37232 ; USA
    26. Department of Biostatistics
    ; Vanderbilt University School of Medicine ; Nashville ; TN ; 37232 ; USA
  • 刊名:BMC Bioinformatics
  • 出版年:2014
  • 出版时间:December 2014
  • 年:2014
  • 卷:15
  • 期:1
  • 全文大小:219 KB
  • 参考文献:1. Edgar, R, Domrachev, M, Lash, AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30: pp. 207-210 CrossRef
    2. Parkinson, H, Sarkans, U, Kolesnikov, N, Abeygunawardena, N, Burdett, T, Dylag, M, Emam, I, Farne, A, Hastings, E, Holloway, E, Kurbatova, N, Lukk, M, Malone, J, Mani, R, Pilicheva, E, Rustici, G, Sharma, A, Williams, E, Adamusiak, T, Brandizi, M, Sklyar, N, Brazma, A (2011) ArrayExpress update鈥揳n archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 39: pp. D1002-D1004 CrossRef
    3. Wheeler, DL, Barrett, T, Benson, DA, Bryant, SH, Canese, K, Chetvernin, V, Church, DM, Dicuccio, M, Edgar, R, Federhen, S, Feolo, M, Geer, LY, Helmberg, W, Kapustin, Y, Khovayko, O, Landsman, D, Lipman, DJ, Madden, TL, Maglott, DR, Miller, V, Ostell, J, Pruitt, KD, Schuler, GD, Shumway, M, Sequeira, E, Sherry, ST, Sirotkin, K, Souvorov, A, Starchenko, G, Tatusov, RL, Tatusova, TA, Wagner, L, Yaschenko, E (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36: pp. D13-D21
    4. Lehmann, BD, Bauer, JA, Chen, X, Sanders, ME, Chakravarthy, AB, Shyr, Y, Pietenpol, JA (2011) Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J Clin Invest 121: pp. 2750-2767 CrossRef
    5. Chen, X, Li, J, Gray, WH, Lehmann, BD, Bauer, JA, Shyr, Y, Pietenpol, JA (2012) TNBCtype: a subtyping tool for triple-negative breast cancer. Cancer Informat 11: pp. 147-156 CrossRef
    6. Sadanandam, A, Lyssiotis, CA, Homicsko, K, Collisson, EA, Gibb, WJ, Wullschleger, S, Ostos, LC, Lannon, WA, Grotzinger, C, Del Rio, M, Lhermitte, B, Olshen, AB, Wiedenmann, B, Cantley, LC, Gray, JW, Hanahan, D (2013) A colorectal cancer classification system that associates cellular phenotype and responses to therapy. Nat Med 19: pp. 619-625 CrossRef
    7. De Sousa, EMF, Wang, X, Jansen, M, Fessler, E, Trinh, A, de Rooij, LP, de Jong, JH, de Boer, OJ, van Leersum, R, Bijlsma, MF, Rodermond, H, van der Heijden, M, van Noesel, CJ, Tuynman, JB, Dekker, E, Markowetz, F, Medema, JP, Vermeulen, L (2013) Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions. Nat Med 19: pp. 614-618 CrossRef
  • 刊物主题:Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms;
  • 出版者:BioMed Central
  • ISSN:1471-2105
文摘
Background Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis. Results We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package. Conclusions Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700