Finding the active genes in deep RNA-seq gene expression studies
详细信息    查看全文
  • 作者:Traver Hart (1)
    H Kiyomi Komori (2)
    Sarah LaMere (2)
    Katie Podshivalova (2)
    Daniel R Salomon (2)
  • 刊名:BMC Genomics
  • 出版年:2013
  • 出版时间:December 2013
  • 年:2013
  • 卷:14
  • 期:1
  • 全文大小:418 KB
  • 参考文献:1. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, / et al.: A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. / Science 2008,321(5891):956-60. CrossRef
    2. Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. / Genome biology 2009,10(3):R25. CrossRef
    3. Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. / Bioinformatics 2009,25(9):1105-111. CrossRef
    4. Labaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP: Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. / Bioinformatics 2011,27(13):i383-91. CrossRef
    5. Toung JM, Morley M, Li M, Cheung VG: RNA-sequence analysis of human B-cells. / Genome research 2011,21(6):991-98. CrossRef
    6. Mercer TR, Gerhardt DJ, Dinger ME, Crawford J, Trapnell C, Jeddeloh JA, Mattick JS, Rinn JL: Targeted RNA sequencing reveals the deep complexity of the human transcriptome. / Nature biotechnology 2012,30(1):99-04. CrossRef
    7. van Bakel H, Nislow C, Blencowe BJ, Hughes TR: Most "dark matter" transcripts are associated with known genes. / PLoS Biol 2010,8(5):e1000371. CrossRef
    8. Hebenstreit D, Fang M, Gu M, Charoensawan V, van Oudenaarden A, Teichmann SA: RNA sequencing reveals two major classes of gene expression levels in metazoan cells. / Mol Syst Biol 2011, 7:497. CrossRef
    9. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. / Nat Methods 2008,5(7):621-28. CrossRef
    10. Nagaraj N, Wisniewski JR, Geiger T, Cox J, Kircher M, Kelso J, Paabo S, Mann M: Deep proteome and transcriptome mapping of a human cancer cell line. / Mol Syst Biol 2011, 7:548. CrossRef
    11. Ramskold D, Wang ET, Burge CB, Sandberg R: An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. / PLoS Comput Biol 2009,5(12):e1000598. CrossRef
    12. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. / Nature biotechnology 2010,28(5):511-15. CrossRef
    13. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, / et al.: GENCODE: the reference human genome annotation for The ENCODE Project. / Genome research 2012,22(9):1760-774. CrossRef
    14. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, / et al.: Mapping and analysis of chromatin state dynamics in nine human cell types. / Nature 2011,473(7345):43-9. CrossRef
    15. Lappalainen T, Sammeth M, Friedl?nder MR, 't Hoen PA, Monlong J, Rivas MA, Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, / et al.: Transcriptome and genome sequencing uncovers functional variation in humans. / Nature 2013,501(7468):506-11. CrossRef
    16. Standards, Guidelines and Best Practices for RNA-Seq http://encodeproject.org/ENCODE/protocols/dataStandards/ENCODE_RNAseq_Standards_V1.0.pdf
    17. Soneson C, Delorenzi M: A comparison of methods for differential expression analysis of RNA-seq data. / BMC bioinformatics 2013, 14:91. CrossRef
    18. Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. / BMC bioinformatics 2011, 12:323. CrossRef
    19. McLendon R, Friedman A, Bigner D, Van Meir EG, Brat DJ, Mastrogianakis GM, Olson JJ, Mikkelsen T, Lehman N, Aldape K, / et al.: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. / Nature 2008,455(7216):1061-068. CrossRef
    20. Head SR, Komori HK, Hart GT, Shimashita J, Schaffer L, Salomon DR, Ordoukhanian PT: Method for improved Illumina sequencing library preparation using NuGEN Ovation RNA-Seq System. / Biotechniques 2011,50(3):177-80.
    21. Jothi R, Cuddapah S, Barski A, Cui K, Zhao K: Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. / Nucleic Acids Res 2008,36(16):5221-231. CrossRef
  • 作者单位:Traver Hart (1)
    H Kiyomi Komori (2)
    Sarah LaMere (2)
    Katie Podshivalova (2)
    Daniel R Salomon (2)

    1. Donnelly Centre, Banting & Best Department of Medical Research, University of Toronto, Toronto, Canada
    2. Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, USA
  • ISSN:1471-2164
文摘
Background Early application of second-generation sequencing technologies to transcript quantitation (RNA-seq) has hinted at a vast mammalian transcriptome, including transcripts from nearly all known genes, which might be fully measured only by ultradeep sequencing. Subsequent studies suggested that low-abundance transcripts might be the result of technical or biological noise rather than active transcripts; moreover, most RNA-seq experiments did not provide enough read depth to generate high-confidence estimates of gene expression for low-abundance transcripts. As a result, the community adopted several heuristics for RNA-seq analysis, most notably an arbitrary expression threshold of 0.3 - 1 FPKM for downstream analysis. However, advances in RNA-seq library preparation, sequencing technology, and informatic analysis have addressed many of the systemic sources of uncertainty and undermined the assumptions that drove the adoption of these heuristics. We provide an updated view of the accuracy and efficiency of RNA-seq experiments, using genomic data from large-scale studies like the ENCODE project to provide orthogonal information against which to validate our conclusions. Results We show that a human cell’s transcriptome can be divided into active genes carrying out the work of the cell and other genes that are likely the by-products of biological or experimental noise. We use ENCODE data on chromatin state to show that ultralow-expression genes are predominantly associated with repressed chromatin; we provide a novel normalization metric, zFPKM, that identifies the threshold between active and background gene expression; and we show that this threshold is robust to experimental and analytical variations. Conclusions The zFPKM normalization method accurately separates the biologically relevant genes in a cell, which are associated with active promoters, from the ultralow-expression noisy genes that have repressed promoters. A read depth of twenty to thirty million mapped reads allows high-confidence quantitation of genes expressed at this threshold, providing important guidance for the design of RNA-seq studies of gene expression. Moreover, we offer an example for using extensive ENCODE chromatin state information to validate RNA-seq analysis pipelines.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700