PEAT: an intelligent and efficient paired-end sequencing adapter trimming algorithm
详细信息    查看全文
  • 作者:Yun-Lung Li (1)
    Jui-Cheng Weng (1)
    Chiung-Chih Hsiao (1)
    Min-Te Chou (1)
    Chin-Wen Tseng (1)
    Jui-Hung Hung (1) (2)

    1. Institute of Bioinformatics and Systems Biology
    ; National Chiao Tung University ; Hsin-Chu ; Taiwan
    2. Department of Biological Science and Technology
    ; National Chiao Tung University ; Hsin-Chu ; Taiwan
  • 刊名:BMC Bioinformatics
  • 出版年:2015
  • 出版时间:December 2015
  • 年:2015
  • 卷:16
  • 期:1
  • 参考文献:1. Fullwood MJaR Y: ChIP-based methods for the identification of long-range chromatin interactions. / J Cell Biochem 2009,107(1):30鈥?9. 10.1002/jcb.22116 CrossRef
    2. Le Hir H: The spliceosome deposits multiple proteins 20鈥?4 nucleotides upstream of mRNA exon-exon junctions. / The EMBO Journal 2000,19(24):6860鈥?869. 10.1093/emboj/19.24.6860 CrossRef
    3. Paired-End Sequencing Achieve maximum coverage across the genome. [http://illumina.com]
    4. FASTX-Toolkit. [http://hannonlab.cshl.edu]
    5. Martin M: Cutadapt removes adapter sequences from high-throughput sequencing reads. / EMBnetjournal 2011,17(1):10鈥?2.
    6. Aronesty E: Comparison of Sequencing Utility Programs. / The Open Bioinformatics Journal 2013, 7:1鈥?. 10.2174/1875036201307010001 CrossRef
    7. Schmieder R, Lim Y, Rohwer F, Edwards R: TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets. / BMC Bioinformatics 2010, 11:341. 10.1186/1471-2105-11-341 CrossRef
    8. Babraham Bioinformatics - Trim Galore! Babraham Bioinformatics - Trim Galore!. [http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/]
    9. Lindgreen S: AdapterRemoval: easy cleaning of next-generation sequencing reads. / BMC Research Notes 2012.
    10. SeqPrep. [https://github.com/jstjohn/SeqPrep]
    11. Bolger AM LM, Usadel B: Trimmomatic: A flexible trimmer for Illumina Sequence Data. / Bioinformatics 2014.
    12. GATK. [http://www.broadinstitute.org/gatk/]
    13. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, / et al.: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. / Genome Res 2010,20(9):1297鈥?303. 10.1101/gr.107524.110 CrossRef
    14. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S: The Sequence Alignment/Map format and SAMtools. / Bioinformatics 2009,25(16):2078鈥?079. 10.1093/bioinformatics/btp352 CrossRef
    15. Zhou X RA: Prevention, diagnosis and treatment of high-throughput sequencing data pathologies. / Molecular Ecology 2014,23(7):1679鈥?700. 10.1111/mec.12680 CrossRef
    16. Criscuolo A, Brisse S: AlienTrimmer removes adapter oligonucleotides with high sensitivity in short-insert paired-end reads. Commentary on Turner (2014) Assessment of insert sizes and adapter content in FASTQ data from NexteraXT libraries. / Front Genet 2014, 5:130. CrossRef
    17. Turner FS: Assessment of insert sizes and adapter content in fastq data from NexteraXT libraries. / Front Genet 2014, 5:5.
    18. Scythe. [https://github.com/vsbuffalo/scythe]
    19. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H: Assessing the accuracy of prediction algorithms for classification: an overview. / Bioinformatics 2000,16(5):412鈥?24. 10.1093/bioinformatics/16.5.412 CrossRef
    20. Langmead BaS SL: Fast gapped-read alignment with Bowtie 2. / Nature Methods 2012,9(4):357鈥?59. 10.1038/nmeth.1923 CrossRef
    21. Guo W, Fiziev P, Yan W, Cokus S, Sun X, Zhang MQ, Chen PY, Pellegrini M: BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data. / BMC Genomics 2013, 14:774. 10.1186/1471-2164-14-774 CrossRef
    22. Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. / Bioinformatics 2009,25(9):1105鈥?111. 10.1093/bioinformatics/btp120 CrossRef
    23. Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB: Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. / Cell 2008,133(6):1106鈥?117. 10.1016/j.cell.2008.04.043 CrossRef
    24. Zhang Y, Liu T, Meyer CA, Eeckhoute J: Model-based analysis of ChIP-Seq (MACS). / Genome Biology 2008.,9(9):
    25. Henikoff JG, Belsky JA, Krassovsky K: Epigenome characterization at single base-pair resolution. / PNAS 2011,108(45):18318鈥?8323. 10.1073/pnas.1110731108 CrossRef
    26. Blake JA, Bult CJ, Eppig JT, Kadin JA: The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse. / Nucleic Acids Research 2014, (42 Database):810鈥?17.
  • 刊物主题:Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms;
  • 出版者:BioMed Central
  • ISSN:1471-2105
文摘
Background In modern paired-end sequencing protocols short DNA fragments lead to adapter-appended reads. Current paired-end adapter removal approaches trim adapter by scanning the fragment of adapter on the 3' end of the reads, which are not competent in some applications. Results Here, we propose a fast and highly accurate adapter-trimming algorithm, PEAT, designed specifically for paired-end sequencing. PEAT requires no a priori adaptor sequence, which is convenient for large-scale meta-analyses. We assessed the performance of PEAT with many adapter trimmers in both simulated and real life paired-end sequencing libraries. The importance of adapter trimming was exemplified by the influence of the downstream analyses on RNA-seq, ChIP-seq and MNase-seq. Several useful guidelines of applying adapter trimmers with aligners were suggested. Conclusions PEAT can be easily included in the routine paired-end sequencing pipeline. The executable binaries and the standalone C++ source code package of PEAT are freely available online.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700