Gap Filling as Exact Path Length Problem
详细信息    查看全文
  • 作者:Leena Salmela (5)
    Kristoffer Sahlin (6)
    Veli M盲kinen (5)
    Alexandru I. Tomescu (5)

    5. Helsinki Institute for Information Technology HIIT
    ; Department of Computer Science ; University of Helsinki ; Helsinki ; Finland
    6. Science for Life Laboratory
    ; School of Computer Science and Communication ; KTH Royal Institute of Technology ; Solna ; Sweden
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2015
  • 出版时间:2015
  • 年:2015
  • 卷:9029
  • 期:1
  • 页码:281-292
  • 全文大小:243 KB
  • 参考文献:1. Boetzer, M, Pirovano, W (2012) Toward almost closed genomes with GapFiller. Genome Biology 13: pp. R56 CrossRef
    2. Drezen, E (2014) GATB: genome assembly & analysis tool box. Bioinformatics 30: pp. 2959-2961 CrossRef
    3. Durbin, R., et al.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)
    4. Dyer, ME (1993) A mildly exponential time algorithm for approximating the number of solutions to a multidimensional knapsack problem. Combinatorics, Probability & Computing 2: pp. 271-284 CrossRef
    5. Garey, MR, Johnson, DS (1979) Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York
    6. Gnerre, S (2010) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences 108: pp. 1513-1518 CrossRef
    7. Gurevich, A (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29: pp. 1072-1075 CrossRef
    8. Karp, RM Reducibility among combinatorial problems. In: Miller, RE, Thatcher, JW eds. (1972) Complexity of Computer Computations. Plenum Press, New York, pp. 85-103 CrossRef
    9. Kurtz, S (2004) Versatile and open software for comparing large genomes. Genome Biology 5: pp. R12 CrossRef
    10. Langmead, B (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10: pp. R25 CrossRef
    11. Li, H, Durbin, R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: pp. 1754-1760 CrossRef
    12. Luo, R., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1(18) (2012)
    13. Nadalin, F (2012) GapFiller: a de novo assembly approach to fill the gap within paired reads. BMC Bioinformatics 13: pp. S8 CrossRef
    14. Nyk盲nen, M, Ukkonen, E (2002) The exact path length problem. J. Algorithms 42: pp. 41-53 CrossRef
    15. Pabinger, S (2013) A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics 15: pp. 256-278 CrossRef
    16. Pevzner, PA, Tang, H (2001) Fragment assembly with double-barreled data. Bioinformatics 17: pp. S225-S233 CrossRef
    17. Salzberg, SL (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Research 22: pp. 557-567 CrossRef
    18. Simpson, JT, Durbin, R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Research 22: pp. 549-556 CrossRef
    19. Simpson, J (2009) ABySS: A parallel assembler for short read sequence data. Genome Research 19: pp. 1117-1123 CrossRef
    20. Wetzel, J (2011) Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies. BMC Bioinformatics 12: pp. 95 CrossRef
  • 作者单位:Research in Computational Molecular Biology
  • 丛书名:978-3-319-16705-3
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
One of the last steps in a genome assembly project is filling the gaps between consecutive contigs in the scaffolds. This problem can be naturally stated as finding an \(s\) - \(t\) path in a directed graph whose sum of arc costs belongs to a given range (the estimate on the gap length). Here \(s\) and \(t\) are any two contigs flanking a gap. This problem is known to be NP-hard in general. Here we derive a simpler dynamic programming solution than already known, pseudo-polynomial in the maximum value of the input range. We implemented various practical optimizations to it, and compared our exact gap filling solution experimentally to popular gap filling tools. Summing over all the bacterial assemblies considered in our experiments, we can in total fill 28% more gaps than the best previous tool and the gaps filled by our method span 80% more sequence. Furthermore, the error level of the newly introduced sequence is comparable to that of the previous tools.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700