文摘
DNA sequencing is the process of determining the exact order of the nucleotide bases of an individual's genome in order to catalogue sequence variation and understand its biological implications. Whole-genome sequencing techniques produce masses of data in the form of short sequences known as reads. Assembling these reads into a whole genome constitutes a major algorithmic challenge. Most assembly algorithms utilise de Bruijn graphs constructed from reads for this purpose. A critical step of these algorithms is to detect typical motif structures in the graph caused by sequencing errors and genome repeats, and filter them out; one such complex subgraph class is a so-called superbubble . In this paper, we propose an class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397515009147&_mathId=si1.gif&_user=111111111&_pii=S0304397515009147&_rdoc=1&_issn=03043975&md5=d7419cdf8d583cc23353fabcfba59f4c" title="Click to view the MathML source">O(n+m)class="mathContainer hidden">class="mathCode">-time algorithm to detect all superbubbles in a directed acyclic graph with n vertices and m (directed) edges, improving the best-known class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397515009147&_mathId=si2.gif&_user=111111111&_pii=S0304397515009147&_rdoc=1&_issn=03043975&md5=25ad98842bd69916f8718fb36e5bc853" title="Click to view the MathML source">O(mlogm)class="mathContainer hidden">class="mathCode">-time algorithm by Sung et al.