Linear-size suffix tries
详细信息    查看全文
文摘
Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w   of length class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si1.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=d2c419a27e9f8065d7299b2454cae023" title="Click to view the MathML source">n=|w|class="mathContainer hidden">class="mathCode">n=|w|, a suffix tree for w   takes class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si2.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=670da1e52821dcfde6a8859ed1ebc55a" title="Click to view the MathML source">O(n)class="mathContainer hidden">class="mathCode">O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w   (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si3.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=0d59c7467904764f7bf168f29aae0203" title="Click to view the MathML source">Θ(n2)class="mathContainer hidden">class="mathCode">Θ(n2) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si2.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=670da1e52821dcfde6a8859ed1ebc55a" title="Click to view the MathML source">O(n)class="mathContainer hidden">class="mathCode">O(n) nodes. We present the linear-size suffix trie  , which guarantees class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si2.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=670da1e52821dcfde6a8859ed1ebc55a" title="Click to view the MathML source">O(n)class="mathContainer hidden">class="mathCode">O(n) nodes. We use a new technique for reducing the number of unary nodes to class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si2.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=670da1e52821dcfde6a8859ed1ebc55a" title="Click to view the MathML source">O(n)class="mathContainer hidden">class="mathCode">O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p   of length class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si4.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=afe94764598f6221c831087a4cd5249d" title="Click to view the MathML source">m=|p|class="mathContainer hidden">class="mathCode">m=|p| occurs in w   in class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si5.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=c9c3e5301095bba31718d9b27e441ef9" title="Click to view the MathML source">O(mlog⁡|Σ|)class="mathContainer hidden">class="mathCode">O(mlog|Σ|) time and we can find the longest common substring of two strings class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si6.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=7edd7f1e0408b19bccb30f006d189a27" title="Click to view the MathML source">w1class="mathContainer hidden">class="mathCode">w1 and class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si7.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=38c74ad882066c6d8135f9b1eea7a98d" title="Click to view the MathML source">w2class="mathContainer hidden">class="mathCode">w2 in class="mathmlsrc">class="formulatext stixSupport mathImg" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0304397516300305&_mathId=si8.gif&_user=111111111&_pii=S0304397516300305&_rdoc=1&_issn=03043975&md5=e2be16e406b7a51125a05fa50efcfabb" title="Click to view the MathML source">O((|w1|+|w2|)log⁡|Σ|)class="mathContainer hidden">class="mathCode">O((|w1|+|w2|)log|Σ|) time for an alphabet Σ.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700