A space-efficient alphabet-independent Four-Russians' lookup table and a multithreaded Four-Russians' edit distance algorithm
详细信息    查看全文
文摘
Given two strings X   (|X|=m|X|=m) and Y   (|Y|=n|Y|=n) over an alphabet Σ, the edit distance between X and Y   can be computed in O(mn/t)O(mn/t) time with the help of the Four-Russians' lookup table whose block size is t  . The Four-Russians' lookup table can be constructed in O((3|Σ|)2tt2)O((3|Σ|)2tt2) time using O((3|Σ|)2tt)O((3|Σ|)2tt) space. However, the construction time and space requirement of the lookup table grow very fast as the alphabet size increases and thus it has been used only when |Σ||Σ| is very small. For example, when a string is a protein sequence, |Σ|=20|Σ|=20 and thus it is almost impossible to use the Four-Russians' lookup table on typical workstations. In this paper, we present an efficient alphabet-independent Four-Russians' lookup table. It requires O(32t(2t)!t)O(32t(2t)!t) space and can be constructed in O(32t(2t)!t2)O(32t(2t)!t2) time. Thus, the Four-Russians' lookup table can be constructed and used irrespective of the alphabet size. The time and space complexity were achieved by compacting the lookup table using a clever encoding of the preprocessed strings. Experimental results show that the space requirement of the lookup table is reduced to about 1/5,172,030 of its original size when |Σ|=26|Σ|=26 and t=4t=4. Furthermore, we present efficient multithreaded parallel algorithms for edit distance computation using the Four-Russians' lookup table. The parallel algorithm for lookup table construction runs in O(t)O(t) time and the parallel algorithm for edit distance computation between X and Y   runs in O(m+n)O(m+n) time. Experiments performed on CUDA-supported GPU show that our algorithm runs about 942 times faster than the sequential version of the original Four-Russians' algorithm for 100 pairs of random strings of length approximately 1,000 when |Σ|=4|Σ|=4 and t=4t=4.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.