Efficient computations of gapped string kernels based on suffix kernel
详细信息查看全文 | 推荐本文 |
摘要
In recent years, several approaches to computing the value of p-length gap-weighted kernel have been presented, such as trie-based approach, suffix kernel, and range sum technique. Although other approaches can achieve better performance in some cases, suffix kernel technique is still an efficient approach in this context. In this paper, we present a series of dynamic programming algorithms based on suffix kernel to compute gapped string kernels. Given strings s and t, and a gap penalty λ, all-length gap-weighted kernel can be calculated in time O(|s||t|) with our algorithms.

Moreover, some new string kernels belonging to the family of gapped string kernels are presented, including all-length and p-length match-weighted kernels, and their variants. Based on the suffix kernel technique, we can compute all-length match-weighted kernel in time O(|s||t|), and then p-length kernel in time O(p|s||t|) using the relationship between all-length and p-length kernels. Furthermore, for p-length match-weighted kernel and its variant, a bit-parallel technique is used to reduce the complexity from O(p|s||t|) to O(left ceilingpk/wright ceiling|s||t|), where w is the word size of the machine (e.g. 32 or 64 in practice) and k is determined by the longest matching subsequence of two strings s and t. The empirical results suggest that the suffix kernel technique is an important and useful approach to computing gapped string kernels.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700