CONTOUR: an efficient algorithm for discovering discriminating subsequences

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

CONTOUR: an efficient algorithm for discovering discriminating subsequences

详细信息

推荐本文 |

作者：Jianyong Wang ; Yuzhou Zhang ; Lizhu Zhou ; George Karypis and Charu C. Aggarwal
关键词：Sequence mining ; Discriminating subsequence ; Summarization subsequence ; Clustering
刊名：Data Mining and Knowledge Discovery
出版时间：February, 2009
出版年：2009
期刊代码：24_13845810
类别：med
卷：18
期：1
页码：1-29
数据来源：sp

摘要

In recent years we have witnessed several applications of frequent sequence mining, such as feature selection for protein sequence classification and mining block correlations in storage systems. In typical applications such as clustering, it is not the complete set but only a subset of discriminating frequent subsequences which is of interest. One approach to discovering the subset of useful frequent subsequences is to apply any existing frequent sequence mining algorithm to find the complete set of frequent subsequences. Then, a subset of interesting subsequences can be further identified. Unfortunately, it is very time consuming to mine the complete set of frequent subsequences for large sequence databases. In this paper, we propose a new algorithm, CONTOUR, which efficiently mines a subset of high-quality subsequences directly in order to cluster the input sequences. We mainly focus on how to design some effective search space pruning methods to accelerate the mining process and discuss how to construct an accurate clustering algorithm based on the result of CONTOUR. We conducted an extensive performance study to evaluate the efficiency and scalability of CONTOUR, and the accuracy of the frequent subsequence-based clustering algorithm.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700