CVS: Fast cardinality estimation for large-scale data streams over sliding windows
详细信息    查看全文
文摘
Estimating the cardinality of data streams over a sliding window is an important problem in many applications, such as network traffic monitoring, web access log analysis and database. The problem becomes more difficult in large-scale data streams when time and space complexity is taken into account. In this paper, we present a novel randomized data structure to address the problem. The significant contributions are as follows: (1) A space-efficient counter vector sketch (CVS) are proposed, which extends the well-known bitmap sketch to sliding window settings. (2) Based on the CVS, a random update mechanism is introduced, whereby a small fixed number of entries are randomly chosen from CVS in a step and then updated. This means that the update procedure just costs constant time. (3) Furthermore, estimating cardinality by CVS just needs one-pass scan of the data. (4) Finally, a theoretical analysis is given to show the accuracy of CVS-based estimators. Our comprehensive experiments confirm that the CVS-based schema attains high accuracy, and demonstate its time efficiency in comparison with the Timestamp Vector (TSV) and the auxiliary indexing method.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700