Improved Weighted Bloom Filter and Space Lower Bound Analysis of Algorithms for Approximated Membership Querying
详细信息    查看全文
  • 作者:Xiujun Wang (17) (20)
    Yusheng Ji (18)
    Zhe Dang (19)
    Xiao Zheng (17)
    Baohua Zhao (20)

    17. Anhui University of Technology
    ; Ma鈥檃nshan ; China
    20. University of Science and Technology of China
    ; Hefei ; China
    18. National Institute of Informatics
    ; Tokyo ; Japan
    19. Washington State University
    ; Pullman ; USA
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2015
  • 出版时间:2015
  • 年:2015
  • 卷:9050
  • 期:1
  • 页码:346-362
  • 全文大小:333 KB
  • 参考文献:1. Ablayev, F (1996) Lower bounds for one-way probabilistic communication complexity and their application to space complexity. Theoretical Computer Science 157: pp. 139-159 CrossRef
    2. Bar-Yossef, Z., Jayram, T., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity, vol. 68, pp. 702鈥?32. Academic Press Inc. (2004)
    3. Berinde, R, Indyk, P, Cormode, G, Strauss, MJ (2010) Space-optimal heavy hitters with strong error bounds. ACM Transactions on Database Systems (TODS) 35: pp. 26 CrossRef
    4. Bloom, BH (1970) Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13: pp. 422-426 CrossRef
    5. Bonomi, F., Mitzenmacher, M., Panigrah, R., Singh, S., Varghese, G.: Beyond bloom filters: from approximate membership checks to approximate state machines. In: ACM SIGCOMM Computer Communication Review, vol.36, pp. 315鈥?26. ACM (2006)
    6. Bonomi, F, Mitzenmacher, M, Panigrahy, R, Singh, S, Varghese, G An improved construction for counting bloom filters. In: Azar, Y, Erlebach, T eds. (2006) Algorithms 鈥?ESA 2006. Springer, Heidelberg, pp. 684-695 CrossRef
    7. Broder, A, Mitzenmacher, M (2004) Network applications of bloom filters: A survey. Internet Mathematics 1: pp. 485-509 CrossRef
    8. Bruck, J., Gao, J., Jiang, A.: Weighted bloom filter. In: 2006 IEEE International Symposium on Information Theory, pp. 2304鈥?308. IEEE (2006). Extented version in http://www.paradise.caltech.edu/papers/etr072.pdf
    9. Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate membership testers. In: Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, pp. 59鈥?5. ACM (1978)
    10. Chakrabarti, K., Chaudhuri, S., Ganti, V., Xin, D.: An efficient filter for approximate membership checking. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 805鈥?18. ACM (2008)
    11. Chung, F, Lu, L (2006) Concentration inequalities and martingale inequalities: a survey. Internet Mathematics 3: pp. 79-127 CrossRef
    12. Cormode, G, Hadjieleftheriou, M (2010) Methods for finding frequent items in data streams. The VLDB Journal 19: pp. 3-20 CrossRef
    13. Cormode, G, Muthukrishnan, S (2005) What鈥檚 hot and what鈥檚 not: tracking most frequent items dynamically. ACM Transactions on Database Systems (TODS) 30: pp. 249-278 CrossRef
    14. Demaine, ED, L贸pez-Ortiz, A, Munro, JI Frequency estimation of internet packet streams with limited space. In: M枚hring, RH, Raman, R eds. (2002) Algorithms - ESA 2002. Springer, Heidelberg, pp. 348-360 CrossRef
    15. Deng, F., Rafiei, D.: Approximately detecting duplicates for streaming data using stable bloom filters. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 25鈥?6. ACM (2006)
    16. Guo, D, Li, M (2013) Set reconciliation via counting bloom filters. IEEE Transactions on Knowledge and Data Engineering 25: pp. 2367-2380 CrossRef
    17. Guo, D, Liu, Y, Li, X, Yang, P (2010) False negative problem of counting bloom filter. IEEE Transactions on Knowledge and Data Engineering 22: pp. 651-664 CrossRef
    18. Guo, D, Wu, J, Chen, H, Yuan, Y, Luo, X (2010) The dynamic bloom filters. IEEE Transactions on Knowledge and Data Engineering 22: pp. 120-133 CrossRef
    19. Hua, Y, Xiao, B, Veeravalli, B, Feng, D (2012) Locality-sensitive bloom filter for approximate membership query. IEEE Transactions on Computers 61: pp. 817-830 CrossRef
    20. Jayram, T.: Information complexity: a tutorial. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 159鈥?68. ACM (2010)
    21. Karp, RM, Shenker, S, Papadimitriou, CH (2003) A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28: pp. 51-55 CrossRef
    22. Kirsch, A, Mitzenmacher, M Less hashing, same performance: building a better bloom filter. In: Azar, Y, Erlebach, T eds. (2006) Algorithms 鈥?ESA 2006. Springer, Heidelberg, pp. 456-467 CrossRef
    23. Liu, Y., Chen, W., Guan, Y.: Near-optimal approximate membership query over time-decaying windows. In: 2013 Proceedings IEEE, INFOCOM, pp. 1447鈥?455. IEEE (2013)
    24. Metwally, A., Agrawal, D., El Abbadi, A.: Duplicate detection in click streams. In: Proceedings of the 14th International Conference on World Wide Web, pp. 12鈥?1. ACM (2005)
    25. Pagh, R, Rodler, FF (2004) Cuckoo hashing. Journal of Algorithms 51: pp. 122-144 CrossRef
    26. Zhong, M., Lu, P., Shen, K., Seiferas, J.: Optimizing data popularity conscious bloom filters. In: Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing, pp. 355鈥?64. ACM (2008)
  • 作者单位:Database Systems for Advanced Applications
  • 丛书名:978-3-319-18122-6
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
The elements in a large universe \(U\) have different membership likelihoods and query frequencies in many applications. Thus, the number of hash functions assigned to each element of \(U\) in the traditional Bloom filter can be further optimized to minimize the average false positive rate. We propose an improved weighted Bloom filter (IWBF) that assigns an optimal number of hash functions to each element and has a less average false positive rate compared to the weighted Bloom filter. We show a tight space lower bound for any approximated membership querying algorithm that represents a small subset \(S\) of \(U\) and answers membership queries with predefined false positive rates, when the query frequencies and membership likelihoods of the elements in \(U\) are known. We also provide an approximate space lower bound for approximated membership querying algorithms that have an average false positive rate, and show that the number of bits used in IWBF is within a factor of \(1.44\) of the approximate space lower bound.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700