Repairing Functional Dependency Violations in Distributed Data
详细信息    查看全文
  • 作者:Qing Chen (17)
    Zijing Tan (17)
    Chu He (17)
    Chaofeng Sha (17)
    Wei Wang (17)

    17. School of Computer Science Shanghai Key Laboratory of Data Science
    ; Fudan University ; Shanghai ; China
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2015
  • 出版时间:2015
  • 年:2015
  • 卷:9049
  • 期:1
  • 页码:441-457
  • 全文大小:909 KB
  • 参考文献:1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)
    2. Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost based model and effective heuristic for repairing constraints by value modification. In: SIGMOD (2005)
    3. Beskales, G, Ilyas, I, Golab, L, Galiullin, A (2014) Sampling from repairs of conditional functional dependency violations. VLDB Journal 23: pp. 103-128 CrossRef
    4. Beskales, G., Ilyas, I., Golab, L., Galiullin, A.: On the relative trust between inconsistent data and inaccurate constraints. In: ICDE (2013)
    5. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: VLDB (2007)
    6. Chu, X., Ilyas, I., Papotti, P.: Holistic data cleaning: Putting violations into context. In: ICDE (2013)
    7. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. MIT Press (2009)
    8. Chiang, F., Miller, R.: A unified model for data and constraint repair. In: ICDE (2011)
    9. Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I., Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD (2013)
    10. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)
    11. Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. In: TODS 33(2) (2008)
    12. Fan, W., Geerts, F., Ma, S., Muller, H.: Detecting inconsistencies in distributed data. In: ICDE (2010)
    13. Fan, W, Li, J, Ma, S, Tang, N, Yu, W (2012) Towards certain fixes with editing rules and master data. VLDB Journal 21: pp. 213-238 CrossRef
    14. Fan, W, Li, J, Tang, N, Yu, W (2014) Incremental detection of inconsistencies in distributed data. TKDE 26: pp. 1367-1383
    15. Kolahi, S., Lakshmanan, L.: On approximating optimum repairs for functional dependency violations. In: ICDT (2009)
    16. Lynch, N.: Distributed Algorithms. Morgan Kaufmann (1996)
    17. Ozsu, M., Valduriez, P.: Principles of Distributed Database Systems (2nd edition). Prentice-Hall (1999)
    18. Song, S., Cheng, H., Yu, J., Chen, L.: Repairing vertex labels under neighborhood constraints. In: VLDB (2014)
    19. Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: SIGMOD (2014)
    20. Yakout, M., Elmagarmid, A., Neville, J., Ouzzani, M., Ilyas, I.: Guided data repair. In: VLDB (2011)
    21. UIS data generator. http://www.cs.utexas.edu/users/ml/riddle/data.html
  • 作者单位:Database Systems for Advanced Applications
  • 丛书名:978-3-319-18119-6
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
One of the problems central to data consistency is data repairing. Given a database \(D\) violating a set \(\Sigma \) of data dependencies as data quality rules, it aims to modify \(D\) for a new relation \(D'\) satisfying \(\Sigma \) . When \(D\) is a centralized database, a host of methods have been provided to address this problem. In practice, a database may be fragmented and distributed to multiple sites, which is advocated by distributed systems for better scalability and is readily supported by commercial systems. This paper makes a first effort to develop techniques for repairing functional dependency violations in a horizontally partitioned database. (1) Based on a message-passing distributed computing model and two complexity measures (parallel time and data shipment) for distributed algorithms, we study data repairing with equivalence classes in the distributed setting. We show that it is NP-completeto build equivalence classes when the data is horizontally partitioned, and when we aim to minimize either data shipment or parallel computation time. (2) Despite the intractability, we propose efficient distributed algorithms and optimization techniques for data repairing based on equivalence classes. (3) We experimentally verify the effectiveness and efficiency of our algorithms, using both real-life and synthetic data.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700