One-Pass Inconsistency Detection Algorithms for Big Data
详细信息    查看全文
  • 关键词:Inconsistency detection ; Big data ; One ; pass algorithm ; Data quality
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9642
  • 期:1
  • 页码:82-98
  • 全文大小:1,301 KB
  • 参考文献:1.Wayne, W.E.: Data quality and the bottom line: achieving business success through a commitment to high quality data. In: TDWI report (2004)
    2.Bohannon, P., Fan, W., Geerts, F., et al.: Conditional functional dependencies for data cleaning. In: ICDE, pp. 746–755 (2007)
    3.Chen, W., Fan, W., Ma, S.: Analyses and validation of conditional dependencies with built-in predicates. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 576–591. Springer, Heidelberg (2009)CrossRef
    4.Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: VLDB, pp. 315–326 (2007)
    5.Fan, W., Geerts, F., Tang, N., et al.: Inferring data currency and consistency for conflict resolution. In: ICDE, pp. 470–481 (2013)
    6.Bohannon, P., Fan, W., Flaster, M., et al.: A cost-based model and effective heuristic for repairing constraints by value modification. In: SIGMOD, pp. 143–154 (2005)
    7.Chu, X., Ilyas, I.F., Papotti, P.: Holistic data cleaning: putting violations into context. In: ICDE, pp. 458–469 (2013)
    8.Kolahi, S., Lakshmanan, L.V.S.: On approximating optimum repairs for functional dependency violations. In: ICDT, pp. 53–62 (2009)
    9.Yakout, M., Elmagarmid, A.K., et al.: Guided data repair. In: PVLDB, pp. 279–289 (2011)
    10.Korn, F., Muthukrishnan, S., Zhu, Y.: Checks and balances: monitoring data quality problems in network traffic databases. In: VLDB, pp. 536–547 (2003)
    11.Xiong, H., Pandey, G., Steinbach, M., et al.: Enhancing data analysis with noise removal. In: TKDE, pp. 304–319 (2006)
    12.Fan, W., Geerts, F.: Foundations of Data Quality Management, Synthesis Lectures on Data Management, pp. 71–82 (2012)
    13.Chiang, F., Miller, R.J.: Discovering data quality rules. In: VLDB, pp. 1166–1177 (2008)
    14.Golab, L., Karloff, H., Korn, F., Srivastava, D., Yu, B.: On generating near-optimal tableaux for conditional functional dependencies. In: VLDB, pp. 1161–1172 (2008)
    15.Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. In: TKDE, pp. 683–698 (2011)
    16.Geerts, F., Mecca, G., Papotti, P., Santoro, D.: The LLUNATIC data-cleaning framework. In: PVLDB, pp. 625–636 (2013)
    17.Bertossi, L., Bravo, L., et al.: The complexity and approximation of fixing numerical attributes in databases under integrity constraints. In: Information Systems, pp. 407–434 (2008)
    18.Fan, W., Li, J., Ma, S., et al.: Towards certain fixes with editing rules and master data. VLDB 3, 173–184 (2010)
    19.Talukder, N., Ouzzani, M., Elmagarmid, A.K., et al.: Detecting inconsistencies in private data with secure function evaluation. Technical report, Purdue University (2011)
    20.Demsky, B., Rinard, M.: Automatic detection and repair of errors in data structures. In: SIGPLAN Notices, pp. 78–95 (2003)
  • 作者单位:Meifan Zhang (19)
    Hongzhi Wang (19)
    Jianzhong Li (19)
    Hong Gao (19)

    19. Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
  • 丛书名:Database Systems for Advanced Applications
  • ISBN:978-3-319-32025-0
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
Data in the real world is often dirty. Inconsistency is an important kind of dirty data. Before repairing inconsistency, we need to detect them first. The time complexities of current inconsistency detection algorithms are super-linear to the size of data and not suitable for big data. For inconsistency detection for big data, we develop an algorithm that detects inconsistency within one-pass scan of the data according to both the functional dependency (FD) and the conditional functional dependency (CFD). We compare our detection algorithm with existing approaches experimentally. Experimental results on real datasets show that our approach could detect inconsistency effectively and efficiently.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700