A Pattern-Based Framework for Addressing Data Representational Inconsistency
详细信息    查看全文
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9877
  • 期:1
  • 页码:395-406
  • 全文大小:610 KB
  • 参考文献:1.Sadiq, S.: Handbook of data quality: Research and practice (2015)
    2.Churches, T., Christen, P., Lim, K., Zhu, J.X.: Preparation of name and address data for record linkage using hidden markov models. BMC Med. Inf. Decis. Making 2(1), 9 (2002)CrossRef
    3.GmbH, A.: Addressdoctor enterprise documentation - informatica (2014)
    4.Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
    5.Türker, C., Gertz, M.: Semantic integrity support in sql: 1999 and commercial (object-) relational database management systems. VLDB J. 10(4), 241–269 (2001)CrossRef MATH
    6.Ceri, S., Cochrane, R., Widom, J.: Practical applications of triggers and constraints: Successes and lingering issues. In: VLDB, pp. 10–14 (2000)
    7.Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. (TODS) 33(2), 94–115 (2008)CrossRef
    8.Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. Proc. VLDB Endowment 3(1–2), 173–184 (2010)CrossRef
    9.Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 457–468 (2014)
    10.Li, G., Zhou, X., Feng, J., Wang, J.: Progressive keyword search in relational databases. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 1183–1186, March 2009
    11.Luo, Y., Wang, W., Lin, X., Zhou, X., Wang, J., Li, K.: Spark2: Top-k keyword query in relational databases. IEEE Trans. Knowl. Data Eng. 23(12), 1763–1780 (2011)CrossRef
    12.Huynh, D.T., Hua, W.: Self-supervised learning approach for extracting citation information on the web. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) APWeb 2012. LNCS, vol. 7235, pp. 719–726. Springer, Heidelberg (2012)CrossRef
    13.Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)CrossRef MATH
    14.Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, techniques, and tools (2006)
    15.Yi, B., Hua, W., Sadiq, S.: Technical report: Pattern-based framework for addressing data representational inconsistency (2016). https://​drive.​google.​com/​folderview?​id=​0B7vhn9TkNVEVYjN​4WWhIclpLdTA&​usp=​sharing
  • 作者单位:Bingyu Yi (16)
    Wen Hua (16)
    Shazia Sadiq (16)

    16. School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia
  • 丛书名:Databases Theory and Applications
  • ISBN:978-3-319-46922-5
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9877
文摘
Data representational inconsistency, where data has diverse formats or structures, is a crucial data quality problem. Existing fixing approaches either target on a specific domain or require massive information from users. In this work, we propose a user-friendly pattern-based framework for addressing data representational inconsistency. Our framework consists of three modules: pattern design, pattern detection, and pattern unification. We identify several challenges in all the three tasks in order to handle an inconsistent dataset both accurately and efficiently. We propose various techniques to tackle these issues, and our experimental results on real-life datasets demonstrate better performance of our proposals compared with existing methods.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700