The similarity-aware relational database set operators

详细信息查看全文

作者：Wadha J. Al Marri^a ; ^{200450064@student.qu.edu.qa" class="auth_mail" title="E-mail the corresponding author} ; Qutaibah Malluhi^a ; ^{qmalluhi@qu.edu.qa" class="auth_mail" title="E-mail the corresponding author} ; Mourad Ouzzani^b ; ^{mouzzani@qf.org.qa" class="auth_mail" title="E-mail the corresponding author} ; Mingjie Tang^c ; ^{tang49@purdue.edu" class="auth_mail" title="E-mail the corresponding author} ; Walid G. Aref^c ; ^{aref@cs.purdue.edu" class="auth_mail" title="E-mail the corresponding author}
关键词：Similarity query processing ; Relational databases ; Set operators
刊名：Information Systems
出版年：2016
出版时间：July 2016
年：2016
卷：59
期：Complete
页码：79-93
全文大小：1426 K

文摘

Identifying similarities in large datasets is an essential operation in several applications such as bioinformatics, pattern recognition, and data integration. To make a relational database management system similarity-aware, the core relational operators have to be extended. While similarity-awareness has been introduced in database engines for relational operators such as joins and group-by, little has been achieved for relational set operators, namely Intersection, Difference, and Union. In this paper, we propose to extend the semantics of relational set operators to take into account the similarity of values. We develop efficient query processing algorithms for evaluating them, and implement these operators inside an open-source database system, namely PostgreSQL. By extending several queries from the TPC-H benchmark to include predicates that involve similarity-based set operators, we perform extensive experiments that demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700