Answering skyline queries on probabilistic data using the dominance of probabilistic skyline tuples
详细信息    查看全文
文摘
Although skyline queries are very useful in such areas such as decision support, market analysis and personalized services, they have not been extensively studied in the context of uncertain data. The existing work on answering probabilistic skyline queries either requires a user to define a threshold (Pei et al., 2007), or return all probabilistic skyline objects (Atallah and Qi, 2009). However, it is difficult to set the threshold because if set too high, important results may be lost, but if set too low or if there is no threshold, a lot of low quality results may be returned (Hua et al., 2011; Le and Cao, 2012; Le et al., 2013) [17]. In this paper, we identify two main challenges in answering probabilistic skyline queries. The first is defining what are the interesting probabilistic skyline tuples to return to the users. The second is efficiently finding these tuples without enumerating all possible worlds. We overcome the first challenge by introducing the bestpro-skyline query, which extends the dominance principle to also include the skyline probability of the probabilistic skyline tuples. This approach results in pruning the result set to just a very small number of the most interesting probabilistic skyline tuples without the need to set any user-defined threshold. We overcome the second challenge by using formulas based on the probabilistic theory to directly calculate the skyline probabilities without considering any possible worlds and develop algorithms to prune the search space. Experiments show that our solution is able to find the 17 interesting probabilistic skyline tuples from 13,095 tuples within 19 s in a real data set. Our solution outperforms a Naïve solution by up to three orders of magnitude for computational time.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700