摘要
Skyline group, also named as combinational skyline or group-based skyline, has attracted more attention recently. The concept of skyline groups is proposed to address the problem in the inadequacy of the traditional skyline to answer queries that need to analyze not only individual points but also groups of points. Skyline group algorithms aim at finding groups of points that are not dominated by any other same-size groups. Although two types of dominance relationship exist between the groups defined in existing works, they have not been compared systematically under the same experimental framework. Thus, practitioners face difficulty in selecting an appropriate definition. Furthermore, the experimental evaluation in most existing works features a weakness,that is, studies only experimented on small data sets or large data sets with small dimensions. For comprehensive comparisons of the two types of definition and existing algorithms, we evaluate each algorithm in terms of time and space on various synthetic and real data sets. We reveal the characteristics of existing algorithms and provide guidelines on selecting algorithms for different situations.
Skyline group, also named as combinational skyline or group-based skyline, has attracted more attention recently. The concept of skyline groups is proposed to address the problem in the inadequacy of the traditional skyline to answer queries that need to analyze not only individual points but also groups of points. Skyline group algorithms aim at finding groups of points that are not dominated by any other same-size groups. Although two types of dominance relationship exist between the groups defined in existing works, they have not been compared systematically under the same experimental framework. Thus, practitioners face difficulty in selecting an appropriate definition. Furthermore, the experimental evaluation in most existing works features a weakness,that is, studies only experimented on small data sets or large data sets with small dimensions. For comprehensive comparisons of the two types of definition and existing algorithms, we evaluate each algorithm in terms of time and space on various synthetic and real data sets. We reveal the characteristics of existing algorithms and provide guidelines on selecting algorithms for different situations.
引文
[1]H.Im and S.Park,Group skyline computation,Inf.Sci.vol.188,no.1,pp.151-169,2012.
[2]C.Li,N.Zhang,N.Hassan,S.Rajasekaran,and G.Das,On skyline groups,in 21st International Conference on Information and Knowledge Management,2012,pp.2119-2123.
[3]Y.Chung,I.Su,and C.Lee,Efficient computation of combinatorial skyline queries,Inf.Syst.,vol.38,no.3,pp.369-387,2013.
[4]J.Liu,L.Xiong,J.Pei,J.Luo,and H.Zhang,Finding pareto optimal groups:Group-based skyline,PVLDB,vol.8,no.13,pp.2086-2097,2015.
[5]N.Zhang,C.Li,N.Hassan,S.Rajasekaran,and G.Das,On skyline groups,IEEE Trans.Knowl.Data Eng.,vol.26,no.4,pp.942-956,2014.
[6]I.Su,Y.Chung,and C.Lee,Top-kcombinatorial skyline queries,in Database Systems for Advanced Applications,15th International Conference,2010,pp.79-93.
[7]H.Zhu,P.Zhu,X.Li,and Q.Liu,Top-k skyline groups queries,in Proceedings of the 20th International Conference on Extending Database Technology,2017,pp.442-445.
[8]X.Guo,H.Li,A.Wulamu,Y.Xie,and Y.Fu,Efficient processing of skyline group queries over a data stream,Tsinghua Science and Technology,vol.21,no.1,pp.29-39,2016.
[9]H.Zhu,P.Zhu,X.Li,and Q.Liu,Computing skyline groups:An experimental evaluation,in Proceedings of the ACM Turing 50th Celebration Conference,2017,pp.1-48.
[10]S.B¨orzs¨onyi,D.Kossmann,and K.Stocker,The skyline operator,in Proceedings of the 17th International Conference on Data Engineering,2001,pp.421-430.
[11]P.Zhang,C.Zhou,P.Wang,B.J.Gao,X.Zhu,and L.Guo,E-tree:An efficient indexing structure for ensemble models on data streams,IEEE Trans.Knowl.Data Eng.,vol.27,no.2,pp.461-474,2015.
[12]D.Kossmann,F.Ramsak,and S.Rost,Shooting stars in the sky:An online algorithm for skyline queries,in Proceedings of 28th International Conference on Very Large Data Bases,2002,pp.275-286.
[13]X.Li,Y.Wang,X.Li,X.Wang,and J.Yu,GDPS:An efficient approach for skyline queries over distributed uncertain data,Big Data Research,vol.1,no.1,pp.23-36,2014.
[14]X.Li,Y.Wang,X.Li,and Y.Wang,Parallel skyline queries over uncertain data streams in cloud computing environments,IJWGS,vol.10,no.1,pp.24-53,2014.
[15]K.C.K.Lee,B.Zheng,H.Li,and W.Lee,Approaching the skyline in Z order,in Proceedings of the 33rd International Conference on Very Large Data Bases,2007,pp.279-290.
[16]J.Chomicki,P.Godfrey,J.Gryz,and D.Liang,Skyline with presorting,in Proceedings of the 19th International Conference on Data Engineering,2003,pp.717-719.
[17]I.Bartolini,P.Ciaccia,and M.Patella,Efficient sort-based skyline evaluation,ACM Trans.Database Syst.,vol.33,no.4,pp.1-31,2008.
[18]S.Zhang,N.Mamoulis,and D.W.Cheung,Scalable skyline computation using object-based space partitioning,in Proceedings of the International Conference on Management of Data,2009,pp.483-494.
[19]J.Lee and S.Hwang,Scalable skyline computation using a balanced pivot selection technique,Inf.Syst.,vol.39,no1,pp.1-21,2014.
[20]J.Pei,B.Jiang,X.Lin,and Y.Yuan,Probabilistic skylines on uncertain data,in Proceedings of the 33rd Internationa Conference on Very Large Data Bases,2007,pp.15-26.
[21]H.Lu,C.S.Jensen,and Z.Zhang,Flexible and efficien resolution of skyline query size constraints,IEEE Trans Knowl.Data Eng.,vol.23,no.7,pp.991-1005,2011.
[22]Y.Yuan,X.Lin,Q.Liu,W.Wang,J.Yu,and Q.Zhang Efficient computation of the skyline cube,in Proceedings of the 31st International Conference on Very Large Data Bases,2005,pp.241-252.
[23]Q.Zhang,P.Ye,X.Lin,and Y.Zhang,Skyline probability over uncertain preferences,in Joint 2013 EDBT/ICDTConferences,2013,pp.395-405.
[24]X.Zhang,G.Li,and J.Feng,Crowd-sourced top-k algorithms:An experimental evaluation,PVLDB,vol.9,no.8,pp.612-623,2016.
[25]S.Chester,D.Sidlauskas,I.Assent,and K.S.B?gh,Scalable parallelization of skyline computation for multicore processors,in 31st IEEE International Conference on Data Engineering,2015,pp.1083-1094.