Affiliation:
1. Università di Bologna, Bologna, Italy
Abstract
In a probabilistic database, deciding if a tuple
u
is
better
than another tuple
v
has not a univocal solution, rather it depends on the specific
Probabilistic Ranking Semantics
(PRS) one wants to adopt so as to combine together tuples' scores and probabilities.
In deterministic databases it is known that skyline queries are a remarkable alternative to (top-
k
) ranking queries, because they remove from the user the burden of specifying a scoring function that combines values of different attributes into a single score. The skyline of a deterministic relation
R
is the set of
undominated
tuples in
R
-- tuple
u
dominates tuple
v
iff on all the attributes of interest
u
is better than or equal to
v
and strictly better on at least one attribute. Domination is equivalent to having
s
(
u
) ≥
s
(
v
) for
all
monotone scoring functions
s
().
The skyline of a probabilistic relation
R
p
can be similarly defined as the set of
P-undominated
tuples in
R
p
, where now
u
P-dominates
v
iff, whatever monotone scoring function one would use to combine the skyline attributes,
u
is reputed better than
v
by the PRS at hand. This definition, which is applicable to arbitrary ranking semantics and probabilistic correlation models, is parametric in the adopted PRS, thus it ensures that ranking and skyline queries will always return consistent results.
In this article we provide an overall view of the problem of computing the skyline of a probabilistic relation. We show how, under mild conditions that indeed hold for all known PRSs, checking P-domination can be cast into an optimization problem, whose complexity we characterize for a variety of combinations of ranking semantics and correlation models. For each analyzed case we also provide specific
P-domination rules
, which are exploited by the algorithm we detail for the case where the probabilistic model is known to the query processor. We also consider the case in which the probability of tuple events can only be obtained through an oracle, and describe another skyline algorithm for this loosely integrated scenario. Our experimental evaluation of P-domination rules and skyline algorithms confirms the theoretical analysis.
Publisher
Association for Computing Machinery (ACM)
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Probabilistic Reverse Top-k Query on Probabilistic Data;Lecture Notes in Computer Science;2023-11-07
2. Finding Best Tuple via Error-prone User Interaction;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04
3. Interactive mining with ordered and unordered attributes;Proceedings of the VLDB Endowment;2022-07
4. Top-K Deep Video Analytics;Proceedings of the 2021 International Conference on Management of Data;2021-06-09
5. Interactive Search for One of the Top-k;Proceedings of the 2021 International Conference on Management of Data;2021-06-09