Title: Incrementally Computing Ordered Answers of Acyclic Conjunctive Queries
1Incrementally Computing Ordered Answers of
Acyclic Conjunctive Queries
Benny Kimelfeld and Yehoshua Sagiv
The Selim and Rachel Benin School of Engineering
and Computer Science
??????????? ?????? ????????
The Hebrew University of Jerusalem
2Introduction
3Order in SQL
Properties of apartments in each city
SELECT DISTINCT A.city, bedrooms, price FROM
Apartments A, Climates C, Distances D WHERE
A.cityC.cityfromCity toCityLondon ORDER BY
avgTemp10-distanceprice/1000 DESC
Order of appearance
In this talk Evaluation of queries with ORDER BY
- Some ORDER BY attributes are projected
- Some are not
4Naïve Evaluation
- 1. Compute w/o projection (FROMWHERE)
- Non-projected tuples are needed for sorting
2. Sort (ORDER BY)
3. Project (SELECT)
- 4. Remove duplicates (DISTINCT)
- Only the first occurrence of each answer is left
5Incremental Evaluation
- An incremental evaluation is needed
- Generate tuples in sorted order
- A small delay between successive tuples
- Frequently,
- Using order indicates how tuples of the result
will be processed by the application - For example, transforming chunks of tuples into
pages of a Web browser - Users phrase queries that return many tuples,
whereas only the first few tuples are actually
needed
How much time is needed to get the next page?
How long does it take to generate the first k
tuples?
Total evaluation time
6The Naïve Evaluation is Inefficient
Generating the whole result before returning the
first (page of) tuples
Many duplicates to eliminate (e.g., many
apartments with similar properties)
7Existing Techniques
- Techniques (e.g., the threshold algorithm) for
minimizing database accesses - Evaluating the query (naively) over increasingly
larger parts of the relations - Duplicate elimination due to projection
- In worst-case scenarios, as inefficient as the
naïve approach - Even for simple queries (e.g., acyclic) and orders
From a theoretical point of view, existing
approaches are heuristics
8The Questions we Addressed
Are there evaluation algorithms that are truly
(i.e., provably) incremental? (or is it necessary
to use heuristics?)
Which are the tractable cases?
9Our Results (Informally)
How long does it take to find just the first
tuple?
For conjunctive queries, that is all that matters!
If your setting allows the first tuple to be
found efficiently, then you can evaluate the
whole query incrementally (with small delays
between tuples)
10The Formal Setting
11Conjunctive Queries
We consider the class of conjunctive queries
Q(u) R1(u1),R2(u2),,Rk(uk)
Each Ri is a relation symbol
Each uj is a list of terms
u is a list of variables from the ujs
A term is either a constant value or a variable
Each Ri(uj) is a conjunct
12The Example as a Conjunctive Query
SELECT DISTINCT A.city, bedrooms, price FROM
Apartments A, Climates C, Distances D WHERE
A.cityC.cityfromCity toCityLondon ORDER BY
avgTemp10-distanceprice/1000 DESC
Q(C,B,P) Apartments(I,C,P,B),
Climates(C,T),Distances(C,London,D)
ORDER BY is not modeled yet
13Homomorphisms
Q(C,B,P) Apartments(I,C,P,B), Climates(C,T),
Distances(C,London,D)
- A homomorphism from the query to the database
- assigns a value to each variable
- All resulting facts are contained in the database
Apartments(51,Leeds,1020,3)
Climates(Leeds,10.1)
Distances(Leeds,London,274)
- An answer is obtained from a homomorphism by
- applying the assignment to the head
(Leeds,3,1020)
14Orders over Homomorphisms
- Order in SQL can be defined over attributes that
are not in the result (not in the SELECT clause) - Hence, for a proper model,
We assume an underlying order ? over the
homomorphisms from the query to the database
(rather than over the answers)
? is reflexive, transitive and total
For example, a lexicographic order
H(X1), then by H(X2), then by,, then by H(Xk)
X1,,Xk are variables from the query (in some
order)
15Orders Defined by Ranking
Orders can be obtained by ranking homomorphisms
H1 ? H2 ? rank(H1) rank(H2)
For example
- Linear combinations / monomials
rank(H) a1H(X1) a2H(X2) akH(Xk) rank(H)
H(X1)m1H(X2)m2H(Xk)mk
rank(H) max( f(H(X1)) , f(H(X2)) ,, f(H(Xk)
) rank(H) min( f(H(X1)) , f(H(X2)) ,, f(H(Xk) )
16Ordering Answers
The order over the answers is the one obtained
from the following (inefficient) process
1. Generate all homomorphisms in a sorted order
2. Obtain an answer from each homomorphism
3. Remove duplicate answers
Only the first occurrence of each answer is left
17The Implied Order on Answers
The goal Generate all the answers in the
implied order
- The process (from the previous slide) defines
how an order over homomorphisms implies an order
over answers
Given an order ? over the homomorphisms and two
answers A1 and A2, A1 ? A2 holds if for each
homomorphism H2 producing A2, there is a
homomorphism H1 producing A1, such that H1 ? H2
In other words, A1 precedes A2 if the best
homomorphism producing A1 is better than the best
homomorphism producing A2
18The Formal Requirement for Efficiency
Yardstick of efficiency Polynomial delay That
is, polynomial time between generating successive
answers, under query-and-data complexity
Top-k Algorithm
19So what is the Problem?
Exponential number of answers
Exponential number of duplicates to eliminate
20Task Formulation
Input
- A database
- A conjunctive query
- An order over the homomorphisms
Goal
Enumerate all the answers in the implied order
Performance
Polynomial delay (under combined complexity)
21Our Results
22Intractable Cases
It is not always possible to obtain an efficient
ranked enumeration, for at least two reasons
- Sometimes, generating any answer is intractable
(regardless of the order) - Non-emptiness of conjunctive queries is
NP-complete - For some ranking functions, finding just the
first tuple is intractable (even if non-emptiness
is tractable)
SELECT FROM R1,R2,,Rn ORDER BY
ABS(R1.A1Rn.An-K)
Cartesian product
(subset sum)
23Acyclic Conjunctive Queries
A conjunctive query is acyclic if the conjuncts
can be placed on some tree T, such that for each
variable X, the conjuncts containing X form a
subtree of T
24Tractability of Acyclic Queries
Recall that for general conjunctive queries,
testing non-emptiness (which is necessary for
efficient incremental evaluation) is intractable
Acyclic conjunctive queries are among the largest
known classes that can be evaluated in polynomial
total time Yannakakis, 1981
25Simplification
- For simplicity of presentation, the next theorem
is less general than the one in the proceedings - In particular, only acyclic conjunctive queries
are considered - Furthermore, we consider orders that are defined
globally over all assignments of variables - And, in particular, on homomorphisms
26Characterizing Incremental Evaluation
Theorem The following two are equivalent for an
order ?, in the case of acyclic conjunctive
queries
1. Given a database and a query, a maximal
homomorphism can be found in polynomial time
2. Given a database and a query, answers can be
enumerated in sorted order with polynomial delay
27Extending the Theorem
- In the proceedings, the theorem is stated for
queries that are more general than just acyclic
conjunctive queries - We only require closure under the (rather
trivial) operation illustrated below
- Furthermore, the order can be defined per
families of databases and queries, rather than on
all assignments - Hence, more general types of orders are possible
28Specific Types of Orders
- Next, we identify orders for which the first
tuple can be computed efficiently, in the case of
acyclic conjunctive queries
By the theorem, for these types of orders,
answers of acyclic conjunctive queries can be
enumerated in sorted order with polynomial delay
29Monotonic Orders
- Intuitively, an order is monotonic if replacing
a part of an assignment with a better part can
only increase the rank - The exact definition is in the proceedings
Lemma monotonic orders satisfy the first
condition of the theorem
Monotonic orders have an efficient ordered
evaluation
30Examples of Monotonic Orders
H(X1), then by H(X2), then by,, then by H(Xk)
- Linear combinations / monomials
rank(H) a1H(X1) a2H(X2) akH(Xk) rank(H)
H(X1)m1H(X2)m2H(Xk)mk
rank(H) max( f(H(X1)) , f(H(X2)) ,, f(H(Xk)
) rank(H) min( f(H(X1)) , f(H(X2)) ,, f(H(Xk) )
31c-Determined Orders
- c is a fixed positive integer
- An order is c-determined if the rank of each
assignment is determined by some c variables - The ranks of two different assignments are not
necessarily determined by the same c variables - Extends the ranking functions used by Cohen
Sagiv, 2005 in the context of ranked full
disjunctions
Lemma c-determined orders satisfy the first
condition of the theorem
c-determined orders have efficient ordered
evaluation
32c-Determined Orders Examples
rank(H) max( f(H(X1)) , f(H(X2)) ,, f(H(Xk) )
- Minimum is not c-determined for every constant c
rank(H) min( f(H(X1)) , f(H(X2)) ,, f(H(Xk) )
- An example of a 3-determined order
H(X1), then by H(X2) / H(X3)
33Proof Techniques (Overview)
Ranked evaluation with polynomial delay can be
obtained by adapting two different techniques
- Iterative Binding of Variables
- Limited to lexicographic and some c-determined
orders - All attributes determining the order must be in
the result - More efficient w.r.t. space usage (does not
collect info.)
- Lawlers Method Mangement Science, 1972
- A general procedure for finding the top-k answers
to discrete optimization problems - Need to fill in missing parts for the specific
setting - Much more general than iterative binding of
variables
34Conclusion
35A Summary
- Evaluation of conjunctive queries with order
- has been considered
- Formal model
- Order over homomorphisms
- Implied order over answers
- Polynomial delay as a yardstick of efficiency
36A Summary (contd)
We have shown that for acyclic conjunctive queries
Finding the first tuple in the given order in
polynomial time
?
Enumerating all answers in sorted order with
polynomial delay
As a corollary, acyclic conjunctive queries have
an efficient ordered evaluation if the order is
either monotonic or c-determined
In the proceedings, the result is extended to
more general queries and orders
37Ongoing and Future Work
- Practical considerations
- Our algorithms require novel optimization
techniques - Implementation of an algorithm for finding the
top answer (the bottleneck of the computation) - Querying XML
- This work has been extended to effective querying
of graph-structured XML by twig joins (Web and
Databases, 2006)
38Thank You.