Title: Answering Queries using views: A survey
1Answering Queries using views A survey
2The problem
- We have a schema A and a set of materialized
views V1, V2,, Vn - We are given a query Q over schema A
- Can we answer Q using another query Q referring
only to (one or more) views? - Depending on the specific context
- Schema A can be either virtual or materialized
- Q may be required to be an equivalent rewriting
or a maximally contained rewriting of Q
3Equivalent and Containment rewritings
- Query Containment Q1 is contained in Q2 (Q1?Q2),
if for all db instances D, Q1(D)?Q2(D) - Query Containment Q1 is equivalent to Q2 if
Q1?Q2 and Q2 ? Q1 - Let A a schema and VV1, V2, , Vm a set of
views over A and Q a query over A - Q is a equivalent rewriting of Q, if
- Q refers only to views in V
- Q is equivalent to Q
- Q is a maximally contained rewriting of Q, if
- Q refers only to views in V
- Q ? Q
- There is no other rewriting Q1, such as Q ? Q1 ?
Q
4When the problem arises?
- Data integration
- The sources are described as views over the
mediated schema - Data warehouse
- The schema of the warehouse can be defined as
views over the schemata of the sources - Query Optimization
- Computing the query using previously materialized
views - semantic caching
- Database design
- Independence of the physical view of the data and
logical view - Rewriting of a query over the logical schema into
a query over the physical view
5The variations of the problem
6A running example
- Suppose we have the following schema
- Student(st_id, st_name, dp_id)
- Professor(pr_id, pr_name)
- Department(dp_id, dp_description)
- Course(cr_id, cr_title, cr_semester)
- Prof_Dep(pr_id, dp_id, university)
- Student_course(st_id, cr_id)
- Prof_course(pr_id, cr_id)
-
-
7Data Integration
Mediator
Student(st_id, st_name, dp_id) Professor(pr_id,
pr_name) Department(dp_id, dp_description) Course(
cr_id, cr_title, cr_semester, cr_field) Prof_Dep(p
r_id, dp_id, university) Student_course(st_id,
cr_id) Prof_course(pr_id, cr_id)
Create View AuebProfs As Select cr_title,
pr_name From Course, Prof_course, Professor Where
Course.cr_id Prof_course.cr_id and
Prof_course.pr_id Professor.pr_id and
Course.univ AUEB
Create View DBProfs As Select cr_title,
pr_name From Course, Prof_course, Professor Where
Course.cr_id Prof_course.cr_id and
Prof_course.pr_id Professor.pr_id and
Course.cr_field DB
Source 1 Professors Of Databases
Source 1 Professors Of AUEB
8Data Integration
Mediator
Student(st_id, st_name, dp_id) Professor(pr_id,
pr_name) Department(dp_id, dp_description) Course(
cr_id, cr_title, cr_semester, cr_field) Prof_Dep(p
r_id, dp_id, university) Student_course(st_id,
cr_id) Prof_course(pr_id, cr_id)
Query Q Select pr_name From Professor
Query Q Select pr_name From DBProfs Union Select
pr_name From AUEBProfs
Source 1 Professors Of Databases
Source 1 Professors Of AUEB
Q is a maximally-contained Rewriting of
Q (professors of Algorithms in the university of
Piraeus cannot be retrieved)
9Data Integration
Mediator
Query P Select pr_name From Course, Prof_course,
Professor Where Course.cr_id Prof_course.cr_id
and Prof_course.pr_id Professor.pr_id and
Course.cr_field DB And Course.univ AUEB
Student(st_id, st_name, dp_id) Professor(pr_id,
pr_name) Department(dp_id, dp_description) Course(
cr_id, cr_title, cr_semester, cr_field) Prof_Dep(p
r_id, dp_id, university) Student_course(st_id,
cr_id) Prof_course(pr_id, cr_id)
Query P Select pr_name From DBProfs,
AUEBProfs Where DBProfs.pr_name
AUEBProfs.pr_name
Source 1 Professors Of Databases
Source 1 Professors Of AUEB
P is an Equivalent Rewriting of P
10Query Optimization
- Suppose that we have a database with the schema
of our running example - Suppose that we have the following materialized
views - create view StudentCoursesView As
- Select st_name, cr_id
- from Student, Stud_course
- where Student.st_id Stud_course.st_id
- create view ProfCoursesView As
- Select pr_name, pr_id
- from Professor, Prof_course
- where Professor.pr_id Prof_course.pr_id
11Query Optimization
- Suppose we are given the query
- Select st_name, prof_name
- from Student, Stud_course, Prof_course, Professor
- where Student.st_id Stud_course.cr_id
- and Stud_course.cr_id Prof_course.cr_id
- and Prof_course.pr_id Professor.pr_id
12Query Optimization
- We can exploit the materialized views in order to
eliminate joins - There are many possible combinations
- One could be
- Select st_name, pr_name
- From StudentCoursesView, ProfessorCoursesView
- Where StudentCoursesView.cr_id
- ProfessorCoursesView.cr_id
13Query Optimization
- But
- Our aim here is not just to find an equivalent
rewriting - We have to find all rewritings and evaluate their
costs in order to select the cheapest plan - In some cases views may not help due to loss of
indexes - Therefore, we need to make a cost-based rewriting
- Similar case is the semantic caching
- Client-server architecture
- Query results can be temporary maintained as
relations in the client side to be exploited for
the evaluation of future queries
14A common question When a view V is useful for a
query Q?
- There must be a mapping µ from the relations of V
to the relations of Q - If so, if R1 is both in V and Q then
- If R1 participates in a join in Q with relation
R2, then - Either R1 and R2 must be joined in the same
attributes with the same conditions in V - Or R1 and R2 must be joined in V in the same
attributes but with weaker conditions than those
in Q, provided that the join attributes are
projected in V - Or only R1 is in V and the attributes of R1
participating in the join in Q, are projected in
V - If Q applies a selection predicate in an
attribute A of R1, then V - Either V must apply the same selection predicate
in attribute A of R1 - Or V must apply a weaker selection predicate,
provided that attribute A is projected in V - Or does not apply a predicate in attribute A of
R1 at all and attribute A is projected in V - V must project at least all attributes projected
in Q
15Examples
- Q Select st_name, prof_name
- from Student, Stud_course, Prof_course, Professor
- where Student.st_id Stud_course.st_id
- and Stud_course.cr_id Prof_course.cr_id
- and Prof_course.pr_id Professor.pr_id
- and Student.st_year gt 2000
V1 Select st_name from Student where
st_yeargt2000
V1 Select st_name, cr_name from Student,
Stud_course, Course where Student.st_id
Stud_course.st_id and Stud_course.cr_id
Course.cr_id
V1 Select st_name, st_id from Student
where st_yeargt 2000
V1 Select st_name, st_year, cr_name, cr_id
from Student, Stud_course, Course where
Student.st_id Stud_course.st_id and
Stud_course.cr_id Course.cr_id
V1 Select st_name, st_id from Student
where st_year gt1996
V1 Select st_name, st_id, ma from
Student where am gt1996
16Algorithms for finding query rewritings using
views
- In the query optimization context
- A variation of the Selinger Algorithm
- Cost-based equivalent rewritings
- In the data integration context
- The bucket algorithm
- The inverse-rules algorithm
- The MiniCon algorithm
Maximally-contained rewritings
17Query Optimization the Selinger Algorithm
- Recall the Selinger Algorithm chooses the best
join order - A bottom-up dynamic programming algorithm
- Each iteration explores all join combinations of
certain length - For each combination of joins, it keeps only the
cheapest plan, taking into account costs
estimated in the previous iteration
18A variation of the Selinger Algorithm
- The Selinger Algorithm can be modified in order
to choose the cheapest equivalent rewriting - We start with views that seem useful for a
certain query - We continue in a bottom-up fashion exploring all
join combinations - For each combination we prune plans estimated to
be more expensive or less contributing than others
19A variation of the Selinger Algorithm
- First Iteration
- Find all views that are relevant (useful) to the
query - Choose the best access path for each view and
evaluate its cost - Second Iteration
- Find all pair combinations
- For each pair find all possible Joins in all
possible attributes that lead to useful plans - Plans that are complete (i.e. are sufficient to
answer the query) are distinguished as final
candidates - For the rest plans do the following
- Compare pairs of plans
- For each pair (p, p), if p is cheaper and has
greater or equal contribution (covers more
relations), then p is pruned - Third Iteration
- Find all possible pairs of plans derived from the
previous iteration and the plans derived from the
first iteration -
20A variation of the Selinger Algorithm
- Query R1 R2 R3 R4
- Views V1R1 R2 , V2R2 R3 R4
- V3R2 R3, V4R3 R4,
(V1-V3)-V2 (V1-V3)-V4
V1-V2 V1-V3 V1-V4 V2-V3 V2-V4
V3-V4
V1 V2 V3 V4
21Algorithms in the Data Integration Context
- We will discuss 2 algorithms
- The bucket algorithm
- The inverse-rules algorithm
- But first a short introduction to conjunctive
queries
22Conjunctive Queries
- Conjunctive queries are able to express
select-project-join queries - h(X,) - a(Y,), b(Z,),
- Head and subgoals are atoms
- An atom consists of a predicate and a tuple of
arguments a(Y, Z, aueb, W, 2006) - Predicates stand for relations
body
subgoals
head
constants
variables
23Conjunctive Queries
- Subgoals can also be arithmetic comparisons
- h(X,) - a(Y,), b(Z,), Zgt3
- A variable that appears in the head is said to be
distinguished otherwise nondistinguished. - h(X,Y)-a(X,Z),b(Z,Y)
- X,Y distinguished, Znondestinguished
- project-select-join SQL queries as conjunctive
queries - Select pr_name, dep_name
- From Professor, Department
- Where Professor.pr_dep Department.dep_id
- and Department.dep_yeargt1996
- and Professor.field DB
- q(pr_name, dep_name) - Professor(pr_name,
dep_id, DB), - Department(dep_id, dep_name, dep_year),
- dep_yeargt1996.
24Conjunctive Queries
- The problem of rewriting queries using views in
the notation of conjunctive queries - He have
- a set of Database Relations
- a set of views declared as conjunctive queries
over the Database Relations - a query q expressed as a conjunctive query over
the Database Relations - We want to find
- A set of conjunctive queries over the view
relations, the disjunction of which being a
maximally contained rewriting of q
25example
- Relations
- R(S,C,Q), CO(C,T), TE(P,C,Q)
- Views
- V1(S, C, Q, T) - R(S, C, Q), CO(C, T), Cgt500,
Qgt97 - V2(S, P, C, Q) - R(S, C, Q), TE(P, C, Q)
- V3(S, C) - R(S, C, Q), Qlt94
- V4(P, C, T, Q) - R(S, C, Q), CO(C, T), TE(P, C,
Q), Qlt97 - Query
- Q(S, C, P) - TE(P,C,Q), R(S,C,Q), CO(C,T),
Cgt300, Qgt95
26The Bucket Algorithm
- Create a bucket for each subgoal of the query q
- Fill each bucket with views that cover the
subgoal - A view V covers a subgoal p(A,B) of q if there is
a subgoal in V with the same predicate p(X,Y)
and - there is a mapping from the variables A, B of the
subgoal of q to the variables X, Y of the subgoal
of the view - For variables of the subgoal of q appearing in
the head of q, their mapping variables in the
subgoal of the view must also appear in the head
of the view - For each variable of the subgoal of q that also
appear in arithmetic conditions C in q, if its
mapping variable in the subgoal of the view also
appear in arithmetic conditions C in the view,
then C and C should not be mutually exclusive - A solution must include q view from each bucket
27The Bucket Algorithm
- V1(S, C, Q, T) - R(S, C, Q), CO(C, T), Cgt500,
Qgt97 - V2(S, P, C, Q) - R(S, C, Q), TE(P, C, Q)
- V3(S, C) - R(S, C, Q), Qlt94
- V4(P, C, T, Q) - R(S, C, Q), CO(C, T), TE(P, C,
Q), Qlt97 - Q(St, Cn, Pr) - TE(Pr,Cn,Qu), R(St,Cn,Qu),
CO(Cn,Tl), Cngt300, Qugt95
V4(P,C,T, Q)
V2(S,P,C,Q)
V4(S, C, T, Q)
V2(S,P,C, Q)
V1(S,C,Q,T)
V1(S, C, Q, T)
CO
TE
R
28The Bucket Algorithm
- There are 8 combinations we have to check
- For each, we equaling appropriate variables to
enforce joins - We check for condition confligs
- If there are duplicate subgoals we try to
eliminate them
V4(P,C,T, Q)
V2(S,P,C,Q)
V4(S, C, T, Q)
V2(S,P,C, Q)
V1(S,C,Q,T)
V1(S, C, Q, T)
TE
R
CO
q(S,C,P)-V2(S,P,C,Q), V1(S,C,Q,T),
V1(S,C,Q,T)
The first solution
q(S,C,P)-V2(S,P,C,Q), V1(S,C,Q,T)
29The inverse-rules algorithm
- Key idea invert the view definitions to give the
global predicates (not only those appearing in
the query) in terms of views - For each nondistinguished variable X, pick a new
function symbol f - Example
- v(X,Y) - p(X,Z), p(Z,Y)
- Replace Z by f(X,Y)
- v(X,Y) - p(X,f(X,Y)), p(f(X,Y),Y)
-
30The inverse-rules algorithm
- Replace a view definition by rules with
- A subgoal as the head, and
- The view itself as the only subgoal of the body.
- Example
- v(X,Y) - p(X,f(X,Y)), p(f(X,Y),Y)
- becomes
- p(X,f(X,Y)) - v(X,Y)
- p(f(X,Y),Y) - v(X,Y)
31The inverse-rules algorithm - example
- View v(D,C)-m(S,D), r(S, C)
- Query q(D) - m(S,D), r(S, 444)
- After applying the inverse-rules, we have
- m(f1(D,C), D) - v(D, C)
- r(f1(D,C), C) - v(D,C)
32The inverse-rules algorithm - example
- Supose that v has the following tuples
- (CS, 444), (EE, 444), (CS, 333)
- Applying the inverse-rules on these tuples we
have - m (f1(CS, 444), CS), (f1(EE, 444),EE),
(f1(CS, 333), CS) - r (f1(CS, 444), 444), (f1(EE, 444),444),
(f1(CS, 333), 333)
33The inverse-rules algorithm - example
- We can now apply the query in this instance
- q (CS),(EE)
-
34The MiniCon algorithm
- A variation of the bucket algorithm
- It takes into account the non-distinguished
variables as well, which usually imply joins - If a view contains a target subgoal, which
participates in a join in the query, then the
join variables should appear in the head of the
view (unless the join takes place also within the
view) - It excludes much more views from participating in
buckets (here called MCDs) - There are fewer combination to check at the end
35Discussion