Answering Queries using views: A survey - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Answering Queries using views: A survey

Description:

Our aim here is not just to find an equivalent rewriting ... Conjunctive queries are able to express select-project-join queries. h(X,...) :- a(Y,...), b(Z, ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 36
Provided by: Har15
Category:

less

Transcript and Presenter's Notes

Title: Answering Queries using views: A survey


1
Answering Queries using views A survey
2
The problem
  • We have a schema A and a set of materialized
    views V1, V2,, Vn
  • We are given a query Q over schema A
  • Can we answer Q using another query Q referring
    only to (one or more) views?
  • Depending on the specific context
  • Schema A can be either virtual or materialized
  • Q may be required to be an equivalent rewriting
    or a maximally contained rewriting of Q

3
Equivalent and Containment rewritings
  • Query Containment Q1 is contained in Q2 (Q1?Q2),
    if for all db instances D, Q1(D)?Q2(D)
  • Query Containment Q1 is equivalent to Q2 if
    Q1?Q2 and Q2 ? Q1
  • Let A a schema and VV1, V2, , Vm a set of
    views over A and Q a query over A
  • Q is a equivalent rewriting of Q, if
  • Q refers only to views in V
  • Q is equivalent to Q
  • Q is a maximally contained rewriting of Q, if
  • Q refers only to views in V
  • Q ? Q
  • There is no other rewriting Q1, such as Q ? Q1 ?
    Q

4
When the problem arises?
  • Data integration
  • The sources are described as views over the
    mediated schema
  • Data warehouse
  • The schema of the warehouse can be defined as
    views over the schemata of the sources
  • Query Optimization
  • Computing the query using previously materialized
    views
  • semantic caching
  • Database design
  • Independence of the physical view of the data and
    logical view
  • Rewriting of a query over the logical schema into
    a query over the physical view

5
The variations of the problem
6
A running example
  • Suppose we have the following schema
  • Student(st_id, st_name, dp_id)
  • Professor(pr_id, pr_name)
  • Department(dp_id, dp_description)
  • Course(cr_id, cr_title, cr_semester)
  • Prof_Dep(pr_id, dp_id, university)
  • Student_course(st_id, cr_id)
  • Prof_course(pr_id, cr_id)

7
Data Integration
Mediator
Student(st_id, st_name, dp_id) Professor(pr_id,
pr_name) Department(dp_id, dp_description) Course(
cr_id, cr_title, cr_semester, cr_field) Prof_Dep(p
r_id, dp_id, university) Student_course(st_id,
cr_id) Prof_course(pr_id, cr_id)
Create View AuebProfs As Select cr_title,
pr_name From Course, Prof_course, Professor Where
Course.cr_id Prof_course.cr_id and
Prof_course.pr_id Professor.pr_id and
Course.univ AUEB
Create View DBProfs As Select cr_title,
pr_name From Course, Prof_course, Professor Where
Course.cr_id Prof_course.cr_id and
Prof_course.pr_id Professor.pr_id and
Course.cr_field DB
Source 1 Professors Of Databases
Source 1 Professors Of AUEB
8
Data Integration
Mediator
Student(st_id, st_name, dp_id) Professor(pr_id,
pr_name) Department(dp_id, dp_description) Course(
cr_id, cr_title, cr_semester, cr_field) Prof_Dep(p
r_id, dp_id, university) Student_course(st_id,
cr_id) Prof_course(pr_id, cr_id)
Query Q Select pr_name From Professor
Query Q Select pr_name From DBProfs Union Select
pr_name From AUEBProfs
Source 1 Professors Of Databases
Source 1 Professors Of AUEB
Q is a maximally-contained Rewriting of
Q (professors of Algorithms in the university of
Piraeus cannot be retrieved)
9
Data Integration
Mediator
Query P Select pr_name From Course, Prof_course,
Professor Where Course.cr_id Prof_course.cr_id
and Prof_course.pr_id Professor.pr_id and
Course.cr_field DB And Course.univ AUEB
Student(st_id, st_name, dp_id) Professor(pr_id,
pr_name) Department(dp_id, dp_description) Course(
cr_id, cr_title, cr_semester, cr_field) Prof_Dep(p
r_id, dp_id, university) Student_course(st_id,
cr_id) Prof_course(pr_id, cr_id)
Query P Select pr_name From DBProfs,
AUEBProfs Where DBProfs.pr_name
AUEBProfs.pr_name
Source 1 Professors Of Databases
Source 1 Professors Of AUEB
P is an Equivalent Rewriting of P
10
Query Optimization
  • Suppose that we have a database with the schema
    of our running example
  • Suppose that we have the following materialized
    views
  • create view StudentCoursesView As
  • Select st_name, cr_id
  • from Student, Stud_course
  • where Student.st_id Stud_course.st_id
  • create view ProfCoursesView As
  • Select pr_name, pr_id
  • from Professor, Prof_course
  • where Professor.pr_id Prof_course.pr_id

11
Query Optimization
  • Suppose we are given the query
  • Select st_name, prof_name
  • from Student, Stud_course, Prof_course, Professor
  • where Student.st_id Stud_course.cr_id
  • and Stud_course.cr_id Prof_course.cr_id
  • and Prof_course.pr_id Professor.pr_id

12
Query Optimization
  • We can exploit the materialized views in order to
    eliminate joins
  • There are many possible combinations
  • One could be
  • Select st_name, pr_name
  • From StudentCoursesView, ProfessorCoursesView
  • Where StudentCoursesView.cr_id
  • ProfessorCoursesView.cr_id

13
Query Optimization
  • But
  • Our aim here is not just to find an equivalent
    rewriting
  • We have to find all rewritings and evaluate their
    costs in order to select the cheapest plan
  • In some cases views may not help due to loss of
    indexes
  • Therefore, we need to make a cost-based rewriting
  • Similar case is the semantic caching
  • Client-server architecture
  • Query results can be temporary maintained as
    relations in the client side to be exploited for
    the evaluation of future queries

14
A common question When a view V is useful for a
query Q?
  • There must be a mapping µ from the relations of V
    to the relations of Q
  • If so, if R1 is both in V and Q then
  • If R1 participates in a join in Q with relation
    R2, then
  • Either R1 and R2 must be joined in the same
    attributes with the same conditions in V
  • Or R1 and R2 must be joined in V in the same
    attributes but with weaker conditions than those
    in Q, provided that the join attributes are
    projected in V
  • Or only R1 is in V and the attributes of R1
    participating in the join in Q, are projected in
    V
  • If Q applies a selection predicate in an
    attribute A of R1, then V
  • Either V must apply the same selection predicate
    in attribute A of R1
  • Or V must apply a weaker selection predicate,
    provided that attribute A is projected in V
  • Or does not apply a predicate in attribute A of
    R1 at all and attribute A is projected in V
  • V must project at least all attributes projected
    in Q

15
Examples
  • Q Select st_name, prof_name
  • from Student, Stud_course, Prof_course, Professor
  • where Student.st_id Stud_course.st_id
  • and Stud_course.cr_id Prof_course.cr_id
  • and Prof_course.pr_id Professor.pr_id
  • and Student.st_year gt 2000

V1 Select st_name from Student where
st_yeargt2000
V1 Select st_name, cr_name from Student,
Stud_course, Course where Student.st_id
Stud_course.st_id and Stud_course.cr_id
Course.cr_id
V1 Select st_name, st_id from Student
where st_yeargt 2000
V1 Select st_name, st_year, cr_name, cr_id
from Student, Stud_course, Course where
Student.st_id Stud_course.st_id and
Stud_course.cr_id Course.cr_id
V1 Select st_name, st_id from Student
where st_year gt1996
V1 Select st_name, st_id, ma from
Student where am gt1996
16
Algorithms for finding query rewritings using
views
  • In the query optimization context
  • A variation of the Selinger Algorithm
  • Cost-based equivalent rewritings
  • In the data integration context
  • The bucket algorithm
  • The inverse-rules algorithm
  • The MiniCon algorithm

Maximally-contained rewritings
17
Query Optimization the Selinger Algorithm
  • Recall the Selinger Algorithm chooses the best
    join order
  • A bottom-up dynamic programming algorithm
  • Each iteration explores all join combinations of
    certain length
  • For each combination of joins, it keeps only the
    cheapest plan, taking into account costs
    estimated in the previous iteration

18
A variation of the Selinger Algorithm
  • The Selinger Algorithm can be modified in order
    to choose the cheapest equivalent rewriting
  • We start with views that seem useful for a
    certain query
  • We continue in a bottom-up fashion exploring all
    join combinations
  • For each combination we prune plans estimated to
    be more expensive or less contributing than others

19
A variation of the Selinger Algorithm
  • First Iteration
  • Find all views that are relevant (useful) to the
    query
  • Choose the best access path for each view and
    evaluate its cost
  • Second Iteration
  • Find all pair combinations
  • For each pair find all possible Joins in all
    possible attributes that lead to useful plans
  • Plans that are complete (i.e. are sufficient to
    answer the query) are distinguished as final
    candidates
  • For the rest plans do the following
  • Compare pairs of plans
  • For each pair (p, p), if p is cheaper and has
    greater or equal contribution (covers more
    relations), then p is pruned
  • Third Iteration
  • Find all possible pairs of plans derived from the
    previous iteration and the plans derived from the
    first iteration

20
A variation of the Selinger Algorithm
  • Query R1 R2 R3 R4
  • Views V1R1 R2 , V2R2 R3 R4
  • V3R2 R3, V4R3 R4,

(V1-V3)-V2 (V1-V3)-V4
V1-V2 V1-V3 V1-V4 V2-V3 V2-V4
V3-V4
V1 V2 V3 V4
21
Algorithms in the Data Integration Context
  • We will discuss 2 algorithms
  • The bucket algorithm
  • The inverse-rules algorithm
  • But first a short introduction to conjunctive
    queries

22
Conjunctive Queries
  • Conjunctive queries are able to express
    select-project-join queries
  • h(X,) - a(Y,), b(Z,),
  • Head and subgoals are atoms
  • An atom consists of a predicate and a tuple of
    arguments a(Y, Z, aueb, W, 2006)
  • Predicates stand for relations

body
subgoals
head
constants
variables
23
Conjunctive Queries
  • Subgoals can also be arithmetic comparisons
  • h(X,) - a(Y,), b(Z,), Zgt3
  • A variable that appears in the head is said to be
    distinguished otherwise nondistinguished.
  • h(X,Y)-a(X,Z),b(Z,Y)
  • X,Y distinguished, Znondestinguished
  • project-select-join SQL queries as conjunctive
    queries
  • Select pr_name, dep_name
  • From Professor, Department
  • Where Professor.pr_dep Department.dep_id
  • and Department.dep_yeargt1996
  • and Professor.field DB
  • q(pr_name, dep_name) - Professor(pr_name,
    dep_id, DB),
  • Department(dep_id, dep_name, dep_year),
  • dep_yeargt1996.

24
Conjunctive Queries
  • The problem of rewriting queries using views in
    the notation of conjunctive queries
  • He have
  • a set of Database Relations
  • a set of views declared as conjunctive queries
    over the Database Relations
  • a query q expressed as a conjunctive query over
    the Database Relations
  • We want to find
  • A set of conjunctive queries over the view
    relations, the disjunction of which being a
    maximally contained rewriting of q

25
example
  • Relations
  • R(S,C,Q), CO(C,T), TE(P,C,Q)
  • Views
  • V1(S, C, Q, T) - R(S, C, Q), CO(C, T), Cgt500,
    Qgt97
  • V2(S, P, C, Q) - R(S, C, Q), TE(P, C, Q)
  • V3(S, C) - R(S, C, Q), Qlt94
  • V4(P, C, T, Q) - R(S, C, Q), CO(C, T), TE(P, C,
    Q), Qlt97
  • Query
  • Q(S, C, P) - TE(P,C,Q), R(S,C,Q), CO(C,T),
    Cgt300, Qgt95

26
The Bucket Algorithm
  • Create a bucket for each subgoal of the query q
  • Fill each bucket with views that cover the
    subgoal
  • A view V covers a subgoal p(A,B) of q if there is
    a subgoal in V with the same predicate p(X,Y)
    and
  • there is a mapping from the variables A, B of the
    subgoal of q to the variables X, Y of the subgoal
    of the view
  • For variables of the subgoal of q appearing in
    the head of q, their mapping variables in the
    subgoal of the view must also appear in the head
    of the view
  • For each variable of the subgoal of q that also
    appear in arithmetic conditions C in q, if its
    mapping variable in the subgoal of the view also
    appear in arithmetic conditions C in the view,
    then C and C should not be mutually exclusive
  • A solution must include q view from each bucket

27
The Bucket Algorithm
  • V1(S, C, Q, T) - R(S, C, Q), CO(C, T), Cgt500,
    Qgt97
  • V2(S, P, C, Q) - R(S, C, Q), TE(P, C, Q)
  • V3(S, C) - R(S, C, Q), Qlt94
  • V4(P, C, T, Q) - R(S, C, Q), CO(C, T), TE(P, C,
    Q), Qlt97
  • Q(St, Cn, Pr) - TE(Pr,Cn,Qu), R(St,Cn,Qu),
    CO(Cn,Tl), Cngt300, Qugt95

V4(P,C,T, Q)
V2(S,P,C,Q)
V4(S, C, T, Q)
V2(S,P,C, Q)
V1(S,C,Q,T)
V1(S, C, Q, T)
CO
TE
R
28
The Bucket Algorithm
  • There are 8 combinations we have to check
  • For each, we equaling appropriate variables to
    enforce joins
  • We check for condition confligs
  • If there are duplicate subgoals we try to
    eliminate them

V4(P,C,T, Q)
V2(S,P,C,Q)
V4(S, C, T, Q)
V2(S,P,C, Q)
V1(S,C,Q,T)
V1(S, C, Q, T)
TE
R
CO
q(S,C,P)-V2(S,P,C,Q), V1(S,C,Q,T),
V1(S,C,Q,T)
The first solution
q(S,C,P)-V2(S,P,C,Q), V1(S,C,Q,T)
29
The inverse-rules algorithm
  • Key idea invert the view definitions to give the
    global predicates (not only those appearing in
    the query) in terms of views
  • For each nondistinguished variable X, pick a new
    function symbol f
  • Example
  • v(X,Y) - p(X,Z), p(Z,Y)
  • Replace Z by f(X,Y)
  • v(X,Y) - p(X,f(X,Y)), p(f(X,Y),Y)

30
The inverse-rules algorithm
  • Replace a view definition by rules with
  • A subgoal as the head, and
  • The view itself as the only subgoal of the body.
  • Example
  • v(X,Y) - p(X,f(X,Y)), p(f(X,Y),Y)
  • becomes
  • p(X,f(X,Y)) - v(X,Y)
  • p(f(X,Y),Y) - v(X,Y)

31
The inverse-rules algorithm - example
  • View v(D,C)-m(S,D), r(S, C)
  • Query q(D) - m(S,D), r(S, 444)
  • After applying the inverse-rules, we have
  • m(f1(D,C), D) - v(D, C)
  • r(f1(D,C), C) - v(D,C)

32
The inverse-rules algorithm - example
  • Supose that v has the following tuples
  • (CS, 444), (EE, 444), (CS, 333)
  • Applying the inverse-rules on these tuples we
    have
  • m (f1(CS, 444), CS), (f1(EE, 444),EE),
    (f1(CS, 333), CS)
  • r (f1(CS, 444), 444), (f1(EE, 444),444),
    (f1(CS, 333), 333)

33
The inverse-rules algorithm - example
  • We can now apply the query in this instance
  • q (CS),(EE)

34
The MiniCon algorithm
  • A variation of the bucket algorithm
  • It takes into account the non-distinguished
    variables as well, which usually imply joins
  • If a view contains a target subgoal, which
    participates in a join in the query, then the
    join variables should appear in the head of the
    view (unless the join takes place also within the
    view)
  • It excludes much more views from participating in
    buckets (here called MCDs)
  • There are fewer combination to check at the end

35
Discussion
Write a Comment
User Comments (0)
About PowerShow.com