Materialized View Selection and Maintenance using Multi-Query Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Materialized View Selection and Maintenance using Multi-Query Optimization

Description:

ABC. CDE. BCDE. merge. merge. incremental refresh. recomputation ... Transient/Permanent materialized view selection. Setting Up the Maintenance Plan Space ... – PowerPoint PPT presentation

Number of Views:319
Avg rating:3.0/5.0
Slides: 36
Provided by: Prasa1
Category:

less

Transcript and Presenter's Notes

Title: Materialized View Selection and Maintenance using Multi-Query Optimization


1
Materialized View Selection and Maintenance using
Multi-Query Optimization
  • Hoshi Mistry
  • Prasan Roy
  • S. Sudarshan
  • Krithi Ramamritham

2
Materialized Views
  • Complex results materialized in order to speed up
    queries that depend on these results
  • Increasingly being supported by commercial
    database systems (e.g. Oracle8i)
  • Crucial in data warehousing environments

3
Materialized View Maintenance
  • As underlying data changes, the materialized
    views need to be refreshed
  • Efficient view maintenance crucial!
  • Need to provide up-to-date query responses
    growing
  • Amount of data added to data warehouses
    increasing
  • Maintenance time window shrinking

4
Focus
  • Efficient techniques for maintenance of a set of
    materialized views (MVs) by
  • Transient materialization of common
    subexpressions (CSEs)
  • Selection of additional MVs
  • Computation of the best maintenance policy and
    plan for each MV

5
Transient Materialization of Common Subexpressions
  • CSEs materialized to reduce maintenance cost by
    sharing computation, disposed after use
  • Motivated by Blakeley et al. SIGMOD86, Ross et
    al. SIGMOD96
  • Huge search space considered impractical
  • Earlier work by Sellis TODS88
  • Efficient heuristic algorithms proposed by Roy et
    al. SIGMOD00

6
Selection of Additional MVs
  • Additional views materialized permanently to
    reduce the overall maintenance cost
  • Motivated by Ross et al. SIGMOD96
  • restricted to incremental maintenance only
  • do not consider transient materialization
  • MV selection in general addressed in Roussopolous
    TODS82, Agrawal et al. VLDB00

7
Best Maintenance Policy and Plan Computation
  • For each MV,
  • Determine the best maintenance policy
    (incremental or recomputation)
  • Find the corresponding best plan
  • Earlier work by Vista EDBT98
  • Does not take into account transient
    materialization of CSEs or presence of other MVs
  • Current systems need manual specification of the
    maintenance policy

8
Contribution
  • A framework that consolidates the choice of
  • CSEs to be transiently materialized
  • Additional MVs
  • Best maintenance plan (incremental/recomputation)
  • Integrated with a state of the art query
    optimizer (Volcano ICDE93)

9
Example
initial set
incremental refresh
recomputation
recomputation
CDE
BCDE
ABC
permanent
permanent
permanent
merge
incremental refresh
DE
permanent
BC
merge
transient
dA
B
C
D
dE
10
Approach
  1. Setting up the search space of maintenance plans
  2. Best maintenance plan computation
  3. Transient/Permanent materialized view selection

11
Approach
  • Setting up the search space of maintenance plans
  • Best maintenance plan computation
  • Transient/Permanent materialized view selection

12
Setting Up the Maintenance Plan Space
  • The Query DAG representation for recomputation
    plans
  • Incorporating incremental plans

13
Representation of the Recomputation Plan Space
  • AND/OR Query DAG

BCD
ABC
Equivalence Class (OR node)
Operation (AND node)
BC
CD
AB
Best Plan
C
D
B
A
Additionally incorporates subsumption derivations
Details in Roy et al. SIGMOD00
14
Incremental PlansPropagation Based Differential
Generation
  • Differentials propagated one at a time
  • For each differential dR
  • Start at dR and compute node differentials
    bottom-up along the best plan in a topological
    order
  • Differential of a node computed as a function of
    its inputs and their differentials
  • e.g. d(E1E2) E1 dE2 U E2dE1 U dE1dE2 where
    dEi differential of Ei wrt dR
  • Refresh the relation R and the affected MVs wrt
    dR by merging with the differentials computed as
    above

Ross et al. SIGMOD96
15
Incorporating Incremental PlansPropagation
Based Differential Generation
  • Propagation of dA

BCdA
Equivalence Class (OR node)
Operation (AND node)
BC
BdA
Best Plan
C
B
dA
16
Incorporating Incremental PlansPropagation
Based Differential Generation
  • Propagation of dB

CDdB
ACdB
Equivalence Class (OR node)
Operation (AND node)
CdB
CD
AdB
Best Plan
C
D
dB
A
17
Incorporating Incremental PlansPropagation
Based Differential Generation
  • Propagation of dC

BDdC
ABdC
Equivalence Class (OR node)
Operation (AND node)
BdC
DdC
AB
Best Plan
dC
D
B
A
18
Incorporating Incremental PlansPropagation
Based Differential Generation
  • Propagation of dD

BCdD
Equivalence Class (OR node)
Operation (AND node)
BC
CdD
Best Plan
C
dD
B
19
Incorporating Incremental Plans
  • Logical representation

Merge operator
AB
incremental plan
recomputation plan
BdA
AdB
A
B
dB
dA
  • For each equiv node and each base differential
    affecting it
  • Introduce a new equiv node representing its
    differential
  • Populate with the differential plans
  • Maintain statistics for the full expression after
    successive merges
  • Large space overhead!

20
Incorporating Incremental Plans
  • Actual space-efficient representation

AB
BdA
AdB
B
dA
A
dB
  • Reuse the same structure for successive
    propagation cycles
  • separate best plan pointers for each cycle
  • separate statistics for the full expression after
    successive merges
  • Also incorporates sort-orders, indices, etc. Roy
    et al. SIGMOD00

21
Approach
  • Setting up the search space of maintenance plans
  • Best maintenance plan computation
  • Transient/Permanent materialized view selection

22
Maintenance Plan Computation
  • Given
  • Set of nodes Mt materialized transiently
  • can include full results as well as differentials
  • Set of nodes Mp materialized permanently
  • includes full results but not differentials
  • compute the best consolidated maintenance plan
    for Mp

23
Maintenance Plan Computation
  • Best plan computed using a query optimizer
    extended as follows
  • Plan accessing a materialized view (trans/perm)
    does not include its computation, only its use
  • Cost of a maintenance plan
  • totalcost(Mp, Mt) ?e?Mpmaintcost(e Mp, Mt)
    ?e?Mttrmatcost(e Mp, Mt)
  • where
  • maintcost(Mp, Mt) cost of cheapest maintenance
    plan for e (recomputation/incremental)
  • trmatcost(Mp, Mt) cost of computing and
    materializing e

24
Approach
  • Setting up the search space of maintenance plans
  • Best maintenance plan computation
  • Transient/Permanent materialized view selection

25
Transient/Permanent Materialized View Selection
  • Given set of MVs M already materialized,
    determine
  • Set of nodes Mt to materialize transiently
  • Set of nodes Mp (? M) to materialize permanently
  • such that totalcost(Mp, Mt) is minimized
  • Exhaustive approach too expensive. Need
    heuristics!

26
Transient/Permanent Materialized View SelectionA
Greedy Heuristic
  • Input Initial MVs M
  • Output Mp (? M) , Mt, corresp. best plan
  • Begin
  • Mp M Mt
  • S set of equivalence nodes in the DAG for M
  • While ( S ? )
  • Pick z ? S which maximizes Benefit(z Mp, Mt)
  • If ( Benefit(z Mp, Mt) ? 0 )
  • break
  • If ( z is a full result and
    maintcost(z Mp, Mt) lt
    trmatcost(z Mp, Mt) )
  • Mp Mp U z
  • else Mt Mt U z
  • S S z
  • Return (Mp, Mt)
  • End

How to compute Benefit(z Mp, Mt)?
27
Transient/Permanent Materialized View
SelectionBenefit Computation
  • Benefit(z Mp, Mt) gain(z Mp, Mt) -
    investment(z Mp, Mt)
  • where
  • gain(z Mp, Mt) ?e?Mp(maintcost(e Mp, Mt) -
    maintcost(e Mp, Mt U z))
  • ?e?Mt(trmatcost(e Mp, Mt) -
    trmatcost(e Mp, Mt U z))
  • and
  • investment(z Mp, Mt) min(maintcost(z Mp,
    Mt), trmatcost(z Mp, Mt))
  • if z is a full result
  • trmatcost(z Mp, Mt) if z is a
    differential
  • Benefit computation expensive. Need efficient
    techniques!

28
Transient/Permanent Materialized View
SelectionImproving Efficiency of the Greedy
Heuristic
  • Cost-propagation based incremental techniques to
    efficiently compute Benefit
  • Monotonicity assumption
  • Reduces the number of Benefit computations
  • Techniques to determine if a node can be shared
    across a given maintenance plan
  • Reduces the number of nodes considered for
    transient materialization
  • Adapted from Roy et al. SIGMOD00. See paper for
    details.

29
Benchmark
  • Single Views
  • Same views as above, refreshed separately
  • Set of Views
  • 10 views (5 with aggregates, 5 without) on 8
    distinct relations, refreshed together

30
Effect of Transient and Permanent Materialization
Single Views
Set of Views
31
Effect of Adaptive Maintenance Policy Selection
Single Views
Set of Views
32
Scalability Analysis
Optimization Memory Requirements
Optimization Time
Negligible one-time costs
33
Conclusion
  • Presented techniques
  • Automate sharing of computation
  • Automate view selection
  • Automate maintenance policy selection and plan
    computation
  • Do the above in an integrated manner
  • leading to benefits greater than could be
    achieved by considering each dimension
    individually
  • Are efficient and scalable
  • the overall benefits greatly outweigh the
    one-time cost
  • Integrate with state-of-the-art optimizers
    (e.g. MS SQL-Server)

34
Future Work
  • Extend presented techniques
  • To handle limited space
  • To speed up a workload of queries in addition to
    maintenance of a set of materialized views
  • To work in dynamic query result caching
    environments

35
Questions
Write a Comment
User Comments (0)
About PowerShow.com