Scaling Heterogeneous Databases and Design of DISCO - PowerPoint PPT Presentation

About This Presentation
Title:

Scaling Heterogeneous Databases and Design of DISCO

Description:

Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 21
Provided by: Donat192
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Scaling Heterogeneous Databases and Design of DISCO


1
Scaling Heterogeneous Databases and Design of
DISCO
  • Anthony Tomasic
  • Louiqa Raschid
  • Patrick Valduriez

2
DISCO Architecture
A Application M Mediator C Catalog W
Wrapper D Data Source
3
Problems with the Architecture
  • Fragile mediator Problem - Mediator schema may
    have to be changed when a new source is added.
  • Source capability problem - Different wrappers
    may have different functionality.
  • Graceless failure - The query can not be
    processed in presence of unavailable data sources.

4
Overview
  • Mediator Query Processing
  • Describing Source Capabilities
  • Mediator Cost Model
  • Partial Evaluation of Queries

5
Mediator Query Processing
6
Incorporating Source Capabilities
  • Describing the operators Wrapper exports
    information about which operators it can execute
    and on which collections.
  • Select publications 1 bind Author ()
  • bind
    KeywordTitle ()
  • project publications 2 bind combine Author ()
  • bind combine Title ()
  • scan ALL
  • Mediators can also accept context-free grammar
    which describes the functionality of the wrapper.

7
Mediator Cost Model
  • The Mediator has a generic cost model
  • Unary Operators
  • sequential scan and index scan
  • cost formulae derived using calibrating approach
  • Binary Operators
  • index join, nested loops and sort-merge join
  • if index is available, index join is chosen,
    otherwise the best of the other two
  • Wrapper can override the mediator model by
    exporting statistics and/or cost formulae.

8
Cost Communication
  • Exporting Statistics - Wrapper can export
    statistics through two special methods attribute
    and extent attached to each interface
    description.
  • Exporting Formulae - Wrapper specific cost
    formulae can be described using rules.
  • For example,
  • select(C, A V ) lt
    CountObject C.CountObject selectivity(A, V)
    TotalSize CountObject
    C.ObjectSize TotalTime C.TotalTime
    C.TotalSize 25
  • Mediator selects the most specific rule.

9
Partial Evaluation of Queries
  • If a data source is unavailable, DISCO evaluates
    as much of the query as possible and returns
    another query.
  • Example
  • Consider the following query run when person2 is
    unavailable
  • select x.name
  • from x in person0, y in person1, z in
    person2
  • where x.name y.name and y.name z.name
  • Returns the following result (where t0 is person0
    join person1)
  • select w.name
  • from w in t0, z in person2
  • where w.name z.name

10
Extracting Information
  • Opaque Partial Answers No extraction possible.
  • Transparent Partial Answers Can ask a
    parachute query which is related to the
    original query.
  • For example, a parachute query for the earlier
    example can be
  • select x.name
  • from x in person0, y in person1
  • where x.name y.name
  • Parachute query is evaluated by rewriting it over
    the materialized relations.

11
Constrained Evaluation of Queries
  • The optimizer tries to ensure that the parachute
    queries can always be evaluated (if possible at
    all) in case of failures.
  • For example, if the parachute query is (A join
    C), then
  • it will not be possible to evaluate it if B
    fails.

12
Partial Evaluation of Queries
  • Open Issues
  • Semantics with updates to data sources
  • Tradeoffs between materializing partial answers
    and resubmitting the original queries
  • Aggregate queries ?
  • APPROXIMATE ?

13
The Good
  • It can handle wrappers with different
    capabilities.
  • Mediator uses a generic cost model which can be
    overridden by the wrapper.
  • Partial evaluation of queries and extraction of
    information from partial answer.

14
The Bad
  • Queries involving different wrappers have to be
    done at the mediator.
  • Only implemented a relational subset of the
    model.
  • Data replication not addressed.

15
The Ugly
  • Arbitrary source capabilities can not be easily
    handled.
  • Proliferation of wrapper specific cost rules can
    make query optimization very expensive.
  • Centralized query optimization - wrappers dont
    have much control over it.
  • Autonomous data sources ?

16
Mediator Query Processing
  • Reformulate the query into local schemas.
  • Transform the query into logical operator trees.
  • Decompose each query into wrapper sub-queries and
    a composition query.
  • Modify the wrapper sub-queries and the
    composition query to reflect the capabilities of
    the wrappers.
  • Generate distributed execution plans .
  • Select the minimum cost plan.
  • Send the wrapper sub-trees to the wrappers and
    execute the composition query on the results.

17
Mediator Data Model
  • Extensions to ODMG 2.0
  • multiple extents per interface using MetaExtents
  • interface MetaExtent
  • attribute String name
  • attribute Extent e
  • attribute Type interface
  • attribute Wrapper wrapper
  • attribute Map map
  • type mapping

18
Accessing Data Sources
  • Define a wrapper object.
  • wrapper w0 rmi//rodin.inria.fr/PersonWrapper
  • Define a wrapper schema.
  • extent p0 of Person
  • interface Person
  • attribute String name
  • attribute Short salary
  • This is exported to the mediator.
  • Define the mediator schema.

19
Accessing Data Sources
  • Define the mediator extents
  • extent person0 of Person wrapper w0 extent p0
  • extent person1 of Person wrapper w1 extent s1
  • map (name sname)
  • Can use subtyping and views to define more
    complex transformations on the data sources.
  • define double as
  • select struct (name x.name, salary
    x.salaryy.salary)
  • from x in person0 and y in person1
  • where x.name y.name

20
Mediator Query Processing
Write a Comment
User Comments (0)
About PowerShow.com