Distributed Database Design - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Distributed Database Design

Description:

Distributed Database Design. What is to be Designed? ... The join can be processed in a genuinely distributed fashion. Join Graphs ... – PowerPoint PPT presentation

Number of Views:854

Avg rating:3.0/5.0

Slides: 39

Provided by: scie192

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Database Design

1
Distributed Database Design
2
What is to be Designed?

In designing a distributed database, the same
issues are faced as for a centralized database
plus, in addition
Fragmentation (how relations are to be logically
divided up, if at all)
Allocation (how relations/fragments are to be
assigned to sites)
Replication (if, and where, relations/fragments
should be replicated)

3
Fragmentation

There are two basic ways in which a relation can
be fragmented
Horizontally (each fragment contains a subset of
the tuples/rows of the relation)
Vertically (each fragment contains a subset of
the attributes/columns of the relation)

4
Hybrid (Mixed) Fragmentation
Relation A
Fragment H1
Fragment H2V2
Fragment H2V1H1
Fragment H2V1H2
Fragment H2V1H3
5
Horizontal Fragmentation

Horizontal fragments are defined according to a
condition or predicate that the tuples in it
satisfy
For example, the team relation of the sports club
database might be horizontally fragmented
according to the values of the attribute club.

6
Vertical Fragmentation

Vertical fragments contain a subset of the
attributes of the original relation
Note, however, an important point Every vertical
fragment should contain all of the prime
attribute(s) - the primary key - of the original
relation
Why?

7
Correctness of Fragmentation

A fragmentation, if it is to be correct, should
obey certain conditions or rules
Completeness If it is in the original relation
then it is also in one of the fragments
Reconstruction The original relation can be
correctly rebuilt from the fragments
Disjointness Data in one fragment should not
also be present in another, except for the
primary key attributes in vertical fragmentation

8
Derived Horizontal Fragmentation

When a horizontal fragmentation is defined by a
predicate that is directly applicable to the
tuples of the relation, it is a primary
horizontal fragmentation
When the fragmentation is defined by a predicate
that is applied to the tuples in a different
relation, it is a derived horizontal fragmentation

9
Semi-Joins Derived Horizontal Fragmentation

A derived horizontal fragmentation can be
formally specified using a semi-join
A semi-join is a natural join that retains the
attributes of only one of the participating
relations
It is denoted by the symbol
For example

10
Degree of Fragmentation

What determines how far we should go in
fragmenting relations?
The answer lies not in the data, but in the
applications (i.e. queries) that access it
We want to fragment to get the best possible
result for all applications/queries but what
does best possible mean?

11
Factors Affecting Fragmentation

The two primary factors affecting how we fragment
are
The database structure (global conceptual schema)
The application characteristics (predicates used,
locations and frequencies of use)
Other important aspects are the network
characteristics and computer processing power

12
Applications Simple Predicates

Applications (think of them as queries) that
access a database often select out a subset of
the data according to a predicate a Boolean
expression
A simple predicate pj is one of the form
pj Ai ? Value
where Ai is an attribute, ? is a comparison
operator, and Value is a domain value

13
Horizontal Fragmentation Minterm Predicates

In designing horizontal fragmentation we need to
identify the chunks of tuples in the database
that are accessed as a unit by applications/querie
s
Each chunk is defined by a minterm predicate,
which is a combination of simple predicates
These chunks are then candidates for horizontal
fragments

14
Minterm Predicates

It is sometimes (often?) possible to eliminate
some minterm predicates, based on our
understanding of the database semantics
This occurs when one minterm predicate implies
another

15
Minterm Predicates (cont)

As well as knowing what the minterm predicates
are, it is important to know (typically) how many
tuples would be selected by each, and
How frequently the tuples selected by each are
accessed (this is closely related to query access
frequency)

16
Derived Horizontal Fragmentation Again

It often happens that a relation can have
alternate, and conflicting, possible derived
fragmentations
This may occur when it is related to two (or
more) parent relations through foreign keys

17
Choosing a Derived Fragmentation

Given two (or more) possible derived
fragmentations, how to choose between them?
The two factors are
Application usage (the more frequently accessed
minterm is generally preferable)
Join characteristics (the design that gives
better overall join performance is preferable)

18
Join Characteristics

The performance of a join in a distributed
database benefits when
The relations, or fragments, to be joined are
small (i.e. have relatively few tuples)
The join can be processed in a genuinely
distributed fashion

19
Join Graphs

A join graph is a pictorial way of showing which
fragments of two relations participating in a
join might produce non-empty results
Each fragment is represented by a node and those
fragments that produce a non-empty join result
are connected by an edge (line)

20
A Simple Join Graph
T1
PF1
T2
PF2
Fragments of team
Fragments of plays_for
T3
PF3
T4
PF4
21
A Total Join Graph
P1
PF1
P2
PF2
Fragments of player
Fragments of plays_for
P3
PF3
P4
PF4
P5
22
A Partitioned Join Graph
P1
PF1
P2
PF2
Fragments of player
Fragments of plays_for
P3
PF3
P4
PF4
P5
23
Vertical Fragmentation

Analogously to horizontal fragmentation, the
purpose of vertical fragmentation is to group
together attributes into chunks that are
usually accessed together by user
applications/queries
Vertical fragmentation is more complex to design
than horizontal fragmentation

24
Approaches to Vertical Fragmentation

There are two basic approaches to vertical
fragmentation
Splitting originally designed relations
Grouping individual attributes from scratch
The first is preferred since
The original relations (should) be well designed
Appropriate fragments are likely to be closer to
relation size than individual attribute size

25
Application Information for Vertical Fragmentation

The application information needed for vertical
fragmentation design is
Which applications/queries use which attributes
The frequencies with which the various
applications/queries are executed
The first of these can be captured in an
attribute usage matrix, then use of the second
results in a attribute affinity matrix

26
The Player Relation

Suppose we wish to determine how the
relationplayer(Name, Height, Gender, Address,
Weight, DOB, Telephone)should be vertically
fragmented
The queries that access player are shown on the
next slide

27
Queries on Player

q1 SELECT Name, DOB, Address, Telephone FROM
player WHERE Gender
q2 SELECT avg(Height), avg(Weight) FROM player
WHERE Gender
q3 SELECT Name, Height, Weight, DOB FROM player
WHERE Name LIKE
q4 SELECT Name, Address, Telephone FROM player
WHERE Name

28
Attribute Usage Matrix

This is a matrix that has attributes along one
axis and queries along the other
It contains a 1 in positions for which the
relevant query accesses the attribute of
interest, and 0 otherwise
The matrix for player and queries q1 to q4 is
shown on the next slide

29
Attribute Usage Matrix
30
Attribute Affinity Matrix

This is a matrix that has the attributes of the
relation we are dealing with on both axes
To construct the matrix we need to know the
frequency of accesses (across all sites) to each
query
Then we can work out how often (relatively) each
pair of attributes are accessed together

31
Query Characteristics

Suppose there are three sites accessing the
player relation, with these relative
frequencies

32
Attribute Affinity Matrix
33
Attribute Clustering

The next step is to group together the
attribute(s) that have an affinity for each other
this results in a clustered affinity matrix
There is a formal, so-called bond energy,
algorithm for this (given in Özsu Valduriez)
but we present an informal version that is easier
to understand although it is not, of course,
suitable for realistically complex situations

34
Clustered Attribute Affinity Matrix
35
Hybrid Fragmentation

The nature of the applications/queries may
require a more complex fragmentation then just a
simple horizontal or vertical one
The technique of fragmentation is, however,
basically identical (e.g. apply the horizontal
fragmentation technique followed by the vertical
fragmentation method to one or more of the
horizontal fragments)

36
Allocation

The general problem of determining the optimal
site(s) to allocate the various fragments to is
impossibly complex
Clearly each fragment has to be stored at one
site at least however, there is also the choice
of replicating it at one or more other sites as
well

37
Aim of Allocation

The fundamental aim of allocation is to minimize
some cost function that may include
Inter-site communication costs
Processing costs
Data retrieval and updating costs
Data storage costs

38
Factors Affecting Allocation