Title: Distributed Database Design
1Distributed Database Design
2What is to be Designed?
- In designing a distributed database, the same
issues are faced as for a centralized database
plus, in addition - Fragmentation (how relations are to be logically
divided up, if at all) - Allocation (how relations/fragments are to be
assigned to sites) - Replication (if, and where, relations/fragments
should be replicated)
3Fragmentation
- There are two basic ways in which a relation can
be fragmented - Horizontally (each fragment contains a subset of
the tuples/rows of the relation) - Vertically (each fragment contains a subset of
the attributes/columns of the relation)
4Hybrid (Mixed) Fragmentation
Relation A
Fragment H1
Fragment H2V2
Fragment H2V1H1
Fragment H2V1H2
Fragment H2V1H3
5Horizontal Fragmentation
- Horizontal fragments are defined according to a
condition or predicate that the tuples in it
satisfy - For example, the team relation of the sports club
database might be horizontally fragmented
according to the values of the attribute club.
6Vertical Fragmentation
- Vertical fragments contain a subset of the
attributes of the original relation - Note, however, an important point Every vertical
fragment should contain all of the prime
attribute(s) - the primary key - of the original
relation - Why?
7Correctness of Fragmentation
- A fragmentation, if it is to be correct, should
obey certain conditions or rules - Completeness If it is in the original relation
then it is also in one of the fragments - Reconstruction The original relation can be
correctly rebuilt from the fragments - Disjointness Data in one fragment should not
also be present in another, except for the
primary key attributes in vertical fragmentation
8Derived Horizontal Fragmentation
- When a horizontal fragmentation is defined by a
predicate that is directly applicable to the
tuples of the relation, it is a primary
horizontal fragmentation - When the fragmentation is defined by a predicate
that is applied to the tuples in a different
relation, it is a derived horizontal fragmentation
9Semi-Joins Derived Horizontal Fragmentation
- A derived horizontal fragmentation can be
formally specified using a semi-join - A semi-join is a natural join that retains the
attributes of only one of the participating
relations - It is denoted by the symbol
- For example
-
10Degree of Fragmentation
- What determines how far we should go in
fragmenting relations? - The answer lies not in the data, but in the
applications (i.e. queries) that access it - We want to fragment to get the best possible
result for all applications/queries but what
does best possible mean?
11Factors Affecting Fragmentation
- The two primary factors affecting how we fragment
are - The database structure (global conceptual schema)
- The application characteristics (predicates used,
locations and frequencies of use) - Other important aspects are the network
characteristics and computer processing power
12Applications Simple Predicates
- Applications (think of them as queries) that
access a database often select out a subset of
the data according to a predicate a Boolean
expression - A simple predicate pj is one of the form
- pj Ai ? Value
- where Ai is an attribute, ? is a comparison
operator, and Value is a domain value
13Horizontal Fragmentation Minterm Predicates
- In designing horizontal fragmentation we need to
identify the chunks of tuples in the database
that are accessed as a unit by applications/querie
s - Each chunk is defined by a minterm predicate,
which is a combination of simple predicates - These chunks are then candidates for horizontal
fragments
14Minterm Predicates
- It is sometimes (often?) possible to eliminate
some minterm predicates, based on our
understanding of the database semantics - This occurs when one minterm predicate implies
another
15Minterm Predicates (cont)
- As well as knowing what the minterm predicates
are, it is important to know (typically) how many
tuples would be selected by each, and - How frequently the tuples selected by each are
accessed (this is closely related to query access
frequency)
16Derived Horizontal Fragmentation Again
- It often happens that a relation can have
alternate, and conflicting, possible derived
fragmentations - This may occur when it is related to two (or
more) parent relations through foreign keys
17Choosing a Derived Fragmentation
- Given two (or more) possible derived
fragmentations, how to choose between them? - The two factors are
- Application usage (the more frequently accessed
minterm is generally preferable) - Join characteristics (the design that gives
better overall join performance is preferable)
18Join Characteristics
- The performance of a join in a distributed
database benefits when - The relations, or fragments, to be joined are
small (i.e. have relatively few tuples) - The join can be processed in a genuinely
distributed fashion
19Join Graphs
- A join graph is a pictorial way of showing which
fragments of two relations participating in a
join might produce non-empty results - Each fragment is represented by a node and those
fragments that produce a non-empty join result
are connected by an edge (line)
20A Simple Join Graph
T1
PF1
T2
PF2
Fragments of team
Fragments of plays_for
T3
PF3
T4
PF4
21A Total Join Graph
P1
PF1
P2
PF2
Fragments of player
Fragments of plays_for
P3
PF3
P4
PF4
P5
22A Partitioned Join Graph
P1
PF1
P2
PF2
Fragments of player
Fragments of plays_for
P3
PF3
P4
PF4
P5
23Vertical Fragmentation
- Analogously to horizontal fragmentation, the
purpose of vertical fragmentation is to group
together attributes into chunks that are
usually accessed together by user
applications/queries - Vertical fragmentation is more complex to design
than horizontal fragmentation
24Approaches to Vertical Fragmentation
- There are two basic approaches to vertical
fragmentation - Splitting originally designed relations
- Grouping individual attributes from scratch
- The first is preferred since
- The original relations (should) be well designed
- Appropriate fragments are likely to be closer to
relation size than individual attribute size
25Application Information for Vertical Fragmentation
- The application information needed for vertical
fragmentation design is - Which applications/queries use which attributes
- The frequencies with which the various
applications/queries are executed - The first of these can be captured in an
attribute usage matrix, then use of the second
results in a attribute affinity matrix
26The Player Relation
- Suppose we wish to determine how the
relationplayer(Name, Height, Gender, Address,
Weight, DOB, Telephone)should be vertically
fragmented - The queries that access player are shown on the
next slide
27Queries on Player
- q1 SELECT Name, DOB, Address, Telephone FROM
player WHERE Gender - q2 SELECT avg(Height), avg(Weight) FROM player
WHERE Gender - q3 SELECT Name, Height, Weight, DOB FROM player
WHERE Name LIKE - q4 SELECT Name, Address, Telephone FROM player
WHERE Name
28Attribute Usage Matrix
- This is a matrix that has attributes along one
axis and queries along the other - It contains a 1 in positions for which the
relevant query accesses the attribute of
interest, and 0 otherwise - The matrix for player and queries q1 to q4 is
shown on the next slide
29Attribute Usage Matrix
30Attribute Affinity Matrix
- This is a matrix that has the attributes of the
relation we are dealing with on both axes - To construct the matrix we need to know the
frequency of accesses (across all sites) to each
query - Then we can work out how often (relatively) each
pair of attributes are accessed together
31Query Characteristics
- Suppose there are three sites accessing the
player relation, with these relative
frequencies
32Attribute Affinity Matrix
33Attribute Clustering
- The next step is to group together the
attribute(s) that have an affinity for each other
this results in a clustered affinity matrix - There is a formal, so-called bond energy,
algorithm for this (given in Özsu Valduriez)
but we present an informal version that is easier
to understand although it is not, of course,
suitable for realistically complex situations
34Clustered Attribute Affinity Matrix
35Hybrid Fragmentation
- The nature of the applications/queries may
require a more complex fragmentation then just a
simple horizontal or vertical one - The technique of fragmentation is, however,
basically identical (e.g. apply the horizontal
fragmentation technique followed by the vertical
fragmentation method to one or more of the
horizontal fragments)
36Allocation
- The general problem of determining the optimal
site(s) to allocate the various fragments to is
impossibly complex - Clearly each fragment has to be stored at one
site at least however, there is also the choice
of replicating it at one or more other sites as
well
37Aim of Allocation
- The fundamental aim of allocation is to minimize
some cost function that may include - Inter-site communication costs
- Processing costs
- Data retrieval and updating costs
- Data storage costs
38Factors Affecting Allocation
- Factors that will influence the final allocation
include - Expected fragment sizes
- Query selectivity
- Ratio of retrievals to updates for each fragment
- The originating site(s) of queries/updates
- Site storage and processing capabilities
- Inter-site network communication capacities