Title: Distribution Transparency and Fragmentation
1Distribution Transparency and Fragmentation
- Introduction to distribution transparency and
reference architecture - Levels of transparency
- Fragmentation
- Relational algebra
- Fragmentation of relations
- Horizontal fragmentation
- Derived horizontal fragmentation
- Vertical fragmentation
2Distribution Transparency and Fragmentation
- Distribution and Fragmentation (of data or
records) - Data distribution data items are distributed at
several sites (for reliability, performance,
security, etc..) - Fragmentation Divide a global table into several
fragments (smaller tables) - How? Normally, based on how the applications at
each site access the data items (close to the
applications) - Performance objectives are to minimize the access
delay and data management cost. - I.e., if a transaction wants to access (read or
write) several fields of a record and the fields
are distributed at several sites, the access cost
will be heavy - Cost for maintaining database integrity
3Introduction to Distribution Transparency
- A distributed database system should look like a
centralized database system to the applications - Distribution transparency
- The applications do not
- need to know how the database is partitioned
(fragmented) and - how they are distributed when they access the
database - How to achieve this? Similar to the default path
in a file system (I.e., define a link) - Advantages of distribution transparency
- Programs can be written as if the database is not
distributed (more flexible as the mapping can be
changed) (similar to the situation of using a
variable for referring to a file) - Reallocation the correctness of a program is
unaffected by the reallocation of data from one
site to another site (a system administration job)
4Introduction to Distribution Transparency
- Separation of the concept of data fragmentation
from the concept of data location (fragmentation
transparency and location transparency) - A fragment can have multiple physical storages
- Explicit control of redundancy at fragment level
(replication transparency)
5Data Fragmentation
- Disadvantages of data fragmentation
- If the data are not fragmented appropriately, the
benefits may become problems (higher access cost
and management cost) (Good methods are needed) - If the applications have conflicting requirements
which prevent decomposition of the relation into
mutually exclusive fragments, those applications
whose views are defined on more than one
fragments may suffer performance degradation - The problems in semantic data control, integrity
checking (data management) etc.. - More fragments, more overheads have to be spent
on that issues. - Consider the update problem and referential
integrity problem
6Reference architecture for distribution
transparency
Global Schema
Site Independent Schemas - do not depend on the
data model of the local DBSs
Fragmentation Schema
Allocation Schema
Site 1
Site n
...
Local Mapping Schema n
Local Mapping Schema 1
DBS
DBS
Local Database
Local Database
7Reference architecture for distribution
transparency
- Global Schema
- defines the data which are maintained by the
system as they are not distributed at all - consists of a set of global relations (from
applications) - Fragmentation Schema
- split each global relation into several
non-overlapping fragment(s) (1 to 1 / 1 to many
relation) - fragments are logical portions of global
relations which are physically located at one or
several sites - ith fragment of global relation R Ri
8Reference architecture for Distribution
Transparency
- Allocation Schema
- defines the location of a fragment, at which
site(s) - redundant a fragment maps to several sites
- non-redundant a fragment maps to 1 site
- physical image of global relation R at site I
- all the fragments which correspond to the same
global relation R at site j - Rji (j - site number, i - ith fragment of R)
9Reference architecture for Distributed
Database(DDB)
- Local Mapping Schema
- map the physical images to the data which are
manipulated by the local DBSs - may have different types of local mappings at
different sites (depends on the database type and
implementation, heterogeneity)
10Levels of Transparency
- Level 1 Fragmentation Transparency
- All the three parameters (fragmentation,
allocation, mapping) are transparent to
applications - Applications consider all relations as
centralized and are unaware of their
fragmentations, allocation and local mapping - Level 2 Location Transparency
- Allocation and local mapping of relations are
transparent, but fragmentation is not - Applications must specify the referred fragments
11Levels of Transparency
- Level 3 Local Transparency
- Only local mapping of relations are transparent.
- Applications have to define specific fragments
and their precise locations. - Level 4 No Transparency
- Application programmers must take responsibility
in defining all the three parameters -
12Levels of Transparency
Level 1 Fragmentation Transparency
Site 1
No Special Specification
SUPPLIER1
DDBS
read (terminal, SNUM) select NAME into
NAME from SUPPLIER where SNUM
SNUM write(terminal, NAME)
Site 2
SUPPLIER2
Site 3
SUPPLIER2
13Levels of Transparency
Level 2 Location Transparency
Site 1
Fragment Specification
DDBS
SUPPLIER1
read (terminal, SNUM) select NAME into
NAME from SUPPLIER1 where
SNUM SNUM if not (FOUND) then
select NAME into NAME from
SUPPLIER2 where SNUM SNUM
endif write(terminal, NAME)
Site 2
SUPPLIER2
Site 3
SUPPLIER2
14Levels of Transparency
Level 3 Local Mapping Transparency
Site 1
Fragment and Location Specification
SUPPLIER1
read (terminal, SNUM) select NAME into
NAME from SUPPLIER1 at Site 1
where SNUM SNUM if not (FOUND) then
select NAME into NAME from
SUPPLIER2 at Site 3 where SNUM
SNUM endif write(terminal, NAME)
DDBS
Site 3
SUPPLIER2
15Review of Relational Model
- Data are stored in tables, called relations
- Each relation has a fixed number of columns
(attributes) and a dynamic number of rows
(tuples) - The relation name and the names of attributes
appearing in it is called the relation schema of
the relation - e.g. EMP (EMPNUM, NAME, AGE, DEPTNUM)
- No two identical tuples in the same relation
- No defined order on the tuples in a relation
EMPNUM NAME AGE DEPTNUM 3
Jones 27
1 7 Smith 34
2 11 Mary 31
1
16Relational Algebra
- Relational algebra A collection of operations on
relation, each of which takes one or two
relations as operands and produce one relation as
result - Unary Select, Projection
- Binary Union, Difference, Cartesian Product,
Join, Semi-Join, Natural Join, Natural Semi-Join
17Relational Algebra
Selection SL ltconditiongt R e.g. SL Aa R
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 1 d
R
18Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
Projection PJ ltattribute1,attribute2...gt
R e.g. PJA,B R
A B a 1 b 1 b 2
R
- Duplicated tuples (a, 1, d) are eliminated
19Relational Algebra
R UN S
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 3 f
A B C a 1 a b 1 b a 1 d b 2 f a 3 f
S
R
Union R1 UN R2 R1 and R2 must be same
relation schema e.g. R UN S duplicate tuples
are eliminated
20Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 3 f
A B C b 1 b a 1 d b 2 f
S
R DF S
R
Difference R1 DF R2 R1 and R2 must be same
relation schema e.g. R DF S same
relation schema, all tuples appearing in R but
not in S
21A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 3 f
R CP S
S
R.A R.B R.C S.A S.B S.C a
1 a a 1 a
b 1 b a 1
a a 1 d a 1
a b 2 f a
1 a a 1 a a
3 f b 1 b
a 3 f a 1 d
a 3 f b 2
f a 3 f
R
Cartesian Product R1 CP R2 if two
attributes with same name in R1 and R2, they are
nevertheless considered different attributes in
order to avoid ambiguity. The name of its each
attribute is prefixed with the name of its
original relation
22Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A R.B R.C T.B T.C D a
1 a 1 a 1 a
1 a 2 a
3 b 1 b 3 b
1 a 1 d 1
d 4
R
B C D 1 a 1 3 b 1 3 c 2 1 d 4 2 a 3
R JN R.C T.C T
Join R1 JN ltjoin predicategt R2 SL ltjoin
predicategt (R1 CP R2)
T
23Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a b 1 b a 1 d
R SJ R.C T.C T
R
Semi-Join R1 SJ ltjoin predicategt R2 PJ
attribute(R1) (R1 JN ltjoin predicategt
R2) where attribute(R1) the set of all
attributes of R1
B C D 1 a 1 3 b 1 3 c 2 1 d 4 2 a 3
T
24Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A B C D a 1 a 1 a 1 d 4
R
R NJN T
B C D 1 a 1 3 b 1 3 c 2 1 d 4 2 a 3
Natural Join R1 NJN R2 values with same
attribute name in R1 and R2 must be equal one set
of attributes is omitted from the result
T
25Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 1 d
R NSJ T
R
Natural Semi-Join R1 NSJ R2 PJ
attribute(R1) (R1 NJN R2) where
attribute(R1) the set of all attributes of R1
B C D 1 a 1 3 b 1 3 c 2 1 d 4 2 a 3
T
26Fragmentation of Relations
- Rules for defining fragments
- Completeness
- each data item in the global relation must appear
in at least one of the fragments - Reconstruction
- it must always be possible to reconstruct a
global relation from its fragments - Disjointness
- fragments have to be disjoint
27Fragmentation of Relations
- Types of fragmentations
- horizontal fragmentation
- Select operation
- derived horizontal fragmentation
- Semi-Join operation with another relation
- vertical fragmentation
- Project operation
28Horizontal Fragmentation
- Partition the tuples of a global relation into
(disjoint) subsets - Usually, each subset contains the data which have
common (geographical) properties, I.e., grouping
the suppliers in the same city into a fragment - It can be defined by using Select operation on
the global relation
SUPPLIER(SNUM, NAME, CITY) SUPPLIER1 SL
CITYSF (SUPPLIER) SUPPLIER2 SL CITYLA
(SUPPLIER)
29Horizontal Fragmentation
- Assumption SF and LA are the only possible
values of CITY attribute - Completeness
- all tuples belong to either SUPPLIER1 or
SUPPLIER2 - Reconstruction
- SUPPLIER SUPPLIER1 UN SUPPLIER2
- Disjointness
- SUPPLIER1 ? SUPPLIER2 ?
30Horizontal Fragmentation
In general completeness of k - fragments of R
Reconstruction of R R R1 UN R2 UN .. UN
Rk
Update problem If the update is performed on an
attribute determining a horizontal
fragmentation, then it may happen that the
update tuple has to be moved to another fragment
31Derived Horizontal Fragmentation
- In some cases, horizontal fragmentation may be
derived from the horizontal fragmentation of
another relation
SUPPLIER(SNUM, NAME, CITY) SUPPLY(SNUM, PNUM,
DEPTNUM, QUAN)
- To partition SUPPLY so that a fragment contains
the tuples for suppliers which are in a given
city. However, city is not an attribute of the
SUPPLY relation. hint using SNUM and join
SUPPLY1 SUPPLY SJSNUM SUPPLIER1 SUPPLY2
SUPPLY SJSNUM SUPPLIER2
32Derived Horizontal Fragmentation
- Completeness is implied by referential integrity
- every supplier number (SNUM) appearing in SUPPLY
appears in SUPPLIER as well - So, when you update a record in supplier, you may
also need to check whether you need to move
supply also - Reconstruction is maintained by UN
- Disjointness can be provided in all cases (in
general situations). In this example, SNUM is a
unique key of SUPPLIER
33Vertical Fragmentation
- Subdivision of a global relations attributes
into groups - Fragments are obtained by projecting the global
relation over each group - Fragmentation is correct if each attribute maps
to at least one attribute of the fragments
(completeness)
EMP(EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM) EMP1
PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMP EMP2 PJ
EMPNUM, SAL, TAX EMP
34Vertical Fragmentation
- Completeness of k fragments of R
- Reconstruction
- Must be lossless (discuss later)
- EMP EMP1 NJN EMP2
- Disjointness
- practically irrelevant (not necessary and usually
not required)
35Correctness of vertical decomposition
A B C 1 2 3 4 2 5
R
R1 PJ A,B (R)
A B 1 2 4 2
R ? R3
A B C 1 2 3 1 2 5 4 2 3 4 2 5
R3 R1 NJN R2
R2 PJ B,C (R)
B C 2 3 2 5
36Lossless Decomposition
- Theorem
- Given a relation R with a schema S and a
decomposition ? S1, S2, ? is lossless iff S1
? S2 is a key of R1 or R2 - Example
- EMPLOYEE(NAME, POSITION, SALARY)
- NAME ? POSITION? SALARY
- Lossless decomposition
- ?1 R1(NAME, POSITION), R2(POSITION,
SALARY) - EMPLOYEE R1 NJN R2
37Lossless Decomposition
- Lossy decomposition
- ?2 R3(NAME, SALARY), R4(POSITION,
SALARY) - EMPLOYEE ? R3 NJN R4
- S3 ? S4 is not a key of R3 or R4
38Updating fragments
EMPLOYEE(NAME, POSITION, SALARY)
(Smith, Programmer, 12000)
PJ
R1 (NAME, POSITION) (Smith, Programmer)
(Smith, Senior Programmer)
R2 (NAME, SALARY) (Smith, 12000)
updating
JN
EMPLOYEE (Smith, Senior Programmer, 12000)
F NAME ? POSITION, POSITION ? SALARY F1
NAME ? POSITION F2 NAME ? SALARY
39Updating fragments
EMPLOYEE(NAME, POSITION, SALARY)
(Smith, Programmer, 12000)
PJ
R1 (NAME, POSITION) (Smith, Programmer)
(Smith, Senior Programmer)
R2 (POSITION, SALARY) (Programmer, 12000)
(Senior Programmer, 20000)
updating
NJN
EMPLOYEE (Smith, Senior Programmer, 20000)
40Mixed Fragmentation
- EMP(EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)
vertical fragment - EMPNUM, NAME, SAL, TAX
EMP
- EMPNUM, NAME, MGRNUM, DEPTNUM
EMP4
DEPTNUM
horizontal fragment
EMP1
EMP2
EMP3
DEPTNUM gt 20
10 lt DEPTNUM lt 20
41Mixed Fragmentation
- EMP1 SL DNUM lt 10 PJ ENUM, NAME, MGRNUM, DNUM
(EMP) - EMP2 SL 10 lt DNUM lt 20 PJ ENUM, NAME, MGRNUM,
DNUM (EMP) - EMP3 SL DNUM gt 20 PJ ENUM, NAME, MGRNUM, DNUM
(EMP) - EMP4 PJ ENUM, NAME, SAL, TAX (EMP)
- Reconstruction
- EMP UN (EMP1, EMP2, EMP3) NJNENUM PJENUM,
SAL, TAX EMP4
42Fragmentation Example
- EXAMPLE_DDB
- Global Schema
- EMP(EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNM)
- DEPT(DEPTNUM, NAME, AREA, MGRNUM)
- SUPPLIER(SNUM, NAME, CITY)
- SUPPLY(SNUM, PNUM, DEPTNUM, QUAN)
- DEPT relation, horizontal fragmented into 3
fragments on the value of the DEPTNUM attribute
43Fragmentation Example
- Fragmentation Schema
- EMP1 SL DNUM lt 10 PJ ENUM, NAME, MGRNUM, DNUM
(EMP) - EMP2 SL 10 lt DNUM lt 20 PJ ENUM, NAME, MGRNUM,
DNUM (EMP) - EMP3 SL DNUM gt 20 PJ ENUM, NAME, MGRNUM, DNUM
(EMP) - EMP4 PJ ENUM, NAME, SAL, TAX (EMP)
- DEPT1 SL DNUM lt 10 (DEPT)
- DEPT2 SL 10 lt DNUM lt 20 (DEPT)
- DEPT3 SL DNUM gt 20 (DEPT)
- SUPPLIER1 SL CITYSF (SUPPLIER)
- SUPPLIER2 SL CITYLA (SUPPLIER)
- SUPPLY1 SUPPLY SJSNUM SUPPLIER1
- SUPPLY2 SUPPLY SJSNUM SUPPLIER2