Objectives of these slides: - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

Objectives of these slides:

Description:

A distributed database dictionary, directory and catalog are essential, probably ... how to bring a server back online so that it contains the same version of its ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 50

Provided by: AndrewD86

Category:

more less

Transcript and Presenter's Notes

Title: Objectives of these slides:

1
12. Distributed Data Management

Objectives of these slides
examine the main issues related to data
distribution

2
Overview

1. Data Distribution
2. The Relational Data Model
3. Methods of Data Distribution
4. Data Placement Considerations
5. Distributed Data Access
6. Database Transaction Management

3
1. Data Distribution

Data is distributed to multiple sites (servers)
main aim is to make the distribution transparent
to the end users and the business logic inside
applications
The DBMS (database management system) also should
be distributed.
How and when should data be distributed?
What are the best locations for data?

4
Advantages

Efficient data access
Application load balancing
Application and end-user scalability
Matches international business models
Replication of data makes failure less critical

5
Disadvantages

Management complexity
Data synchronization
transaction processing

6
2. The Relational Data Model

SQL (Structured Query Language)
DDL (Data Definition Language)
DML (Data Manipulation Language)

Customer Table
SS
Saving Acct
Check Acct
Name
ad
12345
67823
3442

Savings
Acct
Bal
SS
67823
3442
23

7
3. Methods of Data Distribution

3.1. Manual Extraction (read-only)

Branch B1
Customer
Exchange
Savings
copy (push or pull)
Exchange
Branch B2
Exchange
Central Office
8
3.2. Snapshot (read-only)

create snapshot lttable-namegt as select time
lthhmmssgt, interval lthhmmssgt
Frequency?
e.g. exchange rate changes once per day

9
3.3. Replication

How is data consistency maintained between the
original and its copies?
synchronous or asynchronous updates
Replication management
Location transparency
the aim is to make the user/application think
that there is a single table at a single site

10
3.4. Fragmentation
Customer Table
SS
Saving Acct
Name

Horizontal

ad
12345
3442

Customer Table

set
34256
5001
SS
Saving Acct
Name

ad
12345
3442

set
34256
5001
po
00912

5002

jim
78123

9887
11

Vertical

Accounts
Saving Acct

Customer Table

SS
Saving Acct
12345
Name

ad
12345
3442

34256

00912

set
34256
5001

po
00912

5002
78123

jim
78123

9887
12
4. Data Placement Considerations

Increase local autonomy
Decrease network traffic (to/from original)
Data type
read/write, size, local use
Cost of distribution/replication
for data and application code
Data management

13
Data Location Analysis

Assign costs to site read and update messages
per day.
Distribute the data to reduce the cost of the
network message traffic.

14
Read/Update Messages/day
Site
S1
B1
2000
1000
Read
500
100
Update
15
Data Location Analysis Matrix
Location of Data
MessageTraffic
S1
B1
Configuration
?
1100
1
?
2500
?
?
2
?
600
?
3
16
5. Distributed Data Access

Answer queries by accessing data located on other
machines.
The (distributed) DBMS tries to hide the physical
location of the data.

17
5.1. Remote Request

The client application issues a single data
request to be processed at a single remote server.

SQL query
Customer
Client
Server1
continued
18

SQL query
select from server1.bankDB.customer where
server1.bankDB.customer.city Bangkok
Queries like this can be used for manual
extraction and snapshots.
User-level queries should not need to specify
server1.

19
5.2. Remote Transaction

The client application issues multiple data
requests (all part of a single transaction) to a
single remote server.

begin work
SQL query 1
Customer
Client

SQL query 2
Branch
commit
Server1
continued
20

SQL query
begin work select from server1.bankDB.custome
r where server1.bankDB.customer.city
Bangkok update server1.bankDB.branch
set posted_ind yescommit work

21
5.3. Distributed Transaction

The client application issues multiple data
requests (all part of one distributed
transaction) to multiple remote servers.

begin work
SQL query 1
Employee
Medical
SQL query 2
Client
ServerB1
ServerM
commit
continued
22

SQL query
begin work select from serverB1.bankDB.employ
ee where serverB1.bankDB.employee.edLevel
mba select from
serverM.medDB.medical where
serverM.medDB.medical.branch B1 commit
work

23
5.4. Distributed Request

The client application issues multiple data
requests (all part of one distributed
transaction) to multiple remote servers and each
request can refer to multiple sites.

continued
24
Employee
begin work
Client
SQL query 1
ServerB1
join
commit
SQL query 2
Medical
Branch
ServerM
Server1
continued
25

SQL query
begin work select from serverB1.bankDB.employ
ee B1, serverM.bankDB.medical M where
B1.ID M.ID and B1.edLevel mba
update server1.bankDB.branch set
reported yes where server1.bankDB.branch
B1 commit work

26
6. Database Transaction Management

A database transaction should possess the
properties
atomicity
consistency
isolation
durability
serializability
A transaction is an all or nothing task.

continued
27

SQL contains the commands COMMIT and ROLLBACK for
explicit transaction management.
Implicit transaction management
transaction completion ? COMMIT
transaction failure ? ROLLBACK

28
The Two-phase Commit Protocol

A coordinator (which may be the client) sends its
queries to the servers, and waits for replies
the prepare phase
If a server can answer its query then it responds
with an ok (or similar).

continued
29

If all the servers respond with ok, then the
coordinator tells them to commit
the commit phase
If any of the servers do not respond with ok,
then the coordinator tells all the servers to
rollback.

30
Distributed Two-phase Commit
Tree master(coordinator)
Syncpointresponses
Syncpointrequest
Slave/Master
Slave/Master
Slave
Slave
31
Deadlock

Updates are usually carried out by locking the
record (or group of records) being changed.
Two (or more) applications can become deadlocked
if they wait for each other to release a lock.

db1
c1
1. lock
2. wait
lock record
2. wait
db2
c2
lock record
1. lock
32
Deadlock Detection

Automatic time-out followed by transaction
failure (and retry).
Asynchronous updates.
Deadlock avoidance algorithms.

33
13. Distributed Data Management System Design

Objectives of these slides
examine some of the issues related to distributed
database management design

34
The Main Problem

How to obtain information from a system that is
distributed?
e.g. where is the customer database?
e.g. what queries does the customer database
understand?

Overview
1. Metadata 2. Dates DDBMS Rules 3. Other
Issues
35
1. Metadata

Information about a databases resources
the data types, attributes, keys, indexes, etc.
stored in a database dictionary
Information about database location, machine, and
network characteristics
stored in a database directory
Relational DBMS sometimes hold information about
the relationships between databases
stored in a system catalog.

36
Benefits of Metadata

improved project management
reduced maintenance costs
automated information sharing
queries can be made data location independent

37
Drawbacks

the difficulty of creating metadata for legacy
systems
heterogeneous metadata formats

38
Centralised or Distributed Metadata?

Centralised
simple to manage, maintain, etc.
a bottleneck
a single point of failure
Distributed
distribution issues (e.g. how to divide, where to
locate)
data synchronisation

39
2. Dates DDBMS Rules

2.1. Local Autonomy
The local DBMS should manage its own security,
data integrity, data consistency, locking, and
recovery.
2.2. No Reliance on a Central Site
But an occasional coordinator is still
required, for example for two-phase commits.
2.3. Continuous Operation
This includes online backups, archiving, recovery.

2.4. Data Location Independence
Users and business logic should be unaware that
the data is distributed (they see a single local
database).
A distributed database dictionary, directory and
catalog are essential, probably replicated at
every site and synchronised.

2.5. Data Fragmentation Independence
Users and business logic should be unaware of
fragmentation (they see a single database).
At the DBMS level, SQL queries may require joins
(to combine vertical fragments) and unions (to
combine horizontal fragments).

2.6. Data Replication Independence
Replication is transparent to users and business
logic.
This requires the DBMS to use a hidden two-phase
commit protocol (or similar) when performing
transactions.

continued
43

Code transformation
update bankDB.savings set amt amt -
withdrawal where bankDB.savings.acctno
12345
becomes
begin work update serverB1.bankDB.savings
set amt amt - withdrawal where
serverB1.bankDB.savings.acctno 12345
update serverB2.bankDB.savings set amt amt
- withdrawal where serverB2.bankDB.savings.a
cctno 12345commit work

2.7. Distributed Query Processing
The DBMS optimises the users query
rule-based optimisations
cost-based optimisations
Example query
select serverS1.bankDB.employees E,
serverS2.bankDB.managers M where E.ID M.ID
and E.edLevel MBA and
M.edLevel MBA

continued
45
Server S1
DDM
employees
catalog
100,000 rows
SQLrequest
managers
client
catalog
Server S2
1,000 rows
continued
46

Rule-based Optimisation
specify an execution order for the parts of a
query
e.g. search employees first (100,000 records) and
then compare with managers (1,000 records)
Cost-based Optimisation
Estimate the cost of a query by considering
the number of rows being processed
the data properties (e.g. keys, types)
machine/network speeds
I/O operations
e.g. copy the managers DB to the employees server
and join the databases

2.8. Distributed Transaction Management
ACIDS properties
distributed locking
deadlock detection
Some commercial products (see chapter 13, Bersen
for details)
IBM CICS (Customer Information Control System)
Novell TUXEDO
Transarc Encina

2.9. Hardware Independence
2.10. OS Indepedence
2.11. Network Independence
2.12. DBMS Independence

49
3. Other Issues

Administration
backups, recovery, DB creation and modification
Security
protection of data from illegal disclosure,
modification, or destruction
access rights, passwords, encryption
very much harder in a distributed environment
Currency control
how to bring a server back online so that it
contains the same version of its database as on
other servers?
the aim is to make the database current with its
other versions