Distributed Database Management Systems

About This Presentation

Title:

Distributed Database Management Systems

Description:

Asha Haji. Introduction. Dan Freiman. Definitions. Distributed Database. A logically interrelated collection of shared data physically distributed over a ... – PowerPoint PPT presentation

Number of Views:415

Avg rating:3.0/5.0

Slides: 82

Provided by: userfsC

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Database Management Systems

1
Distributed Database Management Systems

Dan Freiman
Jey Ping
Adam Greenbaum
Josh Simon
Asha Haji

2
Introduction

Dan Freiman

3
Definitions

Distributed Database
A logically interrelated collection of shared
data physically distributed over a network.
Data split into fragments (segment of data).
Distributed DBMS (DDBMS)
The software system that permits the management
of the distributed database and makes the
distribution transparent to users.
Fundamental Principle
Transparency in distributed system makes the
system appear centralized.

4
DDBMS Overview

Each site has a DBMS
Fragments (replicated or unique).
Linked by network.
Can handle local users.
Participates in at least one global requests.

5
Distributed Processing

A centralized database that can be accessed over
a computer network.
Data is not physically distributed.
Ex. webdev.

6
Multidatabase System (MDBS)

A DDBMS in which each site maintains complete
autonomy.
Export schema defines data accessible to remote
users.
Unfederated have no local users.
Federated have local and remote users.
MDBA maintains global schema, global transaction
management, and gateways.

7
DDBMS Types

Homogeneous
All sites use same DBMS.
Heterogeneous
Sites run different DBMSs.
DMBSs can have different data models.
Relational, Network, Object-Oriented, etc.

8
Homogeneous DDBMS

Easier to design and manage.
Allows for easy incremental growth.
Ex. Addition of a new site of existing nature.
Allows for easier parallel processing.

9
Heterogeneous DDBMS

Allows multiple independent databases to be
integrated into on distributed database.
Must provide transparency of heterogeneity.
Gateways
Translates all requests made in local language.
Translates the model of all data transmitted.
Do not support transaction management.
Difficult to homogenize schemas from different
models.

10
Database Interoperability Consortium (DBIOP)

Formed by Open Group as Specification Working
Group to develop
A common SQL API for programmers to interact with
any DBMS.
A common database protocol that enables universal
inter-DBMS communication without a gateway.
A common network protocol that allows connections
between any two DBMS.

11
Advantages of DDBMS

Location of Data
Reflects Organizational Structure
Each site has its own database containing
information about that site.
Improved Shareability and Local Autonomy
Users can access data at other sites.
Local users have control of local data.
Local policies are easier to enforce.

12
Advantages of DDBMS

Improved Operation
Improved Performance
Data is located near users of frequent use which
decreases communication costs.
Inherent parallel processing.
Improved Availability
Site failure only matters if system is trying to
access that site.
Improved Reliability
If site fails, request can be rerouted to
replicated data.

13
Advantages of DDBMS

Business Advantages
Economics
Several smaller computers may be cheaper than a
large mainframe.
Processing at remote site require less data in
communications.
Modular Growth
Very easy to add an office with a standard office
database.
Integration
Allows for combining of several legacy databases
into one DDBMS.
Different DBMS have different strengths best
suited for different requirements.
Remaining Competitive

14
Disadvantages of DDBMS

Operation and Maintenance
Complexity
Must handle fragmentation/replication data.
Database Design More Complex
Must design for fragmentation/replication data.
Integrity Control More Difficult
Constraints based on remote data has
communication costs and risks of communication
failure.

15
Disadvantages of DDBMS

Lack of Standards
No widely used communication standard for
databases.
No standardized tools or methodologies or convert
a centralized DBMS to a DDBMS.
Lack of Experience
Little industry experience to examine before
deciding when to use a DDBMS.

16
Disadvantages of DDBMS

Security
Replicated data must be controlled in multiple
locations.
Communications must be secure.
Cost
Higher maintenance, software, and possibly
hardware costs because of complexity.
Communication costs.
Higher labor costs.

17
DDBMS

Advantages
Reflects Organizational Structure
Improved Shareability and Local Autonomy
Improved Availability
Improved Reliability
Improved Performance
Economics
Modular Growth
Integration
Remaining Competitive

Disadvantages
Complexity
Cost
Security
Integrity control more difficult
Lack of Standards
Lack of Experience
Database Design More Complex

18
Architecture and Design

Jey Ping

19
Functions of a DDBMS

Extensions of the functions of a centralized DBMS
to handle distributed data
Data communication - between sites
Data storage, retrieval, update store
distribution details
User-accessible catalog distributed queries,
remote access
Authorization services
Concurrency control
Recovery services recover from failures of
individual sites or communication links

20
Reference Architecture
Global external schema
Global external schema
Global external schema
Global conceptual schema
Fragmentation schema
Allocation schema
Local mapping schema
Local mapping schema
Local mapping schema
Local conceptual schema
Local conceptual schema
Local conceptual schema
Local internal schema
Local internal schema
Local internal schema
21
Component Architecture

Each site has
Local DBMS component (LDBMS)
Local data storage, retrieval, update
Data Communications (DC)
Global System Catalog (GSC)
Distributed DBMS component (DDBMS)

22
Distributed Database Design Issues

Fragmentation
Allocation
Replication

23
Design information

Frequency transactions are run
The site from which a transaction is run
Performance criteria
The relations, attributes, tuples accessed
The type of access (read/write)
Predicates of read operations

24
Design objectives

Locality of reference
Improved reliability and availability
Acceptable performance
Balance storage capacities and costs
Minimal communication costs

25
Data Allocation Strategies

Centralized distributed processing
Fragmented
Complete replication
Selective replication

26
Centralized Design

Locality of reference Lowest
Reliability Availability Lowest
Performance Unsatisfactory
Storage Costs Lowest
Comm. Costs Highest

27
Fragmented Design

Locality of reference High
Reliability Availability Low/High
Performance Satisfactory
Storage Costs Lowest
Comm. Costs Low

28
Complete Replication Design

Locality of reference Highest
Reliability Availability Highest
Performance Read-optimized
Storage Costs Highest
Comm. Costs Read-low, update-high

29
Selective Replication Design

Locality of reference High
Reliability Availability Low/High
Performance Satisfactory
Storage Costs Average
Comm. Costs Low

30
Fragmentation

Advantages
Usage
Efficiency
Parallelism
Security
Disadvantages
Performance
Integrity

31
Fragmentation

Correctness rules
Completeness each item is found in at least one
fragment
Reconstruction it is possible to use a
relational operation to reconstruct the relation
Disjointness each item is found in only one
fragment to minimize redundancy

32
Dilberts Insight ?
33
Transparencies and Date's Rules

Adam Greenbaum

34
Dates 12 Rules
To the user, a distributed system shall looketh
exactly like a non-distributed system
35
Dates 12 Rules

Physical Independence
Data Independence
Functional Requirements

36
Physical Independence

1) Local autonomy local admin controls the
local DB completely
2) No central server no single point of failure
4) Location independence all sites look the
same from the users perspective
9) Hardware independence
10) OS independence

37
Data Independence

5) Fragmentation independence data divided
between sites arbitrarily
6) Replication independence data duplicated
arbitrarily
11) Network independence
12) Database independence supports different
DBMSs at each site

38
Functional Requirements

3) Continuous operation no downtime
7) Distributed query processing queries access
different sites simultaneously
8) Distributed transaction processing all sites
and system conform to ACID

39
Database Transparency

Transparency hide implementation details
Distributed DBMS looks exactly like a
non-distributed DBMS

40
Types of Transparencies

Distribution transparency
Transaction transparency
Performance transparency
DBMS transparency
DDBMS can span multiple vendor DBMSs (Oracle,
Postgre, etc)

All transparencies rarely implemented in single
distributed DBMS
41
Distribution Transparency

Database is a single, logical entity
Fragmentation transparency user does not know
the data is split up
Location transparency user does not know where
the data is

staffNo fname lname branchNo
Branch B001
Central Branch
staffNo position fname lname branchNo
staffNo fname lname branchNo
Branch B002
staffNo position
42
Location Transparency Example
Central Branch
SELECT fName, lName FROM Staff WHERE position
Manager
staffNo position
SELECT fName, lName FROM (SELECT FROM Staff
WHERE branchNo B001) WHERE staffNo IN SELECT
staffNo FROM (SELECT staffNo, position FROM Staff
WHERE position Manager) UNION SELECT fName,
lName FROM (SELECT FROM Staff WHERE branchNo
B002) WHERE staffNo IN SELECT staffNo
FROM (SELECT staffNo, position FROM Staff WHERE
position Manager)
Branch B001
Branch B002
staffNo fname lname branchNo
staffNo fname lname branchNo
43
Local Mapping Transparency
Central Branch
SELECT fName, lName FROM SELECT FROM Staff
WHERE branchNo B001 WHERE staffNo IN SELECT
staffNo FROM (SELECT staffNo, position FROM
Staff WHERE position Manager) UNION SELECT
fName, lName FROM SELECT FROM Staff WHERE
branchNo B002 WHERE staffNo IN SELECT
staffNo FROM (SELECT staffNo, position FROM
Staff WHERE position Manager)
AT Branch001
staffNo position
Branch B001
AT CentralBranch
Branch B002
AT Branch002
staffNo fname lname branchNo
staffNo fname lname branchNo
AT CentralBranch
44
Distribution Transparency, Cont.

Replication transparency user does not know
data is duplicated
Naming transparency each sub-DB must have
unique name
Map with name server (much like DNS)
Add aliases with scope
Local name
Ex LocalBranch
Globally unique identifier
Ex Manager_at_London.LocalBranch_at_Glasgow

45
Transaction Transparency

Maintain DB integrity and consistency
Atomicity is still the key
Divided into subtransactions represented by an
agent

staffNo fname lname branchNo
Branch B001
Central Branch
staffNo position fname lname branchNo
staffNo fname lname branchNo
Branch B002
staffNo position
46
Concurrency transaction

Each subtransaction occurs
concurrently
Each subtransaction is independent
Replication (need to propagate changes to all
copies)
Synchronization

47
Failure Transaction

Failure transaction
If one subtransaction fails and rolls back, what
happens to entire transaction/other
subtransactions?
Loss of a message
Failure of a communications link
Failure of entire site
Network partitioning

48
Performance Transparency

DDBMS must respond like a non-distributed system
Similar to query optimizing by query processor,
except need to consider
Which fragment?
Which copy of the fragment, if replication?
Which location?
Must take into account
I/O bottlenecks of each system
CPU time on each
Communication cost over links (dominant)

49
Transparency Overview
staffNo fname lname position
staffNo salary bonus
DDBMS
staffNo spouse dependent policyNum
50
Problems with Transparency

Gray (1989)
Poor message performance
Poor management
Poor modularity

51
Advanced Topics

Josh Simon

52
Wait! Ive seen all this before!

This section covers things we have done before,
but with a distributed aspect to it
Distributed Transaction Management
Distributed Currency Control
Distributed Deadlock Management
Distributed Query Optimization

53
Distributed Transaction Management

Just like standalone, we have
Transaction manager, scheduler, recovery manager,
buffer manager
Unlike standalone, we have
Global transaction manager (Transaction
Coordination)

54
Distributed Currency Control

Objectives
Be resilient to site and communication failure
Permit parallelism to satisfy performance reqs
Incur modest computational/storage overhead
Perform in a network with a lot of network delay
Place few constraints on the structure of atomic
actions

55
Old Stuff and New Stuff

Lost Update Problem
Uncommitted Dependency
Inconsistent Analysis
Multiple Copy Consistency Problem
Can occur if the DB is replicated
Updates can be done synchronously or
asynchronously

56
Distributed Serializeability

We use a global schedule
Union of all of the local schedules
Locking guarantees
Timestamping guarantees
If the DB is fragmented, but not replicated then
all of the protocols for a single DB can be used

57
Locking Protocols

Centralized 2PL
Primary Copy 2PL
Distributed 2PL
Majority Rule

58
Distributed 2PL

Each site has its own lock mananger
If data is not replicated, then is the same as
primary copy 2PL
Else, 2PL implements Read-One-Write-All Protocol
(ROWA)
Any copy can be read, but all copies must be
locked before an item can be updated
This avoid the drawbacks of centralized control
Basically an extension of Primary Copy 2PL
Deadlock handling is more complex, and
communication costs are higher than primary copy
2PL

59
Majority Locking

When a transaction wants to read or write, it
must obtain the lock on more than 50 of the
sites
Essentially the same as Distributed 2PL, but it
does not rely on ROWA, so it is more efficient.
This plan avoid the drawbacks of centralized
control, but is a more complicated protocol, with
more complicated deadlock.

60
Timestamp Protocols

Exactly the same as before, except we add a
location identifier to the stamp.
Timestamps with older timestamps get priority
In a conflict (two different sites have the same
timestamp), tie goes to the earlier location

61
Distributed Deadlock Management

Centralized Deadlock Detection
A single site is appointed the Deadlock Detection
Coordinator (DDC)
Hierarchical Deadlock Detection
Every site sends its local WFG to the deadlock
detection side above it.
Distributed Deadlock Detection

62
Hierarchical Deadlock Detection
1234
34
12
1
2
3
4
63
Distributed Database Recovery

Why things fail
Loss of message
Failure of communications link
Failure of a site
Choosing the correct time-out value is difficult

64
How Failure Affects Recovery

Abort any transactions affected by failure
Flag the site as failed
Check periodically to see if site has recovered
On restart, the site must initiate recovery
procedures
After local recovery, the site must update its
copy to make it consistent with the rest of the
system
If a network partition occurs, the DDBMS must
ensure that communication is reestablished

65
Distributed Recovery Protocols

A failure at one site should not leave other
sites hanging.
This is called non-blocking
Every global transaction should have a
coordinator or transaction manager
All of the sites at which the transaction occurs,
are called participants

66
Two-Phase Commit

Voting Phase and a Decision Phase
The coordinator asks all of the sites if they are
ready to commit. Any single site can abort
(unilateral abort), which aborts for everyone.
Coordinator sends PREPARE. Everyone responds
either READY_COMMIT or ABORT
If ABORT is received, Coordinator sends a
GLOBAL ABORT
If everyone sends READY_COMMIT, Coordinator
sends a GLOBAL COMMIT
If a participant times out, it defaults to abort
At this point, termination and recovery protocols
are implemented.
3 PHASE COMMIT

67
Protocols

Termination Protocols
Coordinator or Participant can time out
Recovery Protocols
Coordinator can fail in Initial, Waiting, Decided
state
Participant can fail in Initial, Prepared,
Aborted/Committed state
If the participants lose their coordinator, they
can elect a new one

68
Network Partitioning

Problems of identifying updates
Maintaining integrity
2 separate groups that cant talk to each other

69
Distributed Query Processing

Query Decomposition
Takes a query expressed on the global relation
and performs partial opimization
Data Localization
Takes into account how the data is distributed.
Replaces global relations with their
reconstruction algorithms
Global Optimization
Optimization strategy which uses the DDBMS to
make an execution strategy based on the
fragments. Parts of the query are sent to the
local DBMSs
Local Optimization
Optimization described in the last lecture done
at the local site level.

70
Special TopicMobile Agents for Wireless Access
to Distributed Databases

Asha Haji

71
Setbacks of Wireless for DDBMS

With wireless, DDBMS no longer necessarily
provides
Fast, reliable, cheap communication
Robust and research rich servers
Stationary and fixed locations of participants

72
What are mobile agents?

Encapsulated processes dispatched from a source
computer to accomplish a specific task
data and execution state
autonomous from sender
can self-transport, spawn, interact

73
And theyre so good, because?

Computation pushed from host to fixed network
Migrate to mobile clients
Flexibility in shifting load to and from mobile
host depending on bandwidth

74
How can I use mobile agents?

Proposed models for mobile computation
client/server (no mobile agents!)
mobile agent model
client/agent/server model
client/intercept/server model

75
The Bad Client-Server Model

What weve always seen!
Client downloads JDBC driver with every single
database request
Ridiculous amount of overhead!
A JDBC driver can range from 300-500Kb!

76
The Not as Bad Mobile Agent Model

Client deploys mobile agent instead
Mobile agent requests database access on
server-side
Brings back the goods to the client
Still overhead generate agent with every
request!

77
The Better Client/Server/Agent Model

Deploy a database agent to server, have all
traffic go through it
Serve multiple clients
Connection for duration of application
You can also
Have a messenger agent go back and forth between
client and database agent with results and
requests
Or send two types of messages results and
requests

78
The Also Better Client/Server/Intercept Model

Deploy database agent to network, but also create
client-side agent
Whats the point of the client-side agent?
intercept and control communication to reduce
network traffic and query processing

79
The Benefits of Mobile Agents

Significant increase in performance compared to
client/server model
Increased ability to follow mobile clients
Dual role as computation mechanism but also
network communication aide

80
Mobile agent platforms out there

Non-Java and Java-based platforms
Some available platforms include
IBMAglets Workbench
ObjectSpaces Voyager
Mitsubishis Concordia
IKV Grasshopper
General Magics Odyssey
TACOMA
Agent Tcl

81
References

Connolly T. and Begg C. (2005). Database Systems
A Practical Approach to Design, Implementation,
and Management 4th ed. New York, NY Addison
Wesley. Chapters 22-24.
Database Interoperability Consortium
http//www.opengroup.org/dbiop/
Multidatabase Systems
Sheth A. and Larson J.L. (1990). Federated
Databases Architectures and Integration. ACM
Computing Surv., Special Issue on Heterogeneous
Databases, 22(3), 183-236.
Allocation
Ozsu M. and Valduriez P. (1999). Principles of
Distributed Systems 2nd ed. Englewood Cliffs, NJ
Prentice-Hall.
Disadvantages
Gray, Jim. TRANSPARENCY IN ITS PLACE -- The Case
Against Transparent Access to Geographically
Distributed Data http//research.microsoft.com/g
ray/papers/Transparency.doc
Özsu, M. and Patrick Valduriez. Distributed Data
Management Unsolved Problems and New Issues.
http//db.uwaterloo.ca/ddbms/publications/ozsu/di
stdb/UnsolvedIssues.ps