Distributed Database Management Systems - PowerPoint PPT Presentation

1 / 81
About This Presentation
Title:

Distributed Database Management Systems

Description:

Asha Haji. Introduction. Dan Freiman. Definitions. Distributed Database. A logically interrelated collection of shared data physically distributed over a ... – PowerPoint PPT presentation

Number of Views:415
Avg rating:3.0/5.0
Slides: 82
Provided by: userfsC
Category:

less

Transcript and Presenter's Notes

Title: Distributed Database Management Systems


1
Distributed Database Management Systems
  • Dan Freiman
  • Jey Ping
  • Adam Greenbaum
  • Josh Simon
  • Asha Haji

2
Introduction
  • Dan Freiman

3
Definitions
  • Distributed Database
  • A logically interrelated collection of shared
    data physically distributed over a network.
  • Data split into fragments (segment of data).
  • Distributed DBMS (DDBMS)
  • The software system that permits the management
    of the distributed database and makes the
    distribution transparent to users.
  • Fundamental Principle
  • Transparency in distributed system makes the
    system appear centralized.

4
DDBMS Overview
  • Each site has a DBMS
  • Fragments (replicated or unique).
  • Linked by network.
  • Can handle local users.
  • Participates in at least one global requests.

5
Distributed Processing
  • A centralized database that can be accessed over
    a computer network.
  • Data is not physically distributed.
  • Ex. webdev.

6
Multidatabase System (MDBS)
  • A DDBMS in which each site maintains complete
    autonomy.
  • Export schema defines data accessible to remote
    users.
  • Unfederated have no local users.
  • Federated have local and remote users.
  • MDBA maintains global schema, global transaction
    management, and gateways.

7
DDBMS Types
  • Homogeneous
  • All sites use same DBMS.
  • Heterogeneous
  • Sites run different DBMSs.
  • DMBSs can have different data models.
  • Relational, Network, Object-Oriented, etc.

8
Homogeneous DDBMS
  • Easier to design and manage.
  • Allows for easy incremental growth.
  • Ex. Addition of a new site of existing nature.
  • Allows for easier parallel processing.

9
Heterogeneous DDBMS
  • Allows multiple independent databases to be
    integrated into on distributed database.
  • Must provide transparency of heterogeneity.
  • Gateways
  • Translates all requests made in local language.
  • Translates the model of all data transmitted.
  • Do not support transaction management.
  • Difficult to homogenize schemas from different
    models.

10
Database Interoperability Consortium (DBIOP)
  • Formed by Open Group as Specification Working
    Group to develop
  • A common SQL API for programmers to interact with
    any DBMS.
  • A common database protocol that enables universal
    inter-DBMS communication without a gateway.
  • A common network protocol that allows connections
    between any two DBMS.

11
Advantages of DDBMS
  • Location of Data
  • Reflects Organizational Structure
  • Each site has its own database containing
    information about that site.
  • Improved Shareability and Local Autonomy
  • Users can access data at other sites.
  • Local users have control of local data.
  • Local policies are easier to enforce.

12
Advantages of DDBMS
  • Improved Operation
  • Improved Performance
  • Data is located near users of frequent use which
    decreases communication costs.
  • Inherent parallel processing.
  • Improved Availability
  • Site failure only matters if system is trying to
    access that site.
  • Improved Reliability
  • If site fails, request can be rerouted to
    replicated data.

13
Advantages of DDBMS
  • Business Advantages
  • Economics
  • Several smaller computers may be cheaper than a
    large mainframe.
  • Processing at remote site require less data in
    communications.
  • Modular Growth
  • Very easy to add an office with a standard office
    database.
  • Integration
  • Allows for combining of several legacy databases
    into one DDBMS.
  • Different DBMS have different strengths best
    suited for different requirements.
  • Remaining Competitive

14
Disadvantages of DDBMS
  • Operation and Maintenance
  • Complexity
  • Must handle fragmentation/replication data.
  • Database Design More Complex
  • Must design for fragmentation/replication data.
  • Integrity Control More Difficult
  • Constraints based on remote data has
    communication costs and risks of communication
    failure.

15
Disadvantages of DDBMS
  • Lack of Standards
  • No widely used communication standard for
    databases.
  • No standardized tools or methodologies or convert
    a centralized DBMS to a DDBMS.
  • Lack of Experience
  • Little industry experience to examine before
    deciding when to use a DDBMS.

16
Disadvantages of DDBMS
  • Security
  • Replicated data must be controlled in multiple
    locations.
  • Communications must be secure.
  • Cost
  • Higher maintenance, software, and possibly
    hardware costs because of complexity.
  • Communication costs.
  • Higher labor costs.

17
DDBMS
  • Advantages
  • Reflects Organizational Structure
  • Improved Shareability and Local Autonomy
  • Improved Availability
  • Improved Reliability
  • Improved Performance
  • Economics
  • Modular Growth
  • Integration
  • Remaining Competitive
  • Disadvantages
  • Complexity
  • Cost
  • Security
  • Integrity control more difficult
  • Lack of Standards
  • Lack of Experience
  • Database Design More Complex

18
Architecture and Design
  • Jey Ping

19
Functions of a DDBMS
  • Extensions of the functions of a centralized DBMS
    to handle distributed data
  • Data communication - between sites
  • Data storage, retrieval, update store
    distribution details
  • User-accessible catalog distributed queries,
    remote access
  • Authorization services
  • Concurrency control
  • Recovery services recover from failures of
    individual sites or communication links

20
Reference Architecture
Global external schema
Global external schema
Global external schema
Global conceptual schema
Fragmentation schema
Allocation schema
Local mapping schema
Local mapping schema
Local mapping schema
Local conceptual schema
Local conceptual schema
Local conceptual schema
Local internal schema
Local internal schema
Local internal schema
21
Component Architecture
  • Each site has
  • Local DBMS component (LDBMS)
  • Local data storage, retrieval, update
  • Data Communications (DC)
  • Global System Catalog (GSC)
  • Distributed DBMS component (DDBMS)

22
Distributed Database Design Issues
  • Fragmentation
  • Allocation
  • Replication

23
Design information
  • Frequency transactions are run
  • The site from which a transaction is run
  • Performance criteria
  • The relations, attributes, tuples accessed
  • The type of access (read/write)
  • Predicates of read operations

24
Design objectives
  • Locality of reference
  • Improved reliability and availability
  • Acceptable performance
  • Balance storage capacities and costs
  • Minimal communication costs

25
Data Allocation Strategies
  • Centralized distributed processing
  • Fragmented
  • Complete replication
  • Selective replication

26
Centralized Design
  • Locality of reference Lowest
  • Reliability Availability Lowest
  • Performance Unsatisfactory
  • Storage Costs Lowest
  • Comm. Costs Highest

27
Fragmented Design
  • Locality of reference High
  • Reliability Availability Low/High
  • Performance Satisfactory
  • Storage Costs Lowest
  • Comm. Costs Low

28
Complete Replication Design
  • Locality of reference Highest
  • Reliability Availability Highest
  • Performance Read-optimized
  • Storage Costs Highest
  • Comm. Costs Read-low, update-high

29
Selective Replication Design
  • Locality of reference High
  • Reliability Availability Low/High
  • Performance Satisfactory
  • Storage Costs Average
  • Comm. Costs Low

30
Fragmentation
  • Advantages
  • Usage
  • Efficiency
  • Parallelism
  • Security
  • Disadvantages
  • Performance
  • Integrity

31
Fragmentation
  • Correctness rules
  • Completeness each item is found in at least one
    fragment
  • Reconstruction it is possible to use a
    relational operation to reconstruct the relation
  • Disjointness each item is found in only one
    fragment to minimize redundancy

32
Dilberts Insight ?
33
Transparencies and Date's Rules
  • Adam Greenbaum

34
Dates 12 Rules
To the user, a distributed system shall looketh
exactly like a non-distributed system
35
Dates 12 Rules
  • Physical Independence
  • Data Independence
  • Functional Requirements

36
Physical Independence
  • 1) Local autonomy local admin controls the
    local DB completely
  • 2) No central server no single point of failure
  • 4) Location independence all sites look the
    same from the users perspective
  • 9) Hardware independence
  • 10) OS independence

37
Data Independence
  • 5) Fragmentation independence data divided
    between sites arbitrarily
  • 6) Replication independence data duplicated
    arbitrarily
  • 11) Network independence
  • 12) Database independence supports different
    DBMSs at each site

38
Functional Requirements
  • 3) Continuous operation no downtime
  • 7) Distributed query processing queries access
    different sites simultaneously
  • 8) Distributed transaction processing all sites
    and system conform to ACID

39
Database Transparency
  • Transparency hide implementation details
  • Distributed DBMS looks exactly like a
    non-distributed DBMS

40
Types of Transparencies
  • Distribution transparency
  • Transaction transparency
  • Performance transparency
  • DBMS transparency
  • DDBMS can span multiple vendor DBMSs (Oracle,
    Postgre, etc)

All transparencies rarely implemented in single
distributed DBMS
41
Distribution Transparency
  • Database is a single, logical entity
  • Fragmentation transparency user does not know
    the data is split up
  • Location transparency user does not know where
    the data is

staffNo fname lname branchNo
Branch B001
Central Branch
staffNo position fname lname branchNo
staffNo fname lname branchNo
Branch B002
staffNo position
42
Location Transparency Example
Central Branch
SELECT fName, lName FROM Staff WHERE position
Manager
staffNo position
SELECT fName, lName FROM (SELECT FROM Staff
WHERE branchNo B001) WHERE staffNo IN SELECT
staffNo FROM (SELECT staffNo, position FROM Staff
WHERE position Manager) UNION SELECT fName,
lName FROM (SELECT FROM Staff WHERE branchNo
B002) WHERE staffNo IN SELECT staffNo
FROM (SELECT staffNo, position FROM Staff WHERE
position Manager)
Branch B001
Branch B002
staffNo fname lname branchNo
staffNo fname lname branchNo
43
Local Mapping Transparency
Central Branch
SELECT fName, lName FROM SELECT FROM Staff
WHERE branchNo B001 WHERE staffNo IN SELECT
staffNo FROM (SELECT staffNo, position FROM
Staff WHERE position Manager) UNION SELECT
fName, lName FROM SELECT FROM Staff WHERE
branchNo B002 WHERE staffNo IN SELECT
staffNo FROM (SELECT staffNo, position FROM
Staff WHERE position Manager)
AT Branch001
staffNo position
Branch B001
AT CentralBranch
Branch B002
AT Branch002
staffNo fname lname branchNo
staffNo fname lname branchNo
AT CentralBranch
44
Distribution Transparency, Cont.
  • Replication transparency user does not know
    data is duplicated
  • Naming transparency each sub-DB must have
    unique name
  • Map with name server (much like DNS)
  • Add aliases with scope
  • Local name
  • Ex LocalBranch
  • Globally unique identifier
  • Ex Manager_at_London.LocalBranch_at_Glasgow

45
Transaction Transparency
  • Maintain DB integrity and consistency
  • Atomicity is still the key
  • Divided into subtransactions represented by an
    agent

staffNo fname lname branchNo
Branch B001
Central Branch
staffNo position fname lname branchNo
staffNo fname lname branchNo
Branch B002
staffNo position
46
Concurrency transaction
  • Each subtransaction occurs
  • concurrently
  • Each subtransaction is independent
  • Replication (need to propagate changes to all
    copies)
  • Synchronization

47
Failure Transaction
  • Failure transaction
  • If one subtransaction fails and rolls back, what
    happens to entire transaction/other
    subtransactions?
  • Loss of a message
  • Failure of a communications link
  • Failure of entire site
  • Network partitioning

48
Performance Transparency
  • DDBMS must respond like a non-distributed system
  • Similar to query optimizing by query processor,
    except need to consider
  • Which fragment?
  • Which copy of the fragment, if replication?
  • Which location?
  • Must take into account
  • I/O bottlenecks of each system
  • CPU time on each
  • Communication cost over links (dominant)

49
Transparency Overview
staffNo fname lname position
staffNo salary bonus
DDBMS
staffNo spouse dependent policyNum
50
Problems with Transparency
  • Gray (1989)
  • Poor message performance
  • Poor management
  • Poor modularity

51
Advanced Topics
  • Josh Simon

52
Wait! Ive seen all this before!
  • This section covers things we have done before,
    but with a distributed aspect to it
  • Distributed Transaction Management
  • Distributed Currency Control
  • Distributed Deadlock Management
  • Distributed Query Optimization

53
Distributed Transaction Management
  • Just like standalone, we have
  • Transaction manager, scheduler, recovery manager,
    buffer manager
  • Unlike standalone, we have
  • Global transaction manager (Transaction
    Coordination)

54
Distributed Currency Control
  • Objectives
  • Be resilient to site and communication failure
  • Permit parallelism to satisfy performance reqs
  • Incur modest computational/storage overhead
  • Perform in a network with a lot of network delay
  • Place few constraints on the structure of atomic
    actions

55
Old Stuff and New Stuff
  • Lost Update Problem
  • Uncommitted Dependency
  • Inconsistent Analysis
  • Multiple Copy Consistency Problem
  • Can occur if the DB is replicated
  • Updates can be done synchronously or
    asynchronously

56
Distributed Serializeability
  • We use a global schedule
  • Union of all of the local schedules
  • Locking guarantees
  • Timestamping guarantees
  • If the DB is fragmented, but not replicated then
    all of the protocols for a single DB can be used

57
Locking Protocols
  • Centralized 2PL
  • Primary Copy 2PL
  • Distributed 2PL
  • Majority Rule

58
Distributed 2PL
  • Each site has its own lock mananger
  • If data is not replicated, then is the same as
    primary copy 2PL
  • Else, 2PL implements Read-One-Write-All Protocol
    (ROWA)
  • Any copy can be read, but all copies must be
    locked before an item can be updated
  • This avoid the drawbacks of centralized control
  • Basically an extension of Primary Copy 2PL
  • Deadlock handling is more complex, and
    communication costs are higher than primary copy
    2PL

59
Majority Locking
  • When a transaction wants to read or write, it
    must obtain the lock on more than 50 of the
    sites
  • Essentially the same as Distributed 2PL, but it
    does not rely on ROWA, so it is more efficient.
  • This plan avoid the drawbacks of centralized
    control, but is a more complicated protocol, with
    more complicated deadlock.

60
Timestamp Protocols
  • Exactly the same as before, except we add a
    location identifier to the stamp.
  • Timestamps with older timestamps get priority
  • In a conflict (two different sites have the same
    timestamp), tie goes to the earlier location

61
Distributed Deadlock Management
  • Centralized Deadlock Detection
  • A single site is appointed the Deadlock Detection
    Coordinator (DDC)
  • Hierarchical Deadlock Detection
  • Every site sends its local WFG to the deadlock
    detection side above it.
  • Distributed Deadlock Detection

62
Hierarchical Deadlock Detection
1234
34
12
1
2
3
4
63
Distributed Database Recovery
  • Why things fail
  • Loss of message
  • Failure of communications link
  • Failure of a site
  • Choosing the correct time-out value is difficult

64
How Failure Affects Recovery
  • Abort any transactions affected by failure
  • Flag the site as failed
  • Check periodically to see if site has recovered
  • On restart, the site must initiate recovery
    procedures
  • After local recovery, the site must update its
    copy to make it consistent with the rest of the
    system
  • If a network partition occurs, the DDBMS must
    ensure that communication is reestablished

65
Distributed Recovery Protocols
  • A failure at one site should not leave other
    sites hanging.
  • This is called non-blocking
  • Every global transaction should have a
    coordinator or transaction manager
  • All of the sites at which the transaction occurs,
    are called participants

66
Two-Phase Commit
  • Voting Phase and a Decision Phase
  • The coordinator asks all of the sites if they are
    ready to commit. Any single site can abort
    (unilateral abort), which aborts for everyone.
  • Coordinator sends PREPARE. Everyone responds
    either READY_COMMIT or ABORT
  • If ABORT is received, Coordinator sends a
    GLOBAL ABORT
  • If everyone sends READY_COMMIT, Coordinator
    sends a GLOBAL COMMIT
  • If a participant times out, it defaults to abort
  • At this point, termination and recovery protocols
    are implemented.
  • 3 PHASE COMMIT

67
Protocols
  • Termination Protocols
  • Coordinator or Participant can time out
  • Recovery Protocols
  • Coordinator can fail in Initial, Waiting, Decided
    state
  • Participant can fail in Initial, Prepared,
    Aborted/Committed state
  • If the participants lose their coordinator, they
    can elect a new one

68
Network Partitioning
  • Problems of identifying updates
  • Maintaining integrity
  • 2 separate groups that cant talk to each other

69
Distributed Query Processing
  • Query Decomposition
  • Takes a query expressed on the global relation
    and performs partial opimization
  • Data Localization
  • Takes into account how the data is distributed.
    Replaces global relations with their
    reconstruction algorithms
  • Global Optimization
  • Optimization strategy which uses the DDBMS to
    make an execution strategy based on the
    fragments. Parts of the query are sent to the
    local DBMSs
  • Local Optimization
  • Optimization described in the last lecture done
    at the local site level.

70
Special TopicMobile Agents for Wireless Access
to Distributed Databases
  • Asha Haji

71
Setbacks of Wireless for DDBMS
  • With wireless, DDBMS no longer necessarily
    provides
  • Fast, reliable, cheap communication
  • Robust and research rich servers
  • Stationary and fixed locations of participants

72
What are mobile agents?
  • Encapsulated processes dispatched from a source
    computer to accomplish a specific task
  • data and execution state
  • autonomous from sender
  • can self-transport, spawn, interact

73
And theyre so good, because?
  • Computation pushed from host to fixed network
  • Migrate to mobile clients
  • Flexibility in shifting load to and from mobile
    host depending on bandwidth

74
How can I use mobile agents?
  • Proposed models for mobile computation
  • client/server (no mobile agents!)
  • mobile agent model
  • client/agent/server model
  • client/intercept/server model

75
The Bad Client-Server Model
  • What weve always seen!
  • Client downloads JDBC driver with every single
    database request
  • Ridiculous amount of overhead!
  • A JDBC driver can range from 300-500Kb!

76
The Not as Bad Mobile Agent Model
  • Client deploys mobile agent instead
  • Mobile agent requests database access on
    server-side
  • Brings back the goods to the client
  • Still overhead generate agent with every
    request!

77
The Better Client/Server/Agent Model
  • Deploy a database agent to server, have all
    traffic go through it
  • Serve multiple clients
  • Connection for duration of application
  • You can also
  • Have a messenger agent go back and forth between
    client and database agent with results and
    requests
  • Or send two types of messages results and
    requests

78
The Also Better Client/Server/Intercept Model
  • Deploy database agent to network, but also create
    client-side agent
  • Whats the point of the client-side agent?
  • intercept and control communication to reduce
    network traffic and query processing

79
The Benefits of Mobile Agents
  • Significant increase in performance compared to
    client/server model
  • Increased ability to follow mobile clients
  • Dual role as computation mechanism but also
    network communication aide

80
Mobile agent platforms out there
  • Non-Java and Java-based platforms
  • Some available platforms include
  • IBMAglets Workbench
  • ObjectSpaces Voyager
  • Mitsubishis Concordia
  • IKV Grasshopper
  • General Magics Odyssey
  • TACOMA
  • Agent Tcl

81
References
  • Connolly T. and Begg C. (2005). Database Systems
    A Practical Approach to Design, Implementation,
    and Management 4th ed. New York, NY Addison
    Wesley. Chapters 22-24.
  • Database Interoperability Consortium
  • http//www.opengroup.org/dbiop/
  • Multidatabase Systems
  • Sheth A. and Larson J.L. (1990). Federated
    Databases Architectures and Integration. ACM
    Computing Surv., Special Issue on Heterogeneous
    Databases, 22(3), 183-236.
  • Allocation
  • Ozsu M. and Valduriez P. (1999). Principles of
    Distributed Systems 2nd ed. Englewood Cliffs, NJ
    Prentice-Hall.
  • Disadvantages
  • Gray, Jim. TRANSPARENCY IN ITS PLACE -- The Case
    Against Transparent Access to Geographically
    Distributed Data http//research.microsoft.com/g
    ray/papers/Transparency.doc
  • Özsu, M. and Patrick Valduriez. Distributed Data
    Management Unsolved Problems and New Issues.
    http//db.uwaterloo.ca/ddbms/publications/ozsu/di
    stdb/UnsolvedIssues.ps
Write a Comment
User Comments (0)
About PowerShow.com