Distributed Databases - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Distributed Databases

Description:

Shares database's logical processing among physically, networked independent sites ... Allows management of a physically dispersed database as though it were ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 38
Provided by: isabellebi
Category:

less

Transcript and Presenter's Notes

Title: Distributed Databases


1
Distributed Databases
2
Learning Objectives
  • What a distributed database management system
    (DDBMS) is and what its components are
  • How database implementation is affected by
    different levels of data and process distribution
  • How transactions are managed in a distributed
    database environment
  • How database design is affected by the
    distributed database environment

3
DDBMS
  • Decentralized database management systems (DDBMS)
    Distributed Databases are logically related
    data over interconnected computer systems in
    which both data and processing functions reside
    on multiple sites.

4
Evolution of DDBMS
  • Decentralized database management systems (DDBMS)
  • Interconnected computer systems
  • Data/processing functions reside on multiple
    sites
  • 1970s Centralized DBMS
  • 1980s Social and Technical Changes
  • Ad hoc capability required
  • Decentralized management structure common
  • 1990s New forces
  • Internet and the World Wide Web used for data
    access and distribution
  • Data analysis through data mining and data
    warehousing

5
DDBMS Advantages
  • Data located near site with greatest demand
  • Faster data access
  • Faster data processing
  • Growth facilitation
  • Improved communications
  • Reduced operating costs
  • User-friendly interface
  • Less danger of single-point failure
  • Processor independence

6
DDBMS Disadvantages
  • Complexity of management and control
  • Security
  • Lack of standards
  • Increased storage requirements
  • Greater difficulty in managing data environment
  • Increased training costs

7
Distributed Processing
  • Shares databases logical processing among
    physically, networked independent sites

Figure 10.1
8
Distributed Database
  • Stores logically related database over
    physically independent sites

Figure 10.2
9
Distributed Database vs. Distributed Processing
  • Distributed processing
  • Does not require distributed database
  • May be based on a single database on single
    computer
  • Copies or parts of database processing functions
    must be distributed to all data storage sites
  • Distributed database
  • Requires distributed processing
  • Both
  • Require a network to connect components

10
Functions of DDBMS
  • Application/end user interface
  • Validation to analyze data requests
  • Transformation to determine request components
  • Query optimization to find the best access
    strategy
  • Mapping to determine the data location
  • I/O interface to read or write data
  • Formatting to prepare the data for presentation
  • Security to provide data privacy
  • Backup and recovery
  • DB Administration
  • Concurrency Control
  • Transaction Management

11
Centralized Database
Figure 10.3
12
Fully Distributed Database Management System
Figure 10.4
13
DDBMS Components
  • Computer workstations
  • Network hardware and software components
  • Communications media
  • Transaction processor (TP)
  • Also called application manager (AP) or
    transaction manager (TM)
  • Data processor (DP)
  • Also called data manager (DM)

14
Distributed Database Components
Figure 10.5
15
DDBMS Protocols
  • Interface with network to transport data and
    commands between DPs and TPs
  • Synchronize data received from DPs and route to
    appropriate TPs
  • Ensure common database functions
  • Security
  • Concurrency control
  • Backup and recovery

16
Levels of Data and Process Distribution
  • Database systems can be classified based on
    process distribution and data distribution

Table 10.1
17
Single-Site Processing, Single-Site Data (SPSD)
  • All processing on single CPU or host computer
  • All data are stored on host computer disk
  • DBMS located on the host computer
  • DBMS accessed by dumb terminals
  • Typical of mainframe and minicomputer DBMSs
  • Typical of 1st generation of single-user
    microcomputer database

18
Single-Site Processing, Single-Site Data (cont.)
Figure 10.6
19
Multiple-Site Processing, Single-Site Data (MPSD)
  • Requires network file server
  • Applications accessed through LAN
  • Variation known as client/server architecture

Figure 10.7
20
Multiple-Site Processing, Multiple-Site Data
(MPMD)
  • Fully distributed DDBMS with support for multiple
    DPs and TPs at multiple sites
  • Homogeneous I
  • Integrate one type of centralized DBMS over the
    network
  • Heterogeneous
  • Integrate different types of centralized DBMSs
    over a network

21
Heterogeneous Distributed Database Scenario
Figure 10.8
22
Distributed DB Transparency
  • Allows end users to feel like only database user
  • Hides complexities of distributed database
  • Transparency features
  • Distribution
  • Transaction
  • Failure
  • Performance
  • Heterogeneity

23
Distribution Transparency
  • Allows management of a physically dispersed
    database as though it were centralized
  • Three Levels
  • Fragmentation transparency
  • Location transparency
  • Local mapping transparency

Table 10.2
24
Transaction Transparency
  • Ensures transactions maintain integrity and
    consistency
  • Completed only if all involved database sites
    complete their part of the transaction
  • Management mechanisms
  • Remote request
  • Remote transaction
  • Distributed transaction
  • Distributed request

25
Remote Request
Figure 10.10
26
Remote Transaction
Figure 10.11
27
Distributed Transaction
Figure 10.12
28
Distributed Requests
Figure 10.13
29
Distributed Requests (cont.)
Figure 10.14
30
Distributed Concurrency Control
  • Multisite, multiple-process operations more
    likely to create data inconsistencies and
    deadlocked transactions
  • Problems
  • Transaction committed by local DP
  • One DP could not commit transactions result
  • Yields inconsistent database

31
Two-Phase Commit Protocol
  • DO-UNDO-REDO protocol
  • Write-ahead protocol
  • Two kinds of nodes
  • Coordinator
  • Subordinates
  • Phases
  • Preparation
  • Coordinator sends message to all subordinates
  • Confirms all are ready to commit or abort
  • Final Commit
  • Ensures all subordinates have committed or aborted

32
Performance Transparency and Query Optimization
  • Objective Minimize total cost associated with
    execution of request
  • Main costs
  • Access time
  • Communication
  • CPU time
  • Basis for query optimization algorithms
  • Optimum execution order
  • Sites accessed to minimize communication costs
  • Dynamic or static optimization
  • Statistically based vs. rule-based query
    optimization algorithms

33
Distributed Database Design
  • Partition database into fragments
  • Horizontal
  • Vertical
  • Mixed
  • Fragments to replicate
  • Storage of data copies at multiple sites
  • Fully, partially, unreplicated databases
  • Data allocation
  • Where to locate data
  • Centralized, partitioned, replicated

34
Client/Server Advantages Over DDBMS
  • Client/server less expensive
  • Client/server solutions allow use of
    microcomputers GUI
  • More people with PC skills than mainframe skills
  • PC is well established in workplace
  • Numerous data analysis and query tools exist
  • Considerable cost advantages to off-loading
    application development

35
Client/Server Disadvantages
  • Creates more complex environment with different
    platforms
  • Increased number of users and sites creates
    security problems
  • Training issues become more complex and expensive

36
Dates 12 Commandments for Distributed Databases
  • 1. Local Site Independence
  • 2. Central Site Independence
  • 3. Failure Independence
  • 4. Location Transparency
  • 5. Fragmentation Transparency
  • 6. Replication Transparency

37
Dates 12 Commandments for Distributed Databases
  • 7. Distributed Query Processing
  • 8. Distributed Transaction Processing
  • 9. Hardware Independence
  • 10. Operating System Independence
  • 11. Network Independence
  • 12. Database Independence
Write a Comment
User Comments (0)
About PowerShow.com