Title: Distributed Databases
1Distributed Databases
2Learning Objectives
- What a distributed database management system
(DDBMS) is and what its components are - How database implementation is affected by
different levels of data and process distribution - How transactions are managed in a distributed
database environment - How database design is affected by the
distributed database environment
3DDBMS
- Decentralized database management systems (DDBMS)
Distributed Databases are logically related
data over interconnected computer systems in
which both data and processing functions reside
on multiple sites.
4Evolution of DDBMS
- Decentralized database management systems (DDBMS)
- Interconnected computer systems
- Data/processing functions reside on multiple
sites - 1970s Centralized DBMS
- 1980s Social and Technical Changes
- Ad hoc capability required
- Decentralized management structure common
- 1990s New forces
- Internet and the World Wide Web used for data
access and distribution - Data analysis through data mining and data
warehousing
5DDBMS Advantages
- Data located near site with greatest demand
- Faster data access
- Faster data processing
- Growth facilitation
- Improved communications
- Reduced operating costs
- User-friendly interface
- Less danger of single-point failure
- Processor independence
6DDBMS Disadvantages
- Complexity of management and control
- Security
- Lack of standards
- Increased storage requirements
- Greater difficulty in managing data environment
- Increased training costs
7Distributed Processing
- Shares databases logical processing among
physically, networked independent sites
Figure 10.1
8Distributed Database
- Stores logically related database over
physically independent sites
Figure 10.2
9Distributed Database vs. Distributed Processing
- Distributed processing
- Does not require distributed database
- May be based on a single database on single
computer - Copies or parts of database processing functions
must be distributed to all data storage sites - Distributed database
- Requires distributed processing
- Both
- Require a network to connect components
10Functions of DDBMS
- Application/end user interface
- Validation to analyze data requests
- Transformation to determine request components
- Query optimization to find the best access
strategy - Mapping to determine the data location
- I/O interface to read or write data
- Formatting to prepare the data for presentation
- Security to provide data privacy
- Backup and recovery
- DB Administration
- Concurrency Control
- Transaction Management
11Centralized Database
Figure 10.3
12Fully Distributed Database Management System
Figure 10.4
13DDBMS Components
- Computer workstations
- Network hardware and software components
- Communications media
- Transaction processor (TP)
- Also called application manager (AP) or
transaction manager (TM) - Data processor (DP)
- Also called data manager (DM)
14Distributed Database Components
Figure 10.5
15DDBMS Protocols
- Interface with network to transport data and
commands between DPs and TPs - Synchronize data received from DPs and route to
appropriate TPs - Ensure common database functions
- Security
- Concurrency control
- Backup and recovery
16Levels of Data and Process Distribution
- Database systems can be classified based on
process distribution and data distribution
Table 10.1
17Single-Site Processing, Single-Site Data (SPSD)
- All processing on single CPU or host computer
- All data are stored on host computer disk
- DBMS located on the host computer
- DBMS accessed by dumb terminals
- Typical of mainframe and minicomputer DBMSs
- Typical of 1st generation of single-user
microcomputer database
18Single-Site Processing, Single-Site Data (cont.)
Figure 10.6
19Multiple-Site Processing, Single-Site Data (MPSD)
- Requires network file server
- Applications accessed through LAN
- Variation known as client/server architecture
Figure 10.7
20Multiple-Site Processing, Multiple-Site Data
(MPMD)
- Fully distributed DDBMS with support for multiple
DPs and TPs at multiple sites - Homogeneous I
- Integrate one type of centralized DBMS over the
network - Heterogeneous
- Integrate different types of centralized DBMSs
over a network
21Heterogeneous Distributed Database Scenario
Figure 10.8
22Distributed DB Transparency
- Allows end users to feel like only database user
- Hides complexities of distributed database
- Transparency features
- Distribution
- Transaction
- Failure
- Performance
- Heterogeneity
23Distribution Transparency
- Allows management of a physically dispersed
database as though it were centralized - Three Levels
- Fragmentation transparency
- Location transparency
- Local mapping transparency
Table 10.2
24Transaction Transparency
- Ensures transactions maintain integrity and
consistency - Completed only if all involved database sites
complete their part of the transaction - Management mechanisms
- Remote request
- Remote transaction
- Distributed transaction
- Distributed request
25Remote Request
Figure 10.10
26Remote Transaction
Figure 10.11
27Distributed Transaction
Figure 10.12
28Distributed Requests
Figure 10.13
29Distributed Requests (cont.)
Figure 10.14
30Distributed Concurrency Control
- Multisite, multiple-process operations more
likely to create data inconsistencies and
deadlocked transactions - Problems
- Transaction committed by local DP
- One DP could not commit transactions result
- Yields inconsistent database
31Two-Phase Commit Protocol
- DO-UNDO-REDO protocol
- Write-ahead protocol
- Two kinds of nodes
- Coordinator
- Subordinates
- Phases
- Preparation
- Coordinator sends message to all subordinates
- Confirms all are ready to commit or abort
- Final Commit
- Ensures all subordinates have committed or aborted
32Performance Transparency and Query Optimization
- Objective Minimize total cost associated with
execution of request - Main costs
- Access time
- Communication
- CPU time
- Basis for query optimization algorithms
- Optimum execution order
- Sites accessed to minimize communication costs
- Dynamic or static optimization
- Statistically based vs. rule-based query
optimization algorithms
33Distributed Database Design
- Partition database into fragments
- Horizontal
- Vertical
- Mixed
- Fragments to replicate
- Storage of data copies at multiple sites
- Fully, partially, unreplicated databases
- Data allocation
- Where to locate data
- Centralized, partitioned, replicated
34Client/Server Advantages Over DDBMS
- Client/server less expensive
- Client/server solutions allow use of
microcomputers GUI - More people with PC skills than mainframe skills
- PC is well established in workplace
- Numerous data analysis and query tools exist
- Considerable cost advantages to off-loading
application development
35Client/Server Disadvantages
- Creates more complex environment with different
platforms - Increased number of users and sites creates
security problems - Training issues become more complex and expensive
36Dates 12 Commandments for Distributed Databases
- 1. Local Site Independence
- 2. Central Site Independence
- 3. Failure Independence
- 4. Location Transparency
- 5. Fragmentation Transparency
- 6. Replication Transparency
-
37Dates 12 Commandments for Distributed Databases
- 7. Distributed Query Processing
- 8. Distributed Transaction Processing
- 9. Hardware Independence
- 10. Operating System Independence
- 11. Network Independence
- 12. Database Independence