Title: Chapter 10 Distributed Database Management System
1Chapter 10Distributed Database Management System
Database Systems Design, Implementation, and
Management 4th Edition
2The Evolution of Distributed DBMS
- Centralized DBMS in the 1970s
- Support for structured information needs.
- Regularly issued formal reports in standard
formats. - Prepared by specialist using 3GL in response to
precisely channeled request. - Centrally stored corporate data.
- Data access through dumb terminals.
- Incapable of providing quick, unstructured, and
ad hoc information for decision makers in a
dynamic business environment.
3The Evolution of Distributed DBMS
- Social and Technical Changes in the 1980s
- Business operations became more decentralized
geographically. - Competition increased at the global level.
- Customer demands and market needs favored a
decentralized management style. - Rapid technological change created low-cost
microcomputers. The LANs became the basis for
computerized solutions. - The large number of applications based on DBMSs
and the need to protect investments in
centralized DBMS software made the notion of data
sharing attractive.
4The Evolution of Distributed DBMS
- Two Database Requirements in a Dynamic Business
Environment - Quick ad hoc data access became crucial in the
quick-response decision making environment. - The decentralization of management structure
based on the decentralization of business units
made decentralized multiple-access and
multiple-location databases a necessity. - Developments in the 1990s affecting DBMS
- The growing acceptance of the Internet and the
World Wide Web as the platform for data access
and distribution. - The increased focus on data analysis that led to
data mining and data warehousing.
5The Evolution of Distributed DBMS
- DDBMS Advantages
- Data are located near the greatest demand site.
- Faster data access
- Faster data processing
- Growth facilitation
- Improved communications
- Reduced operating costs
- User-friendly interface
- Less danger of a single-point failure
- Processor independence
- DDBMS Disadvantages
- Complexity of management and control
- Security
- Lack of standards
- Increased storage requirements
6Distributed Processingand Distributed Database
- Distributed processing shares the databases
logical processing among two or more physically
independent sites that are connected through a
network. (See Figure 10.1) - Distributed database stores a logically related
database over two or more physically independent
sites connected via a computer network. (See
Figure 10.2)
7Distributed Processing Environment
Figure 10.1
8Distributed Database Environment
Figure 10.2
9Distributed Processingand Distributed Database
- Distributed processing does not require a
distributed database, but a distributed database
requires distributed processing. - Distributed processing may be based on a single
database located on a single computer. In order
to manage distributed data, copies or parts of
the database processing functions must be
distributed to all data storage sites. - Both distributed processing and distributed
databases require a network to connect all
components.
10What Is A Distributed DBMS?
- A distributed database management system (DDBMS)
governs the storage and processing of logically
related data over interconnected computer systems
in which both data and processing functions are
distributed among several sites.
11What Is A Distributed DBMS?
- Functions of a DDBMS
- Application interface
- Validation to analyze data requests
- Transformation to determine requests components
- Query-optimization to find the best access
strategy - Mapping to determine the data location
- I/O interface to read or write data
- Formatting to prepare the data for presentation
- Security to provide data privacy
- Backup and recovery
- Database administration
- Concurrency control
- Transaction management
12Centralized Database Management System
Figure 10.3
13Fully Distributed Database Management System
Figure 10.4
14DDBMS Components
- Computer workstations that form the network
system. - Network hardware and software components that
reside in each workstation. - Communications media that carry the data from one
workstation to another. - Transaction processor (TP) receives and processes
the applications data requests. - Data processor (DP) stores and retrieves data
located at the site. Also known as data manager
(DM).
15Distributed Database System Components
Figure 10.5
16DDBMS Components
- DDBMS protocol determines how the DDBMS will
- Interface with the network to transport data and
commands between DPs and TPs. - Synchronize all data received from DPs (TP side)
and route retrieved data to the appropriate TPs
(DP side). - Ensure common database functions in a distributed
system -- security, concurrency control, backup,
and recovery.
17Levels of Data Process Distribution
- Single-Site Processing, Single-Site Data (SPSD)
- All processing is done on a single CPU or host
computer. - All data are stored on the host computers local
disk. - The DBMS is located on the host computer.
- The DBMS is accessed by dumb terminals.
- Typical of most mainframe and minicomputer DBMSs.
- Typical of the 1st generation of single-user
microcomputer database.
Table 10.1
18Nondistributed (Centralized) DBMS
Figure 10.6
19Levels of Data Process Distribution
- Multiple-Site Processing, Single-Site Data (MPSD)
- Typically, MPSD requires a network file server on
which conventional applications are accessed
through a LAN. - A variation of the MPSD approach is known as a
client/server architecture. (Chapter 12)
Figure 10.7
20Levels of Data Process Distribution
- Multiple-Site Processing, Multiple-Site Data
(MPMD) - Fully distributed DBMS with support for multiple
DPs and TPs at multiple sites. - Homogeneous DDMS integrate only one type of
centralized DBMS over the network. - Heterogeneous DDBMS integrate different types of
centralized DBMSs over a network. (See Figure
10.8)
21Figure 10.8 Heterogeneous Distributed Database
Scenario
22Distributed DB Transparency
- DDBMS transparency features have the common
property of allowing the end users to think that
he is the databases only user. - Distribution transparency
- Transaction transparency
- Failure transparency
- Performance transparency
- Heterogeneity transparency
23Distribution Transparency
- Distribution transparency allows us to manage a
physically dispersed database as though it were a
centralized database. - Three Levels of Distribution Transparency
- Fragmentation transparency
- Location transparency
- Local mapping transparency
Table 10.2
24Distribution Transparency
- Example (Figure 10.9) Employee data (EMPLOYEE)
are distributed over three locations New York,
Atlanta, and Miami.Depending on the level of
distribution transparency support, three
different cases of queries are possible
Figure 10.9 Fragment Locations
25Distribution Transparency
- Case 1 DB Supports Fragmentation Transparency
- SELECT FROM EMPLOYEEWHERE EMP_DOB lt
01-JAN-1940
26Distribution Transparency
- Case 2 DB Supports Location Transparency
- SELECT FROM E1WHERE EMP_DOB lt 01-JAN-1940
- UNION
- SELECT FROM E2WHERE EMP_DOC lt 01-JAN-1940
- UNION
- SELECT FROM E3WHERE EMP_DOC lt 01-JAN-1940
27Distribution Transparency
- Case 3 DB Supports Local Mapping Transparency
- SELECT FROM E1 NODE NYWHERE EMP_DOB lt
01-JAN-1940 - UNION
- SELECT FROM E2 NODE ATLWHERE EMP_DOC lt
01-JAN-1940 - UNION
- SELECT FROM E3 NODE MIAWHERE EMP_DOC lt
01-JAN-1940
28Distribution Transparency
- Distribution transparency is supported by
distributed data dictionary (DDD) or a
distributed data catalog (DDC). - The DDC contains the description of the entire
database as seen by the database administrator. - The database description, known as the
distributed global schema, is the common database
schema used by local TPs to translate user
requests into subqueries.
29Transaction Transparency
- Transaction transparency ensures that database
transactions will maintain the databases
integrity and consistency. The transaction will
be completed only if all database sites involved
in the transaction complete their part of the
transaction. - Related Concepts
- Remote Requests
- Remote Transactions
- Distributed Transactions
- Distributed Requests
30Transaction Transparency
- Distributed Requests and Distributed Transactions
- A remote request allows us to access data to be
processed by a single remote database processor.
(Figure 10.10) - A remote transaction, composed of several
requests, may access data at only a single site.
(Figure 10.11) - A distributed transaction allows a transaction to
reference several different (local or remote) DP
sites. (Figure 10.12) - A distributed request lets us reference data from
several remote DP sites. (Figure 10.13) It also
allows a single request to reference a physically
partitioned table. (Figure 10.14)
31A Remote Request
Figure 10.10
32A Remote Transaction
Figure 10.11
33A Distributed Transaction
Figure 10.12
34A Distributed Request
Figure 10.13
35Another Distributed Request
Figure 10.14
36Figure 10.15
37Transaction Transparency
- Two-Phase Commit Protocol
- The two-phase commit protocol guarantees that, if
a portion of a transaction operation cannot be
committed, all changes made at the other sites
participating in the transaction will be undone
to maintain a consistent database state. - Each DP maintains its own transaction log. The
two-phase protocol requires that each individual
DPs transaction log entry be written before the
database fragment is actually updated. - The two-phase commit protocol requires a
DO-UNDO-REDO protocol and a write-ahead protocol.
38Transaction Transparency
- Two-Phase Commit Protocol
- The DO-UNDO-REDO protocol is used by the DP to
roll back and/or roll forward transactions with
the help of the systems transaction log entries. - DO performs the operation and records the
before and after values in the transaction
log. - UNDO reverses an operation, using the log entries
written by the DO portion of the sequence. - REDO redoes an operation, using the log entries
written by DO portion of the sequence. - The write-ahead protocol forces the log entry to
be written to permanent storage before the actual
operation takes place.
39Transaction Transparency
- Two-Phase Commit Protocol
- Two-phase commit protocol defines the operations
between two types of nodes the coordinator and
one or more subordinates or cohorts. The protocol
is implemented in two phases - Phase 1 Preparation
- The coordinator sends a PREPARE TO COMMIT message
to all subordinates. - The subordinates receive the message, write the
transaction log using the write-ahead protocol,
and send an acknowledgement message to the
coordinator. - The coordinator makes sure that all nodes are
ready to commit, or it aborts the transaction.
40Transaction Transparency
- Phase 2 The Final Commit
- The coordinator broadcasts a COMMIT message to
all subordinates and waits for the replies. - Each subordinate receives the COMMIT message then
updates the database, using the DO protocol. - The subordinates reply with a COMMITTED or NOT
COMMITTED message to the coordinator. - If one or more subordinates did not commit, the
coordinator sends an ABORT message, thereby
forcing them to UNDO all changes.
41Performance Transparency andQuery Optimization
- The objective of a query optimization routine is
to minimize the total cost associated with the
execution of a request. The costs associated with
a request are a function of the - Access time (I/O) cost involved in accessing the
physical data stored on disk. - Communication cost associated with the
transmission of data among nodes in distributed
database systems. - CPU time cost associated with the processing
overhead of managing distributed transactions.
42Performance Transparency andQuery Optimization
- Query optimization must provide distribution
transparency as well as replica transparency. - Replica transparency refers to the DDBMSs ability
to hide the existence of multiple copies of data
from the user. - Most of the query optimization algorithms are
based on two principles - Selection of the optimum execution order
- Selection of sites to be accessed to minimize
communication costs
43Performance Transparency andQuery Optimization
- Operation Modes of Query Optimization
- Automatic query optimization means that the DDBMS
finds the most cost-effective access path without
user intervention. - Manual query optimization requires that the
optimization be selected and scheduled by the end
user or programmer. - Timing of Query Optimization
- Static query optimization takes place at
compilation time. - Dynamic query optimization takes place at
execution time.
44Performance Transparency andQuery Optimization
- Optimization Techniques by Information Used
- Statistically based query optimization uses
statistical information about the database. - In the dynamic statistical generation mode, the
DDBMS automatically evaluates and updates the
statistics after each access. - In the manual statistical generation mode, the
statistics must be updated periodically through a
user-selected utility. - Rule-based query optimization algorithm is based
on a set of user-defined rules to determine the
best query access strategy.
45Distributed Database Design
- The design of a distributed database introduces
three new issues - How to partition the database into fragments.
- Which fragments to replicate.
- Where to locate those fragments and replicas.
46Data Fragmentation
- Data fragmentation allows us to break a single
object into two or more segments or fragments. - Each fragment can be stored at any site over a
computer network. - Data fragmentation information is stored in the
distributed data catalog (DDC), from which it is
accessed by the transaction processor (TP) to
process user requests. - Three Types of Fragmentation Strategies
- Horizontal fragmentation
- Vertical fragmentation
- Mixed fragmentation
47A Sample CUSTOMER Table
Figure 10.16
48Data Fragmentation
- Horizontal FragmentationDivision of a relation
into subsets (fragments) of tuples (rows). Each
fragment is stored at a different node, and each
fragment has unique rows. Each fragment
represents the equivalent of a SELECT statement,
with the WHERE clause on a single attribute.
Table 10.3 Horizontal Fragmentation of the
CUSTOMER Table By State
49Table Fragments In Three Locations
Figure 10.17
50Data Fragmentation
- Vertical FragmentationDivision of a relation
into attribute (column) subsets. Each subset
(fragment) is stored at a different node, and
each fragment has unique columns -- with the
exception of the key column. This is the
equivalent of the PROJECT statement.
Table 10.4 Vertical Fragmentation of the
CUSTOMER Table
51Vertically Fragmented Table Contents
Figure 10.18
52Data Fragmentation
- Mixed FragmentationCombination of horizontal and
vertical strategies. A table may be divided into
several horizontal subsets (rows), each one
having a subset of the attributes (columns).
53Table 10.5 Mixed Fragmentation of the CUSTOMER
Table
54Figure 10.19
55Data Replication
- Data replication refers to the storage of data
copies at multiple sites served by a computer
network. - Fragment copies can be stored at several sites to
serve specific information requirements. - The existence of fragment copies can enhance data
availability and response time, reducing
communication and total query costs.
Figure 10.20
56Data Replication
- Mutual Consistency Rule
- Replicated data are subject to the mutual
consistency rule, which requires that all copies
of data fragments be identical. - DDBMS must ensure that a database update is
performed at all sites where replicas exist. - Data replication imposes additional DDBMS
processing overhead.
57Data Replication
- Replication Conditions
- A fully replicated database stores multiple
copies of all database fragments at multiple
sites. - A partially replicated database stores multiple
copies of some database fragments at multiple
sites. - Factors for Data Replication Decision
- Database Size
- Usage Frequency
58Data Allocation
- Data allocation describes the processing of
deciding where to locate data. - Data Allocation Strategies
- CentralizedThe entire database is stored at one
site. - PartitionedThe database is divided into several
disjoint parts (fragments) and stored at several
sites. - ReplicatedCopies of one or more database
fragments are stored at several sites.
59Data Allocation
- Data allocation algorithms take into
consideration a variety of factors - Performance and data availability goals
- Size, number of rows, the number of relations
that an entity maintains with other entities. - Types of transactions to be applied to the
database, the attributes accessed by each of
those transactions.
60Client/Server vs. DDBMS
- Client/server architecture refers to the way in
which computers interact to form a system. - It features a user of resources or a client and a
provider of resources or a server. - The architecture can be used to implement a DBMS
in which the client is the transaction processor
(TP) and the server is the data processor (DP).
61Client/Server Architecture
- Client/Server Advantages
- Client/server solutions tend to be less
expensive. - Client/server solutions allow the end user to use
the microcomputers graphical user interface
(GUI), thereby improving functionality and
simplicity. - There are more people with PC skills than with
mainframe skills. - The PC is well established in the workplace.
- Numerous data analysis and query tools exist to
facilitate interaction with many of the DBMSs. - There are considerable cost advantages to
off-loading application development from the
mainframe to PCs.
62Client/Server Architecture
- Client/Server Disadvantages
- The client/server architecture creates a more
complex environment with different platforms. - An increase in the number of users and processing
sites often paves the way for security problems. - The burden of training a wider circle of users
and computer personnel increases the cost of
maintaining the environment.
63C. J. Dates 12 Commandments for Distributed
Database
- 1. Local Site Independence
- 2. Central Site Independence
- 3. Failure Independence
- 4. Location Transparency
- 5. Fragmentation Transparency
- 6. Replication Transparency
- 7. Distributed Query Processing
- 8. Distributed Transaction Processing
- 9. Hardware Independence
- 10. Operating System Independence
- 11. Network Independence
- 12. Database Independence