Chapter 10 Distributed Database Management System - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Chapter 10 Distributed Database Management System

Description:

DDBMS transparency features have the common property of allowing the end users ... Distribution transparency allows us to manage a physically dispersed database as ... – PowerPoint PPT presentation

Number of Views:622
Avg rating:3.0/5.0
Slides: 64
Provided by: all1101
Category:

less

Transcript and Presenter's Notes

Title: Chapter 10 Distributed Database Management System


1
Chapter 10Distributed Database Management System
Database Systems Design, Implementation, and
Management 4th Edition
2
The Evolution of Distributed DBMS
  • Centralized DBMS in the 1970s
  • Support for structured information needs.
  • Regularly issued formal reports in standard
    formats.
  • Prepared by specialist using 3GL in response to
    precisely channeled request.
  • Centrally stored corporate data.
  • Data access through dumb terminals.
  • Incapable of providing quick, unstructured, and
    ad hoc information for decision makers in a
    dynamic business environment.

3
The Evolution of Distributed DBMS
  • Social and Technical Changes in the 1980s
  • Business operations became more decentralized
    geographically.
  • Competition increased at the global level.
  • Customer demands and market needs favored a
    decentralized management style.
  • Rapid technological change created low-cost
    microcomputers. The LANs became the basis for
    computerized solutions.
  • The large number of applications based on DBMSs
    and the need to protect investments in
    centralized DBMS software made the notion of data
    sharing attractive.

4
The Evolution of Distributed DBMS
  • Two Database Requirements in a Dynamic Business
    Environment
  • Quick ad hoc data access became crucial in the
    quick-response decision making environment.
  • The decentralization of management structure
    based on the decentralization of business units
    made decentralized multiple-access and
    multiple-location databases a necessity.
  • Developments in the 1990s affecting DBMS
  • The growing acceptance of the Internet and the
    World Wide Web as the platform for data access
    and distribution.
  • The increased focus on data analysis that led to
    data mining and data warehousing.

5
The Evolution of Distributed DBMS
  • DDBMS Advantages
  • Data are located near the greatest demand site.
  • Faster data access
  • Faster data processing
  • Growth facilitation
  • Improved communications
  • Reduced operating costs
  • User-friendly interface
  • Less danger of a single-point failure
  • Processor independence
  • DDBMS Disadvantages
  • Complexity of management and control
  • Security
  • Lack of standards
  • Increased storage requirements

6
Distributed Processingand Distributed Database
  • Distributed processing shares the databases
    logical processing among two or more physically
    independent sites that are connected through a
    network. (See Figure 10.1)
  • Distributed database stores a logically related
    database over two or more physically independent
    sites connected via a computer network. (See
    Figure 10.2)

7
Distributed Processing Environment
Figure 10.1
8
Distributed Database Environment
Figure 10.2
9
Distributed Processingand Distributed Database
  • Distributed processing does not require a
    distributed database, but a distributed database
    requires distributed processing.
  • Distributed processing may be based on a single
    database located on a single computer. In order
    to manage distributed data, copies or parts of
    the database processing functions must be
    distributed to all data storage sites.
  • Both distributed processing and distributed
    databases require a network to connect all
    components.

10
What Is A Distributed DBMS?
  • A distributed database management system (DDBMS)
    governs the storage and processing of logically
    related data over interconnected computer systems
    in which both data and processing functions are
    distributed among several sites.

11
What Is A Distributed DBMS?
  • Functions of a DDBMS
  • Application interface
  • Validation to analyze data requests
  • Transformation to determine requests components
  • Query-optimization to find the best access
    strategy
  • Mapping to determine the data location
  • I/O interface to read or write data
  • Formatting to prepare the data for presentation
  • Security to provide data privacy
  • Backup and recovery
  • Database administration
  • Concurrency control
  • Transaction management

12
Centralized Database Management System
Figure 10.3
13
Fully Distributed Database Management System
Figure 10.4
14
DDBMS Components
  • Computer workstations that form the network
    system.
  • Network hardware and software components that
    reside in each workstation.
  • Communications media that carry the data from one
    workstation to another.
  • Transaction processor (TP) receives and processes
    the applications data requests.
  • Data processor (DP) stores and retrieves data
    located at the site. Also known as data manager
    (DM).

15
Distributed Database System Components
Figure 10.5
16
DDBMS Components
  • DDBMS protocol determines how the DDBMS will
  • Interface with the network to transport data and
    commands between DPs and TPs.
  • Synchronize all data received from DPs (TP side)
    and route retrieved data to the appropriate TPs
    (DP side).
  • Ensure common database functions in a distributed
    system -- security, concurrency control, backup,
    and recovery.

17
Levels of Data Process Distribution
  • Single-Site Processing, Single-Site Data (SPSD)
  • All processing is done on a single CPU or host
    computer.
  • All data are stored on the host computers local
    disk.
  • The DBMS is located on the host computer.
  • The DBMS is accessed by dumb terminals.
  • Typical of most mainframe and minicomputer DBMSs.
  • Typical of the 1st generation of single-user
    microcomputer database.

Table 10.1
18
Nondistributed (Centralized) DBMS
Figure 10.6
19
Levels of Data Process Distribution
  • Multiple-Site Processing, Single-Site Data (MPSD)
  • Typically, MPSD requires a network file server on
    which conventional applications are accessed
    through a LAN.
  • A variation of the MPSD approach is known as a
    client/server architecture. (Chapter 12)

Figure 10.7
20
Levels of Data Process Distribution
  • Multiple-Site Processing, Multiple-Site Data
    (MPMD)
  • Fully distributed DBMS with support for multiple
    DPs and TPs at multiple sites.
  • Homogeneous DDMS integrate only one type of
    centralized DBMS over the network.
  • Heterogeneous DDBMS integrate different types of
    centralized DBMSs over a network. (See Figure
    10.8)

21
Figure 10.8 Heterogeneous Distributed Database
Scenario
22
Distributed DB Transparency
  • DDBMS transparency features have the common
    property of allowing the end users to think that
    he is the databases only user.
  • Distribution transparency
  • Transaction transparency
  • Failure transparency
  • Performance transparency
  • Heterogeneity transparency

23
Distribution Transparency
  • Distribution transparency allows us to manage a
    physically dispersed database as though it were a
    centralized database.
  • Three Levels of Distribution Transparency
  • Fragmentation transparency
  • Location transparency
  • Local mapping transparency

Table 10.2
24
Distribution Transparency
  • Example (Figure 10.9) Employee data (EMPLOYEE)
    are distributed over three locations New York,
    Atlanta, and Miami.Depending on the level of
    distribution transparency support, three
    different cases of queries are possible

Figure 10.9 Fragment Locations
25
Distribution Transparency
  • Case 1 DB Supports Fragmentation Transparency
  • SELECT FROM EMPLOYEEWHERE EMP_DOB lt
    01-JAN-1940

26
Distribution Transparency
  • Case 2 DB Supports Location Transparency
  • SELECT FROM E1WHERE EMP_DOB lt 01-JAN-1940
  • UNION
  • SELECT FROM E2WHERE EMP_DOC lt 01-JAN-1940
  • UNION
  • SELECT FROM E3WHERE EMP_DOC lt 01-JAN-1940

27
Distribution Transparency
  • Case 3 DB Supports Local Mapping Transparency
  • SELECT FROM E1 NODE NYWHERE EMP_DOB lt
    01-JAN-1940
  • UNION
  • SELECT FROM E2 NODE ATLWHERE EMP_DOC lt
    01-JAN-1940
  • UNION
  • SELECT FROM E3 NODE MIAWHERE EMP_DOC lt
    01-JAN-1940

28
Distribution Transparency
  • Distribution transparency is supported by
    distributed data dictionary (DDD) or a
    distributed data catalog (DDC).
  • The DDC contains the description of the entire
    database as seen by the database administrator.
  • The database description, known as the
    distributed global schema, is the common database
    schema used by local TPs to translate user
    requests into subqueries.

29
Transaction Transparency
  • Transaction transparency ensures that database
    transactions will maintain the databases
    integrity and consistency. The transaction will
    be completed only if all database sites involved
    in the transaction complete their part of the
    transaction.
  • Related Concepts
  • Remote Requests
  • Remote Transactions
  • Distributed Transactions
  • Distributed Requests

30
Transaction Transparency
  • Distributed Requests and Distributed Transactions
  • A remote request allows us to access data to be
    processed by a single remote database processor.
    (Figure 10.10)
  • A remote transaction, composed of several
    requests, may access data at only a single site.
    (Figure 10.11)
  • A distributed transaction allows a transaction to
    reference several different (local or remote) DP
    sites. (Figure 10.12)
  • A distributed request lets us reference data from
    several remote DP sites. (Figure 10.13) It also
    allows a single request to reference a physically
    partitioned table. (Figure 10.14)

31
A Remote Request
Figure 10.10
32
A Remote Transaction
Figure 10.11
33
A Distributed Transaction
Figure 10.12
34
A Distributed Request
Figure 10.13
35
Another Distributed Request
Figure 10.14
36
Figure 10.15
37
Transaction Transparency
  • Two-Phase Commit Protocol
  • The two-phase commit protocol guarantees that, if
    a portion of a transaction operation cannot be
    committed, all changes made at the other sites
    participating in the transaction will be undone
    to maintain a consistent database state.
  • Each DP maintains its own transaction log. The
    two-phase protocol requires that each individual
    DPs transaction log entry be written before the
    database fragment is actually updated.
  • The two-phase commit protocol requires a
    DO-UNDO-REDO protocol and a write-ahead protocol.

38
Transaction Transparency
  • Two-Phase Commit Protocol
  • The DO-UNDO-REDO protocol is used by the DP to
    roll back and/or roll forward transactions with
    the help of the systems transaction log entries.
  • DO performs the operation and records the
    before and after values in the transaction
    log.
  • UNDO reverses an operation, using the log entries
    written by the DO portion of the sequence.
  • REDO redoes an operation, using the log entries
    written by DO portion of the sequence.
  • The write-ahead protocol forces the log entry to
    be written to permanent storage before the actual
    operation takes place.

39
Transaction Transparency
  • Two-Phase Commit Protocol
  • Two-phase commit protocol defines the operations
    between two types of nodes the coordinator and
    one or more subordinates or cohorts. The protocol
    is implemented in two phases
  • Phase 1 Preparation
  • The coordinator sends a PREPARE TO COMMIT message
    to all subordinates.
  • The subordinates receive the message, write the
    transaction log using the write-ahead protocol,
    and send an acknowledgement message to the
    coordinator.
  • The coordinator makes sure that all nodes are
    ready to commit, or it aborts the transaction.

40
Transaction Transparency
  • Phase 2 The Final Commit
  • The coordinator broadcasts a COMMIT message to
    all subordinates and waits for the replies.
  • Each subordinate receives the COMMIT message then
    updates the database, using the DO protocol.
  • The subordinates reply with a COMMITTED or NOT
    COMMITTED message to the coordinator.
  • If one or more subordinates did not commit, the
    coordinator sends an ABORT message, thereby
    forcing them to UNDO all changes.

41
Performance Transparency andQuery Optimization
  • The objective of a query optimization routine is
    to minimize the total cost associated with the
    execution of a request. The costs associated with
    a request are a function of the
  • Access time (I/O) cost involved in accessing the
    physical data stored on disk.
  • Communication cost associated with the
    transmission of data among nodes in distributed
    database systems.
  • CPU time cost associated with the processing
    overhead of managing distributed transactions.

42
Performance Transparency andQuery Optimization
  • Query optimization must provide distribution
    transparency as well as replica transparency.
  • Replica transparency refers to the DDBMSs ability
    to hide the existence of multiple copies of data
    from the user.
  • Most of the query optimization algorithms are
    based on two principles
  • Selection of the optimum execution order
  • Selection of sites to be accessed to minimize
    communication costs

43
Performance Transparency andQuery Optimization
  • Operation Modes of Query Optimization
  • Automatic query optimization means that the DDBMS
    finds the most cost-effective access path without
    user intervention.
  • Manual query optimization requires that the
    optimization be selected and scheduled by the end
    user or programmer.
  • Timing of Query Optimization
  • Static query optimization takes place at
    compilation time.
  • Dynamic query optimization takes place at
    execution time.

44
Performance Transparency andQuery Optimization
  • Optimization Techniques by Information Used
  • Statistically based query optimization uses
    statistical information about the database.
  • In the dynamic statistical generation mode, the
    DDBMS automatically evaluates and updates the
    statistics after each access.
  • In the manual statistical generation mode, the
    statistics must be updated periodically through a
    user-selected utility.
  • Rule-based query optimization algorithm is based
    on a set of user-defined rules to determine the
    best query access strategy.

45
Distributed Database Design
  • The design of a distributed database introduces
    three new issues
  • How to partition the database into fragments.
  • Which fragments to replicate.
  • Where to locate those fragments and replicas.

46
Data Fragmentation
  • Data fragmentation allows us to break a single
    object into two or more segments or fragments.
  • Each fragment can be stored at any site over a
    computer network.
  • Data fragmentation information is stored in the
    distributed data catalog (DDC), from which it is
    accessed by the transaction processor (TP) to
    process user requests.
  • Three Types of Fragmentation Strategies
  • Horizontal fragmentation
  • Vertical fragmentation
  • Mixed fragmentation

47
A Sample CUSTOMER Table
Figure 10.16
48
Data Fragmentation
  • Horizontal FragmentationDivision of a relation
    into subsets (fragments) of tuples (rows). Each
    fragment is stored at a different node, and each
    fragment has unique rows. Each fragment
    represents the equivalent of a SELECT statement,
    with the WHERE clause on a single attribute.

Table 10.3 Horizontal Fragmentation of the
CUSTOMER Table By State
49
Table Fragments In Three Locations
Figure 10.17
50
Data Fragmentation
  • Vertical FragmentationDivision of a relation
    into attribute (column) subsets. Each subset
    (fragment) is stored at a different node, and
    each fragment has unique columns -- with the
    exception of the key column. This is the
    equivalent of the PROJECT statement.

Table 10.4 Vertical Fragmentation of the
CUSTOMER Table
51
Vertically Fragmented Table Contents
Figure 10.18
52
Data Fragmentation
  • Mixed FragmentationCombination of horizontal and
    vertical strategies. A table may be divided into
    several horizontal subsets (rows), each one
    having a subset of the attributes (columns).

53
Table 10.5 Mixed Fragmentation of the CUSTOMER
Table
54
Figure 10.19
55
Data Replication
  • Data replication refers to the storage of data
    copies at multiple sites served by a computer
    network.
  • Fragment copies can be stored at several sites to
    serve specific information requirements.
  • The existence of fragment copies can enhance data
    availability and response time, reducing
    communication and total query costs.

Figure 10.20
56
Data Replication
  • Mutual Consistency Rule
  • Replicated data are subject to the mutual
    consistency rule, which requires that all copies
    of data fragments be identical.
  • DDBMS must ensure that a database update is
    performed at all sites where replicas exist.
  • Data replication imposes additional DDBMS
    processing overhead.

57
Data Replication
  • Replication Conditions
  • A fully replicated database stores multiple
    copies of all database fragments at multiple
    sites.
  • A partially replicated database stores multiple
    copies of some database fragments at multiple
    sites.
  • Factors for Data Replication Decision
  • Database Size
  • Usage Frequency

58
Data Allocation
  • Data allocation describes the processing of
    deciding where to locate data.
  • Data Allocation Strategies
  • CentralizedThe entire database is stored at one
    site.
  • PartitionedThe database is divided into several
    disjoint parts (fragments) and stored at several
    sites.
  • ReplicatedCopies of one or more database
    fragments are stored at several sites.

59
Data Allocation
  • Data allocation algorithms take into
    consideration a variety of factors
  • Performance and data availability goals
  • Size, number of rows, the number of relations
    that an entity maintains with other entities.
  • Types of transactions to be applied to the
    database, the attributes accessed by each of
    those transactions.

60
Client/Server vs. DDBMS
  • Client/server architecture refers to the way in
    which computers interact to form a system.
  • It features a user of resources or a client and a
    provider of resources or a server.
  • The architecture can be used to implement a DBMS
    in which the client is the transaction processor
    (TP) and the server is the data processor (DP).

61
Client/Server Architecture
  • Client/Server Advantages
  • Client/server solutions tend to be less
    expensive.
  • Client/server solutions allow the end user to use
    the microcomputers graphical user interface
    (GUI), thereby improving functionality and
    simplicity.
  • There are more people with PC skills than with
    mainframe skills.
  • The PC is well established in the workplace.
  • Numerous data analysis and query tools exist to
    facilitate interaction with many of the DBMSs.
  • There are considerable cost advantages to
    off-loading application development from the
    mainframe to PCs.

62
Client/Server Architecture
  • Client/Server Disadvantages
  • The client/server architecture creates a more
    complex environment with different platforms.
  • An increase in the number of users and processing
    sites often paves the way for security problems.
  • The burden of training a wider circle of users
    and computer personnel increases the cost of
    maintaining the environment.

63
C. J. Dates 12 Commandments for Distributed
Database
  • 1. Local Site Independence
  • 2. Central Site Independence
  • 3. Failure Independence
  • 4. Location Transparency
  • 5. Fragmentation Transparency
  • 6. Replication Transparency
  • 7. Distributed Query Processing
  • 8. Distributed Transaction Processing
  • 9. Hardware Independence
  • 10. Operating System Independence
  • 11. Network Independence
  • 12. Database Independence
Write a Comment
User Comments (0)
About PowerShow.com