Objectives of these slides: - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Objectives of these slides:

Description:

A distributed database dictionary, directory and catalog are essential, probably ... how to bring a server back online so that it contains the same version of its ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 50
Provided by: AndrewD86
Category:

less

Transcript and Presenter's Notes

Title: Objectives of these slides:


1
12. Distributed Data Management
  • Objectives of these slides
  • examine the main issues related to data
    distribution

2
Overview
  • 1. Data Distribution
  • 2. The Relational Data Model
  • 3. Methods of Data Distribution
  • 4. Data Placement Considerations
  • 5. Distributed Data Access
  • 6. Database Transaction Management

3
1. Data Distribution
  • Data is distributed to multiple sites (servers)
  • main aim is to make the distribution transparent
    to the end users and the business logic inside
    applications
  • The DBMS (database management system) also should
    be distributed.
  • How and when should data be distributed?
  • What are the best locations for data?

4
Advantages
  • Efficient data access
  • Application load balancing
  • Application and end-user scalability
  • Matches international business models
  • Replication of data makes failure less critical

5
Disadvantages
  • Management complexity
  • Data synchronization
  • transaction processing

6
2. The Relational Data Model
  • SQL (Structured Query Language)
  • DDL (Data Definition Language)
  • DML (Data Manipulation Language)

Customer Table
SS
Saving Acct
Check Acct
Name
ad
12345
67823
3442




Savings
Acct
Bal
SS
67823
3442
23



7
3. Methods of Data Distribution
  • 3.1. Manual Extraction (read-only)

Branch B1
Customer
Exchange
Savings
copy (push or pull)
Exchange
Branch B2
Exchange
Central Office
8
3.2. Snapshot (read-only)
  • create snapshot lttable-namegt as select time
    lthhmmssgt, interval lthhmmssgt
  • Frequency?
  • e.g. exchange rate changes once per day

9
3.3. Replication
  • How is data consistency maintained between the
    original and its copies?
  • synchronous or asynchronous updates
  • Replication management
  • Location transparency
  • the aim is to make the user/application think
    that there is a single table at a single site

10
3.4. Fragmentation
Customer Table
SS
Saving Acct
Name
  • Horizontal


ad
12345
3442




Customer Table

set
34256
5001
SS
Saving Acct
Name


ad
12345
3442





set
34256
5001
po
00912

5002




jim
78123

9887
11
  • Vertical

Accounts
Saving Acct

Customer Table

SS
Saving Acct
12345
Name




ad
12345
3442

34256




00912


set
34256
5001


po
00912

5002
78123





jim
78123

9887
12
4. Data Placement Considerations
  • Increase local autonomy
  • Decrease network traffic (to/from original)
  • Data type
  • read/write, size, local use
  • Cost of distribution/replication
  • for data and application code
  • Data management

13
Data Location Analysis
  • Assign costs to site read and update messages
    per day.
  • Distribute the data to reduce the cost of the
    network message traffic.

14
Read/Update Messages/day
Site
S1
B1
2000
1000
Read
500
100
Update
15
Data Location Analysis Matrix
Location of Data
MessageTraffic
S1
B1
Configuration
?
1100
1
?
2500
?
?
2
?
600
?
3
16
5. Distributed Data Access
  • Answer queries by accessing data located on other
    machines.
  • The (distributed) DBMS tries to hide the physical
    location of the data.

17
5.1. Remote Request
  • The client application issues a single data
    request to be processed at a single remote server.

SQL query
Customer
Client
Server1
continued
18
  • SQL query
  • select from server1.bankDB.customer where
    server1.bankDB.customer.city Bangkok
  • Queries like this can be used for manual
    extraction and snapshots.
  • User-level queries should not need to specify
    server1.

19
5.2. Remote Transaction
  • The client application issues multiple data
    requests (all part of a single transaction) to a
    single remote server.

begin work
SQL query 1
Customer
Client

SQL query 2
Branch
commit
Server1
continued
20
  • SQL query
  • begin work select from server1.bankDB.custome
    r where server1.bankDB.customer.city
    Bangkok update server1.bankDB.branch
    set posted_ind yescommit work

21
5.3. Distributed Transaction
  • The client application issues multiple data
    requests (all part of one distributed
    transaction) to multiple remote servers.

begin work
SQL query 1
Employee
Medical
SQL query 2
Client
ServerB1
ServerM
commit
continued
22
  • SQL query
  • begin work select from serverB1.bankDB.employ
    ee where serverB1.bankDB.employee.edLevel
    mba select from
    serverM.medDB.medical where
    serverM.medDB.medical.branch B1 commit
    work

23
5.4. Distributed Request
  • The client application issues multiple data
    requests (all part of one distributed
    transaction) to multiple remote servers and each
    request can refer to multiple sites.

continued
24
Employee
begin work
Client
SQL query 1
ServerB1
join
commit
SQL query 2
Medical
Branch
ServerM
Server1
continued
25
  • SQL query
  • begin work select from serverB1.bankDB.employ
    ee B1, serverM.bankDB.medical M where
    B1.ID M.ID and B1.edLevel mba
    update server1.bankDB.branch set
    reported yes where server1.bankDB.branch
    B1 commit work

26
6. Database Transaction Management
  • A database transaction should possess the
    properties
  • atomicity
  • consistency
  • isolation
  • durability
  • serializability
  • A transaction is an all or nothing task.

continued
27
  • SQL contains the commands COMMIT and ROLLBACK for
    explicit transaction management.
  • Implicit transaction management
  • transaction completion ? COMMIT
  • transaction failure ? ROLLBACK

28
The Two-phase Commit Protocol
  • A coordinator (which may be the client) sends its
    queries to the servers, and waits for replies
  • the prepare phase
  • If a server can answer its query then it responds
    with an ok (or similar).

continued
29
  • If all the servers respond with ok, then the
    coordinator tells them to commit
  • the commit phase
  • If any of the servers do not respond with ok,
    then the coordinator tells all the servers to
    rollback.

30
Distributed Two-phase Commit
Tree master(coordinator)
Syncpointresponses
Syncpointrequest
Slave/Master
Slave/Master
Slave
Slave
31
Deadlock
  • Updates are usually carried out by locking the
    record (or group of records) being changed.
  • Two (or more) applications can become deadlocked
    if they wait for each other to release a lock.

db1
c1
1. lock
2. wait
lock record
2. wait
db2
c2
lock record
1. lock
32
Deadlock Detection
  • Automatic time-out followed by transaction
    failure (and retry).
  • Asynchronous updates.
  • Deadlock avoidance algorithms.

33
13. Distributed Data Management System Design
  • Objectives of these slides
  • examine some of the issues related to distributed
    database management design

34
The Main Problem
  • How to obtain information from a system that is
    distributed?
  • e.g. where is the customer database?
  • e.g. what queries does the customer database
    understand?

Overview
1. Metadata 2. Dates DDBMS Rules 3. Other
Issues
35
1. Metadata
  • Information about a databases resources
  • the data types, attributes, keys, indexes, etc.
  • stored in a database dictionary
  • Information about database location, machine, and
    network characteristics
  • stored in a database directory
  • Relational DBMS sometimes hold information about
    the relationships between databases
  • stored in a system catalog.

36
Benefits of Metadata
  • improved project management
  • reduced maintenance costs
  • automated information sharing
  • queries can be made data location independent

37
Drawbacks
  • the difficulty of creating metadata for legacy
    systems
  • heterogeneous metadata formats

38
Centralised or Distributed Metadata?
  • Centralised
  • simple to manage, maintain, etc.
  • a bottleneck
  • a single point of failure
  • Distributed
  • distribution issues (e.g. how to divide, where to
    locate)
  • data synchronisation

39
2. Dates DDBMS Rules
  • 2.1. Local Autonomy
  • The local DBMS should manage its own security,
    data integrity, data consistency, locking, and
    recovery.
  • 2.2. No Reliance on a Central Site
  • But an occasional coordinator is still
    required, for example for two-phase commits.
  • 2.3. Continuous Operation
  • This includes online backups, archiving, recovery.

40
  • 2.4. Data Location Independence
  • Users and business logic should be unaware that
    the data is distributed (they see a single local
    database).
  • A distributed database dictionary, directory and
    catalog are essential, probably replicated at
    every site and synchronised.

41
  • 2.5. Data Fragmentation Independence
  • Users and business logic should be unaware of
    fragmentation (they see a single database).
  • At the DBMS level, SQL queries may require joins
    (to combine vertical fragments) and unions (to
    combine horizontal fragments).

42
  • 2.6. Data Replication Independence
  • Replication is transparent to users and business
    logic.
  • This requires the DBMS to use a hidden two-phase
    commit protocol (or similar) when performing
    transactions.

continued
43
  • Code transformation
  • update bankDB.savings set amt amt -
    withdrawal where bankDB.savings.acctno
    12345
  • becomes
  • begin work update serverB1.bankDB.savings
    set amt amt - withdrawal where
    serverB1.bankDB.savings.acctno 12345
    update serverB2.bankDB.savings set amt amt
    - withdrawal where serverB2.bankDB.savings.a
    cctno 12345commit work

44
  • 2.7. Distributed Query Processing
  • The DBMS optimises the users query
  • rule-based optimisations
  • cost-based optimisations
  • Example query
  • select serverS1.bankDB.employees E,
    serverS2.bankDB.managers M where E.ID M.ID
    and E.edLevel MBA and
    M.edLevel MBA

continued
45
Server S1
DDM
employees
catalog
100,000 rows
SQLrequest
managers
client
catalog
Server S2
1,000 rows
continued
46
  • Rule-based Optimisation
  • specify an execution order for the parts of a
    query
  • e.g. search employees first (100,000 records) and
    then compare with managers (1,000 records)
  • Cost-based Optimisation
  • Estimate the cost of a query by considering
  • the number of rows being processed
  • the data properties (e.g. keys, types)
  • machine/network speeds
  • I/O operations
  • e.g. copy the managers DB to the employees server
    and join the databases

47
  • 2.8. Distributed Transaction Management
  • ACIDS properties
  • distributed locking
  • deadlock detection
  • Some commercial products (see chapter 13, Bersen
    for details)
  • IBM CICS (Customer Information Control System)
  • Novell TUXEDO
  • Transarc Encina

48
  • 2.9. Hardware Independence
  • 2.10. OS Indepedence
  • 2.11. Network Independence
  • 2.12. DBMS Independence

49
3. Other Issues
  • Administration
  • backups, recovery, DB creation and modification
  • Security
  • protection of data from illegal disclosure,
    modification, or destruction
  • access rights, passwords, encryption
  • very much harder in a distributed environment
  • Currency control
  • how to bring a server back online so that it
    contains the same version of its database as on
    other servers?
  • the aim is to make the database current with its
    other versions
Write a Comment
User Comments (0)
About PowerShow.com