Parallel and distributed databases - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel and distributed databases

Description:

Parallel and distributed databases R & G Chapter 22 What is a distributed database? Why distribute a database Scalability and performance Resilience to failures Why ... – PowerPoint PPT presentation

Number of Views:345
Avg rating:3.0/5.0
Slides: 37
Provided by: Yah97
Category:

less

Transcript and Presenter's Notes

Title: Parallel and distributed databases


1
Parallel and distributed databases
  • R G Chapter 22

2
What is a distributed database?
3
Why distribute a database
  • Scalability and performance
  • Resilience to failures

Throughput
Data size
X
versus
X
4
Why distribute a database
  • Data is already distributed
  • Or needs to be distributed
  • Data is in multiple systems

5
Why not distribute a database
  • You must earn your complexity!
  • Communication needed
  • Must build a complex infrastructure
  • Unpredictable latencies must be masked
  • More types of failures
  • More components to fail
  • Network failures
  • Congestion, timeouts
  • More complex planning
  • Communication cost plus I/O cost
  • May have to deal with heterogeneity
  • Different types of systems
  • Different schemas, possibly incompatible
  • Different administrative domains

6
Types of distributed databases
7
The old days mainframes
Definitely not distributed!
8
Client-server
User interaction
Network
Data processing
9
Parallel database
10
Primary/secondary
X
11
Multidatabase
12
How do they work?
  • What is shared?
  • How to distribute the data?
  • How to process the data?
  • How to update the data?

13
What is shared?
  • Memory

Most modern DBMSs
CPUs
RAM
Disk
14
What is shared?
  • Disk

Oracle RAC
RAM
15
What is shared?
  • Nothing

Search engines, Teradata
RAM
16
How to distribute the data?
17
How to distribute the data?
Hash partitioning
Range partitioning
(key,value)
(key,value)
Hash()
lt X
gt X
18
How to distribute the data?
19
Query processing
  • Intra-operator parallelism
  • Inter-operator parallelism

20
Parallel scanning
Result
filter
filter
filter
filter
filter
filter
21
Sorting
22
Sorting
23
Parallel hash join
Hash()
24
Join
25
Semi-join
26
Inter-operator parallelism
27
Updating distributed data
  • Synchronous read-any-write-all

Reads are fast
28
Updating distributed data
  • Synchronous voting

29
Updating distributed data
  • Synchronous voting

Writes tolerant to disconnection
30
Consistency of distributed data
  • Should provide ACID

31
Primary/secondary
32
Two-phase commit
PREPARE
COMMIT
PREPARED
PREPARED
33
Two-phase commit
PREPARE
ABORT
PREPARED
ABORT
34
Two-phase commit
PREPARE
ABORT
PREPARED
35
Two-phase commit
X
PREPARE
PREPARED
PREPARED
36
Conclusion
  • Parallelism and distribution very useful
  • Performance
  • Fault tolerance
  • Scale
  • But complex!
  • Rethink lots of aspects of the system
  • Must earn the complexity
Write a Comment
User Comments (0)
About PowerShow.com