Parallel Database System - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Parallel Database System

Description:

Parallel Database System. David Dewitt & Jim Cray. presented by Ming Hao. Why parallel database ... relation as input and output a new relation. 3. Indicate the ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 17
Provided by: donat164
Category:

less

Transcript and Presenter's Notes

Title: Parallel Database System


1
Parallel Database System
  • David Dewitt Jim Cray

presented by Ming Hao
2
Why parallel database
  • dominance of Relational data model
  • 1. Large uniform data record
  • 2. Query can be decomposed into a bunch of
  • relational operators. Each operator
    takes a
  • relation as input and output a new
    relation
  • 3. Indicate the built-in parallelism

3
1. pipelined parallelism streaming output of
one operator into the input of another
operator 2. partitioned parallelism
partitioned data and execution 3.
Inter-query parallelism OLTP
4
Hardware support available
  • High speed network
  • message passing based client-server operating
    system
  • cheap and powerful PC/Workstation

5
Hardware architecture
1. Shared memory a. can not scale up to
lots of disks and processors network
bandwidth b. interference between
processors private cache does not solve
the problem
6
Hardware architecture
2. Shared disks a. same scale problem as
sharedM b. interference when updating data
7
Hardware architecture
3. Shared nothing a. linear scale up and
speedup b. less interference
c. exploiting commodity processors and
memory
8
Parallelism metrics
  • Speedup
  • small_system_elapsed_time
  • big_system_elapsed_
    time
  • scale up
  • small_system_elapsed_time_on_small_proble
    m
  • big_system_elapsed_time_on_big_problem

Speedup
9
Barriers to linear scaleup speedup
  • Startup
  • time to start parallel program
  • interference
  • critical section, synchronization,
    coherence
  • skew
  • load balance

10
Pipeline or Partitioning
  • not very long operator chain
  • pipeline not available for some operators
  • aggregate
  • skew

11
Data Partitioning
  • Round-robin
  • accessing data by sequential scan
  • - frequently want to associatively access
    record

12
Data Partitioning
  • hash partition
  • accessing data by sequential scan
  • frequently want to associatively access
    record
  • - for clustering

13
Data Partitioning
  • arrange partition
  • accessing data by sequential scan
  • frequently want to associatively access
    record
  • clustering
  • - data skew execution skew

14
Using existing sequential operators
  • merge operator
  • focusing data on one spot
  • split operator
  • used in multiple parallel stages
  • flow control and buffering

15
Better algorithms
  • Minimize data flow/ tolerate data and execution
    skew
  • Join
  • 1. Sort-Merge join nlog(n)
  • 2. Hash -Join linear cost

16
summary
  • Commodity components, not special hardware
  • shared nothing architecture
  • data partition, data flow
  • only choice for some applications
  • some remaining problems
Write a Comment
User Comments (0)
About PowerShow.com