CS632 Parallel Database Systems - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

CS632 Parallel Database Systems

Description:

2 MB RAM per node. 80 Mb/s token ring. 8 x 333 MB Fujitsu drives. Unix ... ARIES. Node failure. availability in spite of processor or disk fail. mirrored disk ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 33
Provided by: rimon
Category:

less

Transcript and Presenter's Notes

Title: CS632 Parallel Database Systems


1
CS632Parallel Database Systems
  • 6 April 1999
  • Rimon Barr

2
Background
  • Database machine
  • not economical
  • parallelism
  • Software database machine
  • off-the-shelf components
  • economy of scale

3
Software database architectures
  • Shared memory
  • processors on shared bus
  • very fast, efficient communication
  • not scaleable due to resource contention
  • easy to program

4
Software database architectures
  • Shared memory
  • Shared disk
  • private memory
  • massively parallel architecture
  • Thinking machines, Intel, N-cube, VAXcluster
  • failed losses at both ends of the spectrum

5
Software database architectures
  • Shared memory
  • Shared disk
  • Shared nothing
  • highly scaleable
  • processors with private memory and disks
  • communication via interconnect

6
Parallel computation metrics
  • Scaleup
  • add processors in proportion to problem size
  • Speedup
  • add processors to fixed size problem

7
The Gamma Database Machine Project
  • DeWitt, Ghandeharizadeh, Schneider, Bricker,
    Hsiao, Rasmussen.

8
History - DIRECT
  • early database machine project
  • showed parallelism useful for db apps
  • flaws curtailed scalability
  • shared memory
  • central control of execution

9
Gamma hardware (v1.0)
  • 17 VAX 11/750 processors
  • 2 MB RAM per node
  • 80 Mb/s token ring
  • 8 x 333 MB Fujitsu drives
  • Unix

10
Gamma hardware (v1.0) issues
  • 2K DB pages due to token ring
  • Unibus congestion (network faster?)
  • corrected with a backplane card
  • VAX obsolete
  • 2MB with no virtual memory was tight

11
Gamma hardware (v2.0)
  • iPSC/2 Intel hypercube
  • 32 x386 processors
  • 8MB of memory
  • 330MB Maxtor drive / node (45KB cache)
  • 8 routing modules
  • 2.8 Mb/s
  • full duplex, serial, reliable

12
Gamma software (v2.0)
  • OS NOSE
  • entire DB in one NX/2 process
  • details renaming nodes
  • 10 CPU used for copying
  • excessive interrupts during I/O

13
Data storage
  • Horizontal partitioning
  • round-robin
  • hashed
  • range partitioned
  • should partition relations based on heat

14
Components
  • Catalog manager
  • repository for db schema and meta-information
  • Query manager
  • one associated with each user
  • Scheduler processes
  • coordinates multi-site queries
  • Operator processes
  • executes single relational operator

15
Components
16
Query processing
  • Ad-hoc and embedded query interfaces
  • standard parsing, optimization, code gen.
  • left deep trees only
  • hash joins only
  • at most two join operators active simul.
  • split tables
  • new relational operators

17
Split table
  • Directs operator output to appropriate node

18
Query processing an example
19
Query processing an example
20
Algorithms selection
  • start selection operator on each node
  • semantic exclusion of nodes possible for hash and
    range partitioning
  • throughput considerations
  • transfer tuples in blocks
  • one page read-ahead

21
Algorithms join
  • Partition into buckets, join buckets
  • Implemented
  • sort-merge, Grace, Simple, Hybrid
  • equi-joins
  • centralized hash join (discuss)
  • parallel hash join (discuss)

22
Algorithms aggregation
  • compute partial results for each partition
  • collect each group-by partition at single node,
    using hash function

23
Algorithms update
  • standard techniques
  • exception update to partitioning attribute may
    cause tuple to move

24
Concurrency control
  • 2PL
  • Granularity file and page
  • Modes S, X, IS, IX, SIX
  • local wait-for graphs
  • centralized multi-site lock detector

25
Logging and recovery
  • Log sequence number, LSN
  • LSN
  • Processor i directs log records at log manager (i
    mod M), where M log mgrs.
  • standard WAL protocol
  • ARIES

26
Node failure
  • availability in spite of processor or disk fail
  • mirrored disk (Tandem)
  • interleaved declustering (Teradata)
  • chained declustering
  • (discuss)
  • load redirection results in 1/n increase

27
Node failure - declustering
28
Node failure - load redirection
29
Performance experiments
  • Selection
  • Relation size
  • Speedup
  • Scaleup
  • Join
  • (similar)
  • Aggregate
  • Update (no recovery)

30
DiscussionThe Future of High Performance
Database Systems
  • David DeWitt, Jim Gray.

31
Current day benchmarks
  • TPC-C Online Transaction Processing
  • TPC-D Decision Support
  • IBM DB2 v5.2 TPC-D, 1000GB, NT

32
CS632Parallel Database Systems
Write a Comment
User Comments (0)
About PowerShow.com