Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

Description:

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly Microsoft Research ... – PowerPoint PPT presentation

Number of Views:268
Avg rating:3.0/5.0
Slides: 15
Provided by: MichaelA184
Category:

less

Transcript and Presenter's Notes

Title: Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks


1
Dryad Distributed Data-Parallel Programs from
SequentialBuilding Blocks
  • Michael Isard, Mihai Budiu, Yuan Yu,
  • Andrew Birrell, Dennis Fetterly
  • Microsoft Research, Silicon Valley

Eurosys 2007
2
Overview
  • Mechanism to express parallel computation
  • Large scale internet services
  • Chip multiprocessors
  • Drawing on previous work
  • Condor
  • GPU shader languages
  • MapReduce
  • Parallel databases

3
Main Idea
  • Represent the computation as a DAG of
    communicating sequential processes

4
Claims
  • The Dryad execution engine
  • Schedules across resources
  • Optimizes the level of concurrency in a node
  • Manages failure resilience
  • Delivers data where needed
  • Performance is good and scales
  • The programming abstraction is at the right level
  • API mastery only takes a couple of weeks
  • Higher level abstractions have been built on Dryad

5
Dryad System
  • Name server enumerates all resources
  • Including location relative to other resources
  • Daemon running on each machine for vertex dispatch

6
Communication
7
Constructing the Job
  • Use graph operators implemented in C to
    describe the graph.

8
Database Query Example
9
Execution
  • Job manager not currently fault tolerant
  • Vertices may be scheduled multiple times
  • Each execution versioned
  • Execution record kept- including versions of
    incoming vertices
  • Outputs are uniquely named (versioned)
  • Final outputs selected if job completes
  • Non-file communication may cascade failures
  • Vertices specify hard constraints or preferences
    for placement
  • Scheduling is greedy assuming only one job

10
Run-time Graph Refinement
11
Results I
  • SQL Query
  • 10 Machines
  • 2 dualcore 2 GHz
  • 8 GB Mem
  • 1 Gb Ethernet
  • 4x400GB disks
  • Winows Server 2003

12
Results II
  • Map then reduce style
  • Builds histogram of MSN Search query frequency
  • 1800 Machines
  • 10.2 TB source data
  • 11072 Vertices

13
Refinement
14
Dryad
Write a Comment
User Comments (0)
About PowerShow.com