Distributed Computations MapReduce/Dryad - PowerPoint PPT Presentation

About This Presentation

Title:

Distributed Computations MapReduce/Dryad

Description:

Distributed Computations MapReduce/Dryad M/R s adapted from those of Jeff Dean s Dryad s adapted from those of Michael Isard ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 51

Provided by: Jinya1

Learn more at: https://news.cs.nyu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Computations MapReduce/Dryad

1
Distributed ComputationsMapReduce/Dryad

M/R slides adapted from those of Jeff Deans
Dryad slides adapted from those of Michael Isard

2
What weve learnt so far

Basic distributed systems concepts
Consistency (sequential, eventual)
Concurrency
Fault tolerance (recoverability, availability)
What are distributed systems good for?
Better fault tolerance
Better security?
Increased storage/serving capacity
Storage systems, email clusters
Parallel (distributed) computation (Todays topic)

3
Why distributed computations?

How long to sort 1 TB on one computer?
One computer can read 60MB from disk
Takes 1 days!!
Google indexes 100 billion web pages
100 109 pages 20KB/page 2 PB
Large Hadron Collider is expected to produce 15
PB every year!

4
Solution use many nodes!

Cluster computing
Hundreds or thousands of PCs connected by high
speed LANs
Grid computing
Hundreds of supercomputers connected by high
speed net
1000 nodes potentially give 1000X speedup

5
Distributed computations are difficult to program

Sending data to/from nodes
Coordinating among nodes
Recovering from node failure
Optimizing for locality
Debugging

6
MapReduce

A programming model for large-scale computations
Process large amounts of input, produce output
No side-effects or persistent state (unlike file
system)
MapReduce is implemented as a runtime library
automatic parallelization
load balancing
locality optimization
handling of machine failures

7
MapReduce design

Input data is partitioned into M splits
Map extract information on each split
Each Map produces R partitions
Shuffle and sort
Bring M partitions to the same reducer
Reduce aggregate, summarize, filter or transform
Output is in R result files

8
More specifically

Programmer specifies two methods
map(k, v) ? ltk', v'gt
reduce(k', ltv'gt) ? ltk', v'gt
All v' with same k' are reduced together, in
order.
Usually also specify
partition(k, total partitions) -gt partition for
k
often a simple hash of the key
allows reduce operations for different k to be
parallelized

9
Example Count word frequencies in web pages

Input is files with one doc per record
Map parses documents into words
key document URL
value document contents
Output of map

doc1, to be or not to be
10
Example word frequencies

Reduce computes sum for a key
Output of reduce saved

key be values 1, 1
be, 2 not, 1 or, 1 to, 2
11
Example Pseudo-code

Map(String input_key, String input_value)
//input_key document name //input_value
document contents for each word w in
input_values EmitIntermediate(w, "1")
Reduce(String key, Iterator intermediate_values)
//key a word, same for input and output
//intermediate_values a list of counts int
result 0 for each v in intermediate_values
result ParseInt(v) Emit(AsString(result))

12
MapReduce is widely applicable

Distributed grep
Document clustering
Web link graph reversal
Detecting approx. duplicate web pages

13
MapReduce implementation

Input data is partitioned into M splits
Map extract information on each split
Each Map produces R partitions
Shuffle and sort
Bring M partitions to the same reducer
Reduce aggregate, summarize, filter or transform
Output is in R result files

14
MapReduce scheduling

One master, many workers
Input data split into M map tasks (e.g. 64 MB)
R reduce tasks
Tasks are assigned to workers dynamically
Often M200,000 R4,000 workers2,000

15
MapReduce scheduling

Master assigns a map task to a free worker
Prefers close-by workers when assigning task
Worker reads task input (often from local disk!)
Worker produces R local files containing
intermediate k/v pairs
Master assigns a reduce task to a free worker
Worker reads intermediate k/v pairs from map
workers
Worker sorts applies users Reduce op to
produce the output

16
Parallel MapReduce
Input data
Map
Map
Map
Map
Master
Partitioned output
17
WordCount Internals

Input data is split into M map jobs
Each map job generates in R local partitions

to, 1 be, 1 or, 1 not, 1 to, 1
doc1, to be or not to be
18
WordCount Internals

Shuffle brings same partitions to same reducer

to,1,1
be,1
R local partitions
not,1 or, 1
do,1
R local partitions
be,1
not,1
19
WordCount Internals

Reduce aggregates sorted key values pairs

do,1
to,1,1
be,1,1
not,1,1
or, 1
20
The importance of partition function

partition(k, total partitions) -gt partition for
k
e.g. hash(k) R
What is the partition function for sort?

21
Load Balance and Pipelining

Fine granularity tasks many more map tasks than
machines
Minimizes time for fault recovery
Can pipeline shuffling with map execution
Better dynamic load balancing
Often use 200,000 map/5000 reduce tasks w/ 2000
machines

22
Fault tolerance via re-execution

On worker failure
Re-execute completed and in-progress map tasks
Re-execute in progress reduce tasks
Task completion committed through master
On master failure
State is checkpointed to GFS new master recovers
continues

23
Avoid straggler using backup tasks

Slow workers significantly lengthen completion
time
Other jobs consuming resources on machine
Bad disks with soft errors transfer data very
slowly
Weird things processor caches disabled (!!)
An unusually large reduce partition?
Solution Near end of phase, spawn backup copies
of tasks
Whichever one finishes first "wins"
Effect Dramatically shortens job completion time

24
MapReduce Sort Performance

1TB (100-byte record) data to be sorted
1700 machines
M15000 R4000

25
MapReduce Sort Performance
When can shuffle start?
When can reduce start?
26
Dryad

Slides adapted from those of Yuan Yu and Michael
Isard

27
Dryad

Similar goals as MapReduce
focus on throughput, not latency
Automatic management of scheduling, distribution,
fault tolerance
Computations expressed as a graph
Vertices are computations
Edges are communication channels
Each vertex has several input and output edges

28
WordCount in Dryad
Count Wordn
MergeSort Wordn
Distribute Wordn
Count Wordn
29
Why using a dataflow graph?

Many programs can be represented as a distributed
dataflow graph
The programmer may not have to know this
SQL-like queries LINQ
Dryad will run them for you

30
Runtime

Vertices (V) run arbitrary app code
Vertices exchange data through
files, TCP pipes etc.
Vertices communicate with JM to report
status

Daemon process (D)
executes vertices

Job Manager (JM) consults name server(NS)
to discover available machines.
JM maintains job graph and schedules vertices

31
Job Directed Acyclic Graph
Outputs
Processing vertices
Channels (file, pipe, shared memory)
Inputs
32
Scheduling at JM

General scheduling rules
Vertex can run anywhere once all its inputs are
ready
Prefer executing a vertex near its inputs
Fault tolerance
If A fails, run it again
If As inputs are gone, run upstream vertices
again (recursively)
If A is slow, run another copy elsewhere and use
output from whichever finishes first

33
Advantages of DAG over MapReduce

Big jobs more efficient with Dryad
MapReduce big job runs gt1 MR stages
reducers of each stage write to replicated
storage
Output of reduce 2 network copies, 3 disks
Dryad each job is represented with a DAG
intermediate vertices write to local file

34
Advantages of DAG over MapReduce

Dryad provides explicit join
MapReduce mapper (or reducer) needs to read from
shared table(s) as a substitute for join
Dryad explicit join combines inputs of different
types
Dryad Split produces outputs of different types
Parse a document, output text and references

35
DAG optimizations merge tree
36
DAG optimizations merge tree
37
Dryad Optimizations data-dependent
re-partitioning
Distribute to equal-sized ranges
Sample to estimate histogram
Randomly partitioned inputs
38
Dryad example 1SkyServer Query

3-way join to find gravitational lens effect
Table U (objId, color) 11.8GB
Table N (objId, neighborId) 41.8GB
Find neighboring stars with similar colors
Join UN to find
T N.neighborID where U.objID N.objID, U.color
Join UT to find
U.objID where U.objID T.neighborID
and U.color T.color

39
SkyServer query
40
(No Transcript)
41
Dryad example 2 Query histogram computation

Input log file (n partitions)
Extract queries from log partitions
Re-partition by hash of query (k buckets)
Compute histogram within each bucket

42
Naïve histogram topology
P parse lines D hash distribute S quicksort C
count occurrences MS merge sort
43
Efficient histogram topology
P parse lines D hash distribute S quicksort C
count occurrences MS merge sort M
non-deterministic merge
Q'
is

Each
k
Each
T
k
C
R
R
is

Each
R
S
D
is

T
C
P
C
Q'
MS
M
MS
n
44
MS?C
R
R
R
MS?C?D
T
M?P?S?C
Q
P parse lines D hash distribute S quicksort MS mer
ge sort C count occurrences M non-deterministic
merge
45
MS?C
R
R
R
MS?C?D
T
M?P?S?C
Q
Q
Q
Q
P parse lines D hash distribute S quicksort MS mer
ge sort C count occurrences M non-deterministic
merge
46
MS?C
R
R
R
MS?C?D
T
T
M?P?S?C
Q
Q
Q
Q
P parse lines D hash distribute S quicksort MS mer
ge sort C count occurrences M non-deterministic
merge
47
MS?C
R
R
R
MS?C?D
T
T
M?P?S?C
Q
Q
Q
Q
P parse lines D hash distribute S quicksort MS mer
ge sort C count occurrences M non-deterministic
merge
48
MS?C
R
R
R
MS?C?D
T
T
M?P?S?C
Q
Q
Q
Q
P parse lines D hash distribute S quicksort MS mer
ge sort C count occurrences M non-deterministic
merge
49
MS?C
R
R
R
MS?C?D
T
T
M?P?S?C
Q
Q
Q
Q
P parse lines D hash distribute S quicksort MS mer
ge sort C count occurrences M non-deterministic
merge
50
Final histogram refinement
1,800 computers 43,171 vertices 11,072
processes 11.5 minutes

Write a Comment

User Comments (0)