Source: - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Source:

Description:

... this presentation is licensed under the Creative Commons Attribution 2.5 license. ... Common cause of slowdown: one 'straggler': a machine that takes a lot of time ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 29

Provided by: Bohm

Category:

Tags: source

more less

Transcript and Presenter's Notes

Title: Source:

1
MapReduce

Source
MapReduce Simplified Data Processing on Large
Clusters
Jeffrey Dean and Sanjay Ghemawat, Google inc.
(wim bohm, cs.colostate.edu)

Except as otherwise noted, the content of this
presentation is licensed under the Creative
Commons Attribution 2.5 license.
2
MapReduce Concept

Simple implicitly // programming model
Based on Lisps Map and Reduce higher order
functions
Lisp Map(fM,L) fM(first(L)) Map(fM, rest(L))
Lisp Reduce(fR,L) fR(first(L), Reduce(fR,
rest(L)))
Lisp MapReduce(fM,fR,L) Reduce(fR,Map(fm,L))
Lisp Lots of Irritating Superfluous Parentheses
(left base cases out)
Very savvy implementation
Hi throughput, hi performance, rack aware
Functional RTS takes care of FT, restart,
Distribution (//ism)

3
Introduction

Data center apps special type of // programs
processing large amounts of data on large
clusters Complexity
Much of this complexity is NOT in the actual
computation, but in the data distribution,
replication, access, in FT, restart etc. These
issues arise for ALL the data center apps.
This has given rise to the MapReduce abstraction
and implementation

4
Map and Reduce

Map take a set of (key,value) pairs and
generate a set of intermediate (key,value) pairs
by applying some function f to all these pairs
Reduce merge all pairs with same key applying a
reduction function R on the values
f and R are user defined
All implemented in a non functional language such
as java, C, python

5
Wordcount

Map(String key, String value)
// key doc name, value doc contents
for each word w in value
EmitIntermediate(w, 1)
Reduce(String key, Iterator values)
// key word, values list of counts
int sum 0
for each v in values sum ParseInt(v)
Emit((String) sum)

6
Types

Map (keytype1,valuetype1) -gt
list( (keytype2, valuetype2) )
Reduce (keytype2, list(valuetype2)) -gt
list ( valuetype2)
Types 12 passed between user functions can be
any valid (e.g. java type)
Communication goes through files,the types are
eg. longWritable (see examples)

7
Example Pi-Estimator

Idea generate random points in a square
Count how many are inside circle, how many in the
square (producing area estimates)
Square area As 4 r2 -gt r2 As / 4
Circle area Ac pi r2 -gt pi Ac
/ r2
-gt pi 4Ac / As
Example of Monte Carlo method simulating a
physical phenomenon using many random samples

8
Worker / Multi-threading view

Master
get input params (nWorkers, nPoints)
for(i0 ilt nWorkers i) thrCreate(i,
nPoints)
for(i0 ilt nWorkers i) join
As 0 Ac 0
for(i0 iltnWorkers i) As nPoints
AcncPointsi
piEst 4Ac / As
Slavei
cPointsi0
for(i0 iltnPointsi)
create 2 random pts x,y in (-.5 .. .5)
if (sqrt(xxyy)lt.5) cPointsi

9
Multithreading vs Lisp functional

Multithreading view assumes
We can spawn threads and join them back
We have shared memory
If there are read/write hazards, we use
explicit mutex locks
Therefore we have parallelism
Lisp functional uses map/reduce lists
List of worker numbers, MAPped to list of cPoints
List of cPoints reduced to sumCpoints
sumcPoints used to estimate pi
The lists make this inherently SEQUENTIAL

10
We want MapReduce to be parallel!

Just like in multithreading, we need some kind of
spawn(id,func,data) construct
In Lisp the spawn is taken care of by higher
order function mechanism
reduce(rFun,map(mFun,inList))
In MapReduce we use method override to define our
specific versions of map and reduce, and we have
a Driver that creates a Job Configuration to
provide parallelism.

11
We need to communicate results

Somehow the map processes need input
(key1,val1) pairs and need to produce
intermediate (key2,val2) pairs, that the reduce
process can pick up.
But we are in a distributed environment
What provides a shared name space?
the file system!
functional HDFS allows for parallelism

12
What about parallel writes?

HDFs no parallel writes
GFS parallel append type writes
MapReduce parallel processes doing potentially
parallel writes, guarantees writes to be atomic
ops If process 1 writes aaaaa, and process 2
writes bbbbb, we get aaaaabbbbb or
bbbbbaaaaa,never something like ababababab.
The data written by one process, occurs in the
order written by that process

13
Parallel writes vs multithreading

Parallel writes are like multiple threads
appending to a mutex-lock protected list.
The list is just a collection of unordered
records.
The reducer has to be aware of this
Either it can impose an order
Or it can make sure the reduction function is
associative and commutative
Take // grep if you want outcomes sorted by line
, make
line part of the key, and sort

14
MapReduce for PiEstimator

MapReduce is integrated into Eclipse
We need to have the MapReduce plugins to create a
MapReduce Eclipse perspective.
MapReduce projects contain three classes
1. A Driver (like the master in the
multithreading case)
Creating a configuaration, defining
mappers, reducers,
starting the app, dealing with the final
result gathering.
2. A mapper (inherited class implementing
mapper interface)
Getting data from files in a directory
specified by driver.
3. A reducer (inherited class implementing
reducer interface)
Getting data from files in a directory
specified by driver,
produced by mappers.

15
Pi

Two versions
1. mypi nMaps, nSamples
Each of the n Maps does n Samples
More Maps more work, hopefully better
result.
2. mypi2 nMaps, nSamples, nReps
Each of the n Maps does nSamples/nMaps
samples. Always same amount of work. Do
nReps times for speedup experiment

16
Mypi2 on Laptop
Multiple Sequential Mappers do not bring the
performance down
17
Mypi2 on Hadoop cluster
Twenty parallel Mappers five fold
speedup Twelve seems better
18
Other example grep

Input DIRECTORY to output DIRECTORY
Whole app written in one class
Not 3 driver, mapper, reducer
Uses a lot of support code sort, regular expr
scanner
Deals with regular expressions like
(app ban coc ) .

19
MapReduce Google implementation

Large clusters of commodity PCs connected with
switched Ethernet.
Luiz A.Barrosso, Jeffrey Dean, and Urs
Holzle. Web search for a planet the Google
cluster architecture. IEEE Micro, 23(2)22-28,
April 2003.
Nodes dual-processor x86, Linux,2-4GB of memory
Storage local disks on individual nodes
GFS (Googles original file system, used by HDFS)
Jobs (sets of tasks) submitted to scheduler,
IMPLICITLY mapped to set of available nodes

20
user program
Execution overview
(1)fork
master
(1)fork
(1)fork
(2)assign map
(2)assign reduce
worker
(6)write
output file 0
split0
worker
(4)local write
split1
(5)remote read
(3)read
worker
split2
output file 1
worker
split3
worker
split4
Input Map Intermediate
Reduce Output Files phase
local files phase files
21
Execution overview

1. Input files are split into M pieces (16 to 64
MB)
Many worker copies of the program are forked.
2. One special copy, the master, assigns map and
reduce tasks to idle slave workers
3. Map workers read input splits, parse
(key,value) pairs, apply the map function, create
buffered output pairs.

22
Execution overview cont

4. Buffered output pairs are periodically written
to local disk, partitioned into R regions,
locations of regions are passed back to the
master.
5. Master notifies reduce worker about locations.
Worker uses remote procedure calls to read data
from local disks of the map workers, sorts by
intermediate keys to group same key records
together.

23
Execution overview cont

6. Reduce worker passes key plus corresponding
set of all intermediate data to reduce function.
The output of the reduce function is appended to
the final output file.
7. When all map and reduce tasks are completed
the master wakes up the user program, which
resumes the user code.

24
Fault Tolerance workers

Master pings workers periodically. No response
worker marked as failed. Completed map tasks are
reset to idle state, so that they can be
restarted, because their results (local to failed
worker) are lost.
Completed reduce tasks do not need to be
re-started (output stored in global file system).
Reduce tasks are notified of the new map tasks,
so they can read unread data from the new
locations.

25
Fault Tolerance Master

Master writes checkpoints
Only one master, less chance of failure
If master failes, MapReduce task aborts.

26
Backup tasks

Common cause of slowdown one straggler a
machine that takes a lot of time because it is
very busy.
Master schedules backup executions of remaining
in-progress tasks. Task marked completed when
who-ever finishes it first.
Smart mechanism, needs tuning.
E.g. Sort is 44 slower if backup mechanism not
used.

27
File names as keys

MapReduce programs take an input DIRECTORY and
produce an output DIRECTORY.
The files in the input directory are broken into
almost equal shards and handed to mappers.
The default key value pair is
byte offset of first char in line, line
content
byte offset allows quick file access
What if we want file name as key?
we have to write our own recordreader

28
Steps towards an ls in MapReduce

Created WholeFileRecordReader.java
implements RecordReaderltText,Textgt. Text
implements both writable and writableComparable.
The user driver (here ls_driver.java) calls the
runjob driver that, in order to put chards
together, calls recordreader.
ls_driver specifies inputFormat to be
MultiFileContentInputFormat, which specifies Text
for the input and output format and returns our
RecordReader WholeFileRecordReader
Eclipse produced the method stubs
Most methods straight ahead
The interesting one next (produce next record)
Our next produces ltfileName, fileSizegt or
ltfileName,contentgt
Probably better ltfileName, pathgt so parallel
mappers read content