Source: - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Source:

Description:

... this presentation is licensed under the Creative Commons Attribution 2.5 license. ... Common cause of slowdown: one 'straggler': a machine that takes a lot of time ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 29
Provided by: Bohm
Category:
Tags: source

less

Transcript and Presenter's Notes

Title: Source:


1
MapReduce
  • Source
  • MapReduce Simplified Data Processing on Large
    Clusters
  • Jeffrey Dean and Sanjay Ghemawat, Google inc.
  • (wim bohm, cs.colostate.edu)

Except as otherwise noted, the content of this
presentation is licensed under the Creative
Commons Attribution 2.5 license.
2
MapReduce Concept
  • Simple implicitly // programming model
  • Based on Lisps Map and Reduce higher order
    functions
  • Lisp Map(fM,L) fM(first(L)) Map(fM, rest(L))
  • Lisp Reduce(fR,L) fR(first(L), Reduce(fR,
    rest(L)))
  • Lisp MapReduce(fM,fR,L) Reduce(fR,Map(fm,L))
  • Lisp Lots of Irritating Superfluous Parentheses
  • (left base cases out)
  • Very savvy implementation
  • Hi throughput, hi performance, rack aware
  • Functional RTS takes care of FT, restart,
    Distribution (//ism)

3
Introduction
  • Data center apps special type of // programs
    processing large amounts of data on large
    clusters Complexity
  • Much of this complexity is NOT in the actual
    computation, but in the data distribution,
    replication, access, in FT, restart etc. These
    issues arise for ALL the data center apps.
  • This has given rise to the MapReduce abstraction
    and implementation

4
Map and Reduce
  • Map take a set of (key,value) pairs and
    generate a set of intermediate (key,value) pairs
    by applying some function f to all these pairs
  • Reduce merge all pairs with same key applying a
    reduction function R on the values
  • f and R are user defined
  • All implemented in a non functional language such
    as java, C, python

5
Wordcount
  • Map(String key, String value)
  • // key doc name, value doc contents
  • for each word w in value
  • EmitIntermediate(w, 1)
  • Reduce(String key, Iterator values)
  • // key word, values list of counts
  • int sum 0
  • for each v in values sum ParseInt(v)
  • Emit((String) sum)

6
Types
  • Map (keytype1,valuetype1) -gt
  • list( (keytype2, valuetype2) )
  • Reduce (keytype2, list(valuetype2)) -gt
  • list ( valuetype2)
  • Types 12 passed between user functions can be
    any valid (e.g. java type)
  • Communication goes through files,the types are
    eg. longWritable (see examples)

7
Example Pi-Estimator
  • Idea generate random points in a square
  • Count how many are inside circle, how many in the
    square (producing area estimates)
  • Square area As 4 r2 -gt r2 As / 4
  • Circle area Ac pi r2 -gt pi Ac
    / r2
  • -gt pi 4Ac / As
  • Example of Monte Carlo method simulating a
    physical phenomenon using many random samples

8
Worker / Multi-threading view
  • Master
  • get input params (nWorkers, nPoints)
  • for(i0 ilt nWorkers i) thrCreate(i,
    nPoints)
  • for(i0 ilt nWorkers i) join
  • As 0 Ac 0
  • for(i0 iltnWorkers i) As nPoints
    AcncPointsi
  • piEst 4Ac / As
  • Slavei
  • cPointsi0
  • for(i0 iltnPointsi)
  • create 2 random pts x,y in (-.5 .. .5)
  • if (sqrt(xxyy)lt.5) cPointsi

9
Multithreading vs Lisp functional
  • Multithreading view assumes
  • We can spawn threads and join them back
  • We have shared memory
  • If there are read/write hazards, we use
    explicit mutex locks
  • Therefore we have parallelism
  • Lisp functional uses map/reduce lists
  • List of worker numbers, MAPped to list of cPoints
  • List of cPoints reduced to sumCpoints
  • sumcPoints used to estimate pi
  • The lists make this inherently SEQUENTIAL

10
We want MapReduce to be parallel!
  • Just like in multithreading, we need some kind of
    spawn(id,func,data) construct
  • In Lisp the spawn is taken care of by higher
    order function mechanism
  • reduce(rFun,map(mFun,inList))
  • In MapReduce we use method override to define our
    specific versions of map and reduce, and we have
    a Driver that creates a Job Configuration to
    provide parallelism.

11
We need to communicate results
  • Somehow the map processes need input
  • (key1,val1) pairs and need to produce
    intermediate (key2,val2) pairs, that the reduce
    process can pick up.
  • But we are in a distributed environment
  • What provides a shared name space?
  • the file system!
  • functional HDFS allows for parallelism

12
What about parallel writes?
  • HDFs no parallel writes
  • GFS parallel append type writes
  • MapReduce parallel processes doing potentially
    parallel writes, guarantees writes to be atomic
    ops If process 1 writes aaaaa, and process 2
    writes bbbbb, we get aaaaabbbbb or
    bbbbbaaaaa,never something like ababababab.
  • The data written by one process, occurs in the
    order written by that process

13
Parallel writes vs multithreading
  • Parallel writes are like multiple threads
    appending to a mutex-lock protected list.
  • The list is just a collection of unordered
    records.
  • The reducer has to be aware of this
  • Either it can impose an order
  • Or it can make sure the reduction function is
    associative and commutative
  • Take // grep if you want outcomes sorted by line
    , make
  • line part of the key, and sort

14
MapReduce for PiEstimator
  • MapReduce is integrated into Eclipse
  • We need to have the MapReduce plugins to create a
    MapReduce Eclipse perspective.
  • MapReduce projects contain three classes
  • 1. A Driver (like the master in the
    multithreading case)
  • Creating a configuaration, defining
    mappers, reducers,
  • starting the app, dealing with the final
    result gathering.
  • 2. A mapper (inherited class implementing
    mapper interface)
  • Getting data from files in a directory
    specified by driver.
  • 3. A reducer (inherited class implementing
    reducer interface)
  • Getting data from files in a directory
    specified by driver,
  • produced by mappers.

15
Pi
  • Two versions
  • 1. mypi nMaps, nSamples
  • Each of the n Maps does n Samples
  • More Maps more work, hopefully better
  • result.
  • 2. mypi2 nMaps, nSamples, nReps
  • Each of the n Maps does nSamples/nMaps
  • samples. Always same amount of work. Do
  • nReps times for speedup experiment

16
Mypi2 on Laptop
Multiple Sequential Mappers do not bring the
performance down
17
Mypi2 on Hadoop cluster
Twenty parallel Mappers five fold
speedup Twelve seems better
18
Other example grep
  • Input DIRECTORY to output DIRECTORY
  • Whole app written in one class
  • Not 3 driver, mapper, reducer
  • Uses a lot of support code sort, regular expr
    scanner
  • Deals with regular expressions like
  • (app ban coc ) .

19
MapReduce Google implementation
  • Large clusters of commodity PCs connected with
    switched Ethernet.
  • Luiz A.Barrosso, Jeffrey Dean, and Urs
    Holzle. Web search for a planet the Google
    cluster architecture. IEEE Micro, 23(2)22-28,
    April 2003.
  • Nodes dual-processor x86, Linux,2-4GB of memory
  • Storage local disks on individual nodes
  • GFS (Googles original file system, used by HDFS)
  • Jobs (sets of tasks) submitted to scheduler,
    IMPLICITLY mapped to set of available nodes

20
user program
Execution overview
(1)fork
master
(1)fork
(1)fork
(2)assign map
(2)assign reduce
worker
(6)write
output file 0
split0
worker
(4)local write
split1
(5)remote read
(3)read
worker
split2
output file 1
worker
split3
worker
split4
Input Map Intermediate
Reduce Output Files phase
local files phase files
21
Execution overview
  • 1. Input files are split into M pieces (16 to 64
    MB)
  • Many worker copies of the program are forked.
  • 2. One special copy, the master, assigns map and
    reduce tasks to idle slave workers
  • 3. Map workers read input splits, parse
    (key,value) pairs, apply the map function, create
    buffered output pairs.

22
Execution overview cont
  • 4. Buffered output pairs are periodically written
    to local disk, partitioned into R regions,
    locations of regions are passed back to the
    master.
  • 5. Master notifies reduce worker about locations.
    Worker uses remote procedure calls to read data
    from local disks of the map workers, sorts by
    intermediate keys to group same key records
    together.

23
Execution overview cont
  • 6. Reduce worker passes key plus corresponding
    set of all intermediate data to reduce function.
    The output of the reduce function is appended to
    the final output file.
  • 7. When all map and reduce tasks are completed
    the master wakes up the user program, which
    resumes the user code.

24
Fault Tolerance workers
  • Master pings workers periodically. No response
    worker marked as failed. Completed map tasks are
    reset to idle state, so that they can be
    restarted, because their results (local to failed
    worker) are lost.
  • Completed reduce tasks do not need to be
    re-started (output stored in global file system).
    Reduce tasks are notified of the new map tasks,
    so they can read unread data from the new
    locations.

25
Fault Tolerance Master
  • Master writes checkpoints
  • Only one master, less chance of failure
  • If master failes, MapReduce task aborts.

26
Backup tasks
  • Common cause of slowdown one straggler a
    machine that takes a lot of time because it is
    very busy.
  • Master schedules backup executions of remaining
    in-progress tasks. Task marked completed when
    who-ever finishes it first.
  • Smart mechanism, needs tuning.
  • E.g. Sort is 44 slower if backup mechanism not
    used.

27
File names as keys
  • MapReduce programs take an input DIRECTORY and
    produce an output DIRECTORY.
  • The files in the input directory are broken into
    almost equal shards and handed to mappers.
  • The default key value pair is
  • byte offset of first char in line, line
    content
  • byte offset allows quick file access
  • What if we want file name as key?
  • we have to write our own recordreader

28
Steps towards an ls in MapReduce
  • Created WholeFileRecordReader.java
  • implements RecordReaderltText,Textgt. Text
    implements both writable and writableComparable.
  • The user driver (here ls_driver.java) calls the
    runjob driver that, in order to put chards
    together, calls recordreader.
  • ls_driver specifies inputFormat to be
    MultiFileContentInputFormat, which specifies Text
    for the input and output format and returns our
    RecordReader WholeFileRecordReader
  • Eclipse produced the method stubs
  • Most methods straight ahead
  • The interesting one next (produce next record)
  • Our next produces ltfileName, fileSizegt or
    ltfileName,contentgt
  • Probably better ltfileName, pathgt so parallel
    mappers read content
Write a Comment
User Comments (0)
About PowerShow.com