HaLoop: Efficient Iterative Data Processing On Large Scale Clusters - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

HaLoop: Efficient Iterative Data Processing On Large Scale Clusters

Description:

HaLoop: Efficient Iterative Data Processing On Large Scale Clusters Horizon Yingyi Bu, UC Irvine Bill Howe, UW Magda Balazinska, UW Michael Ernst, UW – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 42
Provided by: william511
Category:

less

Transcript and Presenter's Notes

Title: HaLoop: Efficient Iterative Data Processing On Large Scale Clusters


1
HaLoop Efficient Iterative Data Processing On
Large Scale Clusters
  • Yingyi Bu, UC Irvine
  • Bill Howe, UW
  • Magda Balazinska, UW
  • Michael Ernst, UW

Horizon
http//clue.cs.washington.edu/
Award IIS 0844572 Cluster Exploratory (CluE)
http//escience.washington.edu/
VLDB 2010, Singapore
2
Thesis in one slide
  • Observation MapReduce has proven successful as a
    common runtime for non-recursive declarative
    languages
  • HIVE (SQL)
  • Pig (RA with nested types)
  • Observation Many people roll their own loops
  • Graphs, clustering, mining, recursive queries
  • iteration managed by external script
  • Thesis With minimal extensions, we can provide
    an efficient common runtime for recursive
    languages
  • Map, Reduce, Fixpoint

3
Related Work Twister Ekanayake HPDC 2010
  • Redesigned evaluation engine using pub/sub
  • Termination condition evaluated by main()

13. while(!complete) 14. monitor
driver.runMapReduceBCast(cData) 15.
monitor.monitorTillCompletion() 16.
DoubleVectorData newCData ((KMeansCombiner)
driver .getCurrentCombiner(
)).getResults() 17. totalError getError(cData,
newCData) 18. cData newCData 19. if
(totalError lt THRESHOLD) 20. complete
true 21. break 22. 23.
4
In Detail PageRank (Twister)
while (!complete) // start the pagerank map
reduce process monitor driver.runMapReduceBCas
t(new BytesValue(tmpCompressedDvd.getBy
tes())) monitor.monitorTillCompletion() //
get the result of process newCompressedDvd
((PageRankCombiner) driver.getCurrentCo
mbiner()).getResults() // decompress the
compressed pagerank values newDvd
decompress(newCompressedDvd) tmpDvd
decompress(tmpCompressedDvd) totalError
getError(tmpDvd, newDvd) // get the
difference between new and old pagerank values
if (totalError lt tolerance) complete
true tmpCompressedDvd newCompressedDvd
run MR
term. cond.
5
Related Work Spark Zaharia HotCloud 2010
  • Reduction output collected at driver program
  • does not currently support a grouped reduce
    operation as in MapReduce

all output sent to driver.
val spark new SparkContext(ltMesos mastergt) var
count spark.accumulator(0) for (i lt-
spark.parallelize(1 to 10000, 10)) val x
Math.random 2 - 1 val y Math.random 2 -
1 if (xx yy lt 1) count 1 println("Pi
is roughly " 4 count.value / 10000.0)
6
Related Work Pregel Malewicz PODC 2009
  • Graphs only
  • clustering k-means, canopy, DBScan
  • Assumes each vertex has access to outgoing edges
  • So an edge representation
  • requires offline preprocessing
  • perhaps using MapReduce

Edge(from, to)
7
Related Work Piccolo Power OSDI 2010
  • Partitioned table data model, with user-defined
    partitioning
  • Programming model
  • message-passing with global synchronization
    barriers
  • User can give locality hints
  • Worth exploring a direct comparison

GroupTables(curr, next, graph)
8
Related Work BOOM c.f. Alvaro EuroSys 10
  • Distributed computing based on Overlog (Datalog
    temporal logic more)
  • Recursion supported naturally
  • app API-compliant implementation of MR
  • Worth exploring a direct comparison

9
Details
  • Architecture
  • Programming Model
  • Caching (and Indexing)
  • Scheduling

10
Example 1 PageRank
Rank Table R0
url rank
www.a.com 1.0
www.b.com 1.0
www.c.com 1.0
www.d.com 1.0
www.e.com 1.0
Linkage Table L
Ri1
url_src url_dest
www.a.com www.b.com
www.a.com www.c.com
www.c.com www.a.com
www.e.com www.c.com
www.d.com www.b.com
www.c.com www.e.com
www.e.com www.c.om
www.a.com www.d.com
p(url_dest, ?url_destSUM(rank))
Ri.rank Ri.rank/?urlCOUNT(url_dest)
Rank Table R3
url rank
www.a.com 2.13
www.b.com 3.89
www.c.com 2.60
www.d.com 2.60
www.e.com 2.13
Ri.url L.url_src
Ri
L
11
A MapReduce Implementation
Join compute rank
Aggregate
fixpoint evaluation
Ri
M
M
r
M
r
r
L-split0
M
r
M
r
M
r
L-split1
M
Converged?
ii1
Client
done
12
Whats the problem?
Ri
m
M
r
M
r
r
L-split0
m
r
M
r
M
r
3.
L-split1
m
2.
1.
L is loop invariant, but
  1. L is loaded on each iteration
  2. L is shuffled on each iteration
  3. Fixpoint evaluated as a separate MapReduce job
    per iteration

plus
13
Example 2 Transitive Closure
Friend
Find all transitive friends of Eric
R0
Eric, Eric
Eric, Elisa
R1
Eric, Tom Eric, Harry
R2
R3

(semi-naïve evaluation)
14
Example 2 in MapReduce
(compute next generation of friends)
(remove the ones weve already seen)
Join
Dupe-elim
Si
M
M
r
r
Friend0
M
r
M
r
Friend1
M
Anything new?
Client
ii1
done
15
Whats the problem?
(compute next generation of friends)
(remove the ones weve already seen)
Join
Dupe-elim
Si
M
M
r
r
Friend0
M
M
r
r
Friend1
2.
M
1.
Friend is loop invariant, but
  1. Friend is loaded on each iteration
  2. Friend is shuffled on each iteration

16
Example 3 k-means
ki
k centroids at iteration i
ki
P0
M
r
ki
P1
ki1
M
r
ki
P2
M
ki - ki1 lt threshold?
Client
ii1
done
17
Whats the problem?
ki
k centroids at iteration i
ki
P0
M
r
ki
P1
ki1
M
r
ki
P2
M
1.
ki - ki1 lt threshold?
Client
ii1
done
P is loop invariant, but
  1. P is loaded on each iteration

18
Approach Inter-iteration caching
Loop body
Reducer output cache (RO)
Reducer input cache (RI)
Mapper output cache (MO)
Mapper input cache (MI)
19
RI Reducer Input Cache
  • Provides
  • Access to loop invariant data without map/shuffle
  • Used By
  • Reducer function
  • Assumes
  • Mapper output for a given table constant across
    iterations
  • Static partitioning (implies no new nodes)
  • PageRank
  • Avoid shuffling the network at every step
  • Transitive Closure
  • Avoid shuffling the graph at every step
  • K-means
  • No help


20
Reducer Input Cache Benefit
Transitive Closure Billion Triples Dataset
(120GB) 90 small instances on EC2
Overall run time
21
Reducer Input Cache Benefit
Transitive Closure Billion Triples Dataset
(120GB) 90 small instances on EC2
Join step only
22
Reducer Input Cache Benefit
Transitive Closure Billion Triples Dataset
(120GB) 90 small instances on EC2
Reduce and Shuffle of Join Step
23
Join compute rank
Aggregate
fixpoint evaluation
Ri
M
M
r
M
r
r
L-split0
M
r
M
r
M
r
L-split1
M
24
RO Reducer Output Cache
  • Provides
  • Distributed access to output of previous
    iterations
  • Used By
  • Fixpoint evaluation
  • Assumes
  • Partitioning constant across iterations
  • Reducer output key functionally determines
    Reducer input key
  • PageRank
  • Allows distributed fixpoint evaluation
  • Obviates extra MapReduce job
  • Transitive Closure
  • No help
  • K-means
  • No help


25
Reducer Output Cache Benefit
Fixpoint evaluation (s)
Iteration
Iteration
Livejournal dataset 50 EC2 small instances
Freebase dataset 90 EC2 small instances
26
MI Mapper Input Cache
  • Provides
  • Access to non-local mapper input on later
    iterations
  • Used
  • During scheduling of map tasks
  • Assumes
  • Mapper input does not change
  • PageRank
  • Subsumed by use of Reducer Input Cache
  • Transitive Closure
  • Subsumed by use of Reducer Input Cache
  • K-means
  • Avoids non-local data reads on iterations gt 0


27
Mapper Input Cache Benefit
5 non-local data reads 5 improvement
28
Conclusions (last slide)
  • Relatively simple changes to MapReduce/Hadoop can
    support arbitrary recursive programs
  • TaskTracker (Cache management)
  • Scheduler (Cache awareness)
  • Programming model (multi-step loop bodies, cache
    control)
  • Optimizations
  • Caching loop invariant data realizes largest gain
  • Good to eliminate extra MapReduce step for
    termination checks
  • Mapper input cache benefit inconclusive need a
    busier cluster
  • Future Work
  • Analyze expressiveness of Map Reduce Fixpoint
  • Consider a model of Map (Reduce) Fixpoint

29
Data-Intensive Scalable Science
http//escience.washington.edu
Award IIS 0844572 Cluster Exploratory (CluE)
http//clue.cs.washington.edu
30
Motivation in One Slide
  • MapReduce cant express recursion/iteration
  • Lots of interesting programs need loops
  • graph algorithms
  • clustering
  • machine learning
  • recursive queries (CTEs, datalog, WITH clause)
  • Dominant solution Use a driver program outside
    of mapreduce
  • Hypothesis making MapReduce loop-aware affords
    optimization
  • and lays a foundation for scalable
    implementations of recursive languages

31
Experiments
  • Amazon EC2
  • 20, 50, 90 default small instances
  • Datasets
  • Billions of Triples (120GB) 1.5B nodes 1.6B
    edges
  • Freebase (12GB) 7M ndoes 154M edges
  • Livejournal social network (18GB) 4.8M nodes,
    67M edges
  • Queries
  • Transitive Closure
  • PageRank
  • k-means

VLDB 2010
32
HaLoop Architecture
33
Scheduling Algorithm
  • Input Node node
  • Global variable HashMapltNode, ListltParitiongtgt
    last, HashMaphltNode, ListltPartitiongtgt current
  • 1 if (iteration 0)
  • 2 Partition part StandardMapReduceSchedule(no
    de)
  • 3 current.add(node, part)
  • 4 else
  • 5 if (node.hasFullLoad())
  • 6 Node substitution findNearbyNode(node)
  • 7 last.get(substitution).addAll(last.remove(no
    de))
  • 8 return
  • 9
  • 10 if (last.get(node).size()gt0)
  • 11 Partition part last.get(node).get(0)
  • 12 schedule(part, node)
  • 13 current.get(node).add(part)
  • 14 list.remove(part)
  • 15
  • 16

The same as MapReduce
Find a substitution
Iteration-local Schedule
34
Programming Interface
Job job new Job() job.AddMap(Map Rank, 1)
job.AddReduce(Reduce Rank, 1) job.AddMap(Map
Aggregate, 2) job.AddReduce(Reduce Aggregate,
2) job.AddInvariantTable(1) job.SetInput(Iter
ationInput) job.SetFixedPointThreshold(0.1)
job.SetDistanceMeasure(ResultDistance)
job.SetMaxNumOfIterations(10)
job.SetReducerInputCache(true)
job.SetReducerOutputCache(true) job.Submit()
define loop body
Declare an input as invariant
Specify loop body input, parameterized by
iteration
Termination condition
Turn on caches
35
Cache Infrastructure Details
  • Programmer control
  • Architecture for cache management
  • Scheduling for inter-iteration locality
  • Indexing the values in the cache

36
Other Extensions and Experiments
  • Distributed databases and Pig/Hadoop for
    Astronomy IASDS 09
  • Efficient Friends of Friends in Dryad SSDBM
    2010
  • SkewReduce Automated skew handling SOCC 2010
  • Image Stacking and Mosaicing with Hadoop Hadoop
    Summit 2010
  • HaLoop Efficient iterative processing with
    Hadoop VLDB2010

37
MapReduce Broadly Applicable
  • Biology
  • Schatz 08, 09
  • Astronomy
  • IASDS 09, SSDBM 10, SOCC 10, PASP 10
  • Oceanography
  • UltraVis 09
  • Visualization
  • UltraVis 09, EuroVis 10

38
Key idea
  • When the loop output is large
  • transitive closure
  • connected components
  • PageRank (with a convergence test as the
    termination condition)
  • need a distributed fixpoint operator
  • typically implemented as yet another MapReduce
    job -- on every iteration

39
Background
  • Why is MapReduce popular?
  • Because its fast?
  • Because it scales to 1000s of commodity nodes?
  • Because its fault tolerant?
  • Witness
  • MapReduce on GPUs
  • MapReduce on MPI
  • MapReduce in main memory
  • MapReduce on lt10 nodes

40
So why is MapReduce popular?
  • The programming model
  • Two serial functions, parallelism for free
  • Easy and expressive
  • Compare this with MPI
  • 70 operations
  • But it cant express recursion
  • graph algorithms
  • clustering
  • machine learning
  • recursive queries (CTEs, datalog, WITH clause)

41
Fixpoint
  • A fixpoint of a function f is a value x such that
    f(x) x
  • The fixpoint queries FIX can be expressed with
    the relational algebra plus a fixpoint operator
  • Map - Reduce - Fixpoint
  • hypothesis sufficient model for all recursive
    queries
Write a Comment
User Comments (0)
About PowerShow.com