Job Scheduling in MapReduce - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Job Scheduling in MapReduce

Description:

... chart2.xml.rels ppt/charts/_rels/chart1.xml.rels ppt/charts/_rels/chart4.xml. ... handoutMasters/handoutMaster1.xml ppt/charts/chart1.xml ppt/charts/chart2.xml ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 21
Provided by: radlabCs
Category:

less

Transcript and Presenter's Notes

Title: Job Scheduling in MapReduce


1
Job Scheduling in MapReduce
  • Matei Zaharia, DhrubaBorthakur,
  • JoydeepSenSarma, Scott Shenker, Ion Stoica

RAD Lab, Facebook Inc
2
Motivation
  • Hadoop was designed for large batch jobs
  • FIFO queue locality optimization
  • At Facebook, we saw a different workload
  • Many users want to share a cluster
  • Many jobs are small (10-100 tasks)
  • Sampling, ad-hoc queries, periodic reports, etc
  • How should we schedule tasks in a shared
    MapReduce cluster?

3
Benefits of Sharing
  • Higher utilization due to statistical
    multiplexing
  • Data consolidation (ability to query disjoint
    data sets together)

4
Why is it Interesting?
  • Data locality is crucial for performance
  • Conflict between locality and fairness
  • 70 gain from simple algorithm

5
Outline
  • Two problems
  • Head-of-line scheduling
  • Slot stickiness
  • A solution (global scheduling)
  • Lots more problems (future work)

6
Problem 1 Poor Locality for Small Jobs
Job Sizes at Facebook
7
Problem 1 Poor Locality for Small Jobs
8
Cause
  • Only head-of-queue job is schedulable on each
    heartbeat
  • Chance of heartbeat node having local data is low
  • Jobs with blocks on X of nodes get X locality

9
Problem 2 Sticky Slots
  • Suppose we do fair sharing as follows
  • Divide task slots equally between jobs
  • When a slot becomes free, give it to the job that
    is farthest below its fair share

10
Problem 2 Sticky Slots
Master
11
Problem 2 Sticky Slots
Slave
Slave
Slave
Master
Slave
Slave
Problem Jobs never leave their original slots
12
Calculations
13
Solution Locality Wait
  • Scan through job queue in order of priority
  • Jobs must wait before they are allowed to run
    non-local tasks
  • If wait lt T1, only allow node-local tasks
  • If T1lt wait lt T2, also allow rack-local
  • If wait gt T2, also allow off-rack

14
Locality Wait Example
Slave
Slave
Slave
Slave
Master
Slave
Slave
Slave
Slave
Jobs can now shift between slots
15
Evaluation Locality Gains
16
Throughput Gains
17
Network Traffic Reduction
Network Traffic in Sort Workload
With locality wait
Without locality wait
18
Further Analysis
  • When is it worthwhile to wait, and how long?
  • For throughput
  • Always worth it, unless theres a hotspot
  • If hotspot, prefer to run IO-bound tasks on the
    hotspot node and CPU-bound tasks remotely

19
Further Analysis
  • When is it worthwhile to wait, and how long?
  • For response time
  • E(gain) (1 e-w/t)(D t)
  • Worth it if E(wait) lt cost of running non-locally
  • Optimal wait time is infinity

20
Conclusion
  • Simple idea improves throughput by 70
  • Lots of future work
  • Reduce scheduling
  • Memory-aware scheduling
  • Intermediate-data-aware scheduling
  • Using past history
  • Evaluation using richer benchmarks
Write a Comment
User Comments (0)
About PowerShow.com