Title: Improving MapReduce Performance in Large Virtualized Environments
1Improving MapReduce Performance in Large
Virtualized Environments
- Andy Konwinski, Matei ZahariaAnthony Joseph,
Randy Katz, Ion Stoica - RAD Lab, June 2008
2RAD Lab Overview
Low level spec
Com- piler
High level spec
Instrumentation Backplane
New apps, equipment, global policies (eg SLA)
Offered load, resource utilization, etc.
Director
Training data
performance cost models
Log Mining
3RAD Lab Overview
Low level spec
Com- piler
High level spec
Instrumentation Backplane
New apps, equipment, global policies (eg SLA)
Offered load, resource utilization, etc.
Director
Training data
performance cost models
Log Mining
4Motivation Objectives
- Motivation
- MapReduce growing in popularity
- Virtualized computing services like Amazon EC2
provide on-demand computing power - But, MR assumes homogeneous environment
- Objectives
- Study impact of virtualization on Hadoop
- Improve Hadoop performance on EC2
5Methodology
- Measured EC2 performance using low level and
application benchmarks - Identified two challenges
- EC2 effects
- Heterogeneity
- Designed heterogeneity-aware scheduler that
reduces response times up to 2x
6Outline
- Experience running Hadoop on EC2
- Choosing EC2 instance types
- Disk and network cold start effects
- The challenge of heterogeneity
- LATE a heterogeneity-aware scheduler
- Lessons about virtualized environments
7Choosing EC2 Instance Types
- 3 VM types small (0.10/h, 1 disk), large
(0.40/h, 2 disks), xlarge (0.80/h, 4 disks) - Goal maximize computing power / dollar
- Observation Small VMs have same CPU RAM per
dollar as large, but more net disk bandwidth
when no contention - Environment chosen
- 100-900 small VM for slaves
- 1 large VM for master
8Disk Cold Start
- First write to each disk location is slow
(possibly due to allocation or security?) - Recommendation warm up nodes with dd
Bytes Written
Time (s)
9Network Cold Start
- First time two machines communicate incurs 2-5s
delay (maybe firewall setup?) - Effect on Hadoop sorting 200 GB on 200 nodes
takes 20 min first time, 4 min later - Also makes setup slow
- Recommendation warm up network with a short
MapReduce job automate setup
10Heterogeneity in Virtualized Environments
- Virtual machine technology isolates CPU and
memory effectively - However, disk and network are shared
- Full bandwidth when no contention
- Fair sharing if there is contention
11Disk IO Heterogeneity on EC2
- Result IO bandwidth / VM varies by 2.5x
- Same effect seen at application level
Effect of contention on I/O performance on EC2
12Speculative Scheduling in MapReduce
- Hadoop assumes nodes have similar performance,
launches backup copies of slow tasks to speed
up response times - This is critical for short interactive jobs,
which are large fraction of workload - E.g. average job at Google is 395s
- Problem How to select backup tasks in a
heterogeneous environment?
13Hadoops Scheduler
- Hadoops scheduler uses fixed threshold create
backup copies of all tasks with progress lt
averageProgress 20 - Problems
- Too many tasks may be backed up, thrashing shared
resources like the network - Wrong tasks may be backed up if slightly slow
- Tasks never backed up if progress gt 80
- Tasks may be restarted on slow nodes
14Example 900 Node Run
- 80 of reduce tasks had backups started
- 1.3 hours to sort 100 GB
80!
15New Scheduler LATE
- Longest Approximate Time to End
- Back up the task with the largest estimated
finish time - Cap backup tasks to 10
- Only launch backups on fast nodes
- Only back up tasks that are slow enough
16LATE Details
- Estimating finish time
- Thresholds
- 25th percentiles work well for slow node/task
- Backup task cap can be controlled to trade off
throughput vs response time (currently 10)
17Evaluation
- 2 environments
- 200 EC2 VMs, with 1-7 VMs per physical host,
emulating mix seen in production - Smaller, controlled testbed (RECC RADLab
Elastic Compute Cloud) - 3 job types Sort, Grep, WordCount
- 2 metrics response time (primary), throughput
(should not be degraded)
18EC2 Sort with No Stragglers
- Average 27 gain over native, 31 gain over no
speculation
19EC2 Sort with Stragglers
- Average 58 gain over native, 220 gain over no
speculation - 93 max gain over native
20EC2 Sort Throughput
Jobs/second for three simultaneously submitted
Sorts
- Average 5.1 gain over native, 18 gain over no
speculation
21EC2 Grep WordCount
Grep
WordCount
- Average 36 gain over native, 57 gain over no
speculation
- Average 8.5 gain over native, 179 gain over no
speculation
22RECC Sort with No Stragglers
- Average 162 gain over native, 104 gain over no
speculation - 261 max speedup over native.
23RECC Sort with Stragglers
- Average 53 gain over native, 121 gain over no
speculation - 100 max speedup over native.
24RECC WordCount
- Average 14 gain over native, 113 gain over no
speculation
25Summary of Results
26Lessons for Virtualized Environments
- Heterogeneity matters
- CPU and memory isolation works well, disk and
network not so much - Ideally, environment should minimize variance
- Otherwise, modify app to tolerate variance
- Document best practices and quirks
- Provide better visibility monitoring tools
27Conclusion
- Analyzed EC2 with a real application
- Identified heterogeneity as a challenge and
addressed it in Hadoop - 2x better response times through LATE
- Demonstrated X-Trace at 900 nodes
28Future Work
- Integrate LATE into Hadoop codebase
- Run LATE on more realistic jobs (Hadoop grid mix
benchmark?) - Scheduling among different jobs
- Scheduling tasks on the same node
29Questions?
?
?
?
30Improving MapReduce Performance in Large
Virtualized Environments
- Andy Konwinski and Matei Zaharia
31Tips for Hadoop on EC2
- Automate cluster setup
- Warm up system with dds and small job
- Choose instance type based on needs
- Smalls usually have more IO bandwidth /
- Large is necessary for Hadoop master
- EC2 now added high-CPU instances too!
- Consider turning down DFS replication for a
significant performance gain
32Tracing Hadoop on EC2
Hadoop Slave
Hadoop Slave
Hadoop Slave
Hadoop Master
TCP
X-Trace Backend
Trace Analysis Web UI
HTTP
HTTP
Derby DB
Filesystem
User
33Scaling up X-Trace
- X-Trace backend improved to use asynchronous IO
and a hybrid filesystem RDMBS data store still
single server - Found overhead of tracing to be negligible in
Hadoop since operations are large - 1-2 MB / node / hour of trace data generated
compresses by a factor of 20