Title: Characterizing Workloads and Provisioning for Scale Up
1Characterizing Workloads and Provisioning for
Scale Up
- Archana Ganapathi Matei Zaharia, Armando Fox,
Dave Patterson, RAD Lab, June 2008
1
2RAD Lab Overview
Low level spec
Com- piler
High level spec
Instrumentation Backplane
New apps, equipment, global policies (eg SLA)
Offered load, resource utilization, etc.
Director
Training data
performance cost models
Log Mining
2
3Goals
Workload
SYSTEM
Behavior
- Given a system (e.g. DBMS, EC2,), its workload,
and performance metrics - Answer what-if questions for
- Changes in workload (mix/rate/distribution)
- Changes in system configuration (hw/sw)
- Changes in both
4Recap Last Retreat
Predict multiple performance metrics
simultaneously for multiple new workloads based
on previously-observed workload performance
Queries
Parallel DBMS
Behavior
5Recap Last Retreat
- Your feedback
- Other systems will be more predictable than the
DBMS query optimizer - Try Map-reduce systems as the next test case
6Motivating Scenario
Jobs
Hadoop
Behavior
- How many machines do I need to run my job in lt X
minutes? - How long will my job take given N heterogeneous
machines? - How should I co-schedule jobs to avoid resource
contention?
7Map-Reduce overview
8Configuration and Jobs
- Up to 100 machines on EC2
- Xtrace to record interactions
- Hadoop example jobs
- Sort sort binary file and write to outputfile
- Pi estimate pi using monte carlo method
- Word Count count occurrences of words in text
file - Random Writer generate binary file with random
ltkey, valuegt - Grep search for regex in text file
9Multiple Linear Regression
Regress number of nodes, maps per node, total
number of maps, total amount of data
10(No Transcript)
11Time per Job
Cold Start
12Time per Job
13Grep
60 nodes 80 nodes
Search parameter impacts slope of line!
14Scaling Behavior
15Reads and Writes
16Bytes Transferred
17Inter-node communication
Eg pi
Eg sort
Same number of Bytes TransferredDifferent
communication patterns
18Lessons
- Each job has a different behavioral signature
- Input parameters affect scaling slopes of jobs
- Map performance is linear wrt number of nodes
- Reduce complexity depends on the job
- What are the best features to describe the
uniqueness of each job? - Usual suspects cluster size, amt of data,
maps/node - Micro-benchmarks Regression to capture reduce
complexity
19Extrapolating Time (fixed nodes, variable input
size)
1. Build regression from first few input sizes
per job 2. Extrapolate time for max input size
for job
20Extrapolating Time (foreach input size, variable
nodes)
1. Build regression from lt 60 nodes 2.
Extrapolate time for 100 nodes
21- Collect Ganglia statistics for jobs
- CPU utilization, memory, disk I/Os etc
- Kernel functions
- Similarity of micro-benchmark curves for pairs
of jobs - Similarity of Ganglia statistics for jobs
22Future Work
- KCCA Nearest Neighbors for Performance
Prediction of unseen jobs - Farthest strangers for Workload Management and
scheduling? - Far in raw data space gt? Least similar resource
requirements - Farthest strangers in projected space gt? far
strangers in raw data space
23We Value Your Feedback
- What are other interesting job characteristics to
capture? - How to account for non-homogeneity in nodes?
- Normalize performance metrics with respect to
nodes characteristics? - What are real world Map-Reduce examples (and
scale)? - Got Data?