Characterizing Workloads and Provisioning for Scale Up - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Characterizing Workloads and Provisioning for Scale Up

Description:

1. Characterizing Workloads and Provisioning for Scale Up. Archana Ganapathi ... Grep: search for regex in text file. Multiple Linear Regression ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 24

Provided by: archanag

Category:

more less

Transcript and Presenter's Notes

Title: Characterizing Workloads and Provisioning for Scale Up

1
Characterizing Workloads and Provisioning for
Scale Up

Archana Ganapathi Matei Zaharia, Armando Fox,
Dave Patterson, RAD Lab, June 2008

1
2
RAD Lab Overview
Low level spec
Com- piler
High level spec
Instrumentation Backplane
New apps, equipment, global policies (eg SLA)
Offered load, resource utilization, etc.
Director
Training data
performance cost models
Log Mining
2
3
Goals
Workload
SYSTEM
Behavior

Given a system (e.g. DBMS, EC2,), its workload,
and performance metrics
Answer what-if questions for
Changes in workload (mix/rate/distribution)
Changes in system configuration (hw/sw)
Changes in both

4
Recap Last Retreat
Predict multiple performance metrics
simultaneously for multiple new workloads based
on previously-observed workload performance
Queries
Parallel DBMS
Behavior
5
Recap Last Retreat

Your feedback
Other systems will be more predictable than the
DBMS query optimizer
Try Map-reduce systems as the next test case

6
Motivating Scenario
Jobs
Hadoop
Behavior

How many machines do I need to run my job in lt X
minutes?
How long will my job take given N heterogeneous
machines?
How should I co-schedule jobs to avoid resource
contention?

7
Map-Reduce overview
8
Configuration and Jobs

Up to 100 machines on EC2
Xtrace to record interactions
Hadoop example jobs
Sort sort binary file and write to outputfile
Pi estimate pi using monte carlo method
Word Count count occurrences of words in text
file
Random Writer generate binary file with random
ltkey, valuegt
Grep search for regex in text file

9
Multiple Linear Regression
Regress number of nodes, maps per node, total
number of maps, total amount of data
10
(No Transcript)
11
Time per Job
Cold Start
12
Time per Job
13
Grep
60 nodes 80 nodes
Search parameter impacts slope of line!
14
Scaling Behavior
15
Reads and Writes
16
Bytes Transferred
17
Inter-node communication
Eg pi
Eg sort
Same number of Bytes TransferredDifferent
communication patterns
18
Lessons

Each job has a different behavioral signature
Input parameters affect scaling slopes of jobs
Map performance is linear wrt number of nodes
Reduce complexity depends on the job
What are the best features to describe the
uniqueness of each job?
Usual suspects cluster size, amt of data,
maps/node
Micro-benchmarks Regression to capture reduce
complexity

19
Extrapolating Time (fixed nodes, variable input
size)
1. Build regression from first few input sizes
per job 2. Extrapolate time for max input size
for job
20
Extrapolating Time (foreach input size, variable
nodes)
1. Build regression from lt 60 nodes 2.
Extrapolate time for 100 nodes
21