Characterizing Workloads and Provisioning for Scale Up - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Characterizing Workloads and Provisioning for Scale Up

Description:

1. Characterizing Workloads and Provisioning for Scale Up. Archana Ganapathi ... Grep: search for regex in text file. Multiple Linear Regression ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 24
Provided by: archanag
Category:

less

Transcript and Presenter's Notes

Title: Characterizing Workloads and Provisioning for Scale Up


1
Characterizing Workloads and Provisioning for
Scale Up
  • Archana Ganapathi Matei Zaharia, Armando Fox,
    Dave Patterson, RAD Lab, June 2008

1
2
RAD Lab Overview
Low level spec
Com- piler
High level spec
Instrumentation Backplane
New apps, equipment, global policies (eg SLA)
Offered load, resource utilization, etc.
Director
Training data
performance cost models
Log Mining
2
3
Goals
Workload
SYSTEM
Behavior
  • Given a system (e.g. DBMS, EC2,), its workload,
    and performance metrics
  • Answer what-if questions for
  • Changes in workload (mix/rate/distribution)
  • Changes in system configuration (hw/sw)
  • Changes in both

4
Recap Last Retreat
Predict multiple performance metrics
simultaneously for multiple new workloads based
on previously-observed workload performance
Queries
Parallel DBMS
Behavior
5
Recap Last Retreat
  • Your feedback
  • Other systems will be more predictable than the
    DBMS query optimizer
  • Try Map-reduce systems as the next test case

6
Motivating Scenario
Jobs
Hadoop
Behavior
  • How many machines do I need to run my job in lt X
    minutes?
  • How long will my job take given N heterogeneous
    machines?
  • How should I co-schedule jobs to avoid resource
    contention?

7
Map-Reduce overview
8
Configuration and Jobs
  • Up to 100 machines on EC2
  • Xtrace to record interactions
  • Hadoop example jobs
  • Sort sort binary file and write to outputfile
  • Pi estimate pi using monte carlo method
  • Word Count count occurrences of words in text
    file
  • Random Writer generate binary file with random
    ltkey, valuegt
  • Grep search for regex in text file

9
Multiple Linear Regression
Regress number of nodes, maps per node, total
number of maps, total amount of data
10
(No Transcript)
11
Time per Job
Cold Start
12
Time per Job
13
Grep
60 nodes 80 nodes
Search parameter impacts slope of line!
14
Scaling Behavior
15
Reads and Writes
16
Bytes Transferred
17
Inter-node communication
Eg pi
Eg sort
Same number of Bytes TransferredDifferent
communication patterns
18
Lessons
  • Each job has a different behavioral signature
  • Input parameters affect scaling slopes of jobs
  • Map performance is linear wrt number of nodes
  • Reduce complexity depends on the job
  • What are the best features to describe the
    uniqueness of each job?
  • Usual suspects cluster size, amt of data,
    maps/node
  • Micro-benchmarks Regression to capture reduce
    complexity

19
Extrapolating Time (fixed nodes, variable input
size)
1. Build regression from first few input sizes
per job 2. Extrapolate time for max input size
for job
20
Extrapolating Time (foreach input size, variable
nodes)
1. Build regression from lt 60 nodes 2.
Extrapolate time for 100 nodes
21
  • Collect Ganglia statistics for jobs
  • CPU utilization, memory, disk I/Os etc
  • Kernel functions
  • Similarity of micro-benchmark curves for pairs
    of jobs
  • Similarity of Ganglia statistics for jobs

22
Future Work
  • KCCA Nearest Neighbors for Performance
    Prediction of unseen jobs
  • Farthest strangers for Workload Management and
    scheduling?
  • Far in raw data space gt? Least similar resource
    requirements
  • Farthest strangers in projected space gt? far
    strangers in raw data space

23
We Value Your Feedback
  • What are other interesting job characteristics to
    capture?
  • How to account for non-homogeneity in nodes?
  • Normalize performance metrics with respect to
    nodes characteristics?
  • What are real world Map-Reduce examples (and
    scale)?
  • Got Data?
Write a Comment
User Comments (0)
About PowerShow.com