Directing a Datacenter: Predicting Resource Utilization from Workload - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Directing a Datacenter: Predicting Resource Utilization from Workload

Description:

resample the data, fit QR, repeat 100x. estimate error from ... but work for broader class of web apps and workloads. estimate the prediction error of the model ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 23
Provided by: radlabCs
Category:

less

Transcript and Presenter's Notes

Title: Directing a Datacenter: Predicting Resource Utilization from Workload


1
Directing a DatacenterPredicting Resource
Utilization from Workload
  • Peter Bodík
  • RAD Lab, UC Berkeley

2
Overview slide
3
Introduction
  • resource allocation main task of the director
  • satisfy SLA reqs, but dont waste
    power/resources
  • minimize cost
  • need to model/predict resource utilization
  • depends on workload, changes w/ app
  • modeling resource utilization
  • input workload
  • output CPU utilization, disk, network, memory
    activity
  • this talk 3 models
  • linear regression
  • quantile regression
  • sampling method

4
The datacenter cost function
workload
proposed physical config
input
per-tier resource utilization
server resource utilization
performance, availability
Jeff Chen
server power consumption
server temperature
HVAC power consumption
cost
output
5
Experimental setup
  • Rubis PHP application, like eBay
  • 2 tiers Apache/PHP and MySQL, running on VMWare
    ESX
  • just one web-server
  • 27 types of requests, but only using 10
  • 3 HTML files, 7 PHP (dynamic) request
  • no writes, all reads cached in memory
  • measure workload, metrics every 20 seconds
  • workload model
  • vector of request rates for 10 classes of
    requests
  • workload generation
  • not using Rubis workload generator
  • use exponential interarrival times
  • we use interarrivals from World Cup 98 and
    Ebates.com traces

6
Model 1 linear regression
  • assumption utilization is a linear function of
    workload
  • 1 request -gt 1M CPU cycles, 10 requests -gt 10M
    cycles
  • same for disk, network, memory
  • reasonable when requests independent
  • web server, Ruby on Rails, MySQL
  • linear regression
  • train and evaluate on same data 2-4 error (CPU
    and net)
  • extrapolation two experiments
  • increase workload magnitude
  • different workload

7
Increasing workload magnitude
  • training evaluation
  • used 10 different workloads
  • web CPU 3 - 10
  • larger increase in magnitude -gt larger error
  • web network, DB net/CPU 3 - 4

request rates
8
Changing workload
  • train on one workload, evaluate on a different
    workload
  • training evaluation
  • results 6 - 8 error
  • web server and DB, net and CPU

9
What else do we need?
  • whats the variance of the resource utilization?
  • mean not really useful
  • whats the error of the predictions?
  • can compute, but assuming Normal distribution
  • how much data do we need?
  • whats the effect of workload variation?

10
Model 2 quantile regression
  • estimate 95th percentile, not mean
  • formulate as linear program
  • assumption P( cpu w ) same for different
    workloads

11
Estimating error -- bootstrap
  • resample the data, fit QR, repeat 100x
  • estimate error from bootstrap samples

12
Estimating error (2)
13
Just 40 points -- larger error
14
How much data do we need?
  • use 95th perc 2stdev as prediction
  • model gets more accurate with more data

15
Model 3 sampling
  • previous models estimated CPU at a fixed
    workload
  • however, workload fluctuates
  • need to estimate
  • workload distribution
  • P( cpu w )
  • generate samples of CPU
  • sample workload w
  • sample cpu P( cpu w )
  • repeat
  • compute sample 95th percentile

CPU utilization
request rate
16
Estimating workload distribution
17
Estimating P(CPUw)
18
Sampling workload and CPU
19
Results
  • actual 95th percentile 347.5 - 4.2 (stdev)
  • quantile regression 339.8 - 2.6
  • workload and CPU sampling 345.3 - 3.9
  • quantile regression already pretty close
  • but sampling can accommodate any workload
    fluctuation

20
Comparison of algorithms
  • linear regression
  • assumptions error from Normal distn
  • ignores workload fluctuation
  • running time fast
  • quantile regression
  • assumptions arbitrary error distn, same for
    different workloads
  • ignores workload fluctuation
  • running time fast, but need bootstrap to
    estimate error
  • workload CPU sampling
  • assumption arbitrary error distn, same for
    different workloads
  • running time slower, sampling bootstrap to
    estimate error

21
Future work
  • model lifecycle
  • when to retrain the model?
  • workload characterization
  • real web app 100s - 1000s request classes
  • what are the important request classes?
  • modeling response time, non-linear resource
    utilization

22
Summary
  • use algorithms that make few assumption
  • linear regression simple to analyze, but not
    useful in practice
  • similar to M/M/1 queue
  • quantile regression, sampling
  • harder to analyze/estimate
  • but work for broader class of web apps and
    workloads
  • estimate the prediction error of the model
  • model still useful even with large error
Write a Comment
User Comments (0)
About PowerShow.com