TEQUILA: T____ E____ QUeuing I___ L____ A_____ - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

TEQUILA: T____ E____ QUeuing I___ L____ A_____

Description:

TEQUILA: Toward Evidence-based QUeueing Inference. for Latency Analysis. Charles Sutton, RAD Lab ... Ex: What caused temporary performance degradation 1 hr ago? ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 26
Provided by: charles382
Category:

less

Transcript and Presenter's Notes

Title: TEQUILA: T____ E____ QUeuing I___ L____ A_____


1
TEQUILA T____ E____ QUeuing I___ L____ A_____
TEQUILA Toward Evidence-based QUeueing Inference
for Latency Analysis
Charles Sutton, RAD Lab Jan 7, 2009
1
  • Charles Sutton
  • RAD Lab
  • Jan 7, 2009

2
Transient Diagnostic ?s
  • Performance models represent What if?
  • This talk They can also answer
  • What happened?
  • Ex What fraction of response time caused by load?

Ex What caused temporary performance degradation
1 hr ago? Ex What are bottlenecks of the 1
slowest requests?
3
Measurement
  • Many questions can be answered by measurement.
  • Measurements include
  • Response time
  • Workload

4
Types of Measurement
5
Talk Road Map
  • Interpret queueing network as a probabilistic
    model.
  • Observe subset of arrivals, departures from
    running system.
  • Reconstruct unobserved measurements
  • Compute posterior distribution
  • Answer diagnostic questions from reconstructed
    data

6
Generative Probabilistic Models
HIDDEN
OBSERVATIONS
  • Define probability distribution
  • p(Burglary, Earthquake, Alarm, JohnCalls,
    MaryCalls)
  • Compute posterior distribution
  • p(Plague Buboes, No Runny Nose)

7
Graphical Models
  • Idea Represent how hidden variables generate
    the input

p(Burglary) p(EQ)
p(Alarm Burglary, EQ)
p(JohnCalls Alarm) p(MaryCalls Alarm)
8
Inference
  • Problem Compute marginal probabilities given
    evidence

p(Burglary John, not Mary) S p(EQ, Alarm,
John, not Mary)
This is a posterior distribution.
9
Queueing Networks
  • Model each component as a queue

Processor
M/M/1 queue
  • For each task k
  • Arrival time
  • Departure time
  • Service time
  • Waiting time

10
Queueing Networks
  • Model distributed system as network of queues
  • Example two-tier web application

Web Server
DB
Web Server
Network
DB
Web Server
11
Queueing Networks
Finite state machine describes a tasks path
through the system.
Web Server
DB
Web Server
Network
DB
Web Server
12
Probabilistic Modeling
  • Now we have a probability distribution over
  • arrivals, departures, service times, and waiting
    times.

¹
Web Server
DB
Web Server
Network
DB
Web Server
13
Probabilistic Modeling
  • Arrival and departure times can be instrumented.
  • Question Can we reconstruct missing arrivals?
  • Answer Use posterior distribution
  • p( hidden arrivals arrivals I observed)

¹
Web Server
DB
Web Server
Network
DB
Web Server
14
Progress from Last Retreat
  • Last retreat
  • All queues single processor
  • FIFO
  • Service times exponential
  • This retreat
  • Arbitrarily many processors
  • FIFO and random
  • Arbitrary service distributions

15
Programmers Perspective
  • Programmer supplies
  • 1. Structure of queueing network

Rails 1
DB
Rails 2
Rails 3
DB 5 processor, FIFO, log normal service
distribution
16
Programmers Perspective
  • Programmer supplies
  • 2. Arrivals and departures measured in production

17
Reconstruction Accuracy
Service
Waiting
Exponential
Log Normal
18
Example Cloudstone
  • Cloudstone running on EC2
  • 5 VMs, Load up to 20 req/s
  • Model Each thread (thin) a single-processor
    queue
  • Data For each request, log
  • Time spent in Rails
  • Time spent in database

19
Example Cloudstone
  • Cloudstone running on EC2
  • 5 VMs, Load up to 20 req/s
  • Workload

20
1. Visualization
  • Performance bottlenecks over time

21
2. Hidden Resources
  • Common performance bug
  • Blocking on a resource you shouldnt
  • Approach Model selection

MODEL 2 (PERFORMANCE BUG)
Rails
Rails
VM1
DB
Rails
Rails
VM2
22
2. Hidden Resources
MODEL 2 (PERFORMANCE BUG)
Rails
DB
Rails
VM
23
2. Hidden Resources
MODEL 1 (NORMAL)
Rails
DB
Rails
MODEL 2 (PERFORMANCE BUG)
Rails
DB
Rails
VM1
24
Summary
  • Model-based diagnosis
  • WHY A model lets you reason about aspects of the
    system state that you cant measure.
  • HOW Do that reasoning using algorithms from
    machine learning
  • Accurate reconstruction from 10 of possible log
    data

25
What Next?
  • Modeling different applications
  • Distributed file systems (e.g., Hadoop DFS)
  • Network traffic
  • SCADS?
  • Feedback between queues
  • Online, distributed inference
  • Converting code ? performance model
Write a Comment
User Comments (0)
About PowerShow.com