Title: Dynamic Resource Management in Internet Hosting Platforms
1Dynamic Resource Management in Internet Hosting
Platforms
- Ph.D. Thesis Defense
- Bhuvan Urgaonkar
- Advisor Prashant Shenoy
2Internet Applications
- Proliferation of Internet applications
auction site
online game
online retail store
- Growing significance in personal, business
affairs - Focus Internet server applications
3Hosting Platforms
- Data Centers
- Clusters of servers
- Storage devices
- High-speed interconnect
- Hosting platforms
- Rent resources to third-party applications
- Performance guarantees in return for revenue
- Benefits
- Applications dont need to maintain their own
infrastructure - Rent server resources, possibly on demand
- Platform provider generates revenue by renting
resources
4Goals of a Hosting Platform
- Meet service-level agreements
- Satisfy application performance guarantees
- E.g., average response time, throughput
- Maximize revenue
- E.g., maximize the number of hosted applications
- Question How should a hosting platform manage
its resources to meet these goals?
5Challenge 1 Dynamic Workloads
- Multi-time-scale variations
- Time-of-day, hour-of-day
- Overloads
- E.g., Flash crowds
- User threshold for
response time
8-10 s - Key issue How to provide good
- response time under varying workloads?
1200
0
0
1
2
3
4
5
Time (days)
Arrivals per min
140K
0
0 12 24
Time (hours)
6Challenge 2 Complexity of Applications
- Complex software architecture
- Diverse software components
- Web servers, Java application servers, databases
- Multiple classes of clients
- How to provide differentiated service?
- Replicable components
- How many replicas to have?
- Tunable configuration parameters
- E.g., MaxClient in Apache
- How to set these parameters?
- Key issue How to capture all this complexity?
7Talk Outline
- Motivation
- Thesis contributions
- Application modeling
- Dynamic provisioning
- Scalable request policing
- Conclusions
8Hosting Platform Models
- Small applications
- Require only a fraction of a server
- Shared Web hosting, 20/month to run own Web site
- Shared hosting multiple applications on a server
- Co-located applications compete for server
resources
9Hosting Platform Models
- Large applications
- May span multiple servers
- eBay site uses thousands of servers!
- Dedicated hosting at most one application per
server - Allocation at the granularity of a single server
10Thesis Contributions
- Dynamic resource management in hosting platforms
- Shared Hosting
- Statistical multiplexing and under-provisioning
OSDI 2002 - Application placement PDCS 2004
- Dedicated Hosting
- Analytical model for an Internet application
SIGMETRICS 2005 - Dynamic provisioning Autonomic Computing 2005
- Scalable request policing PODC 2004, WWW 2005
11Talk Outline
- Motivation
- Thesis contributions
- Application modeling
- Dynamic provisioning
- Scalable request policing
- Conclusions
12Internet Application Architecture
queries
search moby
response
Melvilles Moby Dick Music CDs by Moby
HTTP
J2EE
Database
request processing in an online bookstore
- Multi-tier architecture
- Each tier uses services provided by its successor
- Session-based workloads
13Baseline Application Model
SIGMETRICS05
clients
application
- Model consists of two components
- Sub-system to capture behavior of clients
- Sub-system to capture request processing inside
the application
14Modeling Clients
Z
Client 1
Z
Client 2
application
clients
Z
Client N
Q0
- Clients think between successive requests
- Infinite server system to capture think time Z
- Captures independence of Z from processing in
application
15Modeling Request Processing
pM1
p3
p1
p2
S1
S2
SM
Q1
Q2
QM
N
tier 1
tier 2
tier M
- Transitions defined to capture circulation of
requests - Request may move to next queue or previous queue
- Multiple requests are processed concurrently at
tiers - Processor sharing scheduling discipline
- Caching effects get captured implicitly!
16Putting It All Together
pM1
p3
p1
p2
Z
S1
S2
SM
client
Z
client
Q1
Q2
QM
Q0
N
tier 1
tier 2
tier M
- A closed-queuing model that captures a given
number of simultaneous sessions being served
17Mean-value Analysis
1
client
n
client
Q1
Q2
QM
n1
Q0
client
A2(n1)
AM(n1)
A1(n1)
L1(n)
L2(n)
LM(n)
- Product-form closed queuing network
- Lm average length of Qm
- Am average number of clients in Qm seen by
arriving client - Am (n1) Lm (n)
- Iterative algorithm to compute mean queue
lengths, sojourn times
18 Parameter Estimation
- Visit ratios
- Equivalent to trans. probs. for MVA
- Vi ?i / ?req ?req at sentry, ?i from logs
- Service times
- Use residence time Xi logged at tier i
- For last tier, SM XM
- Si Xi ( Vi1 / Vi ) Xi1
- Think time
- Measured at the application sentry
19Evaluation of Baseline Model
- Auction site RUBiS
- One server per tier
Apache
JBOSS
Mysql
75
150
- Concurrency limits not captured
20Handling Concurrency Limits
Z
S1
S2
SM
Z
Q1
Q2
QM
Q0
N
dropped requests
- Requests may be dropped due to concurrency limits
- Need to model the finiteness of queues!
21Handling Concurrency Limits
Z
S1
S2
SM
Z
Q1
Q2
QM
Q0
N
drop
QM
Q1
drop
pM
drop
p1
drop
drop
drop
S1
SM
- Approach Subsystems to capture dropped requests
- Distinguish the processing of dropped requests
22Estimating Drop Probabilities and Delay Values
- Drop probability
- Step 1 Estimate throughput using MVA assuming no
concurrency limits - Step 2 Estimate pidrop as the drop probability
of M/M/1/Ki queue - Delay value for tier i
- Subject the application to offline workload that
causes limit to be exceeded only at tier i
record response time of failed requests
Ki
t
t(1-pidrop)
Tputt
tpidrop
High limit
Low limit
High limit
23Response Time Prediction
- Enhanced model can capture concurrency limits
24Replication and Load Imbalances
Apache
Mysql
JBOSS
- Causes of imbalance
- Sticky sessions
- Variation in session durations and resource
requirements - Imbalance factor for jth most-loaded replica of
tier i - imbalance(i, j) num_arrivals(i, j) /
num_arrivals(i) - Scale visit ratio
- Vi, j Vi imbalance(i, j)
25Capturing Load Imbalance
Number of requests (per-replica)
Response times (based on load)
1000
1800
1600
800
1400
Replica 1
Least loaded
600
1200
Number of requests
Replica 2
Medium loaded
1000
Avg. resp. time (msec)
400
Replica 3
800
Most loaded
600
200
Average
400
0
200
30
90
150
210
210
270
0
Time (sec)
Observed
Perfect Load balancing
Enhanced Model
- Session affinity causes load imbalance
- Imbalance shifts among replicas
- Our enhancement helps improve
response time prediction
Mysql
Apache
JBOSS
26Talk Outline
- Motivation
- Thesis contributions
- Application modeling
- Dynamic provisioning
- Scalable request policing
- Conclusions
27Dynamic Provisioning
Auto. Computing05
Monitor workload
Compute current/ future demand
Adjust allocation
- Key idea increase or decrease allocated servers
to handle workload fluctuations - Monitor incoming workload
- Compute current or future demand
- Match number of allocated servers to demand
28Dynamic Provisioning at Multiple Time-scales
- Predictive provisioning
- Certain Internet workloads patterns can be
predicted - E.g., time-of-day effects, increased workload
during Thanksgiving - Provision using model at time-scale of hours or
days - Reactive provisioning
- Applications may see unpredictable fluctuations
- E.g., Increased workload to news-sites after an
earthquake - Detect such anomalies and react fast (minutes)
29Request Policing
Sentry policing
drop
- Key Idea If incoming req. rate gt current
capacity - Turn away excess requests
- Why police when you can provision?
- Provisioning is not instantaneous
- Residual sessions on reallocated server
- Application and OS installation and configuration
overheads - Overhead of several (5-30) minutes
30Existing Work
- Lots of existing work on request policing
- Kanodia00, Li00, Verma03, Welsh03, Abdelzaher99,
- Shortcomings of existing work
- Does not attempt to integrate policing and
provisioning - Does not address scalability of the policer!
- The policer itself may become the bottleneck
during overloads
31Policer Design Goals
- Each class should sustain its guaranteed
admission rate - Class-based differentiation and revenue
maximization - Challenging due to online nature of the problem
- An admitted request may cause a more important
request arriving later to be dropped - Approach Preferential admission to higher class
requests - Scalability
- The policer should remain operational even under
extremely high arrival rates
32Overview of Policer Design
PODC04 / WWW05
Admission control
dgold
Class gold
admitted
dsilver
Class silver
Classifier
dropped
dbronze
Class bronze
Leaky buckets
Class-specific queues
- Our policer has three components
- Request classifier and per-class leaky buckets
- Class-specific queues
- Admission control
33Class-based Differentiation
Admission control
dgold
Class gold
admitted
dsilver
Class silver
Classifier
dropped
dbronze
Class bronze
Leaky buckets
Class-specific queues
- Each incoming request undergoes classification
- Per-class leaky buckets used to ensure that rates
guaranteed in SLA are admitted
34Revenue Maximization
Admission control
dgold
Class gold
admitted
dsilver
Class silver
Classifier
dropped
dbronze
Class bronze
Leaky buckets
Class-specific queues
- Idea Different delays in processing requests of
different classes - More important requests processed more frequently
- Methodology to compute delay values in online
manner - Bounds probability of a request denying admission
to a more important request Appendix B of thesis
35Admission Control
Admission control
dgold
Class gold
admitted
dsilver
Class silver
Classifier
dropped
dbronze
Class bronze
Leaky buckets
Class-specific queues
- Goal Ensure that an admitted request meets its
response time target - Measurement-based admission control algorithm
- Use information about current load on servers and
estimated size of new request to make decision
36Scalability of Admission Control
- Idea 1 Reduce the per-request admission control
cost - Admission control on every request may be
expensive - Bursty arrivals during overloads gt batches get
formed - Delays for class-based differentiation gt batches
get formed - Admission control that operates on batches
instead of requests - Idea 2 Sacrifice accuracy for computational
overhead - When batch-based processing becomes prohibitive
- Threshold-based scheme
- E.g., Admit all Gold requests, drop all Silver
and Bronze requests - Thresholds chosen based on observed arrival rates
and service times - Extremely efficient
- Wrong threshold gt bad response times or fewer
requests admitted
37Scaling Even Further
- Protocol processing overheads will saturate
sentry resources at extremely high arrival rates - Indiscriminate dropping of requests will occur
- Important requests may be turned away without
even undergoing the admission control test - Loss in revenue!
- Sentry should still be able to process each
arriving request! - Idea Dynamic capacity provisioning for sentry
- Pull in an additional sentry if CPU utilization
of existing sentries exceeds a threshold (e.g.,
90) - Round-robin DNS to load balance among sentries
38Class-based Differentiation
- Three classes of requests Gold, Silver, Bronze
- Policer successful in providing preferential
admission to important requests
39Threshold-based Higher Scalability
- Threshold-based processing allows the policer to
handle upto 4 times higher arrival rate - Single sentry can handle about 19000 req/s
40Threshold-based Loss of Accuracy
- Higher scalability comes at a loss in accuracy of
admission control - More violations of response time targets
41Talk Outline
- Motivation
- Thesis contributions
- Application modeling
- Dynamic provisioning
- Scalable request policing
- Summary and Future Research
42Thesis Contributions
- Dynamic resource management in hosting platforms
- Shared Hosting
- Statistical multiplexing and under-provisioning
OSDI 2002 - Application placement PDCS 2004
- Dedicated Hosting
- Analytical model for Internet applications
SIGMETRICS 2005 - Dynamic provisioning Autonomic Computing 2005
- Scalable request policing PODC 2004, WWW 2005
43Future Research Directions
- Virtual machine based hosting
- Recent research has shown feasibility of
migrating VMs across nodes - Adds a new dimension to the capacity provisioning
problem - Characterizing multi-tier workloads
- Workloads for standalone Web servers are
well-characterized - E.g., typical service times at Java tier or query
processing times? - Offshoot of this study workloads generators for
multi-tier applications - Automated determination of provisioning
parameters - Predictor and reactor invoked based on manually
chosen frequencies - System administrators use rules-of-thumb gt
error-prone
44Thanks to
- Advisor
- Prashant Shenoy
- Thesis committee
- Emery Berger, Jim Kurose, Don Towsley, Tilman
Wolf - Collaborators
- Abhishek Chandra, Pawan Goyal, Giovanni
Pacifici, Timothy Roscoe, Arnold
Rosenberg, Mike Spreitzer, Asser Tantawi - All my teachers
- Paul Cohen, Mani Krishna, Don Towsley
- Friends and family
45 46Query Caching at the Database
- Caching effects
- Captured by tuning Vi and/or Si
- Bulletin-board site RUBBoS
- 50 sessions
- SELECT SQL_NO_CACHE causes Mysql to not cache the
response to a query
47Agile Switching Using Virtual Machine Monitors
- VMMs allow multiple virtual m/c on a server
- E.g., Xen, VMWare,
dormant
dormant
active
active
VM1
VM1
VM2
VM3
VM2
VM3
VMM
VMM
- Use VMMs to enable fast switching of servers
- Switching time only limited by residual sessions
48Prototype Data Center
Server Node
Application capsules Sentries
Resource monitoring Parameter estimation
Control Plane
Application placement Dynamic provisioning
- 40 Linux servers
- Gigabit switches
- Multi-tier applications
- Auction (RUBiS)
- Bulletin-board (RUBBoS)
- Apache, JBOSS (replicable)
- Mysql database
49Sentry Provisioning (XXX)
50System Overview
Server Node
Application capsules Sentries
Resource monitoring Parameter estimation
Control Plane
Application placement Dynamic provisioning
- Control Plane
- Centralized resource manager
- Nucleus
- Per-server measurements and resource management
- Sentry
- Per-application admission control
- Capsule
- Component of an application running on a server
51 Existing Application Models
- Models for Web servers Chandra03, Doyle03
- Do not model Java server, database etc.
- Black-box models Kamra04, Ranjan02
- Unaware of bottleneck tier
- Extensions of single-tier models Welsh03
- Fail to capture interactions between tiers
- Existing models inadequate for multi-tier
Internet applications
52Existing Work
- Predictable resource management within a single
server - Proportional-share schedulers for CPU, network
Duda,Goyal,Waldspurger - Multi-processors Chandra
- Memory management Berger,Waldspurger
- Disk scheduling Shenoy
- Hosting platforms and Internet applications
- Rice, Duke, Penn State shared platforms for Web
servers - IBM, HP Labs shared platforms, workload
prediction - Berkeley novel architecture for Internet
applications - Main shortcomings
- Possible statistical multiplexing gains in shared
platforms unexplored - Most work assumes simplistic applications (e.g.,
only Web servers) - Provisioning either purely reactive or purely
predictive - Handling of extreme overloads not addressed
satisfactorily
53Predictive Provisioning
Servers
54Reactive Provisioning
lactual
Prediction error
Invoke reactor
allocate servers
gt t
lerror
lpred
time series
- Idea react to current conditions
- Useful for capturing significant short-term
fluctuations - Can correct errors in predictions
- Track error between long-term predictions and
actual - Allocate additional servers if error exceeds a
threshold - Can be invoked if request drop rate exceeds a
threshold - Operates over time scale of a few minutes
- Pure reactive provisioning lags workload
- Reactive predictive more effective!
55Dynamic Capacity Provisioning
- Auction application RUBiS
- Factor of 4 increase in 30 min
Server allocations
Workload
Response time
- Server allocations increased to match increased
workload - Response time kept below 2 seconds