Title: Capsule Placement in the Service Platform
1Capsule Placement in the Service Platform
- Bhuvan Urgaonkar
- Timothy Roscoe
- Systems Group, Sprint ATL
2Service Platform an overview
Processors
High speed interconnect
Management/Control Unit
Internet
3Service Platform Goals
- Sell the platforms resources
- Manage the resources efficiently
- Provide performance guarantees to customers
- Start or stop services within minutes
4Services and Capsules
- Services
- web/game/streaming servers
- service provider pays the platform
- Capsules
- Def Component of a service that should run on a
single node - e.g. consider a replicated web server
5Nucleus
- Node specific control/management software
- Capsule creation, destruction
- Health information (process liveness)
- Resource parameters (memory, CPU, network
bandwidth etc.)
6Control Plane
- Capsule Placement
- Flow Placement
- Node, network, service monitoring
- Deployed Service Database
- Billing
7Outline of this talk
- Service Platform an overview
- Quality of Service
- Capsule Placement
- Design of the Placement unit
- Conclusions and future work
8QoS Representations
- Application level
- e.g., 50 transactions per sec
- Contract level
- e.g., something like a 300 MHz Pentium II
- Platform level
- e.g., ?
- Node level
- e.g., weights, priorities etc.
9Translation between QoS levels
- Application level gt Contract level
- Application specific, customers problem
- Contract level gt Platform level
- More a business problem
- Platform level gt Node level
- OS dependent
10Capsule Placement Desirables
- Maximize revenue!
- Aware of the importance of services.
- Overbooking.
- Exploit known workload characteristics.
- Adapt to changes in workload?
- Fast.
11Stages in hosting a service
- Requirement specification
- Placement
- Deployment
- Activation
12Requirement Specification
- Contract level representation
- Many possibilities 300 MHz PII, best effort or a
CPU instruction token bucket. - Platform level representation
- Must be uniform across the platform.
- (rate, burst, ovb tolerance, arch, OS)
13Translation to Node level
- Reservation based scheduler
- map (rate, burst) to (period, slice)
- bigger burst gt bigger period
- Proportional share scheduler
- burst ?
- weight in proportion to rate
- Priority based scheduler
- no easy mapping
14Placement
- Find the set of feasible nodes
- Compatible architecture and OS
- No overbooking tolerances violated
- Pick one node from this set
- Best Fit
- Worst Fit
- Random Select
- Close Overbooking
15Placement Example
a
b
c
capsules
10
30
20
nodes
30
10
20
10
N1
N4
N3
N2
One possible placement (a, N1), (b, N2),
(c, N3)
16Deployment and Activation
- Deployment The process of preparing a capsule
for execution on a node. - Why ?
- e.g., need to download some files before starting
- the control plane sends all information to deploy
the capsule - Activation Starting a deployed service
17Capsule State Diagram
deploying
activating
deployed
undeployed
active
undeploying
deactivating
18Example Message Exchange
Control Plane
Nucleus
deployed svc cap
Instruct nucleus to deploy a capsule, start timer
Starts deploying the capsule
No response! Send again
deployed svc cap
Still deploying
state svc cap deployed
Done deploying, send status message
Deployed before timeout, instruct nucleus to
activate
activated svc cap
Starts activating the capsule
. . .
19Placement Unit Architecture
Listen for new requests
Dispatch Events
Listen to nuclei
Events due to new requests
Events due to msgs from nuclei
Messages from nuclei
Event Queue
Message Queue
20Database Consistency
- Transactions and exceptions
- e.g
try
transaction_begin ()
deploy_service (svc)
transaction_commit ()
except
transaction_abort ()
21Performance
- Time to compute placement 1-2 sec
gt time to deploy
usually much larger - Comparison of heuristics
- experiments with following workloads
- 1-3 capsules, CPU requirement 0-10, wide range
of overbooking tolerances - Random Select admitted most services, Best Fit
admitted least - But more investigation needed
22Summary
- QoS representation for CPU requirements of
services. - Implementation of placement unit.
- Some simple experiments to deploy and activate
services.
23Unfinished ...
- Experiments
- heuristics better suited to specific workloads.
- Scalability and efficiency of the system.
- Integration of placement unit with rest of the
Control Plane - Handling various failures
- Extend to multiple resources - much harder than a
single resource!