Title: Availability and Performance in WideArea Service Composition
1Availability and Performance in Wide-Area Service
Composition
- Bhaskaran Raman
- EECS, U.C.Berkeley
- June 2002
2Problem Statement
10 of paths have only 95 availability
3Problem Statement (Continued)
BGP recovery can take several 10s of seconds
Poor availability of wide-area (inter-domain)
Internet paths
4Why does it matter?
- Streaming applications
- Real-time
- Session-oriented applications
- Client sessions lasting several minutes to hours
- Composed applications
5Service Composition Motivation
Cellular Phone
Video-on-demand server
Provider A
Provider R
Text to speech
Provider B
Transcoder
Service-Level Path
Email repository
Thin Client
Provider Q
Reuse, Flexibility
Other examples ICEBERG, IETF OPES00
6Solution Approach Alternate Services and
Alternate Paths
7Goals, Assumptions and Non-goals
- Goals
- Availability Detect and handle failures quickly
- Performance Choose set of service instances
- Scalability Internet-scale operation
- Operational model
- Service providers deploy different services at
various network locations - Next generation portals compose services
- Code is NOT mobile (mutually untrusting service
providers) - We do not address service interface issue
- Assume that service instances have no persistent
state - Not very restrictive OPES00
8Related Work
- Other efforts have addressed
- Semantics and interface definitions
- OPES (IETF), COTS (Stanford)
- Fault tolerant composition within a single
cluster - TACC (Berkeley)
- Performance constrained choice of service, but
not for composed services - SPAND (Berkeley), Harvest (Colorado),
Tapestry/CAN (Berkeley), RON (MIT) - None address wide-area network performance or
failure issues for long-lived composed sessions
9Outline
- Architecture for robust service-composition
- Failure detection in wide-area Internet paths
- Evaluation of effectiveness/overheads
- Scaling
- Algorithms for load-balancing
- Wide-area experiments demonstrating availability
- Text-to-speech composed application
10Requirements to achieve goals
- Failure detection/liveness tracking
- Server, Network failures
- Performance information collection
- Load, Network characteristics
- Service location
- Global information is required
- Hop-by-hop approach will not work
11Design challenges
- Scalability and Global information
- Information about all service instances, and
network paths in-between should be known - Quick failure detection and recovery
- Internet dynamics ? intermittent congestion
12Failure detection trade-off
- What is a failure on an Internet path?
- Outage periods happen for varying durations
Monitoring for liveness of path using keep-alive
heartbeat
Time
Failure detected by timeout
Time
Timeout period
False-positive failure detected incorrectly ?
unnecessary overhead
Time
Timeout period
Theres a trade-off between time-to-detection and
rate of false-positives
13Is quick failure detection possible?
- Study outage periods using traces
- 12 pairs of hosts
- Berkeley, Stanford, UIUC, CMU, TU-Berlin, UNSW
- Some trans-oceanic links, some within US
(including Internet2 links) - Periodic UDP heart-beat, every 300 ms
- Measure gaps between receive-times outage
periods - Plot CDF of gap periods
14CDF of gap durations
Ideal case for failure detection
15CDF of gap distributions (continued)
- Failure detection close to ideal case
- For a timeout of about 1.8-2sec
- False-positive rate is about 50
- Is this bad?
- Depends on
- Effect on application
- Effect on system stability, absolute rate of
occurrence
16Rate of occurrence of outages
Timeout for failure detection
17Towards an Architecture
- Service execution platforms
- For providers to deploy services
- First-party, or third-party service platforms
- Overlay network of such execution platforms
- Collect performance information
- Exploit redundancy in Internet paths
18Architecture
- Overlay size how many nodes?
- Akamai O(10,000) nodes
- Cluster ? process/machine failures handled within
19Key Design Points
- Overlay size
- Could grow much slower than services, or
clients - How many nodes?
- A comparison Akamai cache servers
- O(10,000) nodes for Internet-wide operation
- Overlay network is virtual-circuit based
- Switching-state at each node
- E.g. Source/Destination of RTP stream, in
transcoder - Failure information need not propagate for
recovery - Problem of service-location separated from that
of performance and liveness - Cluster ? process/machine failures handled within
20Software Architecture
Service-Level Path Creation, Maintenance, Recovery
Service-Composition Layer
Link-State Propagation
Finding Overlay Entry/Exit
Location of Service Replicas
Link-State Layer
At-least -once UDP
Perf. Meas.
Liveness Detection
Peer-Peer Layer
Functionalities at the Cluster-Manager
21Layers of Functionality
- Why Link-State?
- Need full graph information
- Also, quick propagation of failure information
- Link-state flood overheads?
- Service-Composition layer
- Algorithm for service-composition
- Modified version of Dijkstras
- To accommodate for constraints in service-level
path - Additive metric (latency)
- Load-balancing metric
- Computational overheads?
- Signaling for path creation, recovery
- Downstream to upstream
22Link-State Overheads
- Link-state floods
- Twice for each failure
- For a 1,000-node graph
- Estimate edges 10,000
- Failures (gt1.8 sec outage) O(once an hour) in
the worst case - Only about 6 floods/second in the entire network!
- Graph computation
- O(kElog(N)) computation time k services
composed - For 6,510-node network, this takes 50ms
- Huge overhead, but path caching helps
- Memory a few MB
23Evaluation Scaling
- Scaling bottleneck
- Simultaneous recovery of all client sessions on a
failed overlay link - Parameter
- Load number of client sessions with a single
overlay node as exit node - Metric
- Average time-to-recovery of all paths failed and
recovered
24Evaluation Emulation Testbed
- Idea Use real implementation, emulate the
wide-area network behavior (NistNET) - Opportunity Millennium cluster
Rule for 1?2
App
Emulator
Node 1
Rule for 1?3
Lib
Rule for 3?4
Node 2
Rule for 4?3
Node 3
Node 4
25Scaling Evaluation Setup
- 20-node overlay network
- Created over 6,510 node physical network
- Physical network generated using GT-ITM
- Latency variation according to Acharya Saltz
1995 - Load per cluster-manager (CM)
- Vary from 25 to 500
- Paths setup using latency metric
- 12 different runs
- Deterministic failure of link with maximum
client paths - Worst-case in single-link failure
26AverageTime-to-Recovery vs. Load
27CDF of recovery times of all failed paths
28Path creation load-balancing metric
- So far used a latency metric
- In combination with modified Dijkstras algorithm
- Not good for balancing load
- How to balance load across service instances?
- During path creation and path recovery
- QoS literature
- Sum(1/available-bandwidth) for bandwidth
balancing - Applying this for server load balancing
- Metric Sum(1/(max_load curr_load))
- Study interaction with
- Link-state update interval
- Failure recovery
29Load variation across replicas
30Dealing with load variation
- Decreasing link-state update interval
- More messages
- Could lead to instability
- Use path-setup messages to update load
- Do it all along the path
- Each node that sees the path setup message
- Adds its load info to the message
- Records all load info collected so far
31Load variation with piggy-back
32Load-balancing effect on path length
33Fixing the long-path effect
Metric Sum_services(1/(max_load-curr_load))
Sum_noop(0.1/(max_load-curr_load))
34Fixing the long-path effect
35Wide-Area experiments setup
- 8 nodes
- Berkeley, Stanford, UCSD, CMU
- Cable modem (Berkeley)
- DSL (San Francisco)
- UNSW (Australia), TU-Berlin (Germany)
- Text-to-speech composed sessions
- Half with destinations at Berkeley, CMU
- Half with recovery algo enabled, other half
disabled - 4 paths in system at any time
- Duration of session 2min 30sec
- Run for 4 days
- Metric loss-rate measured in 5sec intervals
36Loss-rate for a pair of paths
37CDF of loss-rates of all paths failed
38CDF of gaps seen at client
39Split of recovery time
- Text-to-Speech application
- Two possible places of failure
40Split of Recovery Time (continued)
- Recovery time
- Failure detection time
- Signaling time to setup alternate path
- State restoration time
- Experiment using tts application, using emulation
- Recovery time 3,300ms
- 1,800ms failure detection time
- 700ms signaling
- 450ms for state restoration
- New tts engine has to re-process current sentence
41Summary
- Wide-area Internet paths have poor availability
- Availability issues in composed sessions
- Architecture based on overlay network of service
clusters - Failure detection feasible in 2sec
- Software-arch scales with clients
- WA experiments show improvement in availability
- Further scaling experiments in overlay nodes
needed