Title: Wide-Area Service Composition: Performance, Availability and Scalability
1Wide-Area Service Composition Performance,
Availability and Scalability
- Bhaskaran Raman
- SAHARA, EECS, U.C.Berkeley
- Presentation at Ericsson, Jan 2002
2Service Composition Motivation
Cellular Phone
Video-on-demand server
Provider A
Provider R
Provider B
Text to speech
Transcoder
Service-Level Path
Email repository
Thin Client
Provider Q
Reuse, Flexibility
Other examples ICEBERG, IETF OPES00
3In this work Goals
Performance Choice of Service Instances Availabil
ity Detecting and Handling Failures Scalability
Internet-scale operation
4In this work Assumptions and Non-goals
- Operational model
- Service providers deploy different services at
various network locations - Next generation portals compose services
- Code is NOT mobile (mutually untrusting service
providers) - We do not address service interface issue
- Assume that service instances have no persistent
state - Not very restrictive OPES00
5Solution Requirements
- Performance information collection
- Failure detection/liveness tracking
- Service location
- Global information is required
- Hop-by-hop approach will not work
Hop-by-hop approach
6Challenges
- Scalability and Global information
- Information about all service instances, and
network paths in-between should be known - Quick failure detection and recovery
- Internet dynamics ? intermittent congestion
- System evaluation
- Simulation?
- Real implementation?
7Architecture
- Overlay can grow slowly
- Amortization of overhead
- Hierarchical monitoring
8Software Architecture
Service-Level Path Creation, Maintenance, Recovery
Service-Composition Layer
Link-State Propagation
Finding Overlay Entry/Exit
Location of Service Replicas
Link-State Layer
At-least -once UDP
Perf. Meas.
Liveness Detection
Peer-Peer Layer
Functionalities at the Cluster-Manager
9Evaluation Emulation Testbed
- Idea Use real implementation, emulate the
wide-area network behavior (NistNET) - Opportunity Millennium cluster
Rule for 1?2
App
Emulator
Node 1
Rule for 1?3
Lib
Rule for 3?4
Node 2
Rule for 4?3
Node 3
Node 4
10Evaluation Recovery of Application Session
- Text-to-Audio application
- Two possible places of failure
- Setup
- 20-node overlay network
- No service instance replicas
- Deterministic failure for 10sec during session
- Metric gap between arrival of successive audio
packets at the client
11Recovery of Application SessionCDF of gapsgt100ms
Recovery time 822 ms
Recovery time 2963 ms Detection 1800 ms
Alternate path setup 1163 ms
Recovery time 10,000 ms
12Evaluation Scaling
- Scaling bottleneck
- Simultaneous recovery of all client sessions on a
failed overlay link - Parameter load on failed link paths to be
recovered - Metric Time to recovery
13Average Time-to-Recovery
- Total of 5000 paths in 20-node overlay network
- Two services in each path
- Two replicas per service
- Each data-point is a separate run
- Controlled link-failure
14Evaluation Scaling
- At a load of 695 paths on the failed edge
- Average path recovery time 575 ms
- All paths recover within 1.5 sec
- Back calculation to determine the number of
simultaneous clients a cluster manager can
support - 350
- Okay for heavy-weight services (text-to-speech)
15Summary
- Service Composition flexible service creation
- We address performance, availability, scalability
- Results so far
- Good recovery time for real-time applications
O(3 sec) - Good scalability minimal additional provisioning
for cluster managers - Ongoing work
- Overlay topology issues how many nodes
- Stability issues
- Trade-offs in recovery mechanisms
Feedback, Questions?
Presentation made using VMWare
16References
- OPES00 A. Beck and et.al., Example Services
for Network Edge Proxies, Internet Draft,
draft-beck-opes-esfnep-01.txt, Nov 2000