Experimentally inferring runtime datacenter dependencies with XTrace - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Experimentally inferring runtime datacenter dependencies with XTrace

Description:

Email server checks for spam by examining hostnames ... Stationary: how long? Non-stationary: changes over time, day-of-week, holidays, etc ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 23

Provided by: georgep6

Category:

more less

Transcript and Presenter's Notes

Title: Experimentally inferring runtime datacenter dependencies with XTrace

1
Experimentally inferring runtime datacenter
dependencies with X-Trace

George Porter - Winter 2007 retreat

2
Relevance to RAD Lab
(datacenter cluster or RAMP)
Policy Maker
Per node SW stack
Load- Balancer (IDLB)
Intrusion- Detection (IDID)
Service (IDS)
Web 2.0 Applications
Firewall (IDF)
Ruby on Rails Interpreter
Web Svc APIs
Path traces and statistics
Trace, X-trace
Local OS functions
trace, X-trace, Lib log, D-trigger,
Identity-based Routing Layer,
Virtual Machine Monitor
Actuator Commands
Sensor Data
1. Energy? 2. 1 person run Killer Apps?
3
Simple datacenter scenario

Applications
Web
Email
Network services
Remote file storage (NFS)
Naming (DNS)
Composition
Service path
Multilayer
Task tree

4
Example scenario observed task tree with XTrace

Multilayer service path or task tree
Static dependencies
Shared nodes and edges
Runtime dependencies
Concurrent requests sharing a node or edge
Typically in a way that effects throughput,
delay, or performance

5
Flash traffic effect on dependent services

DNS dependency example
Email server checks for spam by examining
hostnames
Webserver uses clients hostname for access
control
Surge of junk email arrives
Spam checker floods requests to DNS server
DNS server becomes overloaded DNS latency
increases
Web authentication latency increases
Web throughput decreases

Different applications (web, email) interfere
with each other. Source of degraded performance
non-obvious.
6
Relevance of problem to RAD Lab

Flash traffic / unexpected traffic patterns
What will happen during site growth?
Migration for power savings
Two perspectives on mechanism
Reduce offered capacity to save power
vs increasing offered capacity to handle
increased demands
Virtualization (Identity-based routing layer)
Independent, virtualized services might be
co-located
However, this layer may help if it provides a
service path
This work is in initial stages- comments and
feedback greatly appreciated

7
Assumptions

X-Trace deployed at least partially, creating
regions of network observations
Can measure req/sec, transaction and operation
rates, latencies, flow rates
Interested in inferring dependencies on
unobserved resources
links, back-end servers, etc.
Can generate and/or delay network traffic at key
points in the network
Used for proactive analysis discussed later

8
Summary of Approach

Observe
Dynamic, path-based data
Network policy, SLAs, QoS, service toplogy, etc.
Analyze
Model expected service performance based on path
observations
Use deviations from models to infer dependencies
and find correlations
Act
Modify network (inject or delay traffic) to
test correlations
e.g., Delay selected traffic and look for effect
elsewhere
Identify dependencies before demands affect
service behavior

9
System Observations

Macro-level connectivity behavior
Layer above X,-Trace from individual paths to
operations/sec, latencies over time, flow rates,
service topology
Views from multiple network locations

Develop algorithms
Inspect and measure relevant paths
Start with detailed service semantics, then try
to generalize

10
Managing observations

Semantic complexity
Host
Naming, directory services, remote file storage,
authentication
Middleware
DB pooling (tomcat), Contain managed persistence,
RoR
Application
Need application knowledge
Most variability
Difficult to predict behavior given request(s)

Integrating dependency information with policy
Instrumentation backplane
Cross-layer visibility as a service Kompella05

Increasing difficulty
11
Model extraction and update

Modeling based on observations
Representative timeframe and workload?
Stationary how long?
Non-stationary changes over time, day-of-week,
holidays, etc
Updating the model
Frequency, triggering action (new hw, links,
software, O/S versions)

Modeling based on workload
AWE-gen tie in
Modeling based on policy
SLA, QoS parameters, Middlebox policies

12
Performance anomaly detection

Goal detect dependency
Based on determining when expected behavior
differs from observed
Across services (E-mail volume affecting web
authentication latency, thus throughput)
Despite typical service variability
Develop algorithms to detect that deviations
are related
are strong enough

13
Deviation from expected signature shared
back-end server
14
Deviation from expected signature shared link
15
Link vs server contention?
16
Proactive dependency discovery
2

Selectively inject or delay traffic and observe
result
From source -gt sink over shared link
Result
Alternative
Delay web1 -gt NFS1 and observe result

NFS1
web1
1
NFS2
web2
Clients
sink
source
17
Proactive dependency discovery

Automatically determining experimental plan
Dynamically on-demand, or based on policy
Intervention intensity and duration
Too much -gt disrupt traffic
Too little -gt miss dependency
Detecting effect via change point detection
Stationarity test, measure means
More at poster session
Delay rate R1, measure rate R2 and latency u2
Treat u2 as a time series, look for deviations
during experiment

18
Summary / Discussion

Detect runtime dependencies using and
observe/analyze/act approach
Based on path-traces collected with ,X-Trace
Uses
Handling unexpected traffic, understanding
service behavior despite virtualization and
migration
Initial stages, welcome feedback

19
Backup slides
20
X-Trace overview and status