Experimentally inferring runtime datacenter dependencies with XTrace - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Experimentally inferring runtime datacenter dependencies with XTrace

Description:

Email server checks for spam by examining hostnames ... Stationary: how long? Non-stationary: changes over time, day-of-week, holidays, etc ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 23
Provided by: georgep6
Category:

less

Transcript and Presenter's Notes

Title: Experimentally inferring runtime datacenter dependencies with XTrace


1
Experimentally inferring runtime datacenter
dependencies with X-Trace
  • George Porter - Winter 2007 retreat

2
Relevance to RAD Lab
(datacenter cluster or RAMP)
Policy Maker
Per node SW stack
Load- Balancer (IDLB)
Intrusion- Detection (IDID)
Service (IDS)
Web 2.0 Applications
Firewall (IDF)
Ruby on Rails Interpreter
Web Svc APIs
Path traces and statistics
Trace, X-trace
Local OS functions
trace, X-trace, Lib log, D-trigger,
Identity-based Routing Layer,
Virtual Machine Monitor
Actuator Commands
Sensor Data
1. Energy? 2. 1 person run Killer Apps?
3
Simple datacenter scenario
  • Applications
  • Web
  • Email
  • Network services
  • Remote file storage (NFS)
  • Naming (DNS)
  • Composition
  • Service path
  • Multilayer
  • Task tree

4
Example scenario observed task tree with XTrace
  • Multilayer service path or task tree
  • Static dependencies
  • Shared nodes and edges
  • Runtime dependencies
  • Concurrent requests sharing a node or edge
  • Typically in a way that effects throughput,
    delay, or performance

5
Flash traffic effect on dependent services
  • DNS dependency example
  • Email server checks for spam by examining
    hostnames
  • Webserver uses clients hostname for access
    control
  • Surge of junk email arrives
  • Spam checker floods requests to DNS server
  • DNS server becomes overloaded DNS latency
    increases
  • Web authentication latency increases
  • Web throughput decreases

Different applications (web, email) interfere
with each other. Source of degraded performance
non-obvious.
6
Relevance of problem to RAD Lab
  • Flash traffic / unexpected traffic patterns
  • What will happen during site growth?
  • Migration for power savings
  • Two perspectives on mechanism
  • Reduce offered capacity to save power
  • vs increasing offered capacity to handle
    increased demands
  • Virtualization (Identity-based routing layer)
  • Independent, virtualized services might be
    co-located
  • However, this layer may help if it provides a
    service path
  • This work is in initial stages- comments and
    feedback greatly appreciated

7
Assumptions
  • X-Trace deployed at least partially, creating
    regions of network observations
  • Can measure req/sec, transaction and operation
    rates, latencies, flow rates
  • Interested in inferring dependencies on
    unobserved resources
  • links, back-end servers, etc.
  • Can generate and/or delay network traffic at key
    points in the network
  • Used for proactive analysis discussed later

8
Summary of Approach
  • Observe
  • Dynamic, path-based data
  • Network policy, SLAs, QoS, service toplogy, etc.
  • Analyze
  • Model expected service performance based on path
    observations
  • Use deviations from models to infer dependencies
    and find correlations
  • Act
  • Modify network (inject or delay traffic) to
    test correlations
  • e.g., Delay selected traffic and look for effect
    elsewhere
  • Identify dependencies before demands affect
    service behavior

9
System Observations
  • Macro-level connectivity behavior
  • Layer above X,-Trace from individual paths to
    operations/sec, latencies over time, flow rates,
    service topology
  • Views from multiple network locations
  • Develop algorithms
  • Inspect and measure relevant paths
  • Start with detailed service semantics, then try
    to generalize

10
Managing observations
  • Semantic complexity
  • Host
  • Naming, directory services, remote file storage,
    authentication
  • Middleware
  • DB pooling (tomcat), Contain managed persistence,
    RoR
  • Application
  • Need application knowledge
  • Most variability
  • Difficult to predict behavior given request(s)
  • Integrating dependency information with policy
  • Instrumentation backplane
  • Cross-layer visibility as a service Kompella05

Increasing difficulty
11
Model extraction and update
  • Modeling based on observations
  • Representative timeframe and workload?
  • Stationary how long?
  • Non-stationary changes over time, day-of-week,
    holidays, etc
  • Updating the model
  • Frequency, triggering action (new hw, links,
    software, O/S versions)
  • Modeling based on workload
  • AWE-gen tie in
  • Modeling based on policy
  • SLA, QoS parameters, Middlebox policies

12
Performance anomaly detection
  • Goal detect dependency
  • Based on determining when expected behavior
    differs from observed
  • Across services (E-mail volume affecting web
    authentication latency, thus throughput)
  • Despite typical service variability
  • Develop algorithms to detect that deviations
  • are related
  • are strong enough

13
Deviation from expected signature shared
back-end server
14
Deviation from expected signature shared link
15
Link vs server contention?
16
Proactive dependency discovery
2
  • Selectively inject or delay traffic and observe
    result
  • From source -gt sink over shared link
  • Result
  • Alternative
  • Delay web1 -gt NFS1 and observe result

NFS1
web1
1
NFS2
web2
Clients
sink
source
17
Proactive dependency discovery
  • Automatically determining experimental plan
  • Dynamically on-demand, or based on policy
  • Intervention intensity and duration
  • Too much -gt disrupt traffic
  • Too little -gt miss dependency
  • Detecting effect via change point detection
  • Stationarity test, measure means
  • More at poster session
  • Delay rate R1, measure rate R2 and latency u2
  • Treat u2 as a time series, look for deviations
    during experiment

18
Summary / Discussion
  • Detect runtime dependencies using and
    observe/analyze/act approach
  • Based on path-traces collected with ,X-Trace
  • Uses
  • Handling unexpected traffic, understanding
    service behavior despite virtualization and
    migration
  • Initial stages, welcome feedback

19
Backup slides
20
X-Trace overview and status
  • Collect path-based traces
  • Across layers, devices, and ASes
  • Design principles
  • Trace request sent in-band
  • Trace data collected out of band
  • Decouple trace requestor from trace receiver
  • Components
  • Per-layer metadata
  • Host and server modification to propagate
    metadata
  • Reporting and aggregation framework
  • Opendht, I3, SQL
  • Progress
  • New implementation
  • HTTP, IP, I3, SQL, Chord
  • C and Java
  • Early adopters
  • DONA, Coral CDN
  • To appear at NSDI07

21
Role of workload
  • Increasing request rate may not effect service
    under test
  • Due to caching, fast path, middlebox
    interception, etc
  • E.g. workload consisting of a single page served
    from RAM
  • Services often optimized for certain requests
  • SQL requests to indexed vs non-indexed data
  • Router fast path, vs slow-path

22
Remote file storage behavior under load
  • File storage an application with well-known
    sementics
  • Absent contention, we would expect this behavior
    at runtime
  • NFS write performance
  • File size ?, throughput ?
  • Deviation from expected
  • File size ?, throughput ?
  • This could be an indication of resource
    contention/ depenency

NFS
web
Clients
DNS
email
Write a Comment
User Comments (0)
About PowerShow.com