Title: Quality of Service vs' Any Service at All IWQoS 2005 Passau Germany
1Quality of Service vs.Any Service at AllIWQoS
2005 Passau Germany
- Randy H. Katz
- Computer Science Division
- Electrical Engineering and Computer Science
Department - University of California, Berkeley
- Berkeley, CA 94720-1776
2Presentation Outline
- The Problem
- System and Network Trends
- Checking-Observing-Protecting Services
- Inspection-and-Action Boxes
- Annotation Layer
- Scenario
- Call to Action
3Presentation Outline
- The Problem
- System and Network Trends
- Checking-Observing-Protecting Services
- Inspection-and-Action Boxes
- Annotation Layer
- Scenario
- Call to Action
4Some Observations
- Internet reasonably robust to point problems like
link and router failures (fail stop) - Successfully operates under a wide range of
loading conditions and over diverse technologies - During 9/11/01, Internet worked reasonable well,
under heavy traffic conditions and with some
major facilities failures in Lower Manhattan
5The Problem
- Networks awash in illegitimate traffic port
scans, propagating worms, p2p file swapping - Legitimate traffic starved for bandwidth
- Essential network services (e.g., DNS, NFS)
compromised - Needed better network management of
services/applications to achieve good performance
and resilience even in the face of network stress - Self-aware network environment
- Observing and responding to traffic changes
- While sustaining the ability to control the
network
6From the Frontlines
- Berkeley Campus Network
- Unanticipated traffic surges render the network
unmanageable (and may cause routers to fail) - Denial of service attacks, latest worm, or the
newest file sharing protocol largely
indistinguishable - In-band control channel is starved, making it
difficult to manage and recover the network - Berkeley EECS Department Network (12/04)
- Suspected denial-of-service attack against DNS
- Poorly implemented/configured spam appliance adds
to DNS overload - Traffic surges render it impossible to access Web
or mount file systems - Network problems contribute to brittleness of
distributed systems
7Why and How Networks Fail
- Complex phenomenology of failure
- Recent Berkeley experience suggests that traffic
surges render enterprise networks unusable - Indirect effects of DoS traffic on network
infrastructure role of unexpected traffic
patterns - Cisco Express Forwarding random IP addresses
flood route cache forcing all traffic to go
through router slow pathhigh CPU utilization
yields inability to manage router table updates - Route Summarization powerful misconfigured peer
overwhelms weaker peer with too many router table
entries - SNMP DoS attack overwhelm SNMP ports on routers
- DNS attack response-response loops in DNS
queries generate traffic overload
8Possible Approach
- New technology packet flow manipulations at
L4-L7 made possible by new PNEs and stateful
routers - Enables identification/segregation of traffic
- Good protect it
- Bad block it
- Suspicious slow it
- Check/Observe/Protect Services (COPS)
- Inspection-and-Action Boxes (iBoxes)
- Annotation layer between routing and transport
- Yielding new service building blocks
- Beyond packet marking and annotation
- To flow extraction and path-oriented statistics
collection - Based on traffic analysis, model extraction,
statistical correlation causality testing
9Presentation Outline
- The Problem
- System and Network Trends
- Checking-Observing-Protecting Services
- Inspection-and-Action Boxes
- Annotation Layer
- Scenario
- Call to Action
10Managing Edge Network Services and Applications
- Not shrink wrap softwarebut cascaded
appliances - Data Center in-a-box blade servers, network
storage - Brittle to traffic surges and shifts, yielding
network disruption
Edge Network
Blades
Wide Area Network
Server
Firewall
IDS
Traffic Shaper
Server
Server
Load Balancer
Server
Egress Checker
Edge Network Middleboxes
11Appliances ProliferateManagement Nightmare!
12Network Support for Tiered Applications
13The Computer is the Network
- Emergence of Programmable Network Elements
- Network components where net services/applications
execute - Virtualization (hosts, storage, nets) and flow
filtering (blocking, delaying) - Computation-in-the-Network is NOT Unlimited
- Packet handling complexity limited by
latency/processing overhead - NOT arbitrary per packet programming (aka active
networking) - Allocate general computation like proxies to
network blades - Beyond Per Packet Processing Network Flows
- Managing/configuring network for performance and
resilience - Emergence of stateful routers for flow-based
management - Adaptation based on Observe (Monitor), Analyze
(Detect, Diagnose), Act (Redirect, Reallocate,
Balance, Throttle)
14Presentation Outline
- The Problem
- System and Network Trends
- Checking-Observing-Protecting Services
- Inspection-and-Action Boxes
- Annotation Layer
- Scenario
- Call to Action
15Check
- Checkable Protocols Maintain invariants and
techniques for checking and enforcing protocols - Listen Whisper well-formed BGP behavior
- Traffic Rate Control Self-Verifiable Core
Stateless Fair Queuing (SV-CSFQ) - Existing work requires changes to protocol end
points or routers on the path - Difficult to retrofit checkability to existing
protocols without embedded processing in PNEs - Develop building blocks for new protocols
- Observable protocol behavior
- Cryptographic techniques
- Statistical methods
16Observe
- Observation and Action Points
- Network points where control is exercised,
traffic classified, resources allocated - In the datapath statistical collection
annotating, prioritizing, shaping, blocking, - Inspection-and-Action Boxes (iBoxes)
- Prototyped on commercial PNEs
- Placed at Internet and Server edges of enterprise
net - Cascaded with existing routers to extend their
functionality - Migration into (some current and) future router
architectures
17Protect
- Protect Crucial Services
- Minimize and mitigate effects of attacks and
traffic surges - Classify traffic into good, bad, and ugly
(suspicious) - Good standing patterns and operator-tunable
policies - Bad evolves faster, harder to characterize
- Ugly cannot immediately be determined as good or
bad - Filter the bad, slow the suspicious, maintain
resources for the good (e.g., control traffic) - Sufficient to reduce false positives
- Some suspicious-looking good traffic may be
slowed down, but wont be blocked
18Presentation Outline
- The Problem
- System and Network Trends
- Checking-Observing-Protecting Services
- Inspection-and-Action Boxes
- Annotation Layer
- Scenario
- Call to Action
19iBoxes Observe, Analyze, Act
Enterprise Network Architecture
Inspection-and-Action Boxes Deep multiprotocol
packet inspection No routing observation
marking Policing points drop, fence, block
20Generic Network Element Architecture
Tag Mem
Rules Programs
21RouterVM
- High-level specification environment for
describing packet processing - Virtualized abstracted view of underlying
hardware resources of target PNEs - Portable across diverse architectures
- Simulate before deployment
- Services, policies, and standard routing
functions managed thru composed packet filters - Generalized packet filter trigger action
bundle, cascaded, allows new services and
policies to be implemented / configured thru GUI - New services can be implemented without new code
through library building blocks
Mel Tsai
22Extended Router Architecture
- Virtualized components representative of a
common router implementation - Structure independent of particular hardware
Virtual backplane shuttles packets between line
cards
Virtual line card instantiated for every port
required by application
CPU handles routing protocols mgmt tasks
Blue standard componentsYellow components
added configured per-application
Compute engines perform complex, high-latency
processing on flows
Filters are key to flexibility
Mel Tsai
23GPF Fill-in Specification
RouterVM Generalized Packet Filter (type L7)
- Packet filter as high-level, programmable
building-block for network appliance apps
Traditional Filter
24GPF Action Language
- Basic set of assignments, evaluations,
expressions, control-flow ops, physical actions
on packets/flows - Control-flow If-then-else, if-not
- Evaluation , lt, gt, !
- Packet flow control Allow, unallow, drop, skip
filter, jump filter - Packet manipulation Field rewriting (ip_src
blah, tcp_src blah), truncation, header
additions - Actions NAT, loadbalance, ratelimit, (perhaps
others) - Meta actions packet generation, logging,
statistics gathering
- Basic Filter
- Simple L2-L4 header classifications
- Any RouterVM actions
- L7 Filter
- REs, TCP termination, ADU recon
- NAT Filter
- Capabilities beyond simple NAT action available
to all GPFs - Content Caching
- Builds on L7 filter functionality
- WAN Link Compression
- Simple to specify, but requires lots of
computation - IP-to-FC Gateway
- Requires own table format processing
- XML Preprocessing
- Not very well documented, and difficulty is
unknown
25Presentation Outline
- The Problem
- System and Network Trends
- Checking-Observing-Protecting Services
- Inspection-and-Action Boxes
- Annotation Layer
- Scenario
- Call to Action
26Network-LevelObserve-Analyze-Act
- Observe
- Packet, path, protocol, service invocation
statistical collection and sampling frequencies,
latencies, completion rates - Construct the collection infrastructure
- Analyze
- Determine correlations among observations
- Normal model discovery anomaly detection
- Exploit SLT
- Act
- Experiment to test correlations
- Prioritize and throttle
- Mark and annotate
- Control theory? Distributed analyses and actions
27Network Layer Mechanism Annotations
- Enhance network visibility disseminate
observations, communicate actions, provide
in-band network management actions, iBox-to-iBox
communications - iBoxes label packets at annotation layer but do
not rewrite packet contents - Annotations stack, must be removed from packets
before delivery to A-layer unaware end nodes - Expose annotations to application layer?
28Annotation LayerSimple Marking Example
- Marking vs. rewriting approach
- E.g., mark packets as internally vs. externally
sourced using IP header options - Prioritize internal vs. external access to
services solves some but not all traffic surge
problems
29Annotation LayeriBox Piggybacked Control Plane
- Problem Control plane starvation
- Use A-layer for iBox-to-iBox communication
- Passively piggyback on existing flows
- Busy parts of network have lots of control
plane b/w - Actively inject control packets to less active
parts - Embedded control info authenticated and sent
redundantly - Priority given to packets w/control when net
under stress - Network monitoring and statistics collection
dissemination subsystem
30Presentation Outline
- The Problem
- System and Network Trends
- Checking-Observing-Protecting Services
- Inspection-and-Action Boxes
- Annotation Layer
- Scenario
- Call to Action
31Scenario Traffic Surge Inhibiting Network
Services
Internet Edge
II
R
Primary Secondary DNS Servers
Distribution Tier
S
S
E
Mail Server
E
R
R
S
IA
IS
E
Spam Appliance
Access Edge
Server Edge
E
S
- DNS Server swamped by excessive request traffic
- Observe DNS time outs, Web access traffic
slowed, but also higher than normal mail delivery
latency implying busy server edge (correlation
between Mail Server and DNS Server utilization?) - Root Cause High DNS request rates generated by
Spam Appliance triggered by mail surge
32Scenario Continued
Internet Edge
II
R
Primary Secondary DNS Servers
Distribution Tier
S
S
E
Mail Server
E
R
R
S
IA
IS
E
Spam Appliance
Access Edge
Server Edge
E
S
- How Diagnosed?
- I-S detects high link utilization but abnormally
high DNS traffic - Stats from I-I high mail traffic, low outgoing
web traffic, in traffic high but link utilization
not high - Stats from I-A lower web traffic, no unusual
mail origination - Problem localized to Server edge, but visibility
limited
33Scenario Continued
Internet Edge
II
R
Primary Secondary DNS Servers
Distribution Tier
S
S
E
Mail Server
E
R
R
S
IA
IS
E
Spam Appliance
Access Edge
Server Edge
E
S
- Possible Action Responses
- Experiment Redirect local DNS requests to
Secondary DNS server if these complete, can
infer the server is the problem, not the network - Throttle Due to MS-DNS correlation, block/slow
email traffic at Server Edge should expect
reduced DNS server utilization
34Presentation Outline
- Problem and Approach
- System and Network Trends
- Checking-Observing-Protecting Services
- Inspection-and-Action Boxes
- Annotation Layer
- Scenario
- Call to Action
35System Perspective Needed!
Operator
User
Prototype Applications
Programming Abstractions For Roll-back
and wide-area distributed computations
SLT Services
Crash-only services Observation Infrastructure
forSystem SLT
Application- Specific Overlay Network
Checkable Protocols Fast Detection Route
Recovery ObservationInfrastructure for network
SLT
iBox
iBox
Edge Network
Edge Network
Commodity Internet
36Hope for EmergingPlatforms
- iBoxes implemented on commercial PNEs
- Dont route or implement (full) protocol stacks
- Do protect routers and shield network services
- Classify packets
- Extract flows
- Redirect traffic
- Log, count, collect stats
- Filter/shape traffic
37Summary and Conclusions
- Processing-in-the-Network is real
- Networking plus processing in switched and routed
infrastructures - Configuration and management of packet processing
cast onto PNEs (network appliances, blades,
stateful routers) - Needed Unifying Framework
- Methods to specify functionality and processing
- RouterVM Filtering, Redirecting, Transformation
- Map from policy intentions to network actions?
- Local observations/correct global behavior?
- Application-specific network processing based on
session extraction
38Summary and Conclusions
- PNEs foundation of a pervasive infrastructure
for observation and action at the network level - iBoxes Observation and Action points
- Annotation Layer for marking and control
- Check-Observe-Protect paradigm for protecting
critical resources when network is under stress - Functionality eventually migrates into future
generations of routers - E.g., Blades embedded in routers
39Quality of Servicevs.Any Service at AllRandy
H. KatzThank You!
40(No Transcript)