Title: Introspective Networks
1Introspective Networks
- George Varghese
- University of California, San Diego
2Network Evolution?
- Basic stateless, transparent.
- Tools protocol design (e.g., soft-state)
- 2. Active customizable, re-configurable
- Tools Code Safety (e.g., sandboxing)
- 3. Cognitive intelligent, reasoning
- Tools AI (e.g., multi-agent systems)
- 4. Introspective pattern detection/response
- Tools Streaming algorithms, statistical
inference (e.g. Bloom Filters, sampling)
3What is Introspection?
- Detecting patterns in data traffic, either in
real-time or based on packet logs. Examples - Measurement Introspection Identify resource
usage patterns for better resource management - Security Introspection Identify attack patterns
to mitigate or prevent attacks. - Fault Introspection Identify fault or anomaly
patterns to allow automated fault repair. - Motivated by market pull and technology push
4Market Pull 1 Better ROI for ISPs
reroute or add B/W
Customer Site 2
Customer Site 1
Customer Site 3
ISP
- Better ROI Optimize resources (BGP policy, OSPF
weights, light up fibers, add bandwidth) based on
resource usage patterns. - Better Isolation Better QoS (200 msec versus
2000 msec delay for during Slammer) during
attacks is major differentiator. - Competitive Edge Just as banks use data mining
to better manage loan portfolios, can better
manage bandwidth portfolio.
Sprint Monitoring Proposal, IETF BOF 2003
5Market Pull 2 Costs of (In)Security
IDS
Attacker
Victim
Zombie 1
(patches)
traceback
Firewall
ISP
Zombie N
- Cost Too many isolated perimeter solutions
(firewalls, IDS devices, patches). Total cost of
ownership (TCO) very high. - Delay When perimeter detects, damage is already
done. - Complexity End users finding and installing
patches or require router support for traceback
which could be used for detection.
Gartner Research Security solutions deployed
within enterprises by 2004 and within ISPs by 2006
6Technology Push Streaming Algorithms and
Hardware Gates
- Algorithms Recent major thrust in streaming
algorithms in database, web analysis, theory,
networks - Hardware Memory accesses remain expensive (lt
100) and SRAM not scaling as fast as number of
connections (lt 32 Mbits), but gates are
plentiful. - Mapping Many randomized streaming algorithms
(e.g., Bloom Filters, Min-wise hashing) developed
to find patterns in disk logs map well to network
ASICs. - Opportunity Invent or adapt streaming algorithms
for networking patterns.
7Concerns about Network Introspection
- Speed Can hardware run fast enough?
- Recall IP lookups in 1990s, surprisingly complex
things (branch predictors, TCP Offload) being
done routinely today. - Even if not, can use algorithms to mine packet
logs offline for insight. - Inflexible Hardware not easy to change.
- Design hardware to identify useful primitive
patterns that can be combined. - Network Processors (ISCA 2003) can offer
flexibility speed. - End-to-end argument Not simple, stateless core.
- Not required for correctness of basic forwarding,
but only as an optimization or value-add.
8Introspection as Pattern Detection
ROUTER
S1
S2
S2
S5
S2
S1
- Within Packet Patterns Prefix matches,
classification, signature detection (e.g., Code
Red Payload) - Across Packet Patterns Scheduling, Timing,
Heavy-hitters, large flows, partial completion.
9Pattern Detection Algorithm Requirements
- Low memory On-chip SRAM limited to around 32
Mbits. Not constant but is not scaling with
number of concurrent conversations. - Small processing For wire-speed at 40 Gbps,
using 40 byte packets, have 8 nsec. Using 1 nsec
SRAM, 8 memory accesses. Factor of 30 in
parallelism buys 240 accesses.
10Talk Outline
- Part 1 Motivation
- Part 2 Basic Patterns and Algorithms
(heavy-hitters, many flows, partial completion) - Part 3 Combining patterns to solve useful
application problems - Part 4 Conclusions.
11Pattern 1 Heavy-hitters
- Heavy-hitters In a measurement interval, (e.g.,
10 minutes) detect the flows (e.g., sources) on a
link that send more than a threshold (say 1 of
the traffic) on a link.
S1
S6
S2
S5
S2
S2
Source S2 is 30 percent of traffic sequence
Estan,Varghese, ACM TOCS 2003
12HeavyHitters via Multistage Filters
Increment
13Multistage filters in Action
Counters
. . .
Threshold
Grey other flows
Stage 1
Yellow small flow
Green large flow
Stage 2
Stage 3
14Multistage Filter Analysis
- Assume 1 percent threshold. Bound probability
that a flow F of - 0.1 or less gets through 6 stages of size 1000
each. - Why trouble? F can fall into a hot'' bucket if
and only the sum of traffic of all other flows in
that bucket is morethan 0.9 - Single stage probability At most 100/0.9 111
bucketsthat can be over 0.9 before we bring on
F. Thus probability F falls in a hot'' bucket
is less than 111/1000 0.111 - Multistage probability To be branded, F must
beunlucky in all 6 stages with a probability of
no more than0.111 6 which is very small. Thus
at most 1000 false positiveswith very high
probability.
15Pattern 2 Partial Completion
- Partial Completion In a measurement interval,
detect the flows (e.g., destinations) which have
several Start Packets (e.g., SYN) without the
corresponding End (e.g., FIN).
SYNx
SYNY
SYNz
FINY
SYNx
SYNx
FINZ
Destination X has 3 partial completions in
sequence
16Partial Completion Filters
Increment for SYN, Decrement for FIN
17Analysis 1 Benign but Malformed Connections
FINx
SYNx
Long Lived Connection
Interval 1
Interval 2
Interval 3
Interval 4
SYNy Retransmissions
FINz Retransmissions
Model benign but malformed connections as adding
extra SYN or FIN to an interval with probability
0.5
18Analysis 2 using Gaussian approximation
Probability of false positives 0.0013
Probability
Probability of false negatives 0.0013
Greater than 6?
Counter Values
19Pattern 3 Many Flows
- Many Flows In a measurement interval, find if
number of flows exceeds a threshold.
S1
S6
S2
S5
S2
S2
6 packets but only 4 distinct sources
20Simple Bitmap counting
1
1
1
1
1
1
1
F
Hash based on flow identifier
Estimate based on the number of bits set
Problem bitmap takes too much memory to count a
large number of flows
21Sampled Bitmap counting
1
1
Solution keep only a sample of the bitmap
Estimate scale up sampled count
Problem inaccurate if too few or too many flows
22Multi-resolution Bitmap counting
100-1000
10-100
1-10 flows
Solution multiple bitmaps, each covering a
different range
Estimate use first bitmap that has less than
93.1 of its bits set, count, scale
23Outline of Talk
- Part 1 Motivation
- Part 2 Basic Patterns and Algorithms
- Part 3 Combining base patterns to solve useful
application problems (traffic matrix, DoS, worms) - Part 4 Conclusions.
24Application 1 Traffic Matrix
reroute or add B/W
Customer Site 2
Customer Site 1
Customer Site 3
ISP
- Each entry router uses a multistage filter on
traffic to destination prefixes to isolate
subnets to which there is large traffic. - Aggregating across all entry routers gives the
dominant part of traffic matrix. ATT reports
80-20 rule for prefixes.
25Application 2, Process Logs to Find Large
Bandwidth Usage Patterns
Multidimensional analysis via our tool
Old methods look at a single dimension at a time
Estan,Savage,Varghese, SIGCOMM 2003
26Application 3 DoS Attacks
- Bandwidth attacks (e.g.. Smurf). Pound victim
with large traffic of certain type. - Use heavy-hitter pattern relative to traffic type
(e.g., ICMP) to find attacked destinations - Partial Completion attacks (e.g., TCP
SYN-Flood). May not be unusual bandwidth but
characterized by partial connections. - Use partial completion pattern?
27Syn-Flood Detection Options
Attacker n
Attacker 1
Attacker ISP
Syn-Dog
Attacker ISP
OUR SOLUTION Partial Completion Filters in
network
Network Core
Back-Scatter Detection
TraceBack
Victim ISP
Syn-Kill Syn-Defender Multops Syn-cookie/cache
Victim
28PCF Deployment Options
Attacker n
Attacker 1
Attacker ISP
Attacker ISP
Destination based SYN-FIN PCF for detection and
defense (can be spoofed)
Network Core
Back-Scatter Vantage Point
Victim ISP
Source based SYN-ACK/FIN PCF for BackScatter
detection (Spoof-Proof)
Victim
29Application 4 Worm Detection
New Victim
Infected 1
Inactive Address
Infected N
ISP
- Concrete approaches to worm containment routers
block packets with specific code signature. - Manual signature extraction slow and enormous
effort for each new worm. - Automatic signature extraction of a specific worm
by automatically detecting an abstract worm.
30Abstract Worm Definition
- F1, Content Repetition Payload of worm is seen
frequently at router. - F2, Increasing Infection Levels Same content is
disbursed to increasing number of distinct
source-destination pairs. - O1, Random Probing Worm replicates by probing
random IP addresses. - O2, Code fragments Worm payload contains content
that has some resemblance to code.
31Abstract Worm Detection
- F1, Content Repetition Use heavy-hitter pattern
with hash H of content as index. - F2, Increasing Infection Levels Use many flows
pattern with content hash H as index. - O1, Random Probing Count dests sent with H in
sample unused space (Telescope, Moore et al) - O2, Code fragments Simple offline tests that
test say for 8086 control transfer op-codes. - First 3 tests need low memory, small processing
32Spectre of Polymorphism
- Syntactic Polymorphism Fragmentation on links
with diff MTU sizes, offsets, No-Ops (use Rabin
fingerprints at sampled offsets but does not help
in case of encryption.) - Semantic Polymorphism Code rewriting at each new
source (hard to detect, but raises bar to include
a small compiler with worm payload.)
33EarlyBird Experience
- System Uses 39 byte Rabin fingerprints on
tcpdump, looks for content repetition above low
threshold, large memory currently. - Deployment sniffs on uplink of lab switch. 9
day period between May 2nd and May 10th 2003. 4
million packets - Latent Worms Found-- (742 pairs) TCP/139
NetBios Attack-- (51 pairs) Code Red TCP/80 GET
/default.ida - -- Linux Slapper and 1 Unicode exploit
- False positives "robots.txt", SSH-1.99-3.1.1
SSH Secure Shell for Windows', some VNC strings
34Recent Experience with EarlyBird
- On Aug 11th, Monday afternoon, found 133
repetitions of content for an RPC service. Lab
machines stayed up but received many infection
attempts - Major security companies were already on the
lookout for this, so MSBlaster was detected
quickly. - On the evening of Monday Aug 11th, my home
computer began rebooting every few minutes saying
mumble RPC mumble
35Conclusions
- Measurement introspection can improve ISP ROI and
security introspection can reduce TCO. - Can implement base patterns at high speeds.
- Base patterns can be combined to solve useful
application issues (traffic matrix, DoS, worms,
etc.) - Only scratching surface fault introspection,
etc.,
36Joint work with collaborators
- Stefan Savage (AutoFocus, EarlyBird)
- Students in Internet Algorithmics Lab
Ramana Kompella
Cristian Estan
Sumeet Singh