Title: Measuring Adversaries
1Measuring Adversaries
- Vern Paxson
- International Computer Science Institute /
Lawrence Berkeley National Laboratory - vern_at_icir.org
- June 15, 2004
2 80 growth/year
Data courtesy of Rick Adams
3 60 growth/year
4 596 growth/year
5The Point of the Talk
- Measuring adversaries is fun
- Increasingly of pressing interest
- Involves misbehavior and sneakiness
- Includes true Internet-scale phenomena
- Under-characterized
- The rules change
6The Point of the Talk, cont
- Measuring adversaries is challenging
- Spans very wide range of layers, semantics, scope
- New notions of active and passive measurement
- Extra-thorny dataset problems
- Very rapid evolution arms race
7Adversaries Evasion
- Consider passive measurement scanning traffic
for a particular string (USER root) - Easiest scan for the text in each packet
- No good text might be split across multiple
packets - Okay, remember text from previous packet
- No good out-of-order delivery
- Okay, fully reassemble byte stream
- Costs state .
- . and still evadable
8Evading Detection ViaAmbiguous TCP Retransmission
9The Problem of Evasion
- Fundamental problem passively measuring traffic
on a link Network traffic is inherently
ambiguous - Generally not a significant issue for traffic
characterization - But is in the presence of an adversary
Attackers can craft traffic to confuse/fool
monitor
10The Problem of Crud
- There are many such ambiguities attackers can
leverage - A type of measurement vantage-point problem
- Unfortunately, these occur in benign traffic,
too - Legitimate tiny fragments, overlapping fragments
- Receivers that acknowledge data they did not
receive - Senders that retransmit different data than
originally - In a diverse traffic stream, you will see these
- What is the intent?
11Countering Evasion-by-Ambiguity
- Involve end-host have it tell you what it saw
- Probe end-host in advance to resolve
vantage-point ambiguities (active mapping) - E.g., how many hops to it?
- E.g., how does it resolve ambiguous
retransmisions? - Change the rules - Perturb
- Introduce a network element that normalizes the
traffic passing through it to eliminate
ambiguities - E.g., regenerate low TTLs (dicey!)
- E.g., reassemble streams remove inconsistent
retransmissions
12Adversaries Identity
- Usual notions of identifying services by port
numbers and users by IP addresses become
untrustworthy - E.g., backdoors installed by attackers on
non-standard ports to facilitate return / control - E.g., P2P traffic tunneled over HTTP
- General measurement problem inferring structure
13Adversaries IdentityMeasuring Packet Origins
- Muscular approach (Burch/Cheswick)
- Recursively pound upstream routers to see which
ones perturb flooding stream - Breadcrumb approach
- ICMP ISAWTHIS
- Relies on high volume
- Packet marking
- Lower volume intensive post-processing
- Yaars PI scheme yields general tomography
utility - Yields general technique power of introducing
small amount of state inside the network
14Adversaries IdentityMeasuring User Origins
- Internet attacks invariably do not come from the
attacker's own personal machine, but from a
stepping-stone a previously-compromised
intermediary. - Furthermore, via a chain of stepping stones.
- Manually tracing attacker back across the chain
is virtually impossible. - So want to detect that a connection going into a
site is closely related to one going out of the
site. - Active techniques? Passive techniques?
15Measuring User Origins, cont
- Approach 1 (SH94 passive) Look for similar
text - For each connection, generate a 24-byte
thumbprint summarizing per-minute character
frequencies - Approach 2 (USAF94) - particularly vigorous
active measurement - Break-in to upstream attack site
- Rummage through its logs
- Recurse
16Measuring User Origins, cont
- Approach 3 (ZP00 passive) Leverage unique
on/off pattern of user login sessions - Look for connections that end idle periods at the
same time. - Two idle periods correlated if ending time differ
by ? sec. - If enough periods coincide ? stepping stone pair.
- For A ? B ? C stepping stone, just 2 correlations
suffices - (For A ? B ? ? C ? D, 4 suffices.)
17Measuring User Origins, cont
- Works very well, even for encrypted traffic
- But easy to evade, if attacker cognizant of
algorithm - Cest la arms race
- And also turns out there are frequent legit
stepping stones - Untried active approach imprint traffic with
low-frequency timing signature unique to each
site (breadcrumb). Deconvolve recorded traffic
to extract.
18Global-scale Adversaries Worms
- Worm Self-replicating/self-propagating code
- Spreads across a network by exploiting flaws in
open services, or fooling humans (viruses) - Not new --- Morris Worm, Nov. 1988
- 6-10 of all Internet hosts infected
- Many more small ones since but came into its
own July, 2001
19Code Red
- Initial version released July 13, 2001.
- Exploited known bug in Microsoft IIS Web servers.
- 1st through 20th of each month spread.20th
through end of each month attack. - Spread via random scanning of 32-bitIP address
space. - But failure to seed random number generator ?
linear growth? reverse engineering enables
forensics
20Code Red, cont
- Revision released July 19, 2001.
- Payload flooding attack on
www.whitehouse.gov. - Bug lead to it dying for date 20th of the
month. - But this time random number generator correctly
seeded. Bingo!
21Worm dies on July 20th, GMT
22Measuring Internet-Scale Activity Network
Telescopes
- Idea monitor a cross-section of Internet address
space to measure network traffic involving wide
range of addresses - Backscatter from DOS floods
- Attackers probing blindly
- Random scanning from worms
- LBNLs cross-section 1/32,768 of Internet
- Small enough for appreciable telescope lag
- UCSD, UWiscs cross-section 1/256.
23Spread of Code Red
- Network telescopes give lower bound on infected
hosts 360K. - Course of infection fits classic logistic.
- That night (? 20th), worm dies
- except for hosts with inaccurate clocks!
- It just takes one of these to restart the worm on
August 1st
24Could parasitically analyze sample of 100Ks of
clocks!
25The Worms Keep Coming
- Code Red 2
- August 4th, 2001
- Localized scanning prefers nearby addresses
- Payload root backdoor
- Programmed to die Oct 1, 2001.
- Nimda
- September 18, 2001
- Multi-mode spreading, including via Code Red 2
backdoors!
26Code Red 2 kills off Code Red 1
Nimda enters the ecosystem
CR 1 returns thanksto bad clocks
Code Red 2 settles into weekly pattern
Code Red 2 dies off as programmed
27(No Transcript)
28(No Transcript)
29Detecting Internet-Scale Activity
- Telescopes can measure activity, but what does it
mean?? - Need to respond to traffic to ferret out intent
- Honeyfarm a set of honeypots fed by a network
telescope - Active measurement w/ an uncooperating (but
stupid) remote endpoint
30Internet-Scale Adversary Measurement via
Honeyfarms
- Spectrum of response ranging from simple/cheap
auto-SYN acking to faking higher levels to truly
executing higher levels - Problem 1 Bait
- Easy for random-scanning worms, auto-rooters
- But for topological or contagion worms, need
to seed honeyfarm into application network - Huge challenge
- Problem 2 Background radiation
- Contemporary Internet traffic rife with endemic
malice. How to ignore it??
31Measuring InternetBackground Radiation -- 2004
- For good-sized telescope, must filter
- E.g., UWisc /8 telescope sees 30Kpps of traffic
heading to non-existing addresses - Would like to filter by intent, but initially
dont know enough - Schemes - per source
- Take first N connections
- Take first N connections to K different ports
- Take first N different payloads
- Take all traffic source sends to first N
destinations
32Responding to Background Radiation
33Hourly Background Radiation Seen at a
2,560-address Telescope
34(No Transcript)
35Measuring Internet-scale Adversaries Summary
- New tools forms of measurement
- Telescopes, honeypots, filtering
- New needs to automate measurement
- Worm defense must be faster-than-human
- The lay of the land has changed
- Endemic worms, malicious scanning
- Majority of Internet connection (attempts) are
hostile (80 at LBNL) - Increasing requirement for application-level
analysis
36The Huge Dataset Headache
- Adversary measurement particularly requires
packet contents - Much analysis is application-layer
- Huge privacy/legal/policy/commercial hurdles
- Major challenge anonymization/agents
technologies - E.g. PP03 semantic trace transformation
- Use intrusion detection systems application
analyzers to anonymize trace at semantic level
(e.g., filenames vs. users vs. commands) - Note general measurement increasingly benefits
from such application analyzers, too
37Attacks on Passive Monitoring
- State-flooding
- E.g. if tracking connections, each new SYN
requires state each undelivered TCP segment
requires state - Analysis flooding
- E.g. stick, snot, trichinosis
- But surely just peering at the adversary were
ourselves safe from direct attack?
38Attacks on Passive Monitoring
- Exploits for bugs in passive analyzers!
- Suppose protocol analyzer has an error parsing
unusual type of packet - E.g., tcpdump and malformed options
- Adversary crafts such a packet, overruns buffer,
causes analyzer to execute arbitrary code - E.g. Witty, BlackIce packets sprayed to random
UDP ports - 12,000 infectees in lt 60 minutes!
39Summary
- The lay of the land has changed
- Ecosystem of endemic hostility
- Traffic characterization of adversaries as ripe
as characterizing regular Internet traffic was 10
years ago - People care
- Very challenging
- Arms race
- Heavy on application analysis
- Major dataset difficulties
40Summary, cont
- Revisit passive measurement
- evasion
- telescopes/Internet scope
- no longer isolated observer, but vulnerable
- Revisit active measurement
- perturbing traffic to unmask hiding evasion
- engaging attacker to discover intent
- IMHO, this is "where the action is
- And the fun!