Title: Reactive Patching: a viable worm defence strategy
1Reactive Patching a viable worm defence
strategy ?
- Milan Vojnovic Ayalvadi Ganesh
- Microsoft Research
- Cambridge, United Kingdom
Tutorial Performance 2005 Juan-le-Pins, France,
Oct 4, 05
2Who is this tutorial intended for?
- Security non-specialist
- Learn some strategies of worm spread
effectiveness of some countermeasures - Security specialist
- Fundamental limits of patching
- Whom is it not?
- Those interested in gory details of particular
worms and vulnerabilities they exploit
3What is a Worm?
- Self-replicating malicious code that
- Exploits a known or unknown software
vulnerability (e.g. buffer overflow) - To gain (partial?) control over the host
- Uses the host to propagate copies of itself
- Typically, does not require human intervention
- Unlike viruses
- Contributes to speed of spread
4Buffer overflow vulnerability
Name
Data
Program
Name
Worm data
Overwrites program
5Motivation for studying worms
- Self-replicating malicious code spreads very
quickly - Code Red v2 360,000 hosts in 24 hours
- Slammer 75,000 hosts in 10 minutes
- Causes huge economic damage
- Backbone saturation, cleanup
- Things could be worse!
- Hard or impossible to eliminate all
vulnerabilities in code
6Roadmap
- Worm spread strategies
- Target discovery mechanisms
- SI epidemic model of worm spread
- Patch spread strategies
- Analysis of patching
- Patching
- Patching with filtering
- Candidate strategies PUSH, P2P
- Summary and conclusions
7Target discovery (1)
- No a priori knowledge of vulnerable hosts
- Random scanning
- Generate IP address at random
- If vulnerable host found at that address, infect
it - Commonly used in current generation of worms,
e.g., Code Red I, Slammer - Not very smart, but can still be fast
8Host population types
I (infected)
P (patched)
9Random scanning worm
??
1
IP address
scan hit !
10Random scanning worm
??
1
11Target discovery (2)
- Local preference /subnet biased scanning
- Ex. Code Red II, Zotob
- Infected hosts split their scanning effort
between - Local subnet (IP addresses with the same 1-2-3
octet prefix) - Global Internet
- Why ?
12Subnet preference Code Red II
1/8
1/2
3/8
/8
/16
13Subnet preference Zotob
- Scans local /16 address space until
- 512 consecutive scans miss, or
- Until 32 scans miss if there has been no success
- Then switches to random scanning of entire IP
address space
14Target discovery (3)
- Hit lists Worm seeded with list of vulnerable
targets identified in advance - Carried in worm payload, or
- Looked up from external server, e.g., meta-server
for games - Objective accelerate initial spread
15Target discovery (4)
- Topological worms
- Target lists obtained from data residing on host,
ex, - Local DNS cache
- Instant messenger contact lists
- Neighbour lists of P2P applications
- Potentially very fast, and hard to distinguish
from legitimate use
16Roadmap
- Worm spread strategies
- Target discovery mechanisms
- SI epidemic model of worm spread
- Patch spread strategies
- Analysis of patching
- Patching
- Patching with filtering
- Candidate strategies PUSH, P2P
- Summary and conclusions
17Model of random scanning
- Address space of size O 232
- N vulnerable hosts, occupy fraction N/O of
address space - Infected hosts scan addresses randomly at rate ?
- Code Red ? 360 per minute
- Slammer (UDP) ? 4000 per second
- If scan locates a vulnerable host, it is infected
18Random scanning (in pictures)
??
1
19Stochastic epidemic model
- Infected hosts scan IP address space at points of
Poisson process of rate ? - Independent at distinct hosts
- Rate at which scans hit vulnerable hosts ß ?
N / O - I(t) Number of infected hosts, evolves as a
Markov process - High-level model ignores network congestion,
latency
20Deterministic epidemic model
- Large population limit
- N?8, ?/O fixed
- i(t) I(t)/N fraction of hosts infected
- i(t) density dependent Markov chain
- Converges to limit deterministic ODE
- i(t) ß i(t) 1-i(t)
21Modelling Code Red
22Characteristic time scale
- Epidemic growth follows logistic curve
- Initially exponential, then saturates
- Time to infect most of the susceptible population
is a small multiple of 1/ß - 40 minutes for Code Red
- 10 seconds for Slammer
- Time scale for network-wide infection is hours
for Code Red, minutes for Slammer
23Not considered non-constant per-infective scan
rate
- Some random scanning worms result in a
non-constant per-infective scan rate -
- Example Slammer (2003)
- plausible cause
- bandwidth-saturation
- see extras
observed scans per unit time
per-infective scan rate
number of infected hosts
Moore04
24Roadmap
- Worm spread strategies
- target discovery mechanisms
- SI epidemic model of worm spread
- Patch spread strategies
- Analysis of patching
- patching
- patching with filtering
- candidate strategies PUSH, P2P
- Summary and conclusions
25Patching strategies
- Patching Identify vulnerabilities and develop
patches - Filtering Detect and quarantine infected hosts /
subnets - Other ensure code has no vulnerabilities
(non-trivial in practice)
26Current approaches
- Vulnerability is found patch developed first
- Patch is released
- Worm is subsequently reverse engineered from
patch - Patch needs to be installed before worm is
released hours to days
27Example Zotob
- Aug 9-05 MS05-039 public disclosure
- Plug-and-play vulnerability affecting mostly
Win2k - Aug 12-05 Exploit code released
- Aug 14-05 Zotob worm discovered
- Followed by gt 10 variants
28Deficiencies
- Zero-day worms vulnerability not known or patch
not yet available - Requires automatic response, involving
- Detection of worm spread
- Automatic patch generation
- Automatic patch dissemination
- Human reaction times too slow
29Example Vigilante Costa05
- Detectors distributed through network
- Detect worms by analysis of stack in code
execution - Can be combined with honeypots etc
- Generate self-certifying alerts (SCAs) proving
vulnerability - Disseminate to hosts which verify SCA and create
filters (patches)
30Problems we address
- Architecture for alert dissemination
- Vigilante uses structured overlay interconnecting
all end hosts - We propose a hierarchical scheme
- Analysis of competing spread of worm and patch
- To establish if patching is feasible, and
- quantify system requirements
31Roadmap
- Worm spread strategies
- target discovery mechanisms
- SI epidemic model of worm spread
- Patch spread strategies
- Analysis of patching
- patching
- patching with filtering
- candidate strategies PUSH, P2P
- Summary and conclusions
32Patching
- Hierarchical dissemination
- Phase 1 among patching servers
- Phase 2 patching-servers to hosts
33Network partitioned into subnets
subnet j
??
1
1
2
J
j
34Patching server in each subnet
patching server
??
1
1
2
J
j
- patching servers termed superhosts
35Superhosts interconnected by an overlay
??
1
- alerts or patches disseminate over overlay
- with alerts, patch generated at superhosts
- non essential in modelling
36PULL
- hosts poll a superhost with unit rate
- superhost service rate m
- results in a patched host, if the polling host
was susceptible
s(t) fraction of susceptible hosts at time t
- patching rate at time t m s(t)
37Host population dynamics
- patch susceptible hosts only
- assumes worm prevents patching an infected host
- plausible assumption for automatic patching (no
human intervention)
Patching system
- if m 0, standard logistic
- in general
38Limit host population
- Result
- Implication
- Tight bound whenever infection rate b
sufficiently small(final fraction of infectives
small) - Exponential in the infection to patch rate ratio !
39Limit host population (example)
10000 vulnerable hosts
b 0.1
dots Monte Carlo
40Subnets (contd)
alerted subnets
1
2
J
j
- patching with rate m only in alerted subnets
- g(t) fraction of alerted subnets at time t
41Broadcast curve
- Natural candidate logistic function
- g(t) fraction of alerted subnets at time t
-
- T broadcast time
1
1
t
0
T
0
t
- Many-superhosts limit for random gossip T
O(log(J)) - Same order for standard overlays
42Broadcast curve (example)
- Example
- Pastry overlay of J superhosts
- Topology GaTech
- Broadcasting Flooding
- Exhibits logistic growth
- (Such overlays randomly constructed locally
tree)
43Minimum Broadcast Curve
- A curve m(t) such that at any time t, fraction of
alerted superhosts ? m(t) - Comparison Minimum broadcast curve yields an
upper bound on the fraction of infectives - Example flooding on Pastry overlay logistic
minimum broadcast curve
m(t)
44Host population dynamics
- fractions of infectives
- in alerted subnets
fractions of susceptibles in alerted subnets
45The migrations
- g(t)(i(t)-i1(t)) J g(t)(1-g(t)) i(t)-i1(t)
/ J(1-g(t)) - Assume at time t, J(1-g(t)) 5
- Pick a subnet at random
- M infectives in randomly picked subnet
- E(M) (I1I5) / 5
I1
I3
I2
I4
I5
46Host population dynamics
familiar one-subnet patching system, but with
patching rate mw(t)
- The last ODE Ricatti
- Use substitution w 1 1/z (w 1, a
particular solution)
47Per-susceptible patching rate
bottleneck is patching within subnets
are alerts over the overlay
48Per-susceptible patching rate
bottleneck patching within subnets
bottleneck alerts over overlay
49Overlays that satisfy a logistic minimum
broadcast curve
- Fast-overlay asymptotic small m, b and m / b
fixed - Intuitive T replaced with log(1/g(0))
overlay diameter
Heuristics take g(0)1/J, log(1/g(0)) log(J)
50Known broadcast time T
g(t)
- If T 0, then consistent with one-subnet
patching - Uses minimum broadcast curve m(t) 1t??T
- No patching until all subnets alerted
1
0
t
T
51Roadmap
- Worm spread strategies
- target discovery mechanisms
- SI epidemic model of worm spread
- Patch spread strategies
- Analysis of patching
- patching
- patching with filtering
- candidate strategies PUSH, P2P
- Summary and conclusions
52Patching with filtering
- Assume each subnet applies edge filtering
- An alerted subnet blocks scans in out
- gt a scan between two distinct subnets can
succeed only if both subnets are non alerted
BLOCK
BLOCK
successful scan, if hits a susceptible
53Host population dynamics (1)
- Population dynamics under NON alerted subnets
fractions of infectives in NON alerted subnets
fractions of susceptibles in NON alerted subnets
54Host population dynamics (2)
- Result
-
- u(t) g(t)/g(0)
- b b(i0(0)s0(0))/(1-g(0))
55Fraction of infected hosts in non alerted subnets
56Further example with random realizations
- 100 superhosts
- Pastry overlay
- Broadcast flooding
- Topology GaTech
- i(0) 1/1000
- hosts per subnet 1000
- b 0.1
57Patching vs. Patching Filtering with fast
patching within subnets
- Bottleneck is overlay
- Within subnets patching assumed instantaneous
diffs
patching
patching filtering
58Continued
- For patching filtering a closed-form
- Binomial integral with a series solution
- ?Simple bound
diameter of the overlay
59Example
- i(0) 10-5
- i1(0) r i(0)
- s0(0) 1 g(0) (1 - r) i(0)
- g(0) 10-3
- Note i1(0) p(0) g(0)
60Roadmap
- Worm spread strategies
- target discovery mechanisms
- SI epidemic model of worm spread
- Patch spread strategies
- Analysis of patching
- patching
- patching with filtering
- candidate strategies PUSH, P2P
- Summary and conclusions
61PUSH
- superhost maintains inventory list of hosts
- superhost service rate m
- serves hosts in order
s(t) fraction of susceptible hosts at time t
- patching rate at time t m / (1-mt) s(t)
62Host population dynamics
t lt 1/m
- By comparison patching rate per susceptible is
larger, so starting from same initial value, the
fraction of infectives is smaller than with PULL - Claim no estimate of ultimate infectives, see
numerics
63Patching rate
- Address space 1, 2, , ?
- Susceptibles S(1), S(2), , S(?)
- Probability to pick a susceptible at the k-th
pushk 1, S(1) / ?k 2, S(2) / (?-1)k
?, S(?) / 1
64PULL vs. PUSH
i(0) 10-5
- Yes, PUSH superior to PULL
- But less than an order of magnitude
65PULL vs. PUSH (same but wider range)
66Worm-like patch dissemination (1)
67Worm-like patch dissemination (2)
- Two epidemics
-
- Patch epidemics with larger spread rate m
68Roadmap
- Worm spread strategies
- target discovery mechanisms
- SI epidemic model of worm spread
- Patch spread strategies
- Analysis of patching
- patching
- patching with filtering
- candidate strategies PUSH, P2P
- Summary and conclusions
69Conclusions (1)
- Containment of random scanning worms
- Can be effective by patching only if patching
rate sufficiently larger than worm infection rate - Achievable at reasonable patching rate if scan
rates are constrained - No feasibility problems, but largely engineering
security issues - Smarter worms
- ?
70Conclusions (2)
- Looking ahead
- Worms evolve, so must network immune system
- Analysis in its infancy
- Need solid theoretical understanding of worm
strategies - Informs design of countermeasures
- lets us know their limitations
- Beyond dynamics description numerical solving
- Analytical estimates
71This slide deck related references
- http//research.microsoft.com/milanv
/immunology.htm
72References
- Moore04 Inside the Slammer Worm, D. Moore, V.
Paxson, S. Savage, C. Shannon, S. Staniford, N.
Weaver, IEEE Security Privacy, 2004. - Costa05 Vigilante End-to-End Containment of
Internet Worms, M. Costa, J. Crowcroft, M.
Castro, A. Rowstron, L. Zhou, L. Zhang, P.
Barham, SOSP05, 2005. - Kesidis04 Coupled Kermack-McKendrick Models
for Randomly Scanning and Bandwidth Saturating
Internet Worms, QoS-IP, Feb 2005. - V-G05a On the Effectiveness of Automatic
Patching, V.-G., WORM05, Nov 2005. - V-G05b On the Race of Worms, Alerts, and
Patches, V.-G., Microsoft Research TR TR-2005-13,
Feb 2005. - many other (see some on the website of the
previous slide)
73Extras
- Why did per-infective scan rate of Slammer
decrease ? - Comparison result for patching system
- Patching filtering
74Why did per-infective scan rate of Slammer
decrease ? (1)
- Bandwidth-saturation model Kesidis04
- N subnets, each with at most K vulnerable hosts
- Each subnet with outbound rate s
- A1 one infective saturates the outbound link
- A2 ignore worm scans from local subnet
- okay for many fixed-size subnets
fraction of subnets with k infectives
per-infective scan rate
75Why did per-infective scan rate of Slammer
decrease ? (2)
- Dynamics
-
-
- Non-linear system, but
76Why did per-infective scan rate of Slammer
decrease ? (3)
- Closed-form solution
- change time du(t) (1 n0(t))dt
- makes it a system of linear ODEs
77Why did per-infective scan rate of Slammer
decrease ? (4)
- Special n1(0) 1 n0(0)
- Number of infectives at time t
- Per-infective scan rate
- Similarly for heterogeneous outbound links see
Kesidis04
fit to Slammer data as in Kesidis04
78Comparison for patching system
- Result if f1(t) f2(t), for all t, then
starting from same initial point, the fraction of
infectives with f2 is at most that with f1 - See V-G05b for a precise statement
79Patching filtering
- Note
- Followswith
-
- (i0) generalized logistic ODE
- closed-form solution next slide
(i0)
80A general result for generalized logistic ODE
- Generalized logistic ODE
- Let . Assume A(t) lt ??.
Result - Proof Look at ODE for Y, primitive of y. Use
separation of variables.
81Ultimately infected hosts (1)
- Consider a subnet j alerted at a time Tj
- Before subnet j is alerted
82Ultimately infected hosts (2)
- After subnet j is alerted
- Familiar patching system