Fast Port Scan Detection Using Sequential Hypotheses Testing - PowerPoint PPT Presentation

About This Presentation

Title:

Fast Port Scan Detection Using Sequential Hypotheses Testing

Description:

2. Compute the likelihood ratio accumulated over a day ... 3. Raise a flag if this statistic exceeds some threshold. A sequential (on-line) solution ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 34

Provided by: csU73

Learn more at: http://www.cs.ucf.edu

Category:

more less

Transcript and Presenter's Notes

Title: Fast Port Scan Detection Using Sequential Hypotheses Testing

1
Fast Port Scan Detection Using Sequential
Hypotheses Testing

Authors Jaeyeon Jung, Vern Paxson, Arthur W.
Berger, and Hari Balakrishnan
IEEE Symposium on Security and Privacy 2004.

Presenter Tai Do CAP 6938 Jan. 18,2007
2
Introduction

Problem Random portscans of IP addresses is a
popular method for attackers to find vulnerable
machines in the reconnaissance phase.
Threshold Random Walk an online detection
algorithm.
Motivation Early detection allows some form of
protective response to mitigate or fully prevent
damage.
Three quantities of interest for a detection
problem
Detection accuracy
False alarm rate (false positive)
Misdetection rate (false negative)
Detection delay time

3
Challenges

No crisp definition of the activity
An attempted HTTP connection to the sites main
Web server is OK.
A sweep through the entire address space looking
for HTTP servers is NOT OK.
But how about connections to a few addresses,
some of which succeed and some of which fail???
The granularity of identity
Probes from adjacent remote addresses as part of
a single reconnaissance activity?
Probes from nearby addresses which together form
a clear coverage pattern.
The locality of the addresses to which the probes
are directed might be tight or scattered.
Temporal vs. spatial considerations
how much time do we track activity? Do we factor
in the rate at which connections are made.
Intent
Not all scans are necessarily hostile (search
engine crawlers, p2p applications).

4
Assumptions

Focus only on TCP scanners
Identity- Single remote IP addresses. No
distributed scans. No vertical scans of a single
host.
Does not assume a particular scanning rate from a
remote host.

5
Outline

Existing Works
Data Analysis
Online Detection Algorithm Threshold Random Walk
Performance Evaluation
Concluding Remarks

6
Exiting Works

Counting Models Network security Monitor, Snort,
and Bro.
Probabilistic Models LeckieK00, and SPICE.

7
Counting Models

Network security Monitor, Snort detect N events
within a time interval of T seconds.
Bro treats connections differently depending on
their services. Services in a configurable list
(only count failed attempts) vs. others. Raise
flags if the number of distinct destination
addresses reaches a configurable parameter.
Disadvantages threshold selection.

8
Probabilistic Models

LeckieK02
An access probability distribution for each local
IP address, computed across all remote source IP
addresses that access that destination.
Also consider the number of distinct local IP
addresses that a given remote source has accessed
so far.
Scanners are modeled as accessing each
destination address with equal probability.
Flaws
Many false positives
No confidence levels to assess whether the
difference is large enough.
How to assign an a priori probability to
destination addresses that have never been
accessed.

9
Probabilistic Models

SPICE StanifordHM00
Detect stealthy scans (very low rates, and spread
across multiple source addresses)
Assign anomaly scores to packets based on
conditional probabilities derived from the source
and destination addresses and ports.
Collect packets over long intervals (days or
weeks) and then cluster them using simulated
annealing to find correlations that are then
reported as anomalous events.
Disadvantages
Significantly more run-time processing
More complex.
Off-line method

10
Outline

Existing Works
Data Analysis
Online Detection Algorithm Threshold Random Walk
Performance Evaluation
Concluding Remarks

11
Initial Data Sets

HTTP worms Code Red or Nimda.
Other_bad send packets to 135/tcp, 139/tcp,
445/tcp, or 1433/tcp corresponding to Windows
RPC, NetBios, SMB, and SQL-Snaket attacks.

Two Research Labs LBL, and ICSI
Bro NIDS is used.
8 data sets (6 2).
24-hour period.

known_bad scanner HTTP worms
other_bad
12
A Better Ground Truth

Ground Truth the available data sets is a good
start, but not strong enough.
There may be undetected scanners among remainder
entries.
How to determine likely, but undetected scanners?
Ideal situation using a method that is wholly
separate from the subsequently developed
detection algorithm. The paper fails to find such
a method.
Use the same properties to 1) distinguish likely
scanners from non-scanners in the remainder
hosts, and 2) incorporate in the detection
algorithm.
Soundness of the method show that the likely
scanners do indeed have characteristics in common
with known malicious hosts.

13
Key Observation

inactive_pct the percentage of the local hosts
that a given remote host has accessed for which
the connection attempt failed (rejected or
unanswered).

14
Key Observation

inactive_pct the percentage of the local hosts
that a given remote host has accessed for which
the connection attempt failed (rejected or
unanswered).

15
Separating Possible Scanners

inactive_pct the percentage of the local hosts
that a given remote host has accessed for which
the connection attempt failed.
inactive_pct lt 80 benign remote host.
inactive_pct gt 80 possible scanner (suspect).

16
Final Data Sets

Additional Supporting Evidence Suspect hosts
exhibit distribution quite similar to those for
known-bad hosts.

17
Outline

Existing Works
Data Analysis
Online Detection Algorithm Threshold Random Walk
Performance Evaluation
Concluding Remarks

18
Hypothesis testing formulation

A remote host R attempts to connect a local host
at time i
let Yi 0 if the connection attempt is a
success,
1 if failed connection
As outcomes Y1, Y2, are observed we wish to
determine whether R is a scanner or not
Two competing hypotheses
H0 R is benign
H1 R is a scanner

The distribution of the Bernoulli random variable
Yi
19
An off-line approach

Collect sequence of data Y for one day
(wait for a day)
2. Compute the likelihood ratio accumulated over
a day
This is related to the proportion of inactive
local hosts that R tries to connect (resulting in
failed connections)
3. Raise a flag if this statistic exceeds some
threshold

20
A sequential (on-line) solution

Update accumulative likelihood ratio statistic in
an online fashion
2. Raise a flag if this exceeds some threshold

Acc. Likelihood ratio
Threshold ?1
Threshold ?2
hour
0
24
21
(No Transcript)
22
Likelihood Ratio

The second equality follows from the i.i.d
assumption of the random variables YiHj.

23
Threshold Selection
Performance Criteria
Detection Probability, PD the algorithm selects
H1 when H1 is in fact true.
False Positive Probability, PF the algorithm
selects H1 when H0 is in fact true.
Threshold Selection
or
similarly
Errors differences between actual bounds and
desired bounds
24
Detection Delay Time

The number of observations N until the test
terminates.

Log likelihood Ratio
Walds equation
What is EN?
25
Outline

Existing Works
Data Analysis
Online Detection Algorithm Threshold Random Walk
Performance Evaluation
Concluding Remarks

26
Evaluation Methodology

Used the data from the two labs
Knowledge of whether each connection is
established, rejected, or unanswered
Maintains 3 variables for each remote host
D_s, the set of distinct hosts previously
connected to
S_s, the decision state (pending, H_0, or H_1)
L_s, the likelihood ratio

27
Evaluation Methodology (cont.)

For each line in dataset
Skip if not pending
Determine if connection is successful
Check whether is already in connection set if
so, proceed to next line
Update D_s and L_s
If L_s goes beyond either threshold, update state
accordingly

28
Comparison with other existing intrusion
detection systems (Bro Snort)
0.963 0.040 4.08
1.000 0.008 4.06

Efficiency 1 - false positives / true
positives
Effectiveness false negatives/ all samples
N of samples used (i.e., detection delay time)

29
Comparison with other existing intrusion
detection systems (Bro Snort)(cont.)

TRW is far more effective than the other two
TRW is almost as efficient as Bro
TRW detects scanners in far less time

30
Outline

Existing Works
Data Analysis
Online Detection Algorithm Threshold Random Walk
Performance Evaluation
Concluding Remarks

31
Strengths of the paper

Good observation
inactive_pct provides a strong modality to
differentiate benign hosts from suspicious hosts.
Sequential analysis is well-suited
Provide mathematical bounds on the expected
performance of the algorithm (PD, PF, and N)
minimize the detection time given fixed false
alarm and misdetection rates
balance the tradeoff between these three
quantities (false alarm, misdetection rate,
detection time) effectively

32
Limitations and Possible Improvements

Nearly circular argument between ground truth,
and the developed detection algorithm. Both use
the same key observation.
Oscillation problem in the detection algorithm.
Leveraging Additional Information
Managing State
How to Respond
Evasion and Gaming
Distributed Scans

33
References

LeckieK02 C. Leckie and R. Kotagiri. A
probabilistic approach to detecting network
scans. In Proceedings of the Eighth IEEE Network
Operations and Management Symposium (NOMS 2002),
pages 359372, Florence, Italy, Apr. 2002.
StanifordHM00 S. Staniford, J. A. Hoagland, and
J. M. McAlerney. Practical automated detection of
stealthy portscans. In Proceedings of the 7th ACM
Conference on Computer and Communications
Security, Athens, Greece, 2000.
XuanLong Nguyen. Sequential analysisbalancing
the tradeoff between detection accuracy and
detection delay. Presentation, Radlab, UCB,
11/06/06.