Title: Correlation Analysis of Intrusion Alerts
1Correlation Analysis of Intrusion Alerts
- Peng Ning
- Cyber Defense Laboratory
- Department of Computer Science
- North Carolina State University
- Supported by NSF under grants ITR-0219315 and
CCR-0207297, and by ARO - under grant DAAD19-02-1-0219.
2Background
- Defense of computer and network systems
- Preventive approaches
- First layer of defense
- Authentication, access control,
- Intrusion detection and response
- Second layer of defense
- Anomaly detection
- Misuse detection
3Background (Contd)
- Traditional intrusion detection systems (IDS)
- Focus on low-level attacks or anomalies
- False negatives and false positives
- Mix actual alerts with false alerts
- Generate an unmanageable number of alerts
- ID practitioners Encountering 10,000 to 20,000
alerts per day per sensor is common - It is challenging to understand
- What are the intrusions
- What the intruders have done
4Methods for Alert Correlation
- Method 1 Alert clustering
- Exploit similarities between alert attributes
- Ex. Valdes and Skinner (2001), Staniford et al.
(2000) - Cannot fully discover the causal relationships
between alerts - Method 2 Matching known attack scenarios
- Exploit known attack scenarios
- Ex. Cuppens and Ortalo (2000), Dain and
Cunningham (2001), Debar and Wespi (2001) - Restricted to known attack scenarios or those
generalized from known scenarios - Method 3 Construct attack scenarios using
individual attacks - Use prerequisites and consequences of attacks
- Ex. Cuppens and Miege (2002), Ning et al (2002,
2003, 2004), Xu and Ning (2004, 2005, 2006) - Aggressive in alert correlation may generate
false correlation
5Outline
- Correlation analysis of IDS alerts using
prerequisites and consequences of attacks - Construction of attack scenarios from IDS alerts
- Learning attack strategies through correlated
alerts - Hypothesizing and reasoning about attacks missed
by IDSs - Privacy-preserving alert correlation
- Project team
- Dingbang Xu, Yun Cui, Jaideep Mahalati, Pai Peng,
Yiquan Hu, Alfredo Serrano
6Correlating Alerts Using Prerequisites and
Consequences of Attacks
7A Motivating Example
- Suppose we see the following IDS alerts
- A sequence of ICMP pings
- NmapScan against one live IP
- Find the live ports
- PmapDump against the same IP
- Find the live RPC services
- ToolTalkOverflow against the same IP
- Hack into one RPC service
- What can we infer?
- Can we model this and do it automatically?
8A Formal Framework for Alert Correlation
- Represent our knowledge about individual types of
attacks - Prerequisite necessary condition or system state
for an intrusion to be successful - Consequence possible outcome or system state of
an intrusion - Must be true if the intrusion succeeds.
- Correlate alerts (i.e., detected attacks) by
reasoning about the consequences of earlier
attacks and the prerequisites of later ones - Ex. If attack A learns the existence of a
vulnerable service, and attack B exploits the
same vulnerable service,then correlate A and B
9A Formal Framework (Contd)
- Use predicates to represent system state or
attackers knowledge. - A hyper-alert type T is a triple (fact,
prerequisite, consequence) - fact is a set of attribute names
- prerequisite is a logical combination of
predicates whose free variables are in fact - consequence is a set of predicates s.t. all free
variables in consequence are in fact - Example
- SadmindBufferOverflow(VictimIP, VictimPort,
ExistHost(VictimIP)VulnerableSadmind(VictimIP),
GainAccess(VictimIP))
10A Framework (Contd)
- Given a hyper-alert type T (fact, prerequisite,
consequence), a hyper-alert (instance) h of type
T is a finite set of tuples on fact, where each
tuple is associated with an interval-based
timestamp begin_time, end_time. - Allow aggregation of the same type of
hyper-alerts. - Example
- A hyper-alert h of type SadmindBufferOverflow
- (VictimIP152.1.19.5, VictimPort1235)
- Prerequisite ExistHost(152.1.19.5)VulnerableSadm
ind(152.1.19.5) must be True for them to succeed. - Consequence GainAccess (152.1.19.5) might be
True, depending on the success of the
corresponding attacks.
11Correlation of Alerts
- Can we require the prerequisite of an alert be
fully satisfied to correlate it with an earlier
set of alerts? - Attacker may not always launch earlier attacks to
fully prepare for later ones - Missing detections
- Computationally expensive to check
- Our solution
- Partial match Correlate two alerts if the
earlier attack may contribute to the later one.
12A Formal Framework (Contd)
- Given a hyper-alert type T (fact, prerequisite,
consequence), - The prerequisite set (or consequence set) of T is
the set of all predicates that appear in
prerequisite (or consequence) - Denoted as P(T) (or C(T))
- Given a hyper-alert instance h of T,
- The prerequisite set (or consequence set) of h is
the set of predicates in P(T) (or C(T)) whose
arguments are replaced with the corresponding
attribute values of each tuple in h. - Denoted P(h) (or C(h)).
- Each predicate in P(h) or C(h) inherits the
timestamp of the corresponding tuple.
13A Formal Framework (contd)
- Hyper-alert h1 prepares for hyper-alert h2 if
there exist p ? P(h2) and C ? C(h1) s.t. - For all c ? C, c.end_time lt p.begin_time, and
- The conjunction of the predicates in C implies p.
- Intuition h1 prepares for h2 if some attacks
represented by h1 make some attacks represented
by h2 easier to succeed.
14An Example
- A hyper-alert h1 of type SadmindPing
- C(h1) VulnerableSadmind(152.1.19.5),
VulnerableSadmind(152.1.19.9) - A hyper-alert h2 of type SadmindBufferOverflow
- P(h2) ExistHost(152.1.19.5),
VulnerableSadmind(152.1.19.5) - Assume all tuples in h1 have timestamps earlier
than every tuple in h2. - h1 prepares for h2 .
15Hyper-Alert Correlation Graph Discovered from the
Inside Traffic of LLDOS 1.0
16Learning Attack Strategies from Intrusion Alerts
17Modeling Attack Strategies
- What are essential to an attack strategy?
- Attack steps
- Dependency between attack steps
18Modeling Attack Strategies (Contd)
- Equality constraint
- Represent dependency between attack steps
- An equality constraint for two hyper-alert types
(or attack types) T1 and T2 is a conjunction of
equalities - u1v1? ?unvn,
- where u1, , un are attributes of T1, and v1, ,
vn are attributes of T2 , - such that (informally) if a type T1 hyper-alert
h1 and a type T2 hyper-alert h2 satisfy this
condition and h1 occurs before h2, then h1
prepares for h2. - h1 prepares for h2 if and only if they satisfy at
least one equality constraint and h1 occurs
before h2
19Modeling Attack Strategies (Contd)
- Attack Strategy Graph (N, E, T, C) over a set S
of hyper-alert types - (N, E) is a connected DAG
- For each n ? N, T(n) is a hyper-alert type in S
- Each node corresponds to an attack type.
- For each (n1, n2) ? E, C(n1, n2) is a set of
equality constraints for T(n1) and T(n2) - Every pair of causally related attack steps
satisfies an equality constraint in C(n1, n2). - For any n1, n2 ? N, T(n1) T(n2) implies there
exists n3 in N s.t. T(n3) ? T(n1) and n3 is in a
path between n1 and n2. - If the same attack appears more than once, they
must correspond to different steps.
20An Example Attack Strategy Graph
21Learning Attack Strategy Graphs from Correlated
Intrusion Alerts
- Identify attack steps
- Recognize alerts belonging to the same attack
step - Extract equality constraints
22Identify Attack Steps
- Iterative partitioning
- Output irreducible hyper-alert correlation graph
- Same type of hyper-alerts are aggregated unless
they are separated by other types of hyper-alerts
23Extract Equality Constraints
- Extract equality constraints from matched
predicates in the prerequisites and consequences
of alerts
SadmindBufferOverfow
SadmindPing
C(h1) VulnerableSadmind(152.1.19.5)
P(h2) ExistHost(152.1.19.5),
VulnerableSadmind(152.1.19.5)
24An Example
25Dealing with Variations of Attacks
- Generalization
- Hide unnecessary differences between different
types of attacks - Hide differences between different IDS sensors
26Attack Strategy Extracted from LLDOS 1.0
IDS sensor RealSecure data set LLODS 1.0
inside traffic.
27Hypothesizing and Reasoning about Attacks Missed
by IDSs
28Background
- A common problem of most existing alert
correlation methods - Cannot handle missed attacks.
- Abductive correlation Cuppens and Miege 2002
- Hypothesis of missed attacks are guided by known
attack scenarios specified in LABMDA
29Our Approach
- A framework to automatically hypothesize and
reason about missed attacks based on knowledge
about individual attacks - Hypotheses of missed attacks
- Inference of attack attributes
- Validation of hypothesized attacks
- Consolidation of hypothesized attacks
30Correlation with Missed Attacks
- An example hyper-alert correlation graph
- Would be split into multiple graphs if critical
attacks are missed by the IDSs
31Naïve Approach
- Integrate complementary correlation methods
- Clustering correlation methods
- Based on the similarity between alert attribute
values - May still cluster related alerts even if critical
attacks are missed - Unable to discover the causal relationships
between alerts - Causal correlation methods
- Based on prerequisites and consequences of
attacks - May discover the causal relationships between
alerts - Dont work if critical attack steps are missed
32Naïve Approach (Contd)
- Put multiple attack scenarios together if the
clustering correlation method says they are
similar - But
- How about the possible causal relationships
between these alerts?
33Naïve Approach (Contd)
- Given attack types T and T, T may prepare for T
if - Informally, a type T attack may contribute to a
type T attack - T may indirectly prepare for T if
- Informally, a type T attack may indirectly
contribute to a type T attack through other
intermediate attacks
34Naïve Approach (Contd)
- May-indirectly-prepare-for relations can help
hypothesize missed attacks - More complete attack scenarios
35Type Graph Guided Approach
- May-prepare-for and may-indirectly-prepare-for
relations give us more opportunities - We may use them to hypothesize about what have
been missed by the IDSs
36Type Graph Guided Approach (Contd)
A type graph over a set of known attacks
- Note A type graph is computed automatically over
a given set of attack types
37A Type Graph Guided Approach (Contd)
Hypotheses of Missed Attacks
38Reasoning about the Hypotheses
- How do we know these are good hypotheses?
- Equality constraint
- Represent dependency between adjacent attack
steps - An equality constraint for two hyper-alert types
(or attack types) T1 and T2 is a conjunction of
equalities u1v1? ?unvn, - where u1, , un are attributes of T1, and v1, ,
vn are attributes of T2 , - such that if a type T1 hyper-alert h1 and a type
T2 hyper-alert h2 satisfy this condition, then h1
prepares for h2 - h1 prepares for h2 if and only if they satisfy at
least one equality constraint
T.DestIPT.DestIP T.DestPortT.DestPort
39Reasoning about The Hypotheses (Contd)
- Indirect equality constraint
- Use indirect equality constraints to verify the
hypothesized indirect causal relationships
40Infer Attribute Values of Hypothesized Attacks
- SCAN_NMAP_TCP2
- DestIP 10.10.10.2 DestPort 21
- Rsh3
- DestIP 10.10.10.2
- FTP_Glob_Expansion6
- DestIP 10.10.10.2 DestPort 21
- Timestamp in SCAN_NMAP_TCP2.end_time,
Rsh3.begin_time
41Validating and Pruning via Raw Audit Data
SCAN_NMAP_TCP2
FTP_Glob_Expansion6
DestIPDestIP
DestPortDestPort
DestIPDestIP
Rsh3
Again. Have we hypothesized the right attacks?
- Filtering conditions for hypothesized attacks
- Prior knowledge
- protocol ftp (FTP_Glob_Expansion)
- Inferred attribute values
- protocol ftp DestIP 10.10.10.2
- Possible range of Timestamp
- protocol ftp DestIP 10.10.10.2 TS in
1100AM, 1110AM
42An Example
- There doesnt exist ftp traffic between
SCAN_NMAP_TCP2 and Rsh3.
43Consolidate Hypothesized Attacks
- One missed attack may be hypothesized multiple
times through different related alerts - There may have been multiple instances of the
missed attack, but - Introduce complexity into analysis
44Consolidate Hypothesized Attacks (Contd)
- Consolidate two hypothesized attacks, if they
possibly refer to the same attack - They have the same type
- Their inferred attribute values do not conflict
- The ranges of their timestamps overlap
45(No Transcript)
46(No Transcript)
47Privacy-Preserving Alert Correlation
48Privacy Concerns in Large Scale Systems
- Correlation analysis of security data from
different data owners
49Privacy-Preserving Intrusion Analysis
- Sanitization of IDS alerts and correlation
analysis of sanitized alerts - Tradeoff between privacy protection and utility
- Entropy-guided sanitization
- Generalization (Xu and Ning 2005)
- Perturbation (Xu and Ning 2006)
- Modified correlation analysis
- Clustering analysis
- Attack scenario analysis
50Additional Information
- Project website
- http//discovery.csc.ncsu.edu/Projects/AlertCorrel
ation/ - Related publications can be downloaded there
- Prototype system
- TIAA Toolkit for Intrusion Alert Analysis
- http//discovery.csc.ncsu.edu/software/correlator/
51Thanks You!