Title: Situational Awareness Analysis Tool for Aiding Discovery of Security Events and Patterns
1Situational Awareness Analysis Tool for Aiding
Discovery of Security Events and Patterns
- PI Vipin KumarCo-PIs Jaideep Srivastava,
Zhi-Li Zhang, Yongdae Kim, University of
Minnesota
2Presentation Outline
- Executive Summary
- Key Accomplishments Since June 2004
- A Novel Approach to Level II Analysis
- Analysis Framework
- Analysis Methodology
- Evaluation of the Approach
- Case Study I Experience with the SKAION Data
- Case Study II Experience at the University of
Minnesota - Applicability of Approach to IC Scenarios
- IC Scenario
- Assumptions and Limitations
- Relationship to other ARDA funded projects
- Project Future Plans
- Tasks, timeline, and deliverables
3Executive Summary
- Objective Help IC network defenders identify
analyze distributed, stealthy, multi-step, novel
attacks - Innovative Claim
- a novel Level-II analysis framework/process and
associated techniques for identifying
distributed, stealthy, multi-step attacks - provide attack context and sequencing of events
to aid IC defenders for timely attack recognition
situation assessment - transform large amount of sensor data into a
small set of labeled event sequences analyzable
by human security analysts - significantly reduce false alarms, and uncover
correlated attacks - Novel Ideas
- shallow analysis of voluminous network-wide
sensor data to identify anchor points for
in-depth follow-on analysis in a focused context - spatial/temporal chaining analysis and event
sequencing for attack context extraction and
characterization - both employ behavior-based host profiling flow
anomaly analysis
4Situational Awareness Analysis Framework
Level I
Level II
Signature
-
based IDS
Attack Context
Anchor point
Extraction
identification
Anomaly Detector
Attack
Situation
Characterization
Assessment
Scan Detector
Behavior Profiling
Host/Service Profiling
Flow Anomaly Analysis
Attack Profiling
5Key Accomplishments
- Developed a novel level I and level II analysis
framework and algorithms - Behavior anomaly detection for identifying hard
to detect malicious activity in the IC networks - Profiling of network traffic along multiple
dimensions to characterize normal/abnormal
behavior, enabling improved level I and level II
analysis - Intelligent fusion of multiple sensor data for
high-confidence attack recognition (e.g.
signature based IDS, scan detection, anomaly
detection) - Spatio-temporal chaining analysis in the
communication graph to extract larger context of
a suspicious activity - Event sequencing and labeling for attack
characterization - Demonstrated success in detecting multi-step
attack scenarios in Skaion's dataset, especially
generated for ARDAs P2INGS program - Skaion attack scenarios are detected with a low
false alarm rate - Demonstrated success on real world network data
at the University of Minnesota and at the ARL -
Center for Intrusion Monitoring and Protection,
where data is analyzed from multiple DoD sites
6Research Objective Key Assumptions
- Objective Help IC network defenders identify
analyze distributed, stealthy, multi-step, novel
attacks - Key assumptions
- Attacks on IC networks
- Unlike common Internet attacks such as worms and
(distributed) denial-of-service attacks, which
generate large volume of data, and take place in
a short time - Likely to occur in multi-stages spread out in
time, involving several outside hosts (and
perhaps compromised inside hosts), and generating
low-intensity traffic - Want to break into protected hosts for access to
sensitive data - Attack events exhibit anomalous behaviors
deviating from normal host/service profiles - Attackers/victims connected by suspicious
communication activities
7Level II Analysis Methodology
- Anchor Point Identification
- Identifying starting point for attack analysis
via data fusion correlation of output from
level 1 analysis - Context Extraction
- Identifying relevant events and entities (hosts,
flows, ), starting from an anchor point - Attack Characterization
- Refinement of context to characterize attacks
(presently manual) - Situation Assessment
- Evaluation of attack characterizations (out of
scope)
Anchor Point Identification
Context Extraction
Attack Characterization
Situation Assessment
Motivated by challenges faced while working on
several cases with Angelo Bencivenga and Tim
Dunn at the ARL CIMP
8Level-II Analysis Process Diagram
Configuration/Selection of Analysis Strategies
Search size, depth, time frame
Labeling/Scoring Rules
Control
List of anchor points
Event activity graph
Labeled Attack Sequences
Anchor Point Identification
Context Extraction
Attack Characterization
IDS Sensor Data
Situational Assessment
Behavior Anomaly Analysis
Profile based chaining analysis
Temporal sequencing analysis
Domain specific guided search
Algorithms/Techniques
Correlation/fusion of multiple sensor data
Knowledge based event labeling
Watchlist/Blacklist
Attack pattern matching
9Behavior Profiling Anomaly Analysis
Historical Behavior Profiling
Current-Time Anomaly Analysis
Scoring
- Network-Wide
- Flow Anomaly Analysis
- MINDS flow anomaly ranking
- signature-based alerts
- TCP flag analysis,
- Service Profiling
- service type web, dns,
- protocol TCP, UDP, ICMP,
- connection patterns
- flow statistics
flow anomaly scores
- Host Profiling
- host types servers, clients, etc.
- port/service profiles
- traffic statistics
- communication patterns
- Host-Specific
- Flow Anomaly Analysis
- deviation from normal host
- behaviors
host anomaly scores
clustering
outlier analysis
association rules
signature-based rule matching
Techniques
statistical profiling
statistical deviation analysis
link analysis
10Anchor Point Identification Techniques
- Watch list maintained by analyst
- Hosts that engage in suspicious activity as
identified by one or more of the following - Standard IDS signature (snort alarms)
- Behavior Anomalies
- Hosts that send/receive traffic that is anomalous
w.r.t. historical profile - Behavior Signatures
- A host that communicates with a known compromised
machine - Hosts that perform scans
- Port knocking
- Services (e.g., ftp, ssh) running on non-standard
ports - Any other identifiable behavior of a known
compromised machine
Anomalous Flow
Communication to a host on watchlist
Watchlist Host Anchor Point
SNORT alert on host with anomalous behavior
11Attack Context Extraction
- Starting from an anchor point recursively examine
activity to other hosts that - deviates from norm
- hosts profile
- service/port profile
- is similar to known suspicious traffic
- attack signatures
- replies to scans
- activities from compromised hosts
Anchor Point
Remote login attempt
Outbound FTP
Reply to a scan
Web server
ruleset terminal_services ignore srcport
lt 1024 ignore packets lt 4 ignore
dstport ! 3389 ignore protocol ! tcp
profile client_services server dstport
3389 protocol tcp profile servers
client dstport 3389 protocol tcp
12Attack Characterization
A context graph
- Determine likely relationships (e.g. sequencing)
between retained events and hosts - Evaluate and rank hosts and activities in the
attack context to - Retain those with high degree of suspicion and
prune those with low degree of suspicion
E1
E3
E2
E4
I4
I1
I2
I3
- Sample Rules
- If a host is scanning - label it as attacker
with low score - If a host is scanned and it replies label it
as victim and give it a medium score - If a internal host is scanning - label it hacked
with a high score - If a hacked internal host makes a subsequent file
transfer to outside increase the score of the
hacked label and label the target host as
attacker with a high score
time
- Attack Characterization Event
Sequencing Labeling - E1 -? I1 Scan with replies
- E2 -gt I4 Initializing connection on
non- standard port - Successful - I4 -gt E4 Initializing ftp connection
with external host
13Accomplishments since Nov. 17, 2004 site visit
- Refined Level II Analysis Process
- Investigated and improved anchor point analysis
- Spatio-temporal chaining analysis for context
extraction - Event sequencing and labeling for attack
characterization - Evaluation using Skaion Dataset II and real
network data from the University of Minnesota
14Skaion Dataset
- Two sets of synthetic data, generated by the
Skaion Corporation to simulate IC network traffic
and attacks - Traffic generated to statistically match data
captured at AFRL - Traffic contains background (normal) traffic, as
well as various scans and failed attacks - Background traffic is combined with multi-step/
multi-stage attacks to produce each scenario - Data Set I 4 scenarios
- A. Naïve Attacker B. Five-by-five
- C. Ten-by-ten D. Simple-ten
- Data Set II 3 categories
- Single-Stage Attacks 8 scenarios
- Bankshot Multi-stage Attacks 5 scenarios
- Misdirection Multi-stage Attacks 3 scenarios
- Each scenario includes tcpdump data of all
network traffic as well as Snort alerts, HTTP
access logs, FTP transfer logs, and Windows logs
15Scenario II.C.1 S29 Misdirection Multistage
Attack
18.2.175.153
40.159.214.124
Anomaly Rank 47 Failed connection on port 22
53.82.21.112
EXTERNAL
- Anchor Point Identification
- SNORT alerts involving anomalous IPs
- Statistics
- Trunk
- Total Packets 103,791
- Total flows 10,859
- Snort Alerts 451
- Bprd
- Total Packets 73,595
- Total flows 6,987
- Colo
- Total Packets 98,858
- Total flows 6,002
Anomaly Rank 47 Failed connection on port 22
REMOTE OSIS USERS
100.10.20.4
BPRD
web-server
Scanner
16Scenario II.C.1 S29 Misdirection Multistage
Attack
18.2.175.153
Attempts remote login
116.45.223.116
40.159.214.124
74.205.114.175
Anomaly Rank 47
40.219.61.25
53.82.21.112
Scans and gets a reply
EXTERNAL
Web-server initiating connection on port
8080 Anomaly Ranks1,2, 4, 11
- Context Extraction
- Activity that deviates from hosts normal profile
- Scans that get replies
Web-server initiating FTP Connections
100.1.21.134
Remote login on the web-server This follows
undetected iis50_nsiislog attack
REMOTE OSIS USERS
Anomaly Rank 47
100.10.20.10
100.10.20.4
web-server
BPRD
web-server
17Scenario II.C.1 S29 Misdirection Multistage
Attack
time
E4
- Attack Characterization Event
Sequencing Labeling - E4, E5 E6 -gt I2 Bad HTTP Traffic
- E2 -? i2 Scanning with a reply
- E3 E6 ? I2 Remote login - failed
- D1 -gt I1 Remote login
successful - I1 -gt E1 Anomalous FTP
- I1 -gt E2 Anomalous
transfer on port 8080
E5
E3
E2
E6
E1
X
X
Dial-up host D1 hacks into web server I1 via
remote login, and initiates anomalous file
transfers from I1 to two outside hosts, E1 E2,
where E2 earlier performed scanning
D1
I1
REMOTE OSIS USERS
web-server
I2
BPRD
web-server
Scanner
18Scenario II.B.1 S1 Bankshot Multistage Attack
Anomaly Rank 194 Ftp connection to the
web-server
51.91.57.157
112.50.254.117
EXTERNAL
- Context Extraction
- Activity that deviates from hosts normal profile
- Scans that get replies
Anchor Point Identification SNORT alerts
involving anomalous IPs
- Statistics
- Trunk
- Total Packets 986,494
- Total flows 44,994
- 10,896 Snort Alerts
- Bprd
- Total Packets 305,598
- Total flows 19,111
- Colo
- Total Packets 960,676
- Total flows 27,045
Successful remote login from the external host
100.20.200.15 /100.20.1.3
Failed access to web-server on port 111
web-server
Anomaly Rank 194 Ftp connection to the
web-server
100.10.20.4
Initializing connection with mail server on port
5617 Anomaly Rank 1,2,5
web-server
SHIELD ENCLAVE
100.10.20.3
mail server
BPRD
Scanner
19Scenario II.B.1 S1 Bankshot Multistage Attack
time
- Attack Characterization Event
Sequencing Labeling - E2 -? I2 Scanning with a reply
- E2 -gt I2 Failed ftp attempt to
web- server - E1 ? S1 Scanning with a
reply - E1 ? S1 Remote login -
successful - S1 -gt I1 I2 Scanning with
replies - S1 -gt I2 Failed connection to
web-server on non-standard port - S1 -gt I1 Successful connection to
mail server on port 5617
E2
E1
EXTERNAL
X
SHIELD ENCLAVE
S1
X
web-server
I2
External host E1 scans and hacks internal host S1
which scans the BPRD network and hacks mail
server I1
web-server
I1
mail server
BPRD
Scanner
20Scenario I.C Ten by Ten
EXTERNAL
- Statistics
- 292,272 total packets
- 16,663 total flows
- 98.7 TCP
- 54 Snort Alerts
BPRD
INTERNAL
Scanner
21Scenario I.C Ten by Ten
EXTERNAL
192.168.222.2
199.227.249.246
Anchor Point Identification SNORT alerts
involving anomalous IPs
2
B/O
1
B/O
100.10.20.10
100.10.20.6
Anomaly rank 42
web-server
Anomaly rank 65 Anomalous file transfer
100.10.20.5
Anomaly rank 64
BPRD
INTERNAL
100.10.20.4
web-server
Anomaly rank 12 Non-standard port access
22Scenario I.C Ten by Ten 1st set of anchor
points
EXTERNAL
220.237.152.116
40.219.61.25
199.227.249.246
- Context Extraction
- Activity that deviates from hosts normal profile
- Scans that get replies
Unsuccessful non-standard port access
Anomalous file transfer
100.10.20.10
Anomaly rank 65 Anomalous file transfer initiated
by web-server
INTERNAL
100.10.20.4
BPRD
Anomaly rank 12 Non-standard port access
23Scenario I.C Ten by Ten 1st set of anchor
points
EXTERNAL
time
- Attack Characterization Event
Sequencing Labeling - E1 -? i2 Initializing connection on
non-
standard port Failed - E2 -gt I2 Initializing connection on
non- standard port - Failed - E2 ? I1 Initializing
connection on non- standard port
Successful - I1 -gt E3 Initializing ftp
connection with external host
E1
E3
E2
X
X
External host E2 hacks internal host I1 which
subsequently does file transfer with external
host E3. E2 also attempts an unsuccessful attack
on I2.
I1
web-server
I2
BPRD
web-server
24Scenario I.C Ten by Ten 2nd set of anchor
points
EXTERNAL
206.131.61.250
95.116.204.23
208.241.45.204
221.23.248.251
192.168.222.2
210.20.5.160
161.122.144.247
- Context Extraction
- Activity that deviates from hosts normal profile
- Scans that get replies
Failed attempts by external hosts to connect to
internal machines on non-standard or closed ports
100.10.20.6
Anomaly rank 42
100.0.1.2
100.10.20.5
100.20.10.2
Anomaly rank 64
BPRD
INTERNAL
25Scenario I.C Ten by Ten 2nd set of anchor
points
EXTERNAL
E4
E5
E3
E6
E1
E2
time
E7
X
- Attack Characterization Event
Sequencing Labeling - E1 E7 ? i1 I4 Initializing connection
on non- standard/closed port Failed
X
X
X
X
X
X
X
X
X
I4
I1
I2
BPRD
I3
INTERNAL
Scanner
26Case Study II Experience with Minnesota Data
- Approach Starting with a good set of anchor
points of known bad computers, analyze their
communication patterns and the communication
patterns of those they talk to, to identify other
compromised computers - Anchor Points A blacklist of 370 Master (CC)
machines, constructed by security analysts around
the world, was used as the starting point
27University of Minnesota
U of MN Network
Internal IP was found to be talking to 2 of the
newly found masters and 9 new external IPS on
port 6667
One internal computer talking to 3 blacklisted IP
(17 flows)
Internal IP was found to be talking to 35
external IPS on port 6667
List of 370 Blacklisted computers
kissing-sadam.allxtremenet.net
deleted.important.us-govt.info
not.really.a.whiteangel.info
whats.up.buttface.net
dont.i.know.y-ou.com
irc.acidillusion.net
28More Intelligent Approach
- The 1st attempt was good, growing the black list
by 12, but can we do better? - Removed the requirement of only looking for
communication on port 6667 TCP - Added simple historic profiling to remove good
IPs from being blacklisted - Identified 54 new command and control machines
with no false alarms
29A little manual digging into the 54 new Command
and Control machines
- Upon further inspection 30 of the 54 CC machines
had 2000-5000 machines throughout the world
connected to them at the time of investigation - Some of the more interesting computer names found
- 66.90.85.148 phear.my.penix.info
- 66.90.124.134 dont.i.know.y-ou.com
- 66.90.124.141 irc.acidillusion.net
- 67.111.204.243 whats.up.buttface.net
- 69.64.51.192 192.electricstorm.co.uk
- 208.51.90.83 not.really.a.whiteangel.info
- 208.179.57.115 deleted.important.us-govt.info
- 208.179.62.246 kissing-sadam.allxtremenet.net
30Summary Lessons from Case Studies
- When a compromise does occur, quick understanding
of the scope of the problem is crucial for IC
network defenders - Our analysis methodology is effective at quickly
identifying what computers are compromised on
synthetic, university and military networks - shown good promise on the Skaion data
- helped security analysts identify compromised
machines in public networks (UMN) - proved effective on real military networks (ARL
CIMP) - Behavior anomaly detection is an effective way to
detect novel sophisticated attacks
31 Applicability of Approach to IC Scenarios
- Threat Model multi-step, stealthy attacks
generating suspicious/anomalous activities - Rationale our analysis methodology is likely to
perform better on IC networks than in a general
Internet environment - traffic is relatively cleaner and more regulated
- number of (outside) hosts an IC computer talks to
is likely to be far fewer than a typical host in
a university setting - easier to build reliable behavior profiles and
communication patterns
32Limitations/Vulnerabilities Mitigations
- Limitations/Vulnerabilities
- Must be able to find an anchor point
- either from anomalies, signatures, scan
detection, host based IDS, etc. - Some steps or aspects of malicious activities
must deviate from normal behavior - Mitigations
- include more diverse sensor data
- develop more intelligent rules for anchor point
identification - develop more sophisticated behavior profile
techniques - develop more efficient context extraction and
attack characterization that can explore a larger
search space
33Relationship to Other ARDA Projects (Based on
June 2004 PI meeting)
Veridean, CMU, Lockheed Martin
Secure Decision
Dartmouth
MINDS output can be input to CAPS
UTAH
Hidden Markov Model could help Attack
characterization.
Game Theory could help anchor point
identification.
MINDS Level I and II analysis can be more
effective with visualization.
MINDS output can be input to ECCARS correlator -
MINDS level II Analysis can simplify attack graph
extraction
Attack profiling can be used to guide MINDS
level II analysis
MINDS
Alions Buffalo
Nong Ye, Arizona SU
And/Or analysis might help anchor point
identification
Correlation analysis might help anchor point
identification
MINDS level II Analysis can simplify attack
scenario extraction
MINDS anomalies (alarms) can be correlated with
other alarms
SRIs correlation analysis can be used for anchor
point identification
MINDS level II Analysis can simplify attack
scenario extraction
Bayesian analysis can be used for MINDS level II
level Analysis
D-Force IET
GDAIS Dartmouth
Valdes SRI
34Future Plans
- Long-Term Goal Integrated Situational Awareness
Framework Tools to aid IC defenders in
effective decision making - Where we are in November 2004
- - Developed a novel SA analysis framework and its
key components and algorithms - Where do we expect to be by March 2005
- Tasks deliverables
- Where do we want to go beyond March 2005
- Future capabilities
35Near-Term Action Plan (March 2005)
- Tasks
- Implementation refinement of Level II SA
analysis methodology and algorithms - Implementation refinement of network behavior
profiling - Deliverables
- prototype system incorporating key components of
Level I and II analysis with anchor point and
context extraction steps - documentation of design and implementation
- documentation of testing, evaluation and case
studies
36Plan Beyond March 2005
- Improvement extensions of Level II SA analysis
methodology and algorithms - anchor point identification using more diverse
sensor data - context refinement using link analysis and
association rules - attack characterization using advanced models
from other projects - semi-automated situation assessment techniques
- Continual and real-time profiling and profile
databases - multi-dimensional, information-theoretical
structural models for normal/suspicious network
(host, flow, service, etc.) behaviors - attack and attacker profiling (worm/scanning
activities, moles/drones/masters, etc.) - query-able profile databases
- Integration of various pieces of proposed SA
analysis framework - in particular, interoperability with other P2INGS
systems - Multi-tiered, cooperative, global situational
awareness analysis framework
37Future Plans
- Erics suggestions - Tasks for the next 12
months. - Create a query language an analyst can use in the
course of doing 2nd level analisys to look for
high level patterns. Some example patterns are
like the ones for finding terminal services.
Although the algorithms used in this may not be
very novel or ground breaking, such a tool does
not exist at this time, and would make an anlysts
many many times more effective. Right now, all
pattern matching is effectively done in a
person.s head, and chugging through the data
takes hours to look for a simple pattern on only
a few hours worth of data. Having a fast and
flexible query language will allow an analyst to
look for many patterns quickly and efficiently. - Developing better techniques for profiling of
behaviors will be an ongoing task. Having a
better profile will allow for fewer false alarms,
and less data for a human to look at. This will
also allow a .lower. threshold or .looser.
rules/patterns to be employed since there is a
greater confidence in the profiles and thus in
the initial anchor points. - Developing a scoring mechanism for how important
an anchor point is will allow an analyst to look
at more interesting subgraphs first. - Also, always use top .5 anomalies intersected
with snort alerts for anchor points or first 5
intersects, whichever is more. This will ensure
the analyst always has some anchor points to
explore. - A weighting mechanism should also have a
feedback of sorts. If an alert is presented to an
analysts and it is found useful, the tools used
to create this should be given a higher weight.
If snort combined with minds finds interesting
things 25 of the time, while snort combined with
jids only finds things 5 of the time. That
combination should be given a lower weight. - Payload information should be considered. One way
would be to use a histogram of the payload to
profile if it is http, ftp, email, binary data,
or encrypted. This can be used both to determine
if plain text is going over an ssh port, and if
the traffic is different. If suddenly the traffic
over ssh looks binary and not encrypted this is
interesting (hypothesis binary data although
using all 256 bytes might not be uniform
encrypted data should be uniform) - Profiles should be made between frequent talkers
to determine if their communication varies. Right
now if two computers talk a lot their
conversations are assumed to be safe. Although
generally fine, if someone compromises one or
both of the computers they could hide their
communication on the same service. If 2 mail
servers are compromised that typically talk,
either the IPS or the IPS on port 25 are assumed
to be safe. However, one could have replaced
sendmail on one of them to also allow it to
provide a login. This type of traffic would look
different than typical mail traffic.
38Relationship to Other ARDA Funded Projects(Based
on June 2004 PI meeting)
Correlation
Visualization
GDAIS Corr, Assess, History
SRI IDS/ FW Corr, entity centric
Secure Decision Visual Display, human factor
IET (D-Force) IDS Corr, BN,
Veridean Prediction
Utah Visual Display, human factor
Skaion Traffic Generation
MINDS
Endeavor Automatic response, addr/port map
Northrop Network Data Mining
ASU (Nong Ye) Cyber signal analysis
Arbor networks Macro/micro sensors,
track/forensics active honeypot