Title: Process Detection
1Process Detection
- George Cybenko
- Dartmouth
- gvc_at_dartmouth.edu
2Acknowledgements
Current Members George Bakos Alex
Barsamian Marion Bates Vincent Berk Chad
Behre Wayne Chung Valentino Crespi (Prof. Cal
State LA) George Cybenko Ian deSouza Annarita
Giani Doug Madory Glenn Nofsinger Robert
Savell Jan-Peter Schutt Yong Sheng William
Stearns
Alumni Naomi Fox (UMass, Ph.D. student) Hrithik
Govardhan (Rocket) Robert Gray (BAE
Systems) Diego Hernando (UIUC, Ph.D.
student) Guofei Jiang (NEC Research) Alex Jordan
(BAE Systems) Han Li (China Shipping Corp) Josh
Peteet (Greylock Partners) Chris Roblee (LLNL)
graduate students
Research Support DHS, ARDA, AFOSR, NGA, DARPA
Cybenko
3Overview of Lectures
- Process modeling
- Process detection, theory
- Software and applications
4Why be interested in this....
- Sensor networks
- Airborne plume detection
- Cyber security
- Autonomic server pool management
- Dynamics of social networks
- Genomics and biological pathways
- Human situation awareness
- Possible applications.
Cybenko
5Overview
- Lecture 1 Process models
- Notion of "state"
- Differential equations
- State Machines and Automata
- Probabilistic and quantum states
- Constructing state representations
- Some
6Newton's Big Idea(s)
- Calculus
- Laws of Physics
- Concept of "state"
Isaac Newton
7Contrast with Aristotle
Nature consists of objects and rules Examples
Crisis - could not explain the natural
world
Ancient law (religious and civil)
Astronomical observations
Superstition
8A Closer Look at Fma
9A Closer Look at Fma
10A Closer Look at Fma
Previous state
Next state
Input
Dynamics
11A Closer Look at Fma
Concept of state the future evolution of the
system depends only on the current state
and future inputs. IE, the past's influence on
the future is totally summarized by
the state. The next state is determined by the
current state and the current input (or control,
etc).
sm
ua
si
sn
ub
12Outputs/Observables
Black Box States may not be observable by
an external agent
Inputs, u
Outputs, y
Forces
x (Position, Momentum)
Position only
13Automaton
Alan Turing
14Graphical Depiction of Automata
1
1
1
0
Start State
v
u
u
d
c
b
a
v
u
v
u,v
Q States a , b , c , d , X u , v , Y
0 , 1 d and b shown in graph
15Caution/Nuisance
- Some models of automata have observables
generated by state occupancy - Other models have observables generated by state
transitions - There are simple mechanisms for transforming one
to the other....they are equivalent.
16Automata and Languages
- The set of all possible finite length outputs of
the previous example are a "language" - The language can be represented by a regular
expression - (010110111) - "Classical relationship" between regular
languages and nondeterministic finite automata -
ie, given one, construct the other (Kleene's
Theorem) - How about constructing an automaton from the
input-output relationship?
17Nerode Equivalence
- Theorem Every causal, time-invariant system has
a state space description. - "Constructive" proof
- use the input-output description of a system
- two finite length input strings belong to the
same equivalence class if all the corresponding
outputs (beyond the inputs' lengths) are the same - ie, if inputs w1w2 and w3w2 have outputs z1z2 and
z3z2 for all w2 then w1 is equiv to w3 - the resulting equivalence classes are the states
18Partial Differential Equations
19Quantum Mechanical Systems
20Other process formalisms
- A Petri Net (PN) is given a state by marking its
places. - Marking of a PN consists of assigning a
nonnegative integer to each place. - Graphically, tokens are inserted in places of a
PN - Input place - arrow goes from the place to the
transition - Output place - arrow goes from the transition to
the place
Concurrency Examples R. Apcar, E. Chiu, H.
Jerejian
21Definitions
- A transition may have one or more Input and
Output places - A transition is enabled if there is at least one
token in each of its input places. - An Enabled transition may fire
- one token is removed from each input place and
one token is inserted in each ouput place of the
transition
Concurrency Examples R. Apcar, E. Chiu, H.
Jerejian
22An example
Concurrency Examples R. Apcar, E. Chiu, H.
Jerejian
23Example continued
Concurrency Examples R. Apcar, E. Chiu, H.
Jerejian
24A Process has...
- Hidden states (discrete or continuous)
- State transitions (nondeterministic,
probabilistic) - Observables/events
- Relationship between observables and states
- An algorithm to score observations/events to
state sequences assignments - Examples
- Nondeterministic automata
- Hidden Markov Models
- Petri Nets
- Linear Systems
- Nonlinear Systems
- etc
25Models for Organizational Processes (W. Chung,
J.-P. Schutt, R. Savell, G. Cybenko)
Observables of the Process
A
A
B
B
A asks B to join a project
B accepts
A adds B to a list of recipientsA?B, C,
Dynamics of the Process
ENRON, Ebay, etc
Static Analysis
Dynamic Analysis
26Example of a Multistage Process Model in Computer
Security
Potential malicious activity
snort alerts
Potential normal activity
Samba
ftp, covert channel, etc
Tripwire
Cybenko
27Real time Fish Tracking
- Objective
- Track several fish in the fish tank
- Why
- Very strong example of the power of PQS
- Fish swim very quickly and erratically
- Lots of missed observations
- Lots of noise
- Classical Kalman filters dont work (non-linear
movement and acceleration) - Easier than getting permission to track people
(we mistakenly thought)
Cybenko
28Fish Tracking Details
- 5 Gallon tank with 2 red Platys named Bubble and
Squeak - Camera generates a stream of centroids
- For each frame a series of (X,Y) pairs is
generated. - Model describes the kinematics of a fish
- The model evaluates if new (X,Y) pairs could
belong to the same fish, based on measured
position, momentum, and predicted next position.
This way, multiple tracks are formed. One for
each object. - Model was built in under 3 days!!!
Cybenko
29Kinematic Tracking (2)
- Model the motion of a feature moving at "human"
speed - The model evaluates if new (X,Y) pairs could
belong to the same hot spot, based on measured
position, momentum, and predicted next position.
This way, multiple tracks are formed. One for
each object. - Sensors Infrared video camera provides
datastream - Camera generates a stream of centroids
- For each frame a series of (X,Y) pairs is
generated.
30An Example of a Process
a
b
A Process Model
1
2
Two states - 1 , 2 Two observables a , b
Legal transitions between states are depicted
by arrows. When occupying a state, the process
emits an observable. All states are
initial/start states and there are no terminal
states. Some legal sequences of observables
abbab , bababbb, abbb Some illegal sequences of
observables aa , baab Further reading
Automata Theory, Regular Languages, etc
31A More Complex Process
a , c
b
a , c
Another Process Model
1
2
3
Three states - 1 , 2 , 3 Three observables
a , b , c Some legal sequences of
observables abab , babaccab, ab Some illegal
sequences of observables bb , baabb Problem
Given a sequence of possible observations is it
legal? What states? Solution 1 Read the
first observable, mark states that emit that
observable 2 Read an observable, z 3 New
marked states (states reachable from old marked
states) intersected with (states that could
have emitted z ) 4 If no new marked states,
illegal sequence else go to 2
32Extensions Hidden Markov Model (HMM)
p(a1) 0.8 , p(c1) 0.2 p(b2) 1
p(a3) 0.8, p(c3) 0.2
1
0.8
0.5
Add probabilities
3
1
2
0.2
0.5
Hidden Markov Models consist of two
ingredients - the dynamics state transition
probabilities in a Markov chains - the
emissions p(observationstate) Given a
sequence of observations of length t, what are
the possible states at time t? Unlike the case
for a nondeterministic automaton, all we can say
in general for an HMM is what the probability
distribution on states is.
33Extensions Hidden Markov Model (HMM)
p(a1) 0.8 , p(c1) 0.2 p(b2) 1
p(a3) 0.8, p(c3) 0.2
0.8
1
0.5
1
2
3
0.5
0.2
Probability distribution at time t1 is obtained
by combining - propagation of the distribution
from time t using only the dynamics - factoring
in the observation observed at time t1
34Two Simple Processes
a
b
Model Instance A
A1
A2
a
b
Model Instance B
B1
B2
aabb is a legal observation sequence A1 B1 A2 A2
, A1 B1 A2 B2 , B1 A1 B2 B2 , ... are all
legal state sequences A1 A2 A2 , A1 A2
, A1 B1
B1 B2 B1 B2 B2
We can reduce this to a single process....
a track
a hypothesis
35Multiple Process Representation
A1 B1
a
b
A1 B1
0 1 1 1
Model Instance A
A1
A2
M
a
b
Model Instance A
A1
A2
0 0 0 0
0 1 1 1
M x M
0 1 1 1
0 1 1 1
a
b
Model Instance B
B1
B2
If the observation sequence is aaaaaa and
multiple copies of the model are allowed, then we
get a product model of size 2n.
36A Simple Example of Process Detection
- a,b,c,d are events that can be observed
- states A, B, C, D, E, F are hidden
- observe a sequence of events
- Sequence Hypotheses
- ab NW RF
- abab (NW NW)(RFNW)...
- ababc (NW RF)(NW NW)
- ababcc NW NW
- Which process or combination of
- processes explains the observed events?
a,b,c,d are events that can be observed
a
b
b , c
c , d
A
B
C
D
NETWORK WORM MODEL (NW) (a,b,c,d ICMP traffic
levels)
E,F 0 repeat read event e if ea then E if
E and eb then F until F
a
b
E
F
ROUTER FAILURE MODEL (RF)
Two models states have different semantics
sets of observables intersect what is the
diagnosis?
Cybenko
37Key Questions
- How is a process model built?
- from first principles
- from expert insights
- from data (lots)
- Given an event sequence, is it feasible or what
is its probability? - Given an event sequence, estimate the current
state - Given an event sequence, estimate the state
sequence - How good are those estimates (ie variance)
38Homework Problems
- What are the states, dynamics and observables of
the following processes - intercontinental ballistic missile
- soccer, American football, baseball games
- Avian bird flu epidemic
- terrorist cell
- blogosphere
- US/global economy
- poker
- romance
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46Overview
- Lecture 2 Detecting processes
- What does detection of processes mean?
- Automata
- Hidden Markov Models
- Kalman filtering
- Particle filters
47Process Detection Problems
- Given a sequence of observations...
- What is the current state of the process?
- What is the probability distribution on the
states? - What are the most likely state sequences?
- What is the uncertainty/error of the estimates?
48Graphical Depiction of Automata
1
1
1
0
Start State
v
u
u
d
c
b
a
v
u
v
u,v
Q States a , b , c , d , X u , v , Y
0 , 1 d and b shown in graph
49Input-Output Description
1
1
1
0
Start State
v
u
u
d
c
b
a
v
u
v
u,v
uuuu 01010 uuvu 01001 vuuuu 001010 vvuuuu 0001
010 uvvuuuu 01101010 .....
f v vv uu uvv ... u vu vuuu
.... uv vuv vuuuv ... uvu vuvu vvuvu
...
a b c d
50Estimating states in an automaton
a
b
a , c
1
2
3
a
b
a , c
Observe a
1
2
3
a
b
a , c
Observe ab
Sequences 12, 32
1
2
3
a
b
a , c
Observe ac
1
2
3
Sequences 33
a
b
a , c
Observe acb
1
2
3
Sequences 332
51Commentary
- Trivial algorithm....
- Interesting question What is the worst case
growth of states sequences? Tomorrow. - No probabilities, only possibilities.
- What if we add probabilities?
52Simplest Hidden Markov Model
b1(u) 0.9, b1(v) 0.1
a11 0.7
1
p(1)0.5, p(2)0.5 are initial probabilities
a21 0.1
a12 0.3
2
a22 0.9
b2(u) 0.1, b2(v) 0.9
53Applications of HMM's
- Speech recognition
- Gene sequencing
- Motion modeling and detection
- Pattern recognition (OCR)
- Darpa Grand Challenge (autonomic systems)
- etc
- etc
- etc
54Estimating States
b1(u) 0.9, b1(v) 0.1
a11 0.7
1
p(1)0.5, p(2)0.5 are initial probabilities
a21 0.1
a12 0.3
2
a22 0.9
b2(u) 0.1, b2(v) 0.9
55Estimating Another State
b1(u) 0.9, b1(v) 0.1
a11 0.7
1
p(1)0.5, p(2)0.5 are initial probabilities
a21 0.1
a12 0.3
2
a22 0.9
b2(u) 0.1, b2(v) 0.9
56Sequences of Observations
Time 1 2 3
4 5
States 1 2
Observations O1 u O2 v O3 u
O4 v O5 v
Problems Given a sequence of observations
O1O2O3 ... 1. What is the most likely state at
time t ? 2. What is the most likely state
sequence over all time ? 3. What is the
probability of the observation sequence?
57Best state vs best sequence
b1(u) 0.9, b1(v) 0.1
a11 0.7
1
p(1)0.5, p(2)0.5 are initial probabilities
a21 0
a12 0.3
2
a22 1
b2(u) 0, b2(v) 1
Observe v - most likely state is 2 Observe u
next - must be in state 1 but no transition from
2 to 1 is possible The sequence vu could only
have been produced by starting and staying in
state 1
58Probability of the Observations
Time 1 2 3
4 5
States 1 2
Observations O1 u O2 v O3 u
O4 v O5 v
59Optimal Sequences
Time 1 2 3
4 5
States 1 2
Observations O1 u O2 v O3 u
O4 v O5 v
60Viterbi's Algorithm
- These computations were discovered by A. Viterbi,
a founder of Qualcomm. - The algorithms are used in all modern cell phones
and telecom devices in general.
Noisy Channel
Decode
Source sequence
Receive
11221212222212
uvvuvuvvuvuvvv
11221212122212
61Other issues for HMM
- Learning an HMM -ie. what are the various
probabilities? - Baum/Welch Algorithm
- variational algorithms
- Finite, discrete state spaces
62How about continuous state spaces?
- Major challenge
- in the finite, discrete case (HMM), we can
represent and store the whole probability
distribution as an n-vector - what continuous state probability distributions
have simple representations? - Gaussians - mean and variance specify them
- what if the distribution is more general than a
Gaussian?
63Madory's Goats
- Goat herder
- Herd state is the number of infant females, adult
females, infant males and adult females - Dynamics are generation to generation how many
infant females and males are born, how many
infants of each gender become adults and how many
adults survive - Observables are goat milk revenues and goat baby
inoculation costs - these are noisy - Problem estimate total number of goats and
number of adult females - (Example and code due to Doug Madory)
64(No Transcript)
65Quantification of the State
66Quantification of the Dynamics
67Quantification of Observations
68(No Transcript)
69Basic Concept in Kalman Filtering
- Use the fact that the sum of variables with
Gaussian distributions is also Gaussian - Gaussian is characterized by mean and variance
- Use dynamics to predict the next state
- Use measurement (observation) to correct that
prediction - Update the error covariance (ie confidence in the
estimate)
70(No Transcript)
71(No Transcript)
72Kalman Equations and Geometry
73Extensions
- To nonlinear systems (linearize locally)
- Learn the system dynamics
- Use the estimates to control the state (feedback)
- To non-Gaussian noise problems
- particle filter methods
74Particle Filters
- Represent a probability distribution using a
discrete distribution of particles - Sample the particles, propagate using dynamics
and correct using obervations - This creates a new distribution for the next time
step
75Deep Connections to Information Theory
- This is all part of a much larger problem
description - cybernetics ala N. Wiener -
Noisy Channel
Decode
Environment
Receiver
Estimate of Environment
Learning
Models of Environment
Actions
76Summary of Lecture 2
Process class Distribution Algorithm Automaton
None Simple marking HMM Discrete,
finite Viterbi Linear, continuous Gaussian Kalm
an Continous, nonlinear Arbitrary Particle
filters
What are the observables? What are the states?
What are the dynamics?
77Overview of Lecture 3
- Detecting multiple processes
- Instead of one process, we now have some unknown
number of them - Multiple hypothesis tracking (MHT) framework
- The basic algorithms
- Complexity theory
- Process Query Systems
- Applications
78Multiple Hidden Process Models
Cybenko
79Why be interested in this....
- Sensor networks
- Airborne plume detection
- Cyber security
- Autonomic server pool management
- Dynamics of social networks
- Genomics and biological pathways
- Human situation awareness
- Possible applications.
Cybenko
80Basic Concepts of Process Query Systems (PQS)
An Operational Network
Indictors and Warnings
6
129.170.46.3 is at high risk 129.170.46.33 is a
stepping stone ......
that are used to defend the network
that detect complex attacks and anticipate the
next steps
5
consists of
1
Sample Console
Hypotheses
Multiple Processes
Track 1
Track 1
l1 router failure
Track 2
Track 2
Track 3
l2 worm
Track 3
l3 scan
Hypothesis 1
Hypothesis 2
2
that produce
that are seen as
4
that PQS resolves into
Unlabelled Sensor Reports
Events
.
.
Track Scores
Time
Time
3
PQS
Real World
81 Discrete Source Separation Problem(viz Blind
Source Separation, Cocktail Party Problem)
Process/Model Example
3 states transition probabilities n observable
events a,b,c,d,e, Pr( state observable event
) given/known
Observed event sequence .abcbbbaaaababbabcccbddd
bebdbabcbabe.
A Hypothesis
Catalog of Processes/Models
A Track
Which combination of which process models best
accounts for the observations? This is what we
want to compute. Events not associated with a
known process are anomalies.
Cybenko
82 Multiple Hypothesis Approach to the "Discrete
Source Separation Problem"
Obs1 Obs2 . . .
Observables at time t1
83 Multiple Hypothesis Approach to the "Discrete
Source Separation Problem"
Obs1
Obs2
Hypothesis 1a
Obs2
Obs1
Hypothesis 1b
Candidates at time t1
84Terminology
- Tracks are associations of observations to
individual processes. - Hypotheses are consistent tracks that explain all
the observables. - Hypothesis extension is the conjectural
assignment of new observations to existing
hypotheses. - Track initiation is the instantiation of a new
process in a hypothesis' extension. - Handling missed detections means that an
intermediate observation may have been dropped.
Cybenko
85A Simple Example of Process Detection
- a,b,c,d are events that can be observed
- states A, B, C, D, E, F are hidden
- observe a sequence of events
- Sequence Hypotheses
- ab NW RF
- abab (NW NW)(RFNW)...
- ababc (NW RF)(NW NW)
- ababcc NW NW
- Which process or combination of
- processes explains the observed events?
a,b,c,d are events that can be observed
a
b
b , c
c , d
A
B
C
D
NETWORK WORM MODEL (NW) (a,b,c,d ICMP traffic
levels)
E,F 0 repeat read event e if ea then E if
E and eb then F until F
a
b
E
F
ROUTER FAILURE MODEL (RF)
Two models states have different semantics
sets of observables intersect what is the
diagnosis?
Cybenko
86Add Rules for Missed Detections and Disambiguation
A,B,C,D 0 repeat read event e if ea then
A if A and eb then B if A and ec then
C,D if A and ed then D if B and (eb or
ec) then C if C then (E0, F0) if C and
(ec or ed) then D if D then (E0, F0) until
D
a
b
b , c
c , d
A
B
C
D
WORM MODEL (a,b,c,d ICMP traffic levels)
Blue statements handle missed detections Red
statements handle consistency
This clearly does not scale and does not lead to
manageable sets/systems of rules.
Cybenko
87Approaches to Detecting Processes
- Aristotelian - Traditional information retrieval
is based on specification of a query in terms of
Boolean expressions based on record fields. IE.
SQL ( name smith age gt 20 age lt 40 )
rule-based logics decision trees, etc - Newtonian - Next generation process detection
requires retrieval based on specification of a
set of discrete, dynamic processes. IE,
descriptions of a Hidden Markov Model, Hidden
Petri Net, weak models, FSMs, attack trees, etc.
- Main Concept Move from an Aristotelian to a
Newtonian Paradigm.
Cybenko
88Process Query Systems (PQS)
- Process Query Systems solve the Discrete Source
Separation Problem in a generic way - inputs
- a sequence of unlabelled observations (stream,
logfiles, etc) - a collection of process models
- outputs
- estimates of which processes produced those
observations - estimates of which states those processes are in
- Basic theory and technology has been developed by
the PQS team at Dartmouth - Now being applied to a variety of applications
Cybenko
89Algorithms/Operations of PQS
Evaluate Solutions and Process Outputs
5
3
Update Tracks Within Hypotheses (Viterbi / Kalman
/ NDFA,etc) and Create New Hypotheses
Recursive in Time
Cybenko
90The COBOL and pre-PQS Analogy
application logic statement 1 application
logic statement 2 file management statement
1 record management statement 1 file management
statement 2 record management statement
2 application logic statement 3 record
management statement 3 file management statement
3 application logic statement 4
User responsibility
System responsibility
application logic statement 1 application
logic statement 2 SQL statement 1 application
logic statement 3 SQL statement 2 application
logic statement 4
file management operation 1 record management
operation 1 file management operation 2 record
management operation 2 record management
operation 3 file management operation 3
Application logic
Database management system
Interwoven logic
Post-SQL Programs
Pre-SQL Programs
model logic statement 1 model logic statement
2 sensor access statement 1 state estimate
statement 1 sensor access statement 2 state
estimate statement 2 model logic statement
3 sensor access statement 3 state estimate
statement 3 model logic statement 4
User responsibility
System responsibility
model description statement 1 model
description statement 2 model description
statement 3 model description statement 4
sensor access statement 1 state estimate
statement 1 sensor access statement 2 state
estimate statement 2 sensor access statement
3 state estimate statement 3
Model description
Process query system
Interwoven logic
Current Process Detection Programs
PQS-based Programs
91Network Security(V. Berk, I. De Souza, A.
Bersamian, A. Giani, M. Bates, D. Madory, G.
Bakos, et al)
- Objective
- Detect, disambiguate, and predict the course of
concerted network attacks in an enterprise class
network. - Why
- Problem domain demands the power of PQS
- Hundreds of processes occurring at once
- Lots of missed observations and noise
- All commercial technology focuses on collection
and presentation of data - Existing correlation efforts very weak at best
Cybenko
92SENSORS INTEGRATED
SENSOR DESCRIPTION SCOPE
Global
CovChan
Timing Covert Channel Detection
Network
IPtables
Linux Netfilter firewall, log based
Weblog
IIS, Apache, SSL error logs,
Host
US-agent
Userspace host monitoring agent
Cybenko
93Example of a Multistage Process Model
Potential malicious activity
snort alerts
Potential normal activity
Samba
ftp, covert channel, etc
Tripwire
Cybenko
94PQS-Net supply chain
- Tier 1 Models
- Focus on individual host status
- Report on status changes
- Tier 2 Models
- Focus on correlating host activity
- Report chains of events
Tier 1 Output Mon Feb 21 200617 2005 000000
131.58.63.160 (hostile) recon on 100.10.20.4
SNORT 469 proto 1 Mon Feb 21 203024 2005
000000 138.158.170.45 (hostile) attacked
100.10.20.4 ERRORLOG 400 proto 6 dport 443
Tier 2 Output
Hypothesis 1 Score 0.8 Hypothesis 2 Score 0.2
A scans B A scans B
B scans E
B attacks E
Tier 1 Tracker
Tier 2 Tracker
Attack sequences and scores
Attack steps
sensor data
sensors
Analysts front-end
Cybenko
95Example Scenario
Internet
A
C
B
E
D
Tier1 Alerts Indicators
A scans B Snort 02/21-200617.904500 14691 ICMP PING NMAP Classification Attempted Information Leak Priority 2 ICMP 131.58.63.160 -gt 100.10.20.4
C attacks B (success) SSL error log (host 100.10.20.4) Mon Feb 21 203024 2005 error mod_ssl SSL handshake failed (server www.osis.gov443, client 138.185.170.45) (OpenSSL library error follows) Mon Feb 21 203024 2005 error OpenSSL error1406908Flib(20)func(105)reason(143)
Cybenko
96Example Contd
B
E
D
Tier1 Alerts Indicators
B scans D 02/21-203117.528602 118072 WEB-MISC Chunked-Encoding transfer attempt Classification Web Application Attack Priority 1 TCP 100.10.20.434074 -gt 100.10.20.16980
B attacks D (fails) 100.20.1.169 - - 21/Feb/2005083122 -0500 "GET /default.idq?AAAAAAAAAAA..AAAAAAA HTTP/1.1" 404 1287 "-" "-"
B scans E 02/21-203201.622465 118072 WEB-MISC Chunked-Encoding transfer attempt Classification Web Application Attack Priority 1 TCP 100.10.20.434076 -gt 100.10.20.17080
B attacks E (succeeds) 100.20.1.170 - - 21/Feb/2005083206 -0500 "GET /default.idq?AAAAAAAAAAA..AAAAAAA HTTP/1.1" 200 1287 "-" "-"
Cybenko
97Results
Dataset 3s8 3s26 3s28 3s29
Alerts 22930 18391 12522 39270
Lines in trunk_alert 4830 5959 1159 8168
Lines in snort files generated from tcpdump 11751 7284 7006 19866
Lines in weblogs (apache, IIS) 6349 5148 4357 11236
Number of tracks produced 100 75 51 107
Attack Tracks not in ground truth 1 0 0 0
Attackers identified 3 of 3 4 of 4 0 of 2 3 of 5
Decoys found 5 of 5 2 of 2 2 of 2 6 of 6
Victims identified 2 of 2 2 of 2 1 of 2 10 of 11
Stepping stones identified 1 of 1 1 of 1 1 of 2 2 of 3
98Autonomic Server Monitoring(C. Roblee, V.
Berk)Funded by DHS
Cybenko
99Autonomic Server Monitoring
- Objective
- Detect and predict deteriorating service
situations - Why
- Another strong example of the power of PQS
- Software and hardware are buggy and vulnerable
- Hot market, large profits for The ONE
application - Very ambiguous observations
- Sys-admins also want vacation
Cybenko
100The Environment
- Hundreds of servers and services
- Various non-intrusive sensors check for
- CPU load
- Memory footprint
- Process table (forking behavior)
- Disk I/O
- Network I/O
- Service query response times
- Suspicious network activities (i.e.. Snort)
- Models describe the kinematics of failures and
attacks - The model evaluates load balancing problems,
memory leaks, suspicious forking behavior (like
/bin/sh), service hiccups correlated with network
attacks
Cybenko
101Server Compromise Model Generic Attack Scenario
t0 t1 t2 t3
t4
Observations
Response
Cybenko
102Experimental Results
No Tracking
Tracking
Successful Requests
System Memory Consumed
210,000 requests serviced
380,000 requests serviced
Cybenko
103Chemical Plume Process DetectionFunded by DHS
104The Forward Problem
Concentration in a 2D region as a function of
time
Ficks Law (diffusion)
Advection (wind)
Concentration equation composed of diffusion and
advection
- Forward model result
- arbitrary initial sources
- pseudo-random wind
- includes diffusion and wind
105Current technology on DC Mall. Future sensors
will be smaller and greater in number, with a
need for measurement correlation.
106Multiple Source Case With Terrain Connectivity
determined by wind and geography
Source 1
Source 2
Connectivity
Wind
107Multiple Source Case With Terrain Connectivity
determined by wind and geography
Source 1
Source 2
Connectivity
Wind
108Inverse Source LikelihoodEstimating the
probability that a sensor observation is
generated by a source at a given location. Based
on wind direction history and diffusion
properties of agent.
wind
sensors
S
S
sources
109Correlation Between Observations at Different
Locations
Picking any two sensors we evaluate a probability
that the observation at that sensor is connected
to observations at different sensors in the
region. This is a function of wind history,
distance, and diffusion properties.
wind
110Source Estimation Compared to True Source Location
Estimated Source based on inverse correlation of
plume observations and tracks
Forward Simulation
111Social Network Analysis Comparison of Static vs
Dynamic(W. Chung, R. Savell, J.-P. Schuett)
Temporal sequence of transactions
Analyze projected, non-temporal data
Analysis of Static Artifacts
Projection removes temporal relationships
Time
Temporal sequence of transactions
Extraction of Dynamic Processes
Analysis of temporal aspects of transactions
Time
112Process Primitives
Decay kernel correlates potentially related
emails - eg. links Functional roles based on
conversation segments shown below
A. Initiator B. Broker C. Bridge D.
Triad E. Terminator
113Combining Primitives into Processes
P(t'-t) gt f P(t''-t') gt f P(t'''-t'') lt f
X
Probabilities of temporal relationships are used
to grow tracks
114Methodology Details
1. Crude Naïve Bayes Text Classification w/
Temporal Correlations to isolate coarse thread.
2. Local structure via Process Primitives on the
Dynamic Social Network.
115Theory
- PQS offer a principled approach that enables
- understanding how distinguishable models (attack
and failure) are - developing a notion of processes that are
trackable, given models and sensing
infrastructure (ie a sampling theory)
116 Hypothesis Growth
A hypothesis is a consistent assignment of
events to processes and/or states(ie, each event
assigned to only one process instance). Given a
set of hypotheses for an event stream of length
k-1, update the hypotheses to length k to explain
the new event. NP-Complete in general. Need to
prune the pool of hypotheses, keeping the most
suitable.
time
Individual path is a track ie one process
instance Consistent tracks form a hypothesis
117Models and Hypothesis Growth
Weak model FSM with emission vectors
Emission for state i 0/1 vector of sensor
reports eg obs(i) ( 0 , 1 , 1 , 0 , 0 , 1 , 1 )
Observation vector at time t collected by
sensors eg sensors(t) ( 0 , 1 , 1 , 1 , 1 ,
1 , 0 )
Possible states at time t are determined by P
i Hamming_distance( obs(i) , sensors(t)) lt
HD R i j possible at time t - 1 and i
is reachable from j P R is the set of
possible states at time t Number of hypotheses
at time t recursively computed as above.
U
Theorem For a fixed value of HD, the worst-case
number of hypotheses at time t is either
polynomial or exponential in t. (Crespi,
Cybenko, Jiang 2005)
118Longer tracking time
More noise (worse model)
119Longer tracking time
More noise (worse model)
120Basic Idea Behind the Proof
N states
time t
time t1
time t2
time k
Process dynamics (ie what is reachable from each
state in a time step) observations noise
threshold determines a trellis. If there are
two distinct paths from one node to itself over
some period of time, the number of distinct paths
grows exponentially by repeating the construct.
121Basic Idea Behind the Proof
N states
time t
time t1
time t2
time k
If there are never two distinct paths from any
node to itself over any period of observation,
there is a simple injective mapping (ie. unique
labeling) of the paths into 0, 1, ... , k x 0,
1, ... , k x 0, 1, ... , k ... x 0, 1, ... ,
k 2N times. So the number of paths is lt
(k1)2N. The label for each path is the time it
first occupies a state and the time it last
occupies that state.
122Relationship to Joint Spectral Radius
123New Ideas for Large-Scale Hypothesis Management
- Data structures for maintaining one copy of many
hypotheses that are variants of one another - Viewing the set of hypotheses as the solution
(instead of the highest ranked hypothesis eg) - propagating the set can be done in linear space,
constant time - some properties of the set of hypotheses can be
computed in constant time, others in linear time,
others seem to require exponentially much time
and/or space, etc. - Development of a nonparametric approach to
tracking and Situational Awareness, not unlike
nonparametric statistical techniques (order
statistics, etc) - Reduce dependencies on probabilistic parameters
and model building
124Distinguishability of models(Yong Sheng)
- Given two models, how distinguishable are they?
- Example Model of router failure vs worm attack?
- Do we need to build more refined models or do we
need to add additional sensors/data sources?
125Different degrees of distinguishability
betweenmodels given sensing capabilities (eg
DDOS vs router failure)
Red Prob of deciding model 2 given model 1 Blue
Prob of deciding model 1 given model 2 Entropy
of the two ergodic models are different. Decision
rule is based on ML as determined by the
Viterbi algorithm Shannon-MacMillan-Brieman
Ergodic Theorem states that most observation
sequences are typical and have probability
related to the entropy
126Different degrees of distinguishability
betweenmodels given sensing capabilities (eg
DDOS vs router failure)
However, nonmonotonic behaviors are possible (in
general) and without convergence to zero (if the
entropies are the same)
127Different degrees of distinguishability
betweenmodels given sensing capabilities (eg
DDOS vs router failure)
However, nonmonotonic behaviors are possible (in
general) and without convergence to zero (if the
entropies are the same)
128Where do models come from?
- In practice, we build models of processes by
- First principles ie, symmetry, physical laws,
etc. - Expert models/rules/experience ie, chess
playing computers, military tactics, etc - Empirical analysis (from real or simulated data)
ie. backgammon, stock market models, etc. - Process Query Markup Language developed and
almost implemented allows rapid insertion of
new attack models into PQS
129PQS INPUTS PROCESS MODEL SEMANTICS AND SENSOR
DATA REQUIREMENTS
Failed
Failed
A
A
0.03
0.05
alert icmp EXTERNAL_NET any -gt HOME_NET any
(msg"ICMP Destination Unreachable (Host
Unreachable)" itype 3 icode 1 sid399
classtypemisc-activity rev4)
B
B
0.2
Marginal
Learn
Represent
Marginal
C
C
Normal
Normal
0.9
Rules signatures, etc
Reachability (weak) Models
Probabilistic Models (HMM, Bayes Nets, Fuzzy
models, etc)
Compile
Compile
Compile
if (src_ip_new.equals(src_ip_track)) if
(IPv4_in_CIDR_ints (208,253,154,0, 24,
src_ip_new) true)
// local? new_likelihood new
Likelihood ((0.90f
likelihood.getProbability())/2.0f)
else // Else dont care
new_likelihood new Likelihood (0.0)
Code
Execute
130More details....
- gvc_at_dartmouth.edu
- See www.pqsnet.net