Process Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Process Detection

Description:

Process Detection George Cybenko Dartmouth gvc_at_dartmouth.edu Acknowledgements Overview of Lectures Process modeling Process detection, theory Software and ... – PowerPoint PPT presentation

Number of Views:230
Avg rating:3.0/5.0
Slides: 131
Provided by: gvc
Category:

less

Transcript and Presenter's Notes

Title: Process Detection


1
Process Detection
  • George Cybenko
  • Dartmouth
  • gvc_at_dartmouth.edu

2
Acknowledgements
Current Members George Bakos Alex
Barsamian Marion Bates Vincent Berk Chad
Behre Wayne Chung Valentino Crespi (Prof. Cal
State LA) George Cybenko Ian deSouza Annarita
Giani Doug Madory Glenn Nofsinger Robert
Savell Jan-Peter Schutt Yong Sheng William
Stearns
Alumni Naomi Fox (UMass, Ph.D. student) Hrithik
Govardhan (Rocket) Robert Gray (BAE
Systems) Diego Hernando (UIUC, Ph.D.
student) Guofei Jiang (NEC Research) Alex Jordan
(BAE Systems) Han Li (China Shipping Corp) Josh
Peteet (Greylock Partners) Chris Roblee (LLNL)
graduate students
Research Support DHS, ARDA, AFOSR, NGA, DARPA
Cybenko
3
Overview of Lectures
  1. Process modeling
  2. Process detection, theory
  3. Software and applications

4
Why be interested in this....
  • Sensor networks
  • Airborne plume detection
  • Cyber security
  • Autonomic server pool management
  • Dynamics of social networks
  • Genomics and biological pathways
  • Human situation awareness
  • Possible applications.

Cybenko
5
Overview
  • Lecture 1 Process models
  • Notion of "state"
  • Differential equations
  • State Machines and Automata
  • Probabilistic and quantum states
  • Constructing state representations
  • Some

6
Newton's Big Idea(s)
  • Calculus
  • Laws of Physics
  • Concept of "state"

Isaac Newton
7
Contrast with Aristotle
Nature consists of objects and rules Examples
Crisis - could not explain the natural
world
Ancient law (religious and civil)
Astronomical observations
Superstition
8
A Closer Look at Fma
9
A Closer Look at Fma
10
A Closer Look at Fma
Previous state
Next state
Input
Dynamics
11
A Closer Look at Fma
Concept of state the future evolution of the
system depends only on the current state
and future inputs. IE, the past's influence on
the future is totally summarized by
the state. The next state is determined by the
current state and the current input (or control,
etc).
sm
ua
si
sn
ub
12
Outputs/Observables
Black Box States may not be observable by
an external agent
Inputs, u
Outputs, y
Forces
x (Position, Momentum)
Position only
13
Automaton
Alan Turing
14
Graphical Depiction of Automata
1
1
1
0
Start State
v
u
u
d
c
b
a
v
u
v
u,v
Q States a , b , c , d , X u , v , Y
0 , 1 d and b shown in graph
15
Caution/Nuisance
  • Some models of automata have observables
    generated by state occupancy
  • Other models have observables generated by state
    transitions
  • There are simple mechanisms for transforming one
    to the other....they are equivalent.

16
Automata and Languages
  • The set of all possible finite length outputs of
    the previous example are a "language"
  • The language can be represented by a regular
    expression - (010110111)
  • "Classical relationship" between regular
    languages and nondeterministic finite automata -
    ie, given one, construct the other (Kleene's
    Theorem)
  • How about constructing an automaton from the
    input-output relationship?

17
Nerode Equivalence
  • Theorem Every causal, time-invariant system has
    a state space description.
  • "Constructive" proof
  • use the input-output description of a system
  • two finite length input strings belong to the
    same equivalence class if all the corresponding
    outputs (beyond the inputs' lengths) are the same
  • ie, if inputs w1w2 and w3w2 have outputs z1z2 and
    z3z2 for all w2 then w1 is equiv to w3
  • the resulting equivalence classes are the states

18
Partial Differential Equations
19
Quantum Mechanical Systems
20
Other process formalisms
  • A Petri Net (PN) is given a state by marking its
    places.
  • Marking of a PN consists of assigning a
    nonnegative integer to each place.
  • Graphically, tokens are inserted in places of a
    PN
  • Input place - arrow goes from the place to the
    transition
  • Output place - arrow goes from the transition to
    the place

Concurrency Examples R. Apcar, E. Chiu, H.
Jerejian
21
Definitions
  • A transition may have one or more Input and
    Output places
  • A transition is enabled if there is at least one
    token in each of its input places.
  • An Enabled transition may fire
  • one token is removed from each input place and
    one token is inserted in each ouput place of the
    transition

Concurrency Examples R. Apcar, E. Chiu, H.
Jerejian
22
An example
Concurrency Examples R. Apcar, E. Chiu, H.
Jerejian
23
Example continued
Concurrency Examples R. Apcar, E. Chiu, H.
Jerejian
24
A Process has...
  • Hidden states (discrete or continuous)
  • State transitions (nondeterministic,
    probabilistic)
  • Observables/events
  • Relationship between observables and states
  • An algorithm to score observations/events to
    state sequences assignments
  • Examples
  • Nondeterministic automata
  • Hidden Markov Models
  • Petri Nets
  • Linear Systems
  • Nonlinear Systems
  • etc

25
Models for Organizational Processes (W. Chung,
J.-P. Schutt, R. Savell, G. Cybenko)
Observables of the Process
A
A
B
B
A asks B to join a project
B accepts
A adds B to a list of recipientsA?B, C,
Dynamics of the Process
ENRON, Ebay, etc
Static Analysis
Dynamic Analysis
26
Example of a Multistage Process Model in Computer
Security
Potential malicious activity
snort alerts
Potential normal activity
Samba
ftp, covert channel, etc
Tripwire
Cybenko
27
Real time Fish Tracking
  • Objective
  • Track several fish in the fish tank
  • Why
  • Very strong example of the power of PQS
  • Fish swim very quickly and erratically
  • Lots of missed observations
  • Lots of noise
  • Classical Kalman filters dont work (non-linear
    movement and acceleration)
  • Easier than getting permission to track people
    (we mistakenly thought)

Cybenko
28
Fish Tracking Details
  • 5 Gallon tank with 2 red Platys named Bubble and
    Squeak
  • Camera generates a stream of centroids
  • For each frame a series of (X,Y) pairs is
    generated.
  • Model describes the kinematics of a fish
  • The model evaluates if new (X,Y) pairs could
    belong to the same fish, based on measured
    position, momentum, and predicted next position.
    This way, multiple tracks are formed. One for
    each object.
  • Model was built in under 3 days!!!

Cybenko
29
Kinematic Tracking (2)
  • Model the motion of a feature moving at "human"
    speed
  • The model evaluates if new (X,Y) pairs could
    belong to the same hot spot, based on measured
    position, momentum, and predicted next position.
    This way, multiple tracks are formed. One for
    each object.
  • Sensors Infrared video camera provides
    datastream
  • Camera generates a stream of centroids
  • For each frame a series of (X,Y) pairs is
    generated.

30
An Example of a Process
a
b
A Process Model
1
2
Two states - 1 , 2 Two observables a , b
Legal transitions between states are depicted
by arrows. When occupying a state, the process
emits an observable. All states are
initial/start states and there are no terminal
states. Some legal sequences of observables
abbab , bababbb, abbb Some illegal sequences of
observables aa , baab Further reading
Automata Theory, Regular Languages, etc
31
A More Complex Process
a , c
b
a , c
Another Process Model
1
2
3
Three states - 1 , 2 , 3 Three observables
a , b , c Some legal sequences of
observables abab , babaccab, ab Some illegal
sequences of observables bb , baabb Problem
Given a sequence of possible observations is it
legal? What states? Solution 1 Read the
first observable, mark states that emit that
observable 2 Read an observable, z 3 New
marked states (states reachable from old marked
states) intersected with (states that could
have emitted z ) 4 If no new marked states,
illegal sequence else go to 2
32
Extensions Hidden Markov Model (HMM)
p(a1) 0.8 , p(c1) 0.2 p(b2) 1
p(a3) 0.8, p(c3) 0.2
1
0.8
0.5
Add probabilities
3
1
2
0.2
0.5
Hidden Markov Models consist of two
ingredients - the dynamics state transition
probabilities in a Markov chains - the
emissions p(observationstate) Given a
sequence of observations of length t, what are
the possible states at time t? Unlike the case
for a nondeterministic automaton, all we can say
in general for an HMM is what the probability
distribution on states is.
33
Extensions Hidden Markov Model (HMM)
p(a1) 0.8 , p(c1) 0.2 p(b2) 1
p(a3) 0.8, p(c3) 0.2
0.8
1
0.5
1
2
3
0.5
0.2
Probability distribution at time t1 is obtained
by combining - propagation of the distribution
from time t using only the dynamics - factoring
in the observation observed at time t1
34
Two Simple Processes
a
b
Model Instance A
A1
A2
a
b
Model Instance B
B1
B2
aabb is a legal observation sequence A1 B1 A2 A2
, A1 B1 A2 B2 , B1 A1 B2 B2 , ... are all
legal state sequences A1 A2 A2 , A1 A2
, A1 B1
B1 B2 B1 B2 B2
We can reduce this to a single process....
a track
a hypothesis
35
Multiple Process Representation
A1 B1
a
b
A1 B1
0 1 1 1
Model Instance A
A1
A2
M
a
b
Model Instance A
A1
A2
0 0 0 0
0 1 1 1
M x M
0 1 1 1
0 1 1 1
a
b
Model Instance B
B1
B2
If the observation sequence is aaaaaa and
multiple copies of the model are allowed, then we
get a product model of size 2n.
36
A Simple Example of Process Detection
  • a,b,c,d are events that can be observed
  • states A, B, C, D, E, F are hidden
  • observe a sequence of events
  • Sequence Hypotheses
  • ab NW RF
  • abab (NW NW)(RFNW)...
  • ababc (NW RF)(NW NW)
  • ababcc NW NW
  • Which process or combination of
  • processes explains the observed events?

a,b,c,d are events that can be observed
a
b
b , c
c , d
A
B
C
D
NETWORK WORM MODEL (NW) (a,b,c,d ICMP traffic
levels)
E,F 0 repeat read event e if ea then E if
E and eb then F until F
a
b
E
F
ROUTER FAILURE MODEL (RF)
Two models states have different semantics
sets of observables intersect what is the
diagnosis?
Cybenko
37
Key Questions
  • How is a process model built?
  • from first principles
  • from expert insights
  • from data (lots)
  • Given an event sequence, is it feasible or what
    is its probability?
  • Given an event sequence, estimate the current
    state
  • Given an event sequence, estimate the state
    sequence
  • How good are those estimates (ie variance)

38
Homework Problems
  • What are the states, dynamics and observables of
    the following processes
  • intercontinental ballistic missile
  • soccer, American football, baseball games
  • Avian bird flu epidemic
  • terrorist cell
  • blogosphere
  • US/global economy
  • poker
  • romance

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Overview
  • Lecture 2 Detecting processes
  • What does detection of processes mean?
  • Automata
  • Hidden Markov Models
  • Kalman filtering
  • Particle filters

47
Process Detection Problems
  • Given a sequence of observations...
  • What is the current state of the process?
  • What is the probability distribution on the
    states?
  • What are the most likely state sequences?
  • What is the uncertainty/error of the estimates?

48
Graphical Depiction of Automata
1
1
1
0
Start State
v
u
u
d
c
b
a
v
u
v
u,v
Q States a , b , c , d , X u , v , Y
0 , 1 d and b shown in graph
49
Input-Output Description
1
1
1
0
Start State
v
u
u
d
c
b
a
v
u
v
u,v
uuuu 01010 uuvu 01001 vuuuu 001010 vvuuuu 0001
010 uvvuuuu 01101010 .....
f v vv uu uvv ... u vu vuuu
.... uv vuv vuuuv ... uvu vuvu vvuvu
...
a b c d
50
Estimating states in an automaton
a
b
a , c
1
2
3
a
b
a , c
Observe a
1
2
3
a
b
a , c
Observe ab
Sequences 12, 32
1
2
3
a
b
a , c
Observe ac
1
2
3
Sequences 33
a
b
a , c
Observe acb
1
2
3
Sequences 332
51
Commentary
  • Trivial algorithm....
  • Interesting question What is the worst case
    growth of states sequences? Tomorrow.
  • No probabilities, only possibilities.
  • What if we add probabilities?

52
Simplest Hidden Markov Model
b1(u) 0.9, b1(v) 0.1
a11 0.7
1
p(1)0.5, p(2)0.5 are initial probabilities
a21 0.1
a12 0.3
2
a22 0.9
b2(u) 0.1, b2(v) 0.9
53
Applications of HMM's
  • Speech recognition
  • Gene sequencing
  • Motion modeling and detection
  • Pattern recognition (OCR)
  • Darpa Grand Challenge (autonomic systems)
  • etc
  • etc
  • etc

54
Estimating States
b1(u) 0.9, b1(v) 0.1
a11 0.7
1
p(1)0.5, p(2)0.5 are initial probabilities
a21 0.1
a12 0.3
2
a22 0.9
b2(u) 0.1, b2(v) 0.9
55
Estimating Another State
b1(u) 0.9, b1(v) 0.1
a11 0.7
1
p(1)0.5, p(2)0.5 are initial probabilities
a21 0.1
a12 0.3
2
a22 0.9
b2(u) 0.1, b2(v) 0.9
56
Sequences of Observations
Time 1 2 3
4 5
States 1 2
Observations O1 u O2 v O3 u
O4 v O5 v
Problems Given a sequence of observations
O1O2O3 ... 1. What is the most likely state at
time t ? 2. What is the most likely state
sequence over all time ? 3. What is the
probability of the observation sequence?
57
Best state vs best sequence
b1(u) 0.9, b1(v) 0.1
a11 0.7
1
p(1)0.5, p(2)0.5 are initial probabilities
a21 0
a12 0.3
2
a22 1
b2(u) 0, b2(v) 1
Observe v - most likely state is 2 Observe u
next - must be in state 1 but no transition from
2 to 1 is possible The sequence vu could only
have been produced by starting and staying in
state 1
58
Probability of the Observations
Time 1 2 3
4 5
States 1 2
Observations O1 u O2 v O3 u
O4 v O5 v
59
Optimal Sequences
Time 1 2 3
4 5
States 1 2
Observations O1 u O2 v O3 u
O4 v O5 v
60
Viterbi's Algorithm
  • These computations were discovered by A. Viterbi,
    a founder of Qualcomm.
  • The algorithms are used in all modern cell phones
    and telecom devices in general.

Noisy Channel
Decode
Source sequence
Receive
11221212222212
uvvuvuvvuvuvvv
11221212122212
61
Other issues for HMM
  • Learning an HMM -ie. what are the various
    probabilities?
  • Baum/Welch Algorithm
  • variational algorithms
  • Finite, discrete state spaces

62
How about continuous state spaces?
  • Major challenge
  • in the finite, discrete case (HMM), we can
    represent and store the whole probability
    distribution as an n-vector
  • what continuous state probability distributions
    have simple representations?
  • Gaussians - mean and variance specify them
  • what if the distribution is more general than a
    Gaussian?

63
Madory's Goats
  • Goat herder
  • Herd state is the number of infant females, adult
    females, infant males and adult females
  • Dynamics are generation to generation how many
    infant females and males are born, how many
    infants of each gender become adults and how many
    adults survive
  • Observables are goat milk revenues and goat baby
    inoculation costs - these are noisy
  • Problem estimate total number of goats and
    number of adult females
  • (Example and code due to Doug Madory)

64
(No Transcript)
65
Quantification of the State
66
Quantification of the Dynamics
67
Quantification of Observations
68
(No Transcript)
69
Basic Concept in Kalman Filtering
  • Use the fact that the sum of variables with
    Gaussian distributions is also Gaussian
  • Gaussian is characterized by mean and variance
  • Use dynamics to predict the next state
  • Use measurement (observation) to correct that
    prediction
  • Update the error covariance (ie confidence in the
    estimate)

70
(No Transcript)
71
(No Transcript)
72
Kalman Equations and Geometry
73
Extensions
  • To nonlinear systems (linearize locally)
  • Learn the system dynamics
  • Use the estimates to control the state (feedback)
  • To non-Gaussian noise problems
  • particle filter methods

74
Particle Filters
  • Represent a probability distribution using a
    discrete distribution of particles
  • Sample the particles, propagate using dynamics
    and correct using obervations
  • This creates a new distribution for the next time
    step

75
Deep Connections to Information Theory
  • This is all part of a much larger problem
    description - cybernetics ala N. Wiener

Noisy Channel
Decode
Environment
Receiver
Estimate of Environment
Learning
Models of Environment
Actions
76
Summary of Lecture 2
Process class Distribution Algorithm Automaton
None Simple marking HMM Discrete,
finite Viterbi Linear, continuous Gaussian Kalm
an Continous, nonlinear Arbitrary Particle
filters
What are the observables? What are the states?
What are the dynamics?
77
Overview of Lecture 3
  • Detecting multiple processes
  • Instead of one process, we now have some unknown
    number of them
  • Multiple hypothesis tracking (MHT) framework
  • The basic algorithms
  • Complexity theory
  • Process Query Systems
  • Applications

78
Multiple Hidden Process Models
Cybenko
79
Why be interested in this....
  • Sensor networks
  • Airborne plume detection
  • Cyber security
  • Autonomic server pool management
  • Dynamics of social networks
  • Genomics and biological pathways
  • Human situation awareness
  • Possible applications.

Cybenko
80
Basic Concepts of Process Query Systems (PQS)
An Operational Network
Indictors and Warnings
6
129.170.46.3 is at high risk 129.170.46.33 is a
stepping stone ......
that are used to defend the network
that detect complex attacks and anticipate the
next steps
5
consists of
1
Sample Console
Hypotheses
Multiple Processes
Track 1
Track 1
l1 router failure
Track 2
Track 2
Track 3
l2 worm
Track 3
l3 scan
Hypothesis 1
Hypothesis 2
2
that produce
that are seen as
4
that PQS resolves into
Unlabelled Sensor Reports
Events
.
.
Track Scores
Time
Time
3
PQS
Real World
81
Discrete Source Separation Problem(viz Blind
Source Separation, Cocktail Party Problem)
Process/Model Example
3 states transition probabilities n observable
events a,b,c,d,e, Pr( state observable event
) given/known
Observed event sequence .abcbbbaaaababbabcccbddd
bebdbabcbabe.
A Hypothesis
Catalog of Processes/Models
A Track
Which combination of which process models best
accounts for the observations? This is what we
want to compute. Events not associated with a
known process are anomalies.
Cybenko
82
Multiple Hypothesis Approach to the "Discrete
Source Separation Problem"
Obs1 Obs2 . . .
Observables at time t1
83
Multiple Hypothesis Approach to the "Discrete
Source Separation Problem"
Obs1
Obs2
Hypothesis 1a
Obs2
Obs1
Hypothesis 1b
Candidates at time t1
84
Terminology
  • Tracks are associations of observations to
    individual processes.
  • Hypotheses are consistent tracks that explain all
    the observables.
  • Hypothesis extension is the conjectural
    assignment of new observations to existing
    hypotheses.
  • Track initiation is the instantiation of a new
    process in a hypothesis' extension.
  • Handling missed detections means that an
    intermediate observation may have been dropped.

Cybenko
85
A Simple Example of Process Detection
  • a,b,c,d are events that can be observed
  • states A, B, C, D, E, F are hidden
  • observe a sequence of events
  • Sequence Hypotheses
  • ab NW RF
  • abab (NW NW)(RFNW)...
  • ababc (NW RF)(NW NW)
  • ababcc NW NW
  • Which process or combination of
  • processes explains the observed events?

a,b,c,d are events that can be observed
a
b
b , c
c , d
A
B
C
D
NETWORK WORM MODEL (NW) (a,b,c,d ICMP traffic
levels)
E,F 0 repeat read event e if ea then E if
E and eb then F until F
a
b
E
F
ROUTER FAILURE MODEL (RF)
Two models states have different semantics
sets of observables intersect what is the
diagnosis?
Cybenko
86
Add Rules for Missed Detections and Disambiguation
A,B,C,D 0 repeat read event e if ea then
A if A and eb then B if A and ec then
C,D if A and ed then D if B and (eb or
ec) then C if C then (E0, F0) if C and
(ec or ed) then D if D then (E0, F0) until
D
a
b
b , c
c , d
A
B
C
D
WORM MODEL (a,b,c,d ICMP traffic levels)
Blue statements handle missed detections Red
statements handle consistency
This clearly does not scale and does not lead to
manageable sets/systems of rules.
Cybenko
87
Approaches to Detecting Processes
  • Aristotelian - Traditional information retrieval
    is based on specification of a query in terms of
    Boolean expressions based on record fields. IE.
    SQL ( name smith age gt 20 age lt 40 )
    rule-based logics decision trees, etc
  • Newtonian - Next generation process detection
    requires retrieval based on specification of a
    set of discrete, dynamic processes. IE,
    descriptions of a Hidden Markov Model, Hidden
    Petri Net, weak models, FSMs, attack trees, etc.
  • Main Concept Move from an Aristotelian to a
    Newtonian Paradigm.

Cybenko
88
Process Query Systems (PQS)
  • Process Query Systems solve the Discrete Source
    Separation Problem in a generic way
  • inputs
  • a sequence of unlabelled observations (stream,
    logfiles, etc)
  • a collection of process models
  • outputs
  • estimates of which processes produced those
    observations
  • estimates of which states those processes are in
  • Basic theory and technology has been developed by
    the PQS team at Dartmouth
  • Now being applied to a variety of applications

Cybenko
89
Algorithms/Operations of PQS
Evaluate Solutions and Process Outputs
5
3
Update Tracks Within Hypotheses (Viterbi / Kalman
/ NDFA,etc) and Create New Hypotheses
Recursive in Time
Cybenko
90
The COBOL and pre-PQS Analogy
application logic statement 1 application
logic statement 2 file management statement
1 record management statement 1 file management
statement 2 record management statement
2 application logic statement 3 record
management statement 3 file management statement
3 application logic statement 4
User responsibility
System responsibility
application logic statement 1 application
logic statement 2 SQL statement 1 application
logic statement 3 SQL statement 2 application
logic statement 4
file management operation 1 record management
operation 1 file management operation 2 record
management operation 2 record management
operation 3 file management operation 3

Application logic
Database management system
Interwoven logic
Post-SQL Programs
Pre-SQL Programs
model logic statement 1 model logic statement
2 sensor access statement 1 state estimate
statement 1 sensor access statement 2 state
estimate statement 2 model logic statement
3 sensor access statement 3 state estimate
statement 3 model logic statement 4
User responsibility
System responsibility
model description statement 1 model
description statement 2 model description
statement 3 model description statement 4
sensor access statement 1 state estimate
statement 1 sensor access statement 2 state
estimate statement 2 sensor access statement
3 state estimate statement 3

Model description
Process query system
Interwoven logic
Current Process Detection Programs
PQS-based Programs
91
Network Security(V. Berk, I. De Souza, A.
Bersamian, A. Giani, M. Bates, D. Madory, G.
Bakos, et al)
  • Objective
  • Detect, disambiguate, and predict the course of
    concerted network attacks in an enterprise class
    network.
  • Why
  • Problem domain demands the power of PQS
  • Hundreds of processes occurring at once
  • Lots of missed observations and noise
  • All commercial technology focuses on collection
    and presentation of data
  • Existing correlation efforts very weak at best

Cybenko
92
SENSORS INTEGRATED
SENSOR DESCRIPTION SCOPE
Global
CovChan
Timing Covert Channel Detection
Network
IPtables
Linux Netfilter firewall, log based
Weblog
IIS, Apache, SSL error logs,
Host
US-agent
Userspace host monitoring agent
Cybenko
93
Example of a Multistage Process Model
Potential malicious activity
snort alerts
Potential normal activity
Samba
ftp, covert channel, etc
Tripwire
Cybenko
94
PQS-Net supply chain
  • Tier 1 Models
  • Focus on individual host status
  • Report on status changes
  • Tier 2 Models
  • Focus on correlating host activity
  • Report chains of events

Tier 1 Output Mon Feb 21 200617 2005 000000
131.58.63.160 (hostile) recon on 100.10.20.4
SNORT 469 proto 1 Mon Feb 21 203024 2005
000000 138.158.170.45 (hostile) attacked
100.10.20.4 ERRORLOG 400 proto 6 dport 443
Tier 2 Output
Hypothesis 1 Score 0.8 Hypothesis 2 Score 0.2
A scans B A scans B
B scans E
B attacks E
Tier 1 Tracker
Tier 2 Tracker
Attack sequences and scores
Attack steps
sensor data
sensors
Analysts front-end
Cybenko
95
Example Scenario

Internet
A
C
B
E
D
Tier1 Alerts Indicators
A scans B Snort 02/21-200617.904500 14691 ICMP PING NMAP Classification Attempted Information Leak Priority 2 ICMP 131.58.63.160 -gt 100.10.20.4
C attacks B (success) SSL error log (host 100.10.20.4) Mon Feb 21 203024 2005 error mod_ssl SSL handshake failed (server www.osis.gov443, client 138.185.170.45) (OpenSSL library error follows) Mon Feb 21 203024 2005 error OpenSSL error1406908Flib(20)func(105)reason(143)
Cybenko
96
Example Contd
B
E
D
Tier1 Alerts Indicators
B scans D 02/21-203117.528602 118072 WEB-MISC Chunked-Encoding transfer attempt Classification Web Application Attack Priority 1 TCP 100.10.20.434074 -gt 100.10.20.16980
B attacks D (fails) 100.20.1.169 - - 21/Feb/2005083122 -0500 "GET /default.idq?AAAAAAAAAAA..AAAAAAA HTTP/1.1" 404 1287 "-" "-"
B scans E 02/21-203201.622465 118072 WEB-MISC Chunked-Encoding transfer attempt Classification Web Application Attack Priority 1 TCP 100.10.20.434076 -gt 100.10.20.17080
B attacks E (succeeds) 100.20.1.170 - - 21/Feb/2005083206 -0500 "GET /default.idq?AAAAAAAAAAA..AAAAAAA HTTP/1.1" 200 1287 "-" "-"
Cybenko
97
Results
Dataset 3s8 3s26 3s28 3s29
Alerts 22930 18391 12522 39270
Lines in trunk_alert 4830 5959 1159 8168
Lines in snort files generated from tcpdump 11751 7284 7006 19866
Lines in weblogs (apache, IIS) 6349 5148 4357 11236
Number of tracks produced 100 75 51 107
Attack Tracks not in ground truth 1 0 0 0
Attackers identified 3 of 3 4 of 4 0 of 2 3 of 5
Decoys found 5 of 5 2 of 2 2 of 2 6 of 6
Victims identified 2 of 2 2 of 2 1 of 2 10 of 11
Stepping stones identified 1 of 1 1 of 1 1 of 2 2 of 3
98
Autonomic Server Monitoring(C. Roblee, V.
Berk)Funded by DHS
Cybenko
99
Autonomic Server Monitoring
  • Objective
  • Detect and predict deteriorating service
    situations
  • Why
  • Another strong example of the power of PQS
  • Software and hardware are buggy and vulnerable
  • Hot market, large profits for The ONE
    application
  • Very ambiguous observations
  • Sys-admins also want vacation

Cybenko
100
The Environment
  • Hundreds of servers and services
  • Various non-intrusive sensors check for
  • CPU load
  • Memory footprint
  • Process table (forking behavior)
  • Disk I/O
  • Network I/O
  • Service query response times
  • Suspicious network activities (i.e.. Snort)
  • Models describe the kinematics of failures and
    attacks
  • The model evaluates load balancing problems,
    memory leaks, suspicious forking behavior (like
    /bin/sh), service hiccups correlated with network
    attacks

Cybenko
101
Server Compromise Model Generic Attack Scenario
t0 t1 t2 t3
t4
Observations
Response
Cybenko
102
Experimental Results
No Tracking
Tracking
Successful Requests
System Memory Consumed
210,000 requests serviced
380,000 requests serviced
Cybenko
103
Chemical Plume Process DetectionFunded by DHS
  • Glenn Nofsinger

104
The Forward Problem
Concentration in a 2D region as a function of
time
Ficks Law (diffusion)
Advection (wind)
Concentration equation composed of diffusion and
advection
  • Forward model result
  • arbitrary initial sources
  • pseudo-random wind
  • includes diffusion and wind

105
Current technology on DC Mall. Future sensors
will be smaller and greater in number, with a
need for measurement correlation.
106
Multiple Source Case With Terrain Connectivity
determined by wind and geography
Source 1
Source 2
Connectivity
Wind
107
Multiple Source Case With Terrain Connectivity
determined by wind and geography
Source 1
Source 2
Connectivity
Wind
108
Inverse Source LikelihoodEstimating the
probability that a sensor observation is
generated by a source at a given location. Based
on wind direction history and diffusion
properties of agent.
wind
sensors
S
S
sources
109
Correlation Between Observations at Different
Locations
Picking any two sensors we evaluate a probability
that the observation at that sensor is connected
to observations at different sensors in the
region. This is a function of wind history,
distance, and diffusion properties.
wind
110
Source Estimation Compared to True Source Location
Estimated Source based on inverse correlation of
plume observations and tracks
Forward Simulation
111
Social Network Analysis Comparison of Static vs
Dynamic(W. Chung, R. Savell, J.-P. Schuett)
Temporal sequence of transactions
Analyze projected, non-temporal data
Analysis of Static Artifacts
Projection removes temporal relationships
Time
Temporal sequence of transactions
Extraction of Dynamic Processes
Analysis of temporal aspects of transactions
Time
112
Process Primitives
Decay kernel correlates potentially related
emails - eg. links Functional roles based on
conversation segments shown below
A. Initiator B. Broker C. Bridge D.
Triad E. Terminator
113
Combining Primitives into Processes
P(t'-t) gt f P(t''-t') gt f P(t'''-t'') lt f
X
Probabilities of temporal relationships are used
to grow tracks
114
Methodology Details
1. Crude Naïve Bayes Text Classification w/
Temporal Correlations to isolate coarse thread.
2. Local structure via Process Primitives on the
Dynamic Social Network.
115
Theory
  • PQS offer a principled approach that enables
  • understanding how distinguishable models (attack
    and failure) are
  • developing a notion of processes that are
    trackable, given models and sensing
    infrastructure (ie a sampling theory)

116
Hypothesis Growth
A hypothesis is a consistent assignment of
events to processes and/or states(ie, each event
assigned to only one process instance). Given a
set of hypotheses for an event stream of length
k-1, update the hypotheses to length k to explain
the new event. NP-Complete in general. Need to
prune the pool of hypotheses, keeping the most
suitable.
time
Individual path is a track ie one process
instance Consistent tracks form a hypothesis
117
Models and Hypothesis Growth
Weak model FSM with emission vectors
Emission for state i 0/1 vector of sensor
reports eg obs(i) ( 0 , 1 , 1 , 0 , 0 , 1 , 1 )
Observation vector at time t collected by
sensors eg sensors(t) ( 0 , 1 , 1 , 1 , 1 ,
1 , 0 )
Possible states at time t are determined by P
i Hamming_distance( obs(i) , sensors(t)) lt
HD R i j possible at time t - 1 and i
is reachable from j P R is the set of
possible states at time t Number of hypotheses
at time t recursively computed as above.
U
Theorem For a fixed value of HD, the worst-case
number of hypotheses at time t is either
polynomial or exponential in t. (Crespi,
Cybenko, Jiang 2005)
118
Longer tracking time
More noise (worse model)
119
Longer tracking time
More noise (worse model)
120
Basic Idea Behind the Proof
N states
time t
time t1
time t2
time k
Process dynamics (ie what is reachable from each
state in a time step) observations noise
threshold determines a trellis. If there are
two distinct paths from one node to itself over
some period of time, the number of distinct paths
grows exponentially by repeating the construct.
121
Basic Idea Behind the Proof
N states
time t
time t1
time t2
time k
If there are never two distinct paths from any
node to itself over any period of observation,
there is a simple injective mapping (ie. unique
labeling) of the paths into 0, 1, ... , k x 0,
1, ... , k x 0, 1, ... , k ... x 0, 1, ... ,
k 2N times. So the number of paths is lt
(k1)2N. The label for each path is the time it
first occupies a state and the time it last
occupies that state.
122
Relationship to Joint Spectral Radius
123
New Ideas for Large-Scale Hypothesis Management
  • Data structures for maintaining one copy of many
    hypotheses that are variants of one another
  • Viewing the set of hypotheses as the solution
    (instead of the highest ranked hypothesis eg)
  • propagating the set can be done in linear space,
    constant time
  • some properties of the set of hypotheses can be
    computed in constant time, others in linear time,
    others seem to require exponentially much time
    and/or space, etc.
  • Development of a nonparametric approach to
    tracking and Situational Awareness, not unlike
    nonparametric statistical techniques (order
    statistics, etc)
  • Reduce dependencies on probabilistic parameters
    and model building

124
Distinguishability of models(Yong Sheng)
  • Given two models, how distinguishable are they?
  • Example Model of router failure vs worm attack?
  • Do we need to build more refined models or do we
    need to add additional sensors/data sources?

125
Different degrees of distinguishability
betweenmodels given sensing capabilities (eg
DDOS vs router failure)
Red Prob of deciding model 2 given model 1 Blue
Prob of deciding model 1 given model 2 Entropy
of the two ergodic models are different. Decision
rule is based on ML as determined by the
Viterbi algorithm Shannon-MacMillan-Brieman
Ergodic Theorem states that most observation
sequences are typical and have probability
related to the entropy
126
Different degrees of distinguishability
betweenmodels given sensing capabilities (eg
DDOS vs router failure)
However, nonmonotonic behaviors are possible (in
general) and without convergence to zero (if the
entropies are the same)
127
Different degrees of distinguishability
betweenmodels given sensing capabilities (eg
DDOS vs router failure)
However, nonmonotonic behaviors are possible (in
general) and without convergence to zero (if the
entropies are the same)
128
Where do models come from?
  • In practice, we build models of processes by
  • First principles ie, symmetry, physical laws,
    etc.
  • Expert models/rules/experience ie, chess
    playing computers, military tactics, etc
  • Empirical analysis (from real or simulated data)
    ie. backgammon, stock market models, etc.
  • Process Query Markup Language developed and
    almost implemented allows rapid insertion of
    new attack models into PQS

129
PQS INPUTS PROCESS MODEL SEMANTICS AND SENSOR
DATA REQUIREMENTS
Failed
Failed
A
A
0.03
0.05
alert icmp EXTERNAL_NET any -gt HOME_NET any
(msg"ICMP Destination Unreachable (Host
Unreachable)" itype 3 icode 1 sid399
classtypemisc-activity rev4)
B
B
0.2
Marginal
Learn
Represent
Marginal
C
C
Normal
Normal
0.9
Rules signatures, etc
Reachability (weak) Models
Probabilistic Models (HMM, Bayes Nets, Fuzzy
models, etc)
Compile
Compile
Compile
if (src_ip_new.equals(src_ip_track)) if
(IPv4_in_CIDR_ints (208,253,154,0, 24,
src_ip_new) true)
// local? new_likelihood new
Likelihood ((0.90f
likelihood.getProbability())/2.0f)
else // Else dont care
new_likelihood new Likelihood (0.0)
Code
Execute
130
More details....
  • gvc_at_dartmouth.edu
  • See www.pqsnet.net
Write a Comment
User Comments (0)
About PowerShow.com