Title: User Profiling for Intrusion Detection
1(No Transcript)
2Authenticating Users by Profiling Behavior
- Tom Goldring
- Defensive Computing Research Office
- National Security Agency
3Statement of the Problem
- Can we authenticate a users login based strictly
on behavior? - Do different sessions belonging to a single user
look similar? - Do sessions belonging to different users look
different? - What do similar and different mean?
4Purpose
- We expect that the behavior of any given user,
if defined appropriately, would be very hard to
impersonate - Unfortunately, behavior is turning out to be very
hard to define
5What about Biometrics?
- Things like e.g. fingerprints, voice recognition,
and iris scanning constitute one line of defense,
although they require additional per host
hardware - However, biometrics do not necessarily cover all
scenarios - The unattended terminal
- Insider misuse
- R23 is currently sponsoring research in mouse
movement monitoring and we hope to include
keystrokes at some point as well
6Speaking of Insider Misuse
- For example, spying, embezzlement, disgruntled
employees, employees with additional privilege - This is by far the most difficult problem, and at
present we dont know if its even tractable - Lack of data
- Obtaining real data is extremely difficult if not
impossible - At present we dont even have simulated data,
however efforts are under way to remedy this
shortcoming, e.g. ARDA sponsored research
7The Value of Studying Behavior
- Whether or not the authentication problem can be
solved in other ways, it seems clear that for the
Insider Misuse problem, some way to automatically
process user behavior is absolutely necessary.
8Authentication is a more Tractable Problem
- Getting real data (of the legitimate variety, at
least) is easy - We can simulate illegitimate logins by taking a
session and attributing it to an incorrect source - This is open to criticism, but at least its a
start - As stated previously, research on simulating
misuse is starting to get under way - Many examples of Insider Misuse seem to look
normal - In its simplest form, we authenticate an entire
session - Therefore, authentication is a reasonable place
to start
9Program Profiling is an Even More Tractable
Problem
- Application programs are designed to perform a
limited range of specific tasks - By comparison, human behavior is the worst of
both worlds - It has an extremely wide range of normal
- It can be extremely unpredictable
- Doing abnormal things sometimes is perfectly
normal - People sometimes change roles
- In a point and click environment, the path of
least resistance causes people to look more like
each other
10Possible Data Sources for User Profiling
- Ideally, we would like to recover session data
that - Contains everything (relevant) that the user does
- Encapsulates the users style
- Can be read and understood
11Command line activity
- Up to now, Unix command line data has been the
industry standard - For most tasks, there are many ways to do them in
Unix - This looks like a plus, but actually its the
opposite - Its human readable
- But it misses windows and scripts
- In window-based OSs its becoming (if not
already) a thing of the past - Almost never appears in data from our target OS
(Windows NT) - Therefore we no longer consider it a viable data
source
12System calls
- Appropriate for Program Profiling, but less so
for User Profiling - Very fine granularity
- But (human behavior) / (OS behavior) very low
- Next to impossible to guess what the user is
doing
13Process table
- This is the mechanism that all multitasking OSs
use to keep track of things and share resources - So it isnt about to go away
- Windows and scripts spawn processes
- Built in tree structure
- Nevertheless, we still need to filter out machine
behavior so that we can reconstruct what the user
did
14Window titles
- In Windows NT, everything the user does occurs in
some window - Every window has a title bar, PID, and PPID
- Combining these with the process table gives
superior data - now very easy to filter out system noise by
matching process ids with that of active window - solves the explorer problem
- anyone can read the data and tell what the user
is doing - a wealth of new information, e.g. subject lines
of emails, names of web pages, files and
directories
15Data Sources We Would Like to Investigate
- Keystrokes
- Mouse movements
- Speed, random movements, location (app. Window,
title bar, taskbar, desktop) - Degree to which user maximizes windows vs.
Resizing, or minimizes vs. Switching - These can be combined to study individual style
- Hotkeys vs. Mouse clicks (many examples, some of
which are application dependent) - File system usage
16But
- Our data now consists of successive window titles
with process information in between - So we have a mixture of two different types of
data, making feature selection somewhat less
obvious. - Ideally, feature values should
- be different for different users, but
- be similar for different sessions belonging to
the same user.
17Whats in the Data?
- Contents of the title bar whenever
- A new window is created
- Focus switches to a previously existing window
- Title bar changes in current window
- For process table
- Birth
- Death
- Continuation (existing process uses up CPU time)
- Background
- Ancestry
- Timing
- Date and time of login
- Clock time since login
- CPU time
18(No Transcript)
19Some candidate features
- Timing
- time between windows
- time between new windows
- windows open at once
- sampled at some time interval
- weighted by time open
- words in window title
- total words
- ( W words) / (total words)
20(No Transcript)
21The Complete Feature Set
- Log (time inactive) for any period gt 1 minute
- Log (elapsed time between new windows)
- Log (elapsed time since login) for new windows
- Log (1 windows open) sampled every 10 minutes
- Log ( windows open weighted by time open)
- Log (1 windows open) whenever it changes
- Log (total windows opened) whenever it changes
- characters in W words
- ( characters in W words) / (total characters)
22The Complete Feature Set (cont.)
- W words
- non - W words
- ( W words) / (total words)
- Log (1 process lines) between successive
windows - total CPU time used per window
- elapsed time per window
- (total CPU time used) / (elapsed time)
- Log ((total CPU time used) / (elapsed time since
login))
23But These Features Occur in a Nonstandard Way
- Define a session as whatever occurs from login to
logout - Most classifiers would want to see a complete
feature vector for each session - But what we actually have is a feature stream for
each session features can be generated - Whenever a new window is opened
- Whenever the window or its title changes
- Whenever a period of inactivity occurs
- At sampled time intervals
24Derived Feature Vectors
- We need to reduce this stream to a single feature
vector per session, because thats what most
classifiers want to see - One way is to use the K-S two sample test for
each individual feature - This results in a single vector per session with
dimension equal to the number of features
25Derived Feature Vectors (cont.)
- The test session gives one empirical distribution
- The model gives another (usually larger)
- The K-S test measures how well these
distributions match - If one is empty but not both, we have a total
mismatch - If both are empty, we have a perfect match
- If neither is empty, we compute max
difference between the two cdf s - So as claimed, we obtain a single vector per
session with dimension equal to the of
features - It also makes the analysis much more confusing
26An Experiment
- This is real data it was collected on actual
users doing real work on an internal NT network - Data 10 users, 35 sessions for each user
27An Experiment (cont.)
- For any session, let PU denote the putative
user (i.e. the login name that comes with the
session), and let TU denote the true user (the
person who actually generated the session) - For each user u, we want to see how well the
sessions labeled as u match self vs. other - In the sequel, let M of users, T training
sessions per user
28Building Models from Training Data
- We used SVM, which is a binary classifier, so we
need to define positive examples (self) and
negative examples (other) - For self, create derived feature vectors by using
K-S to match each session (TU u) against the
composite of us other sessions - this gives T positive examples
29One way to define other
- Suppose we intend to evaluate a test session by
scoring it against a pool of user models and
seeing who wins - match each of u1s sessions against the composite
of u2s, for all other u2 - this gives T (M 1) negative examples
30Another way to define other
- Score a test session only against PUs model
- Say the null hypothesis is TU PU, and the
alternative hypothesis is TU someone else - this suggests deriving negative feature vectors
by matching each of u2s sessions (for each u2
other than u1) against the composite of u1s - it also gives T (M 1) negative examples
31Methodology
- For each user, use first 10 sessions for
training, test on next 5 - Build SVM and Random Forest models from training
data - Classify each test session against each model to
create a matrix of scores M (next slide) - each row is a session, user number is on the left
- 10 columns, one for each model scored against
- Look at plots of the two pdf s for each column
- red self (x in M)
- green other ( in M)
- left plot SVM, right plot Random Forest
32Matrix of scores
- 1 2 3 4 5
6 7 8 9 10 - --------------------------------------------------
--------------------- - x
- 1 x
-
- 2 x
- 2 x
-
- 10
x - 10
x
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Future Work
- Reduce / modify the feature set
- Try other classifiers
- Have a non-technical person read the sessions and
intuitively identify user characteristics - Come up with ways to visualize what goes on
inside a session