Title: RealTime Intrusion Detection Systems
1Real-Time Intrusion Detection Systems
- Sandeep Kotagiri
- Graduate Student, CACS
- April 11th 2006
2Papers Presented
- ADMIT Anomaly-based Data Mining for Intrusions
- K. Sequeira, M. Zaki
- ACM SIGKDD, 2002.
- Integrated Access Control and Intrusion Detection
for Web Servers - Tatyana Ryutov, Clifford Neuman, Dongho Kim, and
Li Zhou - IEEE Transactions on Parallel Distributed
Systems, September 2003 - The Specification and Enforcement of Advanced
Security Policies - Tatyana Ryutov and Clifford Neuman
- IEEE Proceedings of the Third International
Workshop on Policies for Distributed Systems and
Networks 2002
3ADMIT Anomaly-based Data Mining for Intrusions
- According to the 2000 Computer Security
Institute/FBI computer crime study, 85 of the
538 companies surveyed, reported an intrusion or
exploit of their corporate data, with 64
suffering a loss. - Features of a good IDS
- ADMIT Real time IDS with host-based data
collection and processing - Problem Differentiate between masqueraders and
the true users of a computer terminal - How augment password authentication with ADMIT
- What does ADMIT do? It is terminal resident,
monitors terminal usage for user, creates user
profile and verifies data against it.
4Overview of ADMIT
- Types of IDS signature based and anomaly based
- Network level data, System call-level data, User
command-level data - User profile for intrusion detection through
clustering - Observation Distribution of test point to
clusters changes significantly at the time of
attacks which is an indicator of anomalous
behavior - ADMIT is a user-profile dependent, temporal
sequence clustering based, real-time intrusion
detection system with host based data collection
and processing. - Advantages using clustering
- Model scaling
- Reduction of noise through cluster support
- Analyzing cluster centers and thus significant
data reduction - Intra-cluster similarity threshold and alarms
(Type A and Type B)
5ADMIT ARCHITECTURE
- 2 main stages training and testing
- Capturing user data
- Unix shell command data captured via t(csh)
mechanism - Recognizer parses user history data and emits
them as tokens - Session all data between logging on and logging
off (SOF and EOF) -
6Parsing user data into tokens
- An example session
- SOF Is l vi tl.txt ps eaf vi t2.txt
ls -a /usr/bin/ rm -i /home/ vi t3.txt
t4.txt ps ef EOF - Conversion to Tokens
- Tti 0 lt i lt 8, where t0 ls-1, t1 vi lt1gt,
t2 ps-eaf, t3 vi lt1gt, t4 Is -a lt1gt, t5 rm
-i lt1gt, t6 vi lt2gt, and t7 ps -ef. - ltngt gives the number of arguments (n) of a
command - vi t1.txt is tokenized as vilt1gt and vi t3.txt
t5.txt t6.txt as vilt3gt
7Familiarizing with terms used
- sequence s, of specified length l, is a list of
tokens, occurring contiguously in the same
session of audit data, i.e., s e Tl, where T is
the token alphabet. - cluster c, is a collection of sequences of user
initiated command data, such that all its
sequences are very similar to others within
itself using some similarity measure Sim(), but
different from those in other clusters. - If cs0,s1,s2,..,sn-1 is a cluster with n
sequences then cluster center sc is - A profile p, is the set of clusters of sequences
of user-initiated command data whose centers
characterize the user behavior. Thus, for user u, - Where r and r are intra-cluster and
inter-cluster similarity threshold and Sim(s1,s2)
is similarity between two sequences and
8Flow of Control in ADMIT
9Similarity Measure Sim(s1, s2)
- 2 sequences
- s1vi lt1gt, ps-eaf, vi lt1gt,ls a lt1gt,
- S2vi lt1gt, ls a lt1gt, rm i lt1gt, vi lt2gt
- MCP (match count polynomial bound ) counts the
number slots in the two sequences for which both
have identical tokens - MCP for above example is 1
- MCE (match count exponential bound) is a variant
of MCP in that it doubles for each matching value - MCAP/MCAE (Match Count with Adjacency Reward and
Polynomial/Exponential Bound) is a variant of
MCP/MCE where adjacent matches are rewarded - LCS (Longest Common Subsequence) is length of
longest subsequences of tokens that the sequences
have in common - It is 2 for the above sequences
10ADMIT Algorithms
- Data Training
- Data Pre-processing
- Clustering user sequences
- Cluster refinement
- Merge clusters
- Split clusters
- Online Testing
- Real-time data pre-processing
- Similarity search within profile
- Sequence rating
- Sequence classification
11Data Training Data Pre-processing
- SOF ls -1 vi t1.txt ps eaf vi t2.txt
ls - a /usr/bin/ rm -i/home/ vi t3.txt
t4.txt ps -ef EOF - FeatureSelector parses, cleans and tokenizes the
audit data, within each session specified by the
ProfileManager. - T ti 0 _lt i lt 8, where t0 ls -1, t1 vi
lt1gt, t2 ps -eaf, t3 vilt1gt, t4 ls -a lt1gt, t5
rm -i lt1gt, t6 vi lt2gt, and t7 ps -ef. - FeatureSelector creates sequences of length l.
For e.g. if l4 the set of user sequences is
given as Ssi 0 lt I lt T - l - Where
- S0 ls -1, vi lt1gt, ps -eaf, vi lt1gt
- S1 vi lt1gt, ps -eaf, vi lt1gt, ls -a lt1gt
- s2 ps -eaf, vi lt1gt, ls -a lt1gt, rm -i lt1gt
- s3 vi lt1gt, ls -a lt1gt, rm -i lt1gt, vi lt2gt
- s4 Is -a lt1gt, rm -i lt1gt, vi lt2gt, ps -ef
12Data Training Clustering User Sequences
13Data Training Clustering User Sequences
- Example with r 3 Initially Su, Sua s0,
s1, s2, s3, s4, - pu, Suc 0.
- Say new center is s0.
- For all remaining sequences in Su - Suc where Suc
s0, we compute similarity to the new center
s0. - Using LCS as the similarity metric we get Sim(s1
, s0) 3 since vi lt1gt, ps -eaf, vi lt1gt is their
LCS. - y we get Sim(s2, s0) 2, Sim(s3, s0) 1, and
Sim(s4, s0) 0. - Since s1 passes the threshold, we add it to the
new cluster to get cnew s0, s1. - Therefore the new Sua s2, s3, s4. Repeating
the while loop we get the profile as - pu, c0 s0, s1, c1 s2, c2 s3, s4.
14Data Training Cluster Refinement
- Purpose of Cluster Refinement
- setting the intra-cluster similarity r may
require experimentation. - Cluster may have a lot in common with another
- Larger sub-clusters within clusters
- Algorithms
15Data Training Cluster Refinement
- Example
- From above pu, c0,cl,c2 and r' 2
- Using LCS, Sim(c0,cl) Sim(s0,s2) 2.
- In this case, the two clusters should be merged
to get c0 s0, s1, s2 - Now c1 is deleted from the profile. Also, the
center for c0 becomes s1. - For clusters that have high support,
SplitClusters calls - DynamicClustering to re-cluster them into
smaller, higher - density clusters.
16Online Testing Real Time Data Pre-processing
- Testing must happen in an online manner as the
user sequences are produced - Example Sequence SOF vi t4.txt vi t4.txt
vi t4.txt ls -a/home/ rm -i/home/turbo/tmp/
ls- a/home/ vi t2.txt t4.txt ps el - Right padding is done in the absence of complete
sequences - Tokenizing
- T' ti 0 lt i lt 8 where t0 vi lt1gt, t1
vi lt1gt, t2 vi lt1gt, t3 ls -a lt1gt, t4 rm
-i lt1gt, t5 ls -a lt1gt,t6 vi lt2gt, t7 ps
-of. - For l4 S' si 0 lt i lt IT'I - l
- s0 vi ltlgt,vi ltlgt,vi ltlgt,ls -a lt1gt
- s1 vi ltlgt,vi ltlgt,ls -a ltlgt,rm -i lt1gt
- s2 vi ltlgt,ls -a ltlgt,rm -i ltlgt,ls -a lt1gt
- s3 Is-a ltlgt,rm-i ltlgt,ls-a ltlgt,vi lt2gt
- s4 rm -i ltlgt,ls -a ltlgt,vi lt2gt,ps -ef
17Online Testing Profile Search
- for each sequence si, find the most similar
cluster in pu - similarity between a sequence si and a profile
pu - Sim(si,pu,) maxcj, Sim(si, scj)
- Example
- pu c0 s0, s1, s2, c1 s3, s4
- (cluster centers are indicated with '').
- Then Sim(s0,pu) max( Sim( s0, sc0), Sim(s0,
sc1 ) ) max( Sim(s0, s1 ), Sim(s0, s3))
max(3, 2) 3. - Similarly Sim(s1,pu) 3, Sim(s2,pu) 3,
Sim(s3,pu) 3, and - Sim( s4 ,pu) 2.
18Online Testing Sequence Rating
- Noisy data and high false positive rates
- Using past sequences, present sequences are
tested to see if it is noise or true change in
profile - LAST_n
- Arithmetic mean of the similarity of last n
sequences - For the five new sequences, using this rating
metric with n 3, we would get the following
ratings Ro R1 R2 R3 3, and Ra 8/3
2.67
19Online Testing Sequence Rating
- WEIGHTED
- The weighted mean of the last rating and the
current sequence's similarity. The rating Rj for
the jth sequence is calculated as - Rj aSim(sj.,pu) (1 a) Rj-1 , where R0
Sim(s0,pu). - For example, if a 0.33, then Ro R1 R2 R3 3,
and R4 2.66. - DECAYED_WEIGHTS
- A variant of WEIGHTED. a is varied according to
the sequence number - The rating Rj for jth sequence is calculated as
- E.g. if y 4100 and z 7500, then R0 R1 R2
R3 3, and R4 2.66.
20Online Testing Prediction (Normal Vs Anomaly)
- Normal i.e. true user , anomaly i.e. possible
masquerader - Based upon the sequence rating Rj for sequence sj
- Normal Sequences
- TACCEPT is lower accept threshold
- If user sequence rating gt TACCEPT then normal
user - E.g.
- TACCEPT 2.7, for WEIGHTED rating metric (a
0.33) no alarm will be raised for s0, since R0
3 gt 2.7. - y, s1, s2, s3 are all normal
- assigned to the nearest profile cluster, e.g., c0
s0, s1, s2, s0, s1 and c1 s3, s4,
s2, s3 - Cluster centers are recalculated
21Online Testing Prediction (Normal Vs Anomaly)
- Anomalous Sequences
- Sequences that fail TACCEPT Test
- E.g. for s4 R42.66 lt 2.7 Type A alarm
- Reasons
- Noise (typing errors)
- Concept drift (change of project)
- Anomalous Sequence
- larger the number of anomalous sequences in near
succession, the more suspicious the identity of
the user - Cluster the anomalous sequences to get a better
estimate of behavioral change - Type B alarm if cluster size crosses certain
threshold Tcluster
22Incremental Clustering Algorithm
23Results
- The system achieves approximately 80 detection
rate and 15 false positive rate - The security analyst should only go through the
anomalous clusters instead of vast amounts of
audit data
24Integrated Access Control and Intrusion Detection
for Web Servers
- Problems faced by Web Servers
- Stealing and destroying data
- Denying user access
- Changing website content to embarrass
organizations - Subverting Web Servers through vulnerable cgi
scripts - Denial of Service (DOS) attack
- Traditional access control systems were not
designed to detect and adjust their behavior to
take corrective action - Separate components like fire-walls, IDSs and
code integrity checkers they do not fully
address a web servers security needs. - This approach supports access control policies
extended with the capability of identifying
intrusions and respond to the intrusions in real
time.
25Generic Application Level Intrusion Detection
Framework
26Generic Authorization and Access Control API
- Supports fine grained access control and
application level intrusion detection and
response - Evaluates HTTP requests and determines whether
the requests are allowed and if they represent a
threat according to a policy. - Provides general-purpose execution environment in
which EACLs are evaluated - Policy Enforcement 3 phases
- Before requested operation starts (is the
operation authorized) - During execution of the authorized operation
(detect malicious behavior during exec) - After operation completes (logging and
notification whether the operation succeeded or
failed ) - respond to suspected intrusion in real-time
before it causes damage - Can be easily integrated with different
applications - Apache Web server, SOCKS5, sshd, and FreeS/WAN
IPsec for Linux.
27Policy Representation - EACL
- EACL-Extended Access Control List
- Simple policy language designed to describe
user-level authorization policy - EACL is associated with an object to be protected
- Specifies negative and positive access rights on
the object - Also has optional set of associated conditions
- Types of Conditions
- Pre-conditions What must be true in order to
grant request - Request-result conditions must be activated
whether granted or denied - Mid-conditions what must be true during the
execution of requested op - Post-conditions what must happen after the
completion of operation - EACL entry consists of positive or negative
access rights and four condition blocks a set
of pre-conditions
28EACL Syntax
- An EACL is specified according to the following
format - eacl eacl_entry
- eacl_entry pos_access_ right_ conditions
neg_access_right_conditions - pos_access_right "pos_access_right"
- def_auth value
- neg_access_right "neg_access_right"
- def_auth_value
- conditions pre_conds mid_conds rr_conds
post_conds - pre_conds condition
- mid_conds condition
- rr_conds condition
- post_conds condition
- condition cond_type def_auth value
- cond_type alphanumeric_string
- def_auth alphanumeric_string
- value alphanumeric_string
cond_type type of condition def_auth
authority responsible for defining
the value within cond_type value value of
the condition
29EACL Example Access to host
- EACL entry 3
- pos access right test host login
- pre cond location IPsec 10.1.1.0-10.1.200.255
- pre cond access id KerberosV.5 partnerb_at_ORGB.EDU
- pre cond threshold local lt3 failures/day/failed
log/ - rr cond update log local onfailure/failed
log/infouserID - mid cond duration local lt 8hrs
- EACL entry 4
- pos access right test host check status
- pre cond location IPsec 10.1.1.0-10.1.200.255
- EACL entry 5
- pos access right test host shut down
- pre cond access id KerberosV.5 trusted_at_ORGA.EDU
- rr cond audit local onsuccess/infouserID
- post cond notify local email/tosysadmin/onfailur
e
- EACL entry 1
- neg_access_right test host_login
- pre_cond_access_id KerberosV.5 tom_at_ORGB.EDU
- EACL entry 2
- pos_access_right test host_login
- pre_cond_location IPsec 10.1.1.0-10.1.200.255
- pre_cond_access_id
- X509/CUS/OTrusted/OUorgb.edu/CN
- partnerB
- pre_cond_threshold_local lt3 failures/day/failed
log/ - rr_cond_update_log local on failure/failed_log/i
nfouserID - mid_cond_duration local _lt 8hrs
30EACL Policy Composition and Modules in GAA
- Policy Composition
- Process of relating separately specified policies
- System-wide policy and local policy (merged)
- System-wide policy specifies a composition mode
that describes how local policies are to be
composed with it - Expand disjunction of rights
- Narrow conjunction of rights
- Stop local policies are ignored
- GAA Modules
- Access Control
- Detector
- Countermeasure handler
- Security Database
31GAA-API and IDS Interaction
- GAA-API to IDS Interaction
- Ill-formed access requests
- Access request with abnormal parameters
- Denied Access
- Exceeding threshold
- Incidents and Suspicious application behavior
- Legitimate activity (creating and updating user
profiles) - IDS to GAA-API Interaction
- Can be used for updating policies and adjusting
policy values such as thresholds, times and
locations.
32GAA-API and APACHE Integration
- Apache Access Control
- .htaccess file
-
- Order Deny Allow
- Deny from All
- Allow from 10000255000
- AuthType Basic
- AuthUserFile /usr/local/apache2/htpasswd-isi-staf
f - Require valid-user
- Satisfy All
- Access request _--gt check access control policies
- Outputs
-
- HTTP_OK
- HTTP_DECLINED
- HTTP_AUTHREQUIRED
33GAA-API to Enhance the Access Control of Apache
Server
- Apache Server does not support fine-grained
policies like - Which users or user groups from which location
are allowed to access - Does not support other conditions like time,
threat level, system load. - GAA-APACHE Access Control
- Makes use of system-wide and local policy and
configuration files - 3 status values are returned to describe policy
enforcement process - Authorization Status Sa indicates whether the
request is authorized (GAA_YES), not authorized
(GAA_NO) or uncertain (GAA_MAYBE) - Midcondition enforcement status Sm indicate
status of mid-conditions - Postcondition enforcement status Sp indicate the
status of post-conditions - Policy evaluation happens in four phases as in
the figure - Sa to Apache format
- GAA_YES ? HTTP_OK
- GAA_NO ? HTTP_DECLINED
- GAA_MAYBE ? HTTP_AUTHREQUIRED
34Examples
- When system level is higher than low, lock down
the system and require user authentication for
all accesses within the network - System-wide policy
- eacl_mode 1 composition mode narrow
- EACL entry 1
- neg_access_right
- pre_cond_system_threat_level local high
- Local policy
- EACL entry 1
- pos_access_right apache
- pre_cond_system_threat_level local gt low
- pre_cond_accessID_USER apache
- Prevention of penetration and/or surveillance
attacks by detecting CGI script abuse - System-wide policy
- eacl_mode 1 composition mode narrow
- EACL entry 1
- neg_access_right
- pre_cond_accessID_GROUP local BadGuys
- Local policy
- EACL entry 1
- neg_access_right apache
- pre_cond_regex gnu phf test-cgi
- rr_cond_notify local onfailure/email/sysadmin/inf
o CGIexploit - rr_cond_update_log local onfailure/BadGuys/infoI
P - EACL entry 2
- Pos_access_right apache
35Conclusions
- Traditional access control mechanisms have little
ability to support or respond to the detection of
attacks. - A generic authorization framework that supports
security policies that can detect attempted and
actual security breaches and which can actively
respond by modifying security policies
dynamically has been developed. - The GAA-API implementation is available at
http//gaaapi.sysproject.info.