Title: Immune System Metaphors Applied to Intrusion Detection and Related Problems
1Immune System Metaphors Applied to Intrusion
Detection and Related Problems
- by
- Ian Nunn, SCS, Carleton University
- inunn_at_digitaldoor.net
2Overview of Presentation
- Review of immune system properties of most
interest - Algorithm design and the representation of
application domains - Examples of two recognition algorithms
- Overview of application areas
- Focus on intrusion detection systems (IDS)
- Advantages of IS models and future research
- The IS model as a swarm system
3Immune System Characteristics of Interest
- The human immune system (IS) is a system of
detectors (principally B and T cells) that - After initial negative selection (tolerization),
does not recognize elements of the body (self) - Is adaptable in that it can recognize over time,
any foreign element (non-self) including those
never before encountered - Remembers previous foreign element encounters
- Dynamically regenerates its elements
- Regulates the population size and diversity of
its elements - Is robust to input signal noise (recognition
region) and detector loss - Is distributed in nature with no central or
hierarchical control - Is error tolerant in that self recognition does
not halt the system - Is self-protecting since it is part of self
4Representation of Self/Non-Self
- IS elements involved are cellular proteins and
their peptide sequences
5IS Application Algorithm Design
- Requires a deep understanding of the problem
domain - Self/non-self discrimination the fundamental IS
principle - Steps in designing an IS algorithm
- Identification of features allowing correct and
complete self/non-self discrimination - Representation or encoding of features,
particularly of continuous real-valued
parameters. Ab and Ag feature strings of same
length facilitate algorithm performance analysis - Determination of a matching or fitness function.
Important for evolution of Ab populations
(affinity maturation) - Selection of IS principles to apply, e.g.
negative selection, costimulation, affinity
maturation, etc.
This is hard stuff and an important step in
applying any modeling technique whether genetic
algorithms or swarm simulations (recall for army
ants the problem of deciding what parameters to
assign to the ants and to the environment and
what values to allow).
6Approach to Feature Selection and Representation
- Antibodies and antigens represented by strings of
features - The set of actual values observed such as sensor
readings, voltages, ASCII text is called the
applications phenotype - The coded representation is called the
applications genotype - A feature is encoded by symbols from a finite
alphabet - Some application feature domains
- Binary variables digital signals in computer
systems - Discrete real variables ASCII character text
- Continuous real variables real world sensors
- Continuous domains must be mapped onto discrete
domains since we work with finite alphabets to
ensure finite Ab/Ag population spaces
7Phenotype Representation Change Detection
Problem Domains
- OS (UNIX) processes sequences of top level
system calls - Program execution alphabet symbols represent op
codes - File system reduction to ASCII or binary strings
- User behavior and interface use keystrokes,
mouse clicks - Time series data representation of a physical
(plant) processes x/y position of a milling
machine tool - Memory accesses memory address calls
- Local network traffic TCP/IP packets addresses,
ports - Network traffic through routers and gateways
TCP/IP packets, addresses, volume
8IS Phenotype Encoding and Matching Using a Binary
Model1
- Genotype Phenotype 32 bit string on a binary
alphabet
9Example Use of a Binary Model with a GA for
Clonal Selection1
- Start with randomly generated Ag and possibly
incomplete Ab populations - For each Ab in turn, compute its average match
(fitness) with a random fixed-size Ag
subpopulation - Use a standard GA with mutation but no crossover
to evolve successively better generations of
antibodies - Niches observed to develop in coverage space for
genetic commonalities (bacterial polysaccaride
coating) if the initial populations have a bias - Self recognition minimized (without negative
selection) by selecting for more Ag specific
instead of more general antibodies less likely
to match self
10Establishing Antibody Fitness
Random sub-population
11GA Evolution of Antibodies1
________________ 1011010111110111
12Use of a Negative Selection Algorithm for Clonal
Selection5,2
- Want explicit self-filtering (tolerization)
- Algorithm
- Generate the set S of self (sub)strings
- Generate a set R0 of random strings
- Match each string from R0 against S
- Match (non-complementary) on at least r
contiguous locations reject - No match add string to detector set R
- How to generate detectors efficiently an issue
- Match detector set against target strings to
detect intruder - Strings can be on any alphabet
13Negative Selection Algorithm2
Match Ab1 10xx
Match Ab4 xx00
14The Problem of Holes6
- For a particular choice of matching rule and Ab
repertoire, some non-self strings may not be
found causing a hole in the coverage space
- A detector that matches any r contiguous bits in
h1 will also match either s1 or s2 for the same
feature string. The same for h2. So h1 and h2 are
undetectable.
15Major Application Categories of Immune System
Theory
- Machine learning and pattern recognition limited
but promising work done to date - Associative memories limited work done to date
- Elimination of identified elements
- IS model use the B cell and Tk cell kill
disable viruses - Use a phagocyte analogue for cleanup and garbage
collection - IBM virus lab and Forrests group at UNM have
looked at this - Recovery, repair and augmentation of identified
elements - IS model use the B cell and Tk cell analogue to
deliver a positive payload to an agent - Very little work done to date
16Application Areas (cont.)
- Detection problems where most of the work has
occurred - Fault failure of a self element (industrial
plant systems) - Change any change in self (tumors)
- Anomaly unusual presentation of a self element
- Virus presence of a non-self element
- Intrusion attempt to gain access by non-self
element - Many of the classical issues of computer and
network security involve some element of
detection or self/non-self discrimination
17The Intrusion Detection Problem
- Two classical types of intrusion detection
systems (IDS) - Host-based domain is a single machine possibly
on a network - Network-based domain is a network of hosts
- Two classes of problem
- Anomaly detection deviations from normal local
resource use and network traffic - Misuse detection usage identified with known
system vulnerabilities and security policies
18Essential Requirements of a Network-based IDS
- Robustness to host failures and noisy signals
(anomalous behavior) - Easy (self-)configurability of hosts
- Easy extendibility to new hosts
- Scalability extendible to large networks without
degradation of performance - Adaptability dynamically able to recognize new
anomalies - Efficiency simple and low overhead operation
- Global analysis able to correlate local events
to form global patterns
19Network Representation
- Commonly represent problem as the connection
events (not message content) between computers - Kim and Bentley4
- Phenotype 35 real-valued fields in four
categories - connection identifier, port
vulnerabilities, TCP handshaking, traffic
intensity - Genotype 35 genes. A detector gene has three
nucleotides (cluster number in (0, 9), min
offset, max offset). An antibody or antigen has a
single real value. - Cluster and offset tables established for each
host at start - A matching function maps an Ag or Ab value to a
cluster and takes the distance to the nearest
cluster bound as the measure of similarity - Use positive detection events to evolve the
offsets for clusters
20Kim and Bentley Model4
New lower interval bound for cluster 2
New upper interval bound for cluster 2
21Network Representation (cont.)
- Hofmeyr and Forrest3
- Phenotype 3 integer fields (source IP address,
destination IP address, service or port number) - Genotype for a detector, 49 bit binary string
state
22Algorithmic Refinements3
- Detectors may have a lifetime at the end of which
they are replaced if they have not matched
maintain diversity - Activation threshold and time decay on activation
level to deter limited autoimmune reaction to
rare self strings - Local activation causes a message to be sent to
other hosts decreasing their activation levels
(cytokine costimulation) - Matching rule may result in holes in coverage. A
randomly assigned permutation mask to control
packet presentation helps avoid this (MHC
molecule host diversity) contributing to
population diversity. - Each host has a unique detector set contributing
to diversity and self-protection across a
population of hosts
23The Hofmeyr and Forrest Model3
Antigen pheotype
Host-based refinement fields
Detector state
24Problem Posed by Computer Applications
- The repertoire of human self proteins is fixed
over a lifetime - In networks, valid hosts are added and deleted
without notice so what self is constantly
changes - Among a fixed set of hosts, valid usage patterns
may change without notice - One solution costimulation by a trusted (human)
authority both at start and subsequent operation - Much work needs to be done
25Advantages of IS Models
- Adaptability through the ability to recognize
foreign patterns never before encountered - Distributed detection contributes to
- Diversity (shape space coverage)
- Robustness (failure of individual hosts)
- Scalability and extendibility
- Quick response to new variants of old attacks
- Ability to reproduce detectors of increasing
fitness while self-regulating the overall
population
26IS as a Swarm System
- The IS model has a number of characteristics in
common with swarm systems - Large populations of independent agents of
characterizable classes - Each agent has at most a very few characteristic
simple behaviors - Bind with another appropriate agent and activate
(B and T cells) - Kill something (killer T cell)
- Clone myself (B cell)
- Secrete a signaling chemical or an antibody (T
and B cells) - Live for a long time (memory B and T cells)
- Simple interactions with the environment
- Special things that happen in lymphoid organs
- Secreting signal chemicals which alter
environmental properties (cytokines and
inflammation) - Self-organizing as an emergent property
- No centralized control over the system
27Areas for Additional Research
- Matching rules with good computational
properties, perhaps application specific ones - Self/non-self representation and encoding
- Algorithms for generating detector sets
- Other selection algorithms
- Incorporation of additional IS characteristics
- Detector set populations evolution, dynamics and
emergent properties at the species level
28References
- Forrest, Smith, Javornik and Perelson. Using
Genetic Algorithms to Explore Pattern Recognition
in the Immune System. Evolutionary Computation,
1(3)191-211, 1993. - Forrest, Allen, Perelson and Cherukuri.
Self-Nonself Discrimination in a Computer.
Proceeding of the 1994 IEEE Symposium on Research
in Security and Privacy, Los Alamitos CA, 1994. - Hofmeyr and Forrest. Immunity by Design An
Artificial Immune System. In Proceedings of 1999
GECCO Conference, 1999. - Kim and Bentley. Negative selection and niching
by an artificial immune system for network
intrusion detection. In Late Breaking Papers at
the 1999 Genetic and Evolutionary Computation
Conference, Orlando, Florida, 1999.
29References (cont.)
- Forrest, Allen, Perelson and Cherukuri. A
Change-Detection Algorithm Inspired by the Immune
System. Submitted to IEEE Transactions on
Software Engineering, 1995. - D'haeseleer, Forrest and Helman. An Immunological
Approach to Change Detection Algorithms,
Analysis, and Implications. Proceeding of the
1994 IEEE Symposium on Research in Security and
Privacy, 1996.