Immune System Metaphors Applied to Intrusion Detection and Related Problems - PowerPoint PPT Presentation

About This Presentation
Title:

Immune System Metaphors Applied to Intrusion Detection and Related Problems

Description:

Title: Immune System Metaphors Applied to Intrusion Detection and Related Problems Author: Ian Nunn Last modified by: Tony White Created Date: 10/26/2002 7:20:42 PM – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 30
Provided by: IanN8
Category:

less

Transcript and Presenter's Notes

Title: Immune System Metaphors Applied to Intrusion Detection and Related Problems


1
Immune System Metaphors Applied to Intrusion
Detection and Related Problems
  • by
  • Ian Nunn, SCS, Carleton University
  • inunn_at_digitaldoor.net

2
Overview of Presentation
  • Review of immune system properties of most
    interest
  • Algorithm design and the representation of
    application domains
  • Examples of two recognition algorithms
  • Overview of application areas
  • Focus on intrusion detection systems (IDS)
  • Advantages of IS models and future research
  • The IS model as a swarm system

3
Immune System Characteristics of Interest
  • The human immune system (IS) is a system of
    detectors (principally B and T cells) that
  • After initial negative selection (tolerization),
    does not recognize elements of the body (self)
  • Is adaptable in that it can recognize over time,
    any foreign element (non-self) including those
    never before encountered
  • Remembers previous foreign element encounters
  • Dynamically regenerates its elements
  • Regulates the population size and diversity of
    its elements
  • Is robust to input signal noise (recognition
    region) and detector loss
  • Is distributed in nature with no central or
    hierarchical control
  • Is error tolerant in that self recognition does
    not halt the system
  • Is self-protecting since it is part of self

4
Representation of Self/Non-Self
  • IS elements involved are cellular proteins and
    their peptide sequences

5
IS Application Algorithm Design
  • Requires a deep understanding of the problem
    domain
  • Self/non-self discrimination the fundamental IS
    principle
  • Steps in designing an IS algorithm
  • Identification of features allowing correct and
    complete self/non-self discrimination
  • Representation or encoding of features,
    particularly of continuous real-valued
    parameters. Ab and Ag feature strings of same
    length facilitate algorithm performance analysis
  • Determination of a matching or fitness function.
    Important for evolution of Ab populations
    (affinity maturation)
  • Selection of IS principles to apply, e.g.
    negative selection, costimulation, affinity
    maturation, etc.

This is hard stuff and an important step in
applying any modeling technique whether genetic
algorithms or swarm simulations (recall for army
ants the problem of deciding what parameters to
assign to the ants and to the environment and
what values to allow).
6
Approach to Feature Selection and Representation
  • Antibodies and antigens represented by strings of
    features
  • The set of actual values observed such as sensor
    readings, voltages, ASCII text is called the
    applications phenotype
  • The coded representation is called the
    applications genotype
  • A feature is encoded by symbols from a finite
    alphabet
  • Some application feature domains
  • Binary variables digital signals in computer
    systems
  • Discrete real variables ASCII character text
  • Continuous real variables real world sensors
  • Continuous domains must be mapped onto discrete
    domains since we work with finite alphabets to
    ensure finite Ab/Ag population spaces

7
Phenotype Representation Change Detection
Problem Domains
  • OS (UNIX) processes sequences of top level
    system calls
  • Program execution alphabet symbols represent op
    codes
  • File system reduction to ASCII or binary strings
  • User behavior and interface use keystrokes,
    mouse clicks
  • Time series data representation of a physical
    (plant) processes x/y position of a milling
    machine tool
  • Memory accesses memory address calls
  • Local network traffic TCP/IP packets addresses,
    ports
  • Network traffic through routers and gateways
    TCP/IP packets, addresses, volume

8
IS Phenotype Encoding and Matching Using a Binary
Model1
  • Genotype Phenotype 32 bit string on a binary
    alphabet

9
Example Use of a Binary Model with a GA for
Clonal Selection1
  • Start with randomly generated Ag and possibly
    incomplete Ab populations
  • For each Ab in turn, compute its average match
    (fitness) with a random fixed-size Ag
    subpopulation
  • Use a standard GA with mutation but no crossover
    to evolve successively better generations of
    antibodies
  • Niches observed to develop in coverage space for
    genetic commonalities (bacterial polysaccaride
    coating) if the initial populations have a bias
  • Self recognition minimized (without negative
    selection) by selecting for more Ag specific
    instead of more general antibodies less likely
    to match self

10
Establishing Antibody Fitness
Random sub-population
11
GA Evolution of Antibodies1
________________ 1011010111110111
12
Use of a Negative Selection Algorithm for Clonal
Selection5,2
  • Want explicit self-filtering (tolerization)
  • Algorithm
  • Generate the set S of self (sub)strings
  • Generate a set R0 of random strings
  • Match each string from R0 against S
  • Match (non-complementary) on at least r
    contiguous locations reject
  • No match add string to detector set R
  • How to generate detectors efficiently an issue
  • Match detector set against target strings to
    detect intruder
  • Strings can be on any alphabet

13
Negative Selection Algorithm2
Match Ab1 10xx
Match Ab4 xx00
14
The Problem of Holes6
  • For a particular choice of matching rule and Ab
    repertoire, some non-self strings may not be
    found causing a hole in the coverage space
  • A detector that matches any r contiguous bits in
    h1 will also match either s1 or s2 for the same
    feature string. The same for h2. So h1 and h2 are
    undetectable.

15
Major Application Categories of Immune System
Theory
  • Machine learning and pattern recognition limited
    but promising work done to date
  • Associative memories limited work done to date
  • Elimination of identified elements
  • IS model use the B cell and Tk cell kill
    disable viruses
  • Use a phagocyte analogue for cleanup and garbage
    collection
  • IBM virus lab and Forrests group at UNM have
    looked at this
  • Recovery, repair and augmentation of identified
    elements
  • IS model use the B cell and Tk cell analogue to
    deliver a positive payload to an agent
  • Very little work done to date

16
Application Areas (cont.)
  • Detection problems where most of the work has
    occurred
  • Fault failure of a self element (industrial
    plant systems)
  • Change any change in self (tumors)
  • Anomaly unusual presentation of a self element
  • Virus presence of a non-self element
  • Intrusion attempt to gain access by non-self
    element
  • Many of the classical issues of computer and
    network security involve some element of
    detection or self/non-self discrimination

17
The Intrusion Detection Problem
  • Two classical types of intrusion detection
    systems (IDS)
  • Host-based domain is a single machine possibly
    on a network
  • Network-based domain is a network of hosts
  • Two classes of problem
  • Anomaly detection deviations from normal local
    resource use and network traffic
  • Misuse detection usage identified with known
    system vulnerabilities and security policies

18
Essential Requirements of a Network-based IDS
  • Robustness to host failures and noisy signals
    (anomalous behavior)
  • Easy (self-)configurability of hosts
  • Easy extendibility to new hosts
  • Scalability extendible to large networks without
    degradation of performance
  • Adaptability dynamically able to recognize new
    anomalies
  • Efficiency simple and low overhead operation
  • Global analysis able to correlate local events
    to form global patterns

19
Network Representation
  • Commonly represent problem as the connection
    events (not message content) between computers
  • Kim and Bentley4
  • Phenotype 35 real-valued fields in four
    categories - connection identifier, port
    vulnerabilities, TCP handshaking, traffic
    intensity
  • Genotype 35 genes. A detector gene has three
    nucleotides (cluster number in (0, 9), min
    offset, max offset). An antibody or antigen has a
    single real value.
  • Cluster and offset tables established for each
    host at start
  • A matching function maps an Ag or Ab value to a
    cluster and takes the distance to the nearest
    cluster bound as the measure of similarity
  • Use positive detection events to evolve the
    offsets for clusters

20
Kim and Bentley Model4
New lower interval bound for cluster 2
New upper interval bound for cluster 2
21
Network Representation (cont.)
  • Hofmeyr and Forrest3
  • Phenotype 3 integer fields (source IP address,
    destination IP address, service or port number)
  • Genotype for a detector, 49 bit binary string
    state

22
Algorithmic Refinements3
  • Detectors may have a lifetime at the end of which
    they are replaced if they have not matched
    maintain diversity
  • Activation threshold and time decay on activation
    level to deter limited autoimmune reaction to
    rare self strings
  • Local activation causes a message to be sent to
    other hosts decreasing their activation levels
    (cytokine costimulation)
  • Matching rule may result in holes in coverage. A
    randomly assigned permutation mask to control
    packet presentation helps avoid this (MHC
    molecule host diversity) contributing to
    population diversity.
  • Each host has a unique detector set contributing
    to diversity and self-protection across a
    population of hosts

23
The Hofmeyr and Forrest Model3
Antigen pheotype
Host-based refinement fields
Detector state
24
Problem Posed by Computer Applications
  • The repertoire of human self proteins is fixed
    over a lifetime
  • In networks, valid hosts are added and deleted
    without notice so what self is constantly
    changes
  • Among a fixed set of hosts, valid usage patterns
    may change without notice
  • One solution costimulation by a trusted (human)
    authority both at start and subsequent operation
  • Much work needs to be done

25
Advantages of IS Models
  • Adaptability through the ability to recognize
    foreign patterns never before encountered
  • Distributed detection contributes to
  • Diversity (shape space coverage)
  • Robustness (failure of individual hosts)
  • Scalability and extendibility
  • Quick response to new variants of old attacks
  • Ability to reproduce detectors of increasing
    fitness while self-regulating the overall
    population

26
IS as a Swarm System
  • The IS model has a number of characteristics in
    common with swarm systems
  • Large populations of independent agents of
    characterizable classes
  • Each agent has at most a very few characteristic
    simple behaviors
  • Bind with another appropriate agent and activate
    (B and T cells)
  • Kill something (killer T cell)
  • Clone myself (B cell)
  • Secrete a signaling chemical or an antibody (T
    and B cells)
  • Live for a long time (memory B and T cells)
  • Simple interactions with the environment
  • Special things that happen in lymphoid organs
  • Secreting signal chemicals which alter
    environmental properties (cytokines and
    inflammation)
  • Self-organizing as an emergent property
  • No centralized control over the system

27
Areas for Additional Research
  • Matching rules with good computational
    properties, perhaps application specific ones
  • Self/non-self representation and encoding
  • Algorithms for generating detector sets
  • Other selection algorithms
  • Incorporation of additional IS characteristics
  • Detector set populations evolution, dynamics and
    emergent properties at the species level

28
References
  1. Forrest, Smith, Javornik and Perelson. Using
    Genetic Algorithms to Explore Pattern Recognition
    in the Immune System. Evolutionary Computation,
    1(3)191-211, 1993.
  2. Forrest, Allen, Perelson and Cherukuri.
    Self-Nonself Discrimination in a Computer.
    Proceeding of the 1994 IEEE Symposium on Research
    in Security and Privacy, Los Alamitos CA, 1994.
  3. Hofmeyr and Forrest. Immunity by Design An
    Artificial Immune System. In Proceedings of 1999
    GECCO Conference, 1999.
  4. Kim and Bentley. Negative selection and niching
    by an artificial immune system for network
    intrusion detection. In Late Breaking Papers at
    the 1999 Genetic and Evolutionary Computation
    Conference, Orlando, Florida, 1999.

29
References (cont.)
  • Forrest, Allen, Perelson and Cherukuri. A
    Change-Detection Algorithm Inspired by the Immune
    System. Submitted to IEEE Transactions on
    Software Engineering, 1995.
  • D'haeseleer, Forrest and Helman. An Immunological
    Approach to Change Detection Algorithms,
    Analysis, and Implications. Proceeding of the
    1994 IEEE Symposium on Research in Security and
    Privacy, 1996.
Write a Comment
User Comments (0)
About PowerShow.com