RABAs Red Team Assessments - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

RABAs Red Team Assessments

Description:

QuickSilver / Ricochet (Cornell) Steward (JHU) The Tasking ' ... QuickSilver. Assessment. November 8, 2005 ... QuickSilver (Cornell) Assessment Strategy ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 31

Provided by: john366

Category:

more less

Transcript and Presenter's Notes

Title: RABAs Red Team Assessments

1
RABAs Red TeamAssessments
QuickSilver

14 December 2005

2
Agenda

Tasking for this talk
Projects Evaluated
Approach / Methodology
Lessons Learned
and Validations Achieved
The Assessments
General Strengths / Weaknesses
AWDRAT (MIT)
Success Criteria
Assessment Strategy
Strengths / Weaknesses
LRTSS (MIT)
QuickSilver / Ricochet (Cornell)
Steward (JHU)

3
The Tasking

Lee would like a presentation from the Red Team
perspective on the experiments you've been
involved with. He's interested in a
talk that's heavy on lessons learned and
benefits gained.
Also of interest would be
red team thoughts on strengths and weaknesses
of the technologies involved.
Keeping in mind that no rebuttal would be able to
take place beforehand,
controversial observations should be either
generalized (i.e., false positives as a problem
across several projects) or left to the final
report.
-- John Frank e-mail (November 28, 2005)

4
Specific Teams We Evaluated

Architectural-Differencing, Wrappers, Diagnosis,
Recover, Adaptive Software and Trust Management
(AWDRAT)
October 18-19, 2005
MIT
Learning and Repair Techniques for Self-Healing
Systems (LRTSS)
October 25, 2005
MIT
QuickSilver / Ricochet
November 8, 2005
Cornell University
Steward
Dec 9, 2005
JHU

5
Basic Methodology

Planning
Present High Level Plan at July PI Meeting
Interact with White Team to schedule
Prepare Project Overview
Prepare Assessment Plan
Coordinate with Blue Team and White Team
Learning
Study documentation provided by team
Conference Calls
Visit with Blue Team day prior to assessment
Use system, examine output, gather data
Test
Formal De-Brief at end of Test Day

6
Lessons Learned(and VALIDATIONS achieved)
7
Validation / Lessons Learned

Consistent Discontinuity of Expectations
Scope of the Assessment Success Criteria
Boiling it down to Red Team Wins or Blue Team
Wins on each test required significant clarity
Unique to these assessments because the metrics
were unique
Lee/John instituted an assessment scope
conference call ½ way through
we think that helped a lot
Scope of Protection for the systems
Performers Assumptions vs. Red Teams
Expectations
In all cases, we wanted to see a more holistic
approach to the security model
We assert each program needs to define its
security policy
And especially document what it assumes will be
protected / provided by other components or
systems

8
LL Scope of Protection
9
Validation / Lessons Learned

More time would have helped A LOT
Longer Test Period (2-3 day test vice 1 day test)
Having an evening to digest then return to test
would have allowed more effective additional
testing and insight
We planned an extra 1.5 days for most, and that
was very helpful
We werent rushing to get on an airplane
We could reduce the data and come back for
clarifications if needed
We could defer non-controversial tests to the
next day to allow focus with Government present
More Communication with Performers
Pre-Test Site/Team Visit (2-3 weeks prior to
test)
Significant help in preparing testing approach
The half-day that we implemented before the test
was crucial for us
More conference calls would have helped, too
Hard to balance against performers main focus,
though

10
Validation / Lessons Learned

A Series of Tests Might Be Better
Perhaps one day of tests similar to what we did
Then a follow-up test a month or two later as
prototypes matured
With the same test team to leverage understanding
of system gained
We Underestimated the Effort in Our Bid
Systems were more unique and complex than we
anticipated
20-25 more hours would have helped us a lot in
data reduction
Multi-talented team proved vital to success
We had programming (multi-lingual), traditional
red team, computer security, systems engineering,
OS, system admin, network engineering, etc.
talent present for each test
Highly tailored approach proved appropriate and
necessary
Using more traditional network-oriented Red Team
Assessment approach would have failed

11
The Assessments
12
Overall Strengths / Weaknesses of Projects

Strengths
Teams worked hard to support our assessments
The technologies are exciting and powerful
Weaknesses
Most Suffered a Lack of System Documentation
We understand there is a balance to strike
these are research prototypes essentially after
all
Really limited ability to prepare for assessment
All are Prototypes -- stability needed for
deterministic test results
All provide incomplete security / protection
almost by definition
Most Suffered a Lack of Configuration Management
/ Control
Test Harnesses far from optimal for Red Team
use
Of course, they are oriented around supporting
the development
But, were fairly limited in using other tools
due to uniquenesses of the technologies

13
AWDRATAssessment

October 18-19, 2005

14
Success Criteria
AWDRAT (MIT)

The target application can successfully and/or
correctly perform its mission
The AWDRAT system can
detect an attacked clients misbehavior
interrupt a misbehaving client
reconstitute a misbehaving client in such a way
that the reconstituted client is not vulnerable
to the attack in question
The AWDRAT system must
Detect / Diagnose at least 10 of attacks/root
causes
Take effective corrective action on at least 5
of the successfully identified compromises/attacks

15
Assessment Strategy
AWDRAT (MIT)

Denial of Service
aimed at disabling or significantly modifying the
operation of the application to an extent that
mission objectives cannot be accomplished
attacks using buffer-overflow and corrupted data
injection to gain system access
False Negative Attacks
a situation in which a system fails to report an
occurrence of anomalous or malicious behavior
Red Team hoped to perform actions that would fall
"under the radar". We targeted the modules of
AWDRAT that support diagnosis and detection.
False Positive Attacks
system reports an occurrence of malicious
behavior when the activity detected was
non-malicious
Red Team sought to perform actions that would
excite AWDRAT's monitors. Specifically, we
targeted the modules supporting diagnosis and
detection.
State Disruption Attacks
interrupt or disrupt AWDRAT's ability to maintain
its internal state machines
Recovery Attacks
disrupt attempts to recover or regenerate a
misbehaving client
target the Adaptive Software and Recovery and
Regeneration modules in an attempt to allow a
misbehaving client to continue operating

16
Strengths / Weaknesses
AWDRAT (MIT)

Strengths
With a reconsideration of systems scope of
responsibility, we anticipate the system would
have performed far better in the tests
We see great power in the concept of wrapping all
the functions
Weaknesses
Scope of Responsibility / Protection far too
Limited
Need to Develop Full Security Policy
Single points of failure
Application-Specific Limitations
Application Model Issues
Incomplete by design?
Manually Created
Limited Scope
Doesnt really enforce multi-layered defense

17
LRTSSAssessment

October 25, 2005

18
Success Criteria
LRTSS (MIT)

The instrumented Freeciv server does not core
dump under a condition in which the
uninstrumented Freeciv server does core dump
The LRTSS system can
Detect a corruption in a data structure that
causes an uninstrumented Freeciv server to exit
Repair the data corruption in such a way that the
instrumented Freeciv server can continue running
The LRTSS system must
Detect / Diagnose at least 10 of attacks/root
causes
Take effective corrective action on at least 5
of the successfully identified compromises/attacks

19
Assessment Strategy
LRTSS (MIT)

Denial of Service
Aimed at disabling or significantly modifying the
operation of the Freeciv server to an extent that
mission objectives cannot be accomplished
In this case, not achieving mission objectives is
defined as the Freeciv server exits or dumps core
Attacks using buffer-overflow, corrupted data
injection, and resource utilization
Various data corruptions aimed at causing the
server to exit
Formulated the attacks by targeting the
uninstrumented server first, then running the
same attack against the instrumented server
State Disruption Attacks
interrupt or disrupt LRTSS's ability to maintain
its internal state machines

20
Strengths / Weaknesses
LRTSS (MIT)

Strengths
Performs very well under simple data corruptions
(that would cause the system to crash under
normal operation)
Performs well under a large number of these
simple data corruptions
(200 to 500 corruptions are repaired
successfully)
Learning and Repair algorithms well thought out
Weaknesses
Scope of Responsibility / protection too limited
Complex Data Structure Corruptions not handled
well
Secondary Relationships are not protected against
Pointer Data Corruptions not entirely tested
Timing of Check and Repair Cycles not optimal
Description of Mission Failure as core dump may
be excessive

21
QuickSilverAssessment

November 8, 2005

22
Success Criteria
QuickSilver (Cornell)

Ricochet can successfully and/or correctly
perform its mission
Ricochet must consistently achieve a
fifteen-fold reduction in latency (with benign
failures) for achieving consistent values of data
shared among one hundred to ten thousand
participants, where all participants can send and
receive events."
Per client direction, elected to use average
latency time as the comparative metric
Ricochets Average Recovery demonstrates 15-fold
improvement over SRM
Additional constraint levied requiring 98 update
saturation (imposing the use of the NACK failover
for Ricochet)

23
Assessment Strategy
QuickSilver (Cornell)

Scalability Experiments -- test scalability in
terms of number of groups per node and number of
nodes per group. Here, no node failures will be
simulated, and no packet losses will be induced
(aside from those that occur as a by-product of
normal network traffic).
Baseline Latency
Group Scalability
Large Repair Packet Configuration
Large Data Packet Storage Configuration
Simulated Node Failures simulate benign node
failures.
Group Membership Overhead / Intermittent Network
Failure
Simulated Packet Losses introduce packet loss
into the network.
High Packet Loss Rates
Node-driven Packet Loss
Network-driven Packet Loss
Ricochet-driven Packet Loss
High Ricochet Traffic Volume
Low Bandwidth Network
Simulated Network Anomalies simulate benign
routing and network errors that might exist on a
deployed network. The tests will establish
whether or not the protocol is robust in its
handling of typical network anomalies, as well as
those atypical network anomalies that may be
induced by an attacker.
Out of Order Packet Delivery
Packet Fragmentation
Duplicate Packets
Variable Packet Sizes

24
Strengths / Weaknesses
QuickSilver (Cornell)

Strengths
Appears to be very resilient when operating
within its assumptions
Very stable software
Significant performance gains over SRM
Weaknesses
FEC-orientation - focus in statistics belies
valuable data regarding complete packet delivery
Batch-oriented Test Harness
Impossible to perform interactive attacks
Very limited insight into blow-by-blow
performance
Metrics collected were very difficult to
understand fully

25
STEWARDAssessment

December 9, 2005

26
Success Criteria
Steward (JHU)

The STEWARD system must
Make progress in the system when under attack.
Progress is defined as the eventual global
ordering, execution, and reply to any request
which is assigned a sequence number within the
system
Maintain a consistency of data replicated on each
of the servers in the system

27
Assessment Strategy
Steward (JHU)

Validation Activities - tests we will perform to
verify that STEWARD can endure up to five
Byzantine faults while maintaining a three-fold
reduction in latency with respect to BFT
Byzantine Node Threshold
Benchmark Latency
Progress Attacks - attacks we will launch to
prevent STEWARD from progressing to a successful
resolution of an ordered client request
Packet Loss
Packet Delay
Packet Duplication
Packet Re-ordering
Packet Fragmentation
View Change Message Flood
Site Leader Stops Assigning Sequence Numbers
Site Leader Assigns Non-Contiguous Sequence
Numbers
Suppressed New-View Messages
Consecutive Pre-Prepare Messages in Different
Views
Out of Order Messages
Byzantine Induced Failover

Data Integrity Attacks - attempts to create an
inconsistency in the data replicated on the
various servers in the network
Arbitrarily Execute Updates
Multiple Pre-Prepare Messages using Same Sequence
Numbers and Different Request Data
Spurious Prepare, Null Messages
Suppressed Checkpoint Messages
Prematurely Perform Garbage Collection
Invalid Threshold Signature
Protocol State Attacks - attacks focused on
interrupting or disrupting STEWARD's ability to
maintain its internal state machines
Certificate Threshold Validation Attack
Replay Attack
Manual Exploit of Client or Server

Note We did not try to validate or break the
encryption algorithms.
28
Strengths / Weaknesses
Steward (JHU)

Strengths
First system that assumes and actually tolerates
corrupted components (Byzantine attack)
Blue Team spent extensive time up front in
analysis, design and proof of the protocol it
was clear in the performance
System was incredibly stable and resilient
We did not compromise the system
Weaknesses
Limited Scope of Protection
Relies on external entity to secure and manage
keys which are fundamental to the integrity of
the system
STEWARD implicitly and completely trusts the
client
Client-side attacks were out of scope of the
assessment