RABAs Red Team Assessments - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

RABAs Red Team Assessments

Description:

QuickSilver / Ricochet (Cornell) Steward (JHU) The Tasking ' ... QuickSilver. Assessment. November 8, 2005 ... QuickSilver (Cornell) Assessment Strategy ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 31
Provided by: john366
Category:

less

Transcript and Presenter's Notes

Title: RABAs Red Team Assessments


1
RABAs Red TeamAssessments
QuickSilver
  • 14 December 2005

2
Agenda
  • Tasking for this talk
  • Projects Evaluated
  • Approach / Methodology
  • Lessons Learned
  • and Validations Achieved
  • The Assessments
  • General Strengths / Weaknesses
  • AWDRAT (MIT)
  • Success Criteria
  • Assessment Strategy
  • Strengths / Weaknesses
  • LRTSS (MIT)
  • QuickSilver / Ricochet (Cornell)
  • Steward (JHU)

3
The Tasking
  • Lee would like a presentation from the Red Team
    perspective on the experiments you've been
    involved with. He's interested in a
  • talk that's heavy on lessons learned and
    benefits gained.
  • Also of interest would be
  • red team thoughts on strengths and weaknesses
    of the technologies involved.
  • Keeping in mind that no rebuttal would be able to
    take place beforehand,
  • controversial observations should be either
    generalized (i.e., false positives as a problem
    across several projects) or left to the final
    report.
  • -- John Frank e-mail (November 28, 2005)

4
Specific Teams We Evaluated
  • Architectural-Differencing, Wrappers, Diagnosis,
    Recover, Adaptive Software and Trust Management
    (AWDRAT)
  • October 18-19, 2005
  • MIT
  • Learning and Repair Techniques for Self-Healing
    Systems (LRTSS)
  • October 25, 2005
  • MIT
  • QuickSilver / Ricochet
  • November 8, 2005
  • Cornell University
  • Steward
  • Dec 9, 2005
  • JHU

5
Basic Methodology
  • Planning
  • Present High Level Plan at July PI Meeting
  • Interact with White Team to schedule
  • Prepare Project Overview
  • Prepare Assessment Plan
  • Coordinate with Blue Team and White Team
  • Learning
  • Study documentation provided by team
  • Conference Calls
  • Visit with Blue Team day prior to assessment
  • Use system, examine output, gather data
  • Test
  • Formal De-Brief at end of Test Day

6
Lessons Learned(and VALIDATIONS achieved)
7
Validation / Lessons Learned
  • Consistent Discontinuity of Expectations
  • Scope of the Assessment Success Criteria
  • Boiling it down to Red Team Wins or Blue Team
    Wins on each test required significant clarity
  • Unique to these assessments because the metrics
    were unique
  • Lee/John instituted an assessment scope
    conference call ½ way through
  • we think that helped a lot
  • Scope of Protection for the systems
  • Performers Assumptions vs. Red Teams
    Expectations
  • In all cases, we wanted to see a more holistic
    approach to the security model
  • We assert each program needs to define its
    security policy
  • And especially document what it assumes will be
    protected / provided by other components or
    systems

8
LL Scope of Protection
9
Validation / Lessons Learned
  • More time would have helped A LOT
  • Longer Test Period (2-3 day test vice 1 day test)
  • Having an evening to digest then return to test
    would have allowed more effective additional
    testing and insight
  • We planned an extra 1.5 days for most, and that
    was very helpful
  • We werent rushing to get on an airplane
  • We could reduce the data and come back for
    clarifications if needed
  • We could defer non-controversial tests to the
    next day to allow focus with Government present
  • More Communication with Performers
  • Pre-Test Site/Team Visit (2-3 weeks prior to
    test)
  • Significant help in preparing testing approach
  • The half-day that we implemented before the test
    was crucial for us
  • More conference calls would have helped, too
  • Hard to balance against performers main focus,
    though

10
Validation / Lessons Learned
  • A Series of Tests Might Be Better
  • Perhaps one day of tests similar to what we did
  • Then a follow-up test a month or two later as
    prototypes matured
  • With the same test team to leverage understanding
    of system gained
  • We Underestimated the Effort in Our Bid
  • Systems were more unique and complex than we
    anticipated
  • 20-25 more hours would have helped us a lot in
    data reduction
  • Multi-talented team proved vital to success
  • We had programming (multi-lingual), traditional
    red team, computer security, systems engineering,
    OS, system admin, network engineering, etc.
    talent present for each test
  • Highly tailored approach proved appropriate and
    necessary
  • Using more traditional network-oriented Red Team
    Assessment approach would have failed

11
The Assessments
12
Overall Strengths / Weaknesses of Projects
  • Strengths
  • Teams worked hard to support our assessments
  • The technologies are exciting and powerful
  • Weaknesses
  • Most Suffered a Lack of System Documentation
  • We understand there is a balance to strike
    these are research prototypes essentially after
    all
  • Really limited ability to prepare for assessment
  • All are Prototypes -- stability needed for
    deterministic test results
  • All provide incomplete security / protection
    almost by definition
  • Most Suffered a Lack of Configuration Management
    / Control
  • Test Harnesses far from optimal for Red Team
    use
  • Of course, they are oriented around supporting
    the development
  • But, were fairly limited in using other tools
    due to uniquenesses of the technologies

13
AWDRATAssessment
  • October 18-19, 2005

14
Success Criteria
AWDRAT (MIT)
  • The target application can successfully and/or
    correctly perform its mission
  • The AWDRAT system can
  • detect an attacked clients misbehavior
  • interrupt a misbehaving client
  • reconstitute a misbehaving client in such a way
    that the reconstituted client is not vulnerable
    to the attack in question
  • The AWDRAT system must
  • Detect / Diagnose at least 10 of attacks/root
    causes
  • Take effective corrective action on at least 5
    of the successfully identified compromises/attacks

15
Assessment Strategy
AWDRAT (MIT)
  • Denial of Service
  • aimed at disabling or significantly modifying the
    operation of the application to an extent that
    mission objectives cannot be accomplished
  • attacks using buffer-overflow and corrupted data
    injection to gain system access
  • False Negative Attacks
  • a situation in which a system fails to report an
    occurrence of anomalous or malicious behavior
  • Red Team hoped to perform actions that would fall
    "under the radar". We targeted the modules of
    AWDRAT that support diagnosis and detection.
  • False Positive Attacks
  • system reports an occurrence of malicious
    behavior when the activity detected was
    non-malicious
  • Red Team sought to perform actions that would
    excite AWDRAT's monitors. Specifically, we
    targeted the modules supporting diagnosis and
    detection.
  • State Disruption Attacks
  • interrupt or disrupt AWDRAT's ability to maintain
    its internal state machines
  • Recovery Attacks
  • disrupt attempts to recover or regenerate a
    misbehaving client
  • target the Adaptive Software and Recovery and
    Regeneration modules in an attempt to allow a
    misbehaving client to continue operating

16
Strengths / Weaknesses
AWDRAT (MIT)
  • Strengths
  • With a reconsideration of systems scope of
    responsibility, we anticipate the system would
    have performed far better in the tests
  • We see great power in the concept of wrapping all
    the functions
  • Weaknesses
  • Scope of Responsibility / Protection far too
    Limited
  • Need to Develop Full Security Policy
  • Single points of failure
  • Application-Specific Limitations
  • Application Model Issues
  • Incomplete by design?
  • Manually Created
  • Limited Scope
  • Doesnt really enforce multi-layered defense

17
LRTSSAssessment
  • October 25, 2005

18
Success Criteria
LRTSS (MIT)
  • The instrumented Freeciv server does not core
    dump under a condition in which the
    uninstrumented Freeciv server does core dump
  • The LRTSS system can
  • Detect a corruption in a data structure that
    causes an uninstrumented Freeciv server to exit
  • Repair the data corruption in such a way that the
    instrumented Freeciv server can continue running
  • The LRTSS system must
  • Detect / Diagnose at least 10 of attacks/root
    causes
  • Take effective corrective action on at least 5
    of the successfully identified compromises/attacks

19
Assessment Strategy
LRTSS (MIT)
  • Denial of Service
  • Aimed at disabling or significantly modifying the
    operation of the Freeciv server to an extent that
    mission objectives cannot be accomplished
  • In this case, not achieving mission objectives is
    defined as the Freeciv server exits or dumps core
  • Attacks using buffer-overflow, corrupted data
    injection, and resource utilization
  • Various data corruptions aimed at causing the
    server to exit
  • Formulated the attacks by targeting the
    uninstrumented server first, then running the
    same attack against the instrumented server
  • State Disruption Attacks
  • interrupt or disrupt LRTSS's ability to maintain
    its internal state machines

20
Strengths / Weaknesses
LRTSS (MIT)
  • Strengths
  • Performs very well under simple data corruptions
  • (that would cause the system to crash under
    normal operation)
  • Performs well under a large number of these
    simple data corruptions
  • (200 to 500 corruptions are repaired
    successfully)
  • Learning and Repair algorithms well thought out
  • Weaknesses
  • Scope of Responsibility / protection too limited
  • Complex Data Structure Corruptions not handled
    well
  • Secondary Relationships are not protected against
  • Pointer Data Corruptions not entirely tested
  • Timing of Check and Repair Cycles not optimal
  • Description of Mission Failure as core dump may
    be excessive

21
QuickSilverAssessment
  • November 8, 2005

22
Success Criteria
QuickSilver (Cornell)
  • Ricochet can successfully and/or correctly
    perform its mission
  • Ricochet must consistently achieve a
    fifteen-fold reduction in latency (with benign
    failures) for achieving consistent values of data
    shared among one hundred to ten thousand
    participants, where all participants can send and
    receive events."
  • Per client direction, elected to use average
    latency time as the comparative metric
  • Ricochets Average Recovery demonstrates 15-fold
    improvement over SRM
  • Additional constraint levied requiring 98 update
    saturation (imposing the use of the NACK failover
    for Ricochet)

23
Assessment Strategy
QuickSilver (Cornell)
  • Scalability Experiments -- test scalability in
    terms of number of groups per node and number of
    nodes per group. Here, no node failures will be
    simulated, and no packet losses will be induced
    (aside from those that occur as a by-product of
    normal network traffic).
  • Baseline Latency
  • Group Scalability
  • Large Repair Packet Configuration
  • Large Data Packet Storage Configuration
  • Simulated Node Failures simulate benign node
    failures.
  • Group Membership Overhead / Intermittent Network
    Failure
  • Simulated Packet Losses introduce packet loss
    into the network.
  • High Packet Loss Rates
  • Node-driven Packet Loss
  • Network-driven Packet Loss
  • Ricochet-driven Packet Loss
  • High Ricochet Traffic Volume
  • Low Bandwidth Network
  • Simulated Network Anomalies simulate benign
    routing and network errors that might exist on a
    deployed network. The tests will establish
    whether or not the protocol is robust in its
    handling of typical network anomalies, as well as
    those atypical network anomalies that may be
    induced by an attacker.
  • Out of Order Packet Delivery
  • Packet Fragmentation
  • Duplicate Packets
  • Variable Packet Sizes

24
Strengths / Weaknesses
QuickSilver (Cornell)
  • Strengths
  • Appears to be very resilient when operating
    within its assumptions
  • Very stable software
  • Significant performance gains over SRM
  • Weaknesses
  • FEC-orientation - focus in statistics belies
    valuable data regarding complete packet delivery
  • Batch-oriented Test Harness
  • Impossible to perform interactive attacks
  • Very limited insight into blow-by-blow
    performance
  • Metrics collected were very difficult to
    understand fully

25
STEWARDAssessment
  • December 9, 2005

26
Success Criteria
Steward (JHU)
  • The STEWARD system must
  • Make progress in the system when under attack.
  • Progress is defined as the eventual global
    ordering, execution, and reply to any request
    which is assigned a sequence number within the
    system
  • Maintain a consistency of data replicated on each
    of the servers in the system

27
Assessment Strategy
Steward (JHU)
  • Validation Activities - tests we will perform to
    verify that STEWARD can endure up to five
    Byzantine faults while maintaining a three-fold
    reduction in latency with respect to BFT
  • Byzantine Node Threshold
  • Benchmark Latency
  • Progress Attacks - attacks we will launch to
    prevent STEWARD from progressing to a successful
    resolution of an ordered client request
  • Packet Loss
  • Packet Delay
  • Packet Duplication
  • Packet Re-ordering
  • Packet Fragmentation
  • View Change Message Flood
  • Site Leader Stops Assigning Sequence Numbers
  • Site Leader Assigns Non-Contiguous Sequence
    Numbers
  • Suppressed New-View Messages
  • Consecutive Pre-Prepare Messages in Different
    Views
  • Out of Order Messages
  • Byzantine Induced Failover
  • Data Integrity Attacks - attempts to create an
    inconsistency in the data replicated on the
    various servers in the network
  • Arbitrarily Execute Updates
  • Multiple Pre-Prepare Messages using Same Sequence
    Numbers and Different Request Data
  • Spurious Prepare, Null Messages
  • Suppressed Checkpoint Messages
  • Prematurely Perform Garbage Collection
  • Invalid Threshold Signature
  • Protocol State Attacks - attacks focused on
    interrupting or disrupting STEWARD's ability to
    maintain its internal state machines
  • Certificate Threshold Validation Attack
  • Replay Attack
  • Manual Exploit of Client or Server

Note We did not try to validate or break the
encryption algorithms.
28
Strengths / Weaknesses
Steward (JHU)
  • Strengths
  • First system that assumes and actually tolerates
    corrupted components (Byzantine attack)
  • Blue Team spent extensive time up front in
    analysis, design and proof of the protocol it
    was clear in the performance
  • System was incredibly stable and resilient
  • We did not compromise the system
  • Weaknesses
  • Limited Scope of Protection
  • Relies on external entity to secure and manage
    keys which are fundamental to the integrity of
    the system
  • STEWARD implicitly and completely trusts the
    client
  • Client-side attacks were out of scope of the
    assessment

29
Going Forward
  • White Team will generate definitive report on
    this Red Team Test activity
  • It will have the official scoring and results
  • RABA (Red Team) will generate a test report from
    our perspective
  • We will publish to
  • PI for the Project
  • White Team (Mr. Do)
  • DARPA (Mr. Badger)

30
Questions or Comments
  • Any Questions, Comments, or Concerns?
Write a Comment
User Comments (0)
About PowerShow.com