A Research Program in Reliable Adaptive Distributed Systems RADS - PowerPoint PPT Presentation

About This Presentation
Title:

A Research Program in Reliable Adaptive Distributed Systems RADS

Description:

Armando Fox*, Michael Jordan, Randy Katz, George Necula, David Patterson, Ion ... Michael Jordan: Statistical Learning Theory ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 15
Provided by: Rand220
Category:

less

Transcript and Presenter's Notes

Title: A Research Program in Reliable Adaptive Distributed Systems RADS


1
A Research Program inReliable AdaptiveDistribute
d Systems (RADS)
  • Armando Fox, Michael Jordan, Randy Katz, George
    Necula, David Patterson, Ion Stoica, Doug Tygar
  • University of California, BerkeleyStanford
    University

2
What Are We Trying to DoNew Approach for RADS
  • Dramatically improve the trustworthiness of
    networked systems
  • Observe design observation points throughout
    system
  • Analyze infer via statistical learning
  • Respond detect anomalous behavior vs. baseline
  • Learn use observations to modify responses to
    future observations
  • Act
  • Reactive use control points in system for rapid
    recovery if detect something wrong
  • Proactive/protective prophylactically act on
    system to prevent predicted impending failure

3
Todays Systems are Too Brittle
  • Fragile, easily broken, yielding poor
    dependability and security
  • E.g., Amazon yearly revenue 3.1B, downtime
    costs 600,000/hr
  • Why?
  • Existing systems focus on performance, not fast
    adaptive detection and response to failure and
    attack
  • Fundamentally incorrect assumptions
  • Humans are perfect
  • Software can be made bug free
  • Maintenance is free
  • People/HW/SW failures are facts, not problems
  • If a problem has no solution, it may not be a
    problem, but a fact--not to be solved, but to be
    coped with over time
  • Shimon Peres

4
Failures and Attacks Inevitable soDesign for
Rapid Adaptation
  • Rapid application and server recovery, agile
    network rerouting, proactive protective actions
    ...
  • No distinction between normal operation and
    recovery
  • Elements of our solution
  • Programming paradigms for robust recovery
  • Crash-only software design for rapid server
    recovery
  • Network protocols designed for observation to
    allow rapid detection of behavioral violations
  • Instrumentation and online statistical analysis
    for anomaly detection and failure
    diagnosis/localization
  • Adaptation benchmarks to measure progress
  • What you cant measure, you cant improve
  • Collect real failure data to drive benchmarks

5
Example anomaly detection meets crash-only design
  • Use simple time series analysis on key operating
    statistics (committed writes, offered load, etc.)
  • Count relative frequencies of all substrings of
    length k or shorter, look for discrepancies in
    relative frequencies across replicas
  • Works even when period is irregular or not known
    a priori
  • If you see anything unusual, coerce to a crash
    and recover from that reboot is nearly free, so
    occasional false positives OK

6
Security Challenges for RADS
  • Need new techniques to detect and respond to
    rapidly-evolving attacks
  • But these techniques can themselves be used to
    mount attacks
  • So we must secure the learning process
  • Rapid secure protocol synthesis tools can be
    applied to this problem

7
Approach for SuccessInterdisciplinary Expertise
  • Interdisciplinary Team
  • Armando Fox/Dave Patterson Dependable System
    Design
  • Randy Katz/Ion Stoica Network Services/Protocols
  • Michael Jordan Statistical Learning Theory
  • Ion Stoica/Doug Tygar Verification of networks
    and security
  • George Necula Language/Applications-level
    mechanisms
  • Spans algorithm design and system implementations
  • Comprehensive distributed architecture embedding
    SLT as a primitive building block
  • Embedding observational and inference means at
    strategic points throughout the distributed
    system
  • New kinds of statistical inference and
    verification techniques able to execute on-line
    and in real-time

8
RADS Conceptual Architecture
Prototype Application Messaging, E-Mail for
Operational Systems
User
Programming Abstractions For Roll-back (Necula)
Operator
Benchmarks,Tools for Human Operators (Patterson)
Crash-Only Middleware Servers, System
OC Infrastructure (Fox)
SLT Services
Application- Specific Overlay Network
Online Statistical Learning Algorithms (Jordan)
PNE
PNE
Edge Network
Edge Network
Protocols Enabling Fast Detection Route
Recovery, Network OC Infrastructure (Katz,
Stoica)
Router
Router
CommodityInternet IP networks
Reduction to practice of on-line SLT and
observe/analyze/act infrastructureReusable
embeddable componentsPervasive security
considerations (Tygar)
9
Vulnerable Messaging Application that Requires
Trustworthiness
DHS/Federal Network
Coalition Internet
Trust Relations
Allies Networks
Allies Networks
Allies Networks
Allies Networks
Local Police, Fire, State Police
Incident Reports Responder Locations GIS Data Etc.
Compromised Network With Embedded Adversaries
Exploit DETER Testbed for Prototyping
10
Scientific Foundation For Self- Systems
  • New design principles and tools for systems that
    continuously adjust their behavior in response to
    analysis of online observations
  • New metrics and benchmarks for evaluating
    self-adapting networked systems
  • Advances in Statistical Learning Theory to move
    from offline to online analysis of large-scale
    distributed systems

11
Measuring Success
  • Build messaging prototype using RADS design
    principles and tools
  • Put realistic performance workload on prototype,
    embed in DHS DETER testbed
  • Subject prototype to increasingly aggressive
    failure and attack workloads
  • E.g., hardware failures, software failures,
    operator failures, worms attacks, DDOS attacks,
  • Measure false positive rates, accuracy rates,
    time to analyze failures, time to act,
    performance impact of actions, availability of
    prototype, performability of prototype,
  • Compare results with conventional systems under
    similar performance, failure, and attack workloads

12
New Funding OpportunityNSF CyberTrust Program
  • From RFP
  • People rely on systems based on networked
    computers
  • Too vulnerable to cyber attacks inhibit
    function, corrupt data, or expose private
    information
  • Promote vision where networked systems are
  • More predictable, more accountable, and less
    vulnerable to attack and abuse
  • Developed, configured, operated and evaluated by
    a well-trained and diverse workforce
  • Used by a public educated in their secure and
    ethical operation
  • Example research area improve trustworthiness of
    networks explore evolving nature of security
    protocols and policies in communications networks
  • Individual, Team projects and 1-2 Centers 

13
CATS Center for Adaptive Trustworthy Systems
  • Dramatically improve the trustworthiness of
    networked systems
  • New understanding of how to construct such
    systems
  • Observe-Analyze-Act
  • From responding to known problems to learning new
    problems
  • From reacting to problems to proactively
    responding before problems become significant
  • Experimental method of benchmarking, prototyping,
    and deployment to provide context
  • Technical Thrusts
  • Statistical Learning Theory
  • Crash-Only Software
  • Behaviorally-Consistent and Secure Protocols
  • Programmable Network Elements
  • Integration Vehicle
  • Application Disaster Response Messaging
  • Supported by prototype distributed system
    architecture
  • Deployment and Evaluation Plan

14
We need your help and support!Discussion?
Write a Comment
User Comments (0)
About PowerShow.com