Customizable Fault Tolerance for WideArea Replication - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Customizable Fault Tolerance for WideArea Replication

Description:

performance an order of magnitude above the current state of the art. It. successfully met safety (data consistency) and liveness (eventual ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 2
Provided by: sbri
Category:

less

Transcript and Presenter's Notes

Title: Customizable Fault Tolerance for WideArea Replication


1
Survivable Information Access
Johns Hopkins Yair Amir, Claudiu
Danilov, Jonathan Kirsch, John Lane
Purdue Chi-Bun Chan, Cristina Nita-Rotaru, Josh
Olsen, David Zage
Telcordia Technologies Brian Coan
  • Customizable Fault Tolerance for Wide-Area
    Replication
  • As network environments become increasingly
    hostile, even well-protected distributed informati
    on systems, constructed with security in mind,
    are likely to be compromised. Our previous archite
    cture, Steward, was the first to scale
    Byzantine fault-tolerant replication to wide-area
    networks, where servers are located in many sites
    distributed across the Internet. Steward's
    architecture was to designed to minimize communica
    tion costs, enabling it to achieve
    performance an order of magnitude above the
    current state of the art. It successfully met saf
    ety (data consistency) and liveness (eventual
    progress) guarantees even during a white-box
    red-team experiment where a knowledgeable
    attacker was given control of some of the
    servers. However, the performance
    came at a cost inflexibility and complexity. It
    was impossible to substitute the protocols used
    within the local-area sites and the protocol used
    on the wide area because the protocols were
    tightly coupled. We present a new composable Byzan
    tine wide-area replication architecture that
    retains most of the performance benefits of
    Steward, while addressing its shortcomings.
  • Composable Byzantine Architecture
  • Free Substitution of the fault-tolerance
    guarantees (Byzantine or benign) in each position
    of the hierarchy, enabling configurations that
    can survive a complete site compromise.
  • Clean separation between the protocols running
    within each site and among the sites greatly
    simplifies implementation and correctness proofs.
  • Extensible wide-area replication framework
    supports integration of current algorithms and
    future innovations.

Clients
Server
Replicas
o o o
1
2
3
3f1
Comparison to State of the Art
  • Our previous approach - Steward
  • Globally-optimized, hierarchical architecture
    with entwined local and wide-area protocols.
  • Survives compromise of up to (but not including)
    one third of servers in each site, but cannot
    survive a site compromise.
  • Confines malicious behavior to the local site,
    allowing for the use of a lightweight, two-round
    wide-area protocol, reducing latency compared to
    a flat Byzantine solution.
  • Reduces wide-area message complexity to O(S2),
    where S is the number of sites.
  • Supports local query handling, resulting in
    increased performance and availability.
  • Our new composable architecture
  • Hierarchical architecture that cleanly separates
    local and wide-area protocols via logical machine
    abstraction.
  • Allows configurations that can survive a
    complete site compromise.
  • Supports the use of a variety of local and
    wide-area replication protocols, enabling
    customization based on perceived risks.
  • Achieves O(S2) wide-area message complexity even
    when a Byzantine fault-tolerant protocol is run
    on the wide area.
  • Provides the same benefits for queries.

New Challenge Performance Under Attack
  • Need new fault models and metrics
  • Model should allow one to reason about and
    specify stronger performance guarantees than
    standard liveness.
  • Metrics should have practical utility.
  • Average throughput average rate at which
    the system executes updates.
  • Reaction time maximum time required to execute
    a single update.
  • Fault model should allow one to estimate system
    survivability by capturing correlated
    vulnerabilities due to similarities in location
    and machine hardware and software.
  • Safety and liveness are not enough
  • Byzantine replication protocols rely on
    performance-critical servers to perform certain
    tasks in a timely manner.
  • These servers exhibit performance faults by
    sending correct messages slowly enough to avoid
    triggering defense mechanisms.
  • Correct in the traditional sense, despite
    being able to degrade performance by an order of
    magnitude or more.
  • Providing liveness may not yield a practical
    system in hostile environments.
  • During the red-team experiment, one
    (unsuccessful) attack reduced performance by 80
    by adding latency to messages.
  • Our own sophisticated attack throttled wide-area
    messages to dramatically degrade performance
    without triggering protocol timeouts.
Write a Comment
User Comments (0)
About PowerShow.com