FUSE: Lightweight Guaranteed Distributed Failure Notification - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

FUSE: Lightweight Guaranteed Distributed Failure Notification

Description:

... returned to creator. ... then failure is returned to creator and other nodes whose state was ... initiates repair directly with the root (group creator) ... – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 23
Provided by: Crea187
Category:

less

Transcript and Presenter's Notes

Title: FUSE: Lightweight Guaranteed Distributed Failure Notification


1
FUSE Lightweight Guaranteed Distributed Failure
Notification
  • Mukil Kesavan
  • Original Author Sangeetha Seshadri Spring
    2007. This presentation is an augmented version
    of the original one.

2
Agenda
  • Problem Considered
  • Background
  • What is (/is not) FUSE?
  • How FUSE works
  • FUSE Semantics
  • Implementation
  • Evaluation
  • Discussion

3
Problem considered
  • Failure management in distributed systems
  • How do you notify a failure to all relevant
    members of a group of systems with the following
    goals
  • Guaranteed delivery of failure notification
  • Delivery of failure notification within bounded
    time
  • Be lightweight, scalable and flexible

4
Background
  • Managing failures in distributed systems is
    important and complex
  • Lots of cases and need to maintain a lot of
    state.
  • Complexity of applications increase.
  • Known techniques (before this paper)
  • Weakly/strongly consistent membership services
    maintain a list of each component and whether
    up or down
  • () Can be used to implement consensus
  • (-) Bound to process or machine. Does not allow
    application components to have failed with
    respect to one operation but not another.
  • Unreliable failure detectors
  • Solves consensus and atomic broadcast problems in
    partially synchronous distributed systems.
  • Can make mistakes and provides weak guarantees.

5
What is (/is not) FUSE?
  • A programming model for failure management that
    helps DS nodes agree whether a failure has
    occurred or not handles many corner cases.
  • Provides
  • Distributed one-way agreement
  • Guaranteed failure notification to all group
    members
  • Tracks individual app communication paths
  • Scalable when apps use overlay network
  • Handles arbitrary and intransitive network
    failures (whole group fails)
  • Enables fate-sharing of related dist. data
    items
  • It is NOT a failure detection service -
    Responsibility of detecting failures shared
    between FUSE and application.
  • Does NOT promise efficiency for large groups of
    systems
  • Applications Wide-area internet applications
    such as content delivery networks, peer-to-peer
    applications, web-services and grid computing.

6
How FUSE works
  • Every node in the system runs a FUSE layer.
  • Can create multiple FUSE groups between same set
    of nodes.
  • Application invokes the
  • FuseId CreateGroup(NodeId set)
  • API to create a FUSE group. A unique Group
    FuseID returned to creator.
  • FUSE layer on every node contacted (possibly
    concurrently) and initialized.
  • Application passes on FuseId to every node in the
    set.
  • Each node registers a failure callback
    associated with the Group FuseId using the
  • void RegisterFailureHandler(Callback handler,
    FuseId id)

7
How FUSE works (contd)
  • Nodes periodically ping each other.
  • If a node initiates a ping that is missed, the
    node itself stops responding to future pings
    ensures that individual observation of a failure
    converted into a group notification.
  • Nodes notified of failure through callback
  • Failure notification can be triggered
  • explicitly, by application
  • or implicitly when FUSE detects communication
    failure among group members.

8
Fuse Semantics
  • Group Creation
  • If any group node is unreachable then failure is
    returned to creator and other nodes whose state
    was established.
  • Notification for unknown FUSE group ignored.
  • Attempt to associate callback with non-existent
    FUSE group results in callback being executed
    immediately.
  • Alternative design?
  • Guarantees for Notification Delivery
  • When a notification is triggered every live node
    in the system hears it within a small multiple of
    failure timeout period. In practice, authors
    claim a single failure timeout is enough is it
    per node?
  • False positives possible during transient
    failures.

9
Fuse Semantics (contd)
  • Fail-on-Send
  • Explicit application failure signaling required
    as FUSE just guarantees failure notification
    delivery.
  • Failed comm. Path and intransitive failures cause
    a Fail-on-Send.
  • Crash recovery
  • A recovering node does not know if a failure
    notification was triggered.
  • FUSE handles this by nodes actively comparing the
    live FUSE groups during liveness checking.
  • FUSE does not use stable storage, but can be used
    for masking transient failures.

10
Liveness Checking
  • FUSE piggybacks on existing overlay maintenance
    pings.
  • Per-group spanning trees on an overlay network
    used.
  • Terminology
  • Members Group members
  • Delegates Nodes of overlay that are not part of
    the FUSE group but still aid in liveness
    checking.
  • Required to notify neighbors in case of a
    connectivity failure.
  • Overlapping liveness checking trees between
    multiple FUSE groups result in all of them being
    monitored during each ping.

11
Security-Scalability trade-offs
  • Two kinds of security attacks
  • Violation of FUSE semantics Dropped
    notifications
  • handled using multiple dissemination trees
    (redundancy)
  • Can use all-to-all pinging but high overhead.
  • Delegates attack
  • use per-group spanning trees without using
    overlay nodes
  • Increases the amount of liveness checking
    traffic.
  • DoS attacks malicious node causing frequent
    unnecessary failure notifications.
  • Delegates cannot cause DoS because they can only
    trigger soft notifications (explained later).

12
Implementation
  • Implemented on top of SkipNet
  • SkipNet features
  • Messages routed through the overlay result in a
    client upcall on every intermediate overlay hop.
  • Overlay routing table is visible to the client.
  • Route directly between members during creation
    and failure notifications reduces false
    positives.

13
Implementation Group Creation
  • Group creation
  • Creation request/response directly between root
    and member nodes
  • Members simultaneously route InstallChecking
    messages through the overlay towards root. This
    prepares overlay nodes for future liveness
    forwarding

14
Implementation Steady State Notifications
  • Steady-State
  • Piggyback a hash containing all FUSE groups that
    use a particular overlay link on the SkipNet ping
    messages.
  • reuse overlay routing table maintenance traffic
    for liveness checking
  • Notifications
  • Hard notifications used to dismantle the group
  • Direct communication. Reduces latency.
  • Soft notifications used to clear state on the
    liveness checking tree.
  • Member receiving soft notifications initiates
    repair directly with the root (group creator).
  • Provides resilience to delegate failures.

15
Implementation Group Repair
  • Repair
  • NeedRepair msg Sent by members to root. (In
    order to reduce latency)
  • SoftNotification Sent by delegates to root.
  • Otherwise repair mostly similar to group
    creation.

16
Experiments
  • Latency of group creation As group size
    increases, latency increases since although nodes
    contacted in parallel, probability of
    encountering a slow link is increased.
  • Note Groups created by direct messages and hence
    unaffected by the size of the network.

17
Experiments
  • Latency of Failure notification
  • Explicit notification- Lower than creation due
    to
  • cached TCP connections
  • One-way message
  • Non-blocking.
  • Crash failures with ping interval of 1 min and
    timeout of 30 secs. TCP connection timeout
    dominates.

18
Experiments (contd)
  • At steady state, no additional traffic
    introduced. (However, message size increased by
    20 bytes due to hash)
  • With churn with average network size of 300 and
    an additional 100 nodes churn, FUSE soft
    notifications result in a 33 increase in
    messages (good? Bad?)
  • Price paid for reusing overlay liveness..

19
Experiments
  • False positives
  • Unreliable communication links
  • Under high loss rates more groups failed
    (obvious)
  • Larger the group size, greater the probability of
    encountering an unreliable link.
  • Delegate failures Never generated false
    positives (due to soft notifications and repair)

20
Summary
  • Can scale with the number of groups
  • Multiple FUSE groups can share liveness checking
    messages
  • Designed to support large number of small and
    medium sized groups.
  • If application already uses a scalable overlay,
    FUSE can reuse existing liveness checking.
    Otherwise can implement its own overlay or
    alternative liveness checking topology.
  • Allows applications to declare failures even when
    application level constraints are violated.
  • FAILURE could mean system failure, violation of
    application constraints, invalidation of shared
    data etc.

21
Discussion
  • Is the scalable claim true?
  • Scalable IF implemented on an overlay. Otherwise
    FUSE does introduce liveness checking traffic
    overhead.
  • FUSE just tells you whether or not theres a
    failure. Theres very little info for the app to
    use during repair. The repair here is more of a
    re-establish.
  • Reports failure even if one process in the group
    fails they recommend smaller groups
  • App semantics might not be conducive to that.
  • How to model other failure paradigms like say
    group alive as long as quorum exists?
  • Cannot handle Byzantine failures (arbitrary o/p,
    collusion non-determinism are bad for FUSE)

22
Thank you!!
Write a Comment
User Comments (0)
About PowerShow.com