FUSE: Lightweight Guaranteed Distributed Failure Notification - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

FUSE: Lightweight Guaranteed Distributed Failure Notification

Description:

... returned to creator. ... then failure is returned to creator and other nodes whose state was ... initiates repair directly with the root (group creator) ... – PowerPoint PPT presentation

Number of Views:233

Avg rating:3.0/5.0

Slides: 23

Provided by: Crea187

Category:

more less

Transcript and Presenter's Notes

Title: FUSE: Lightweight Guaranteed Distributed Failure Notification

1
FUSE Lightweight Guaranteed Distributed Failure
Notification

Mukil Kesavan
Original Author Sangeetha Seshadri Spring
2007. This presentation is an augmented version
of the original one.

2
Agenda

Problem Considered
Background
What is (/is not) FUSE?
How FUSE works
FUSE Semantics
Implementation
Evaluation
Discussion

3
Problem considered

Failure management in distributed systems
How do you notify a failure to all relevant
members of a group of systems with the following
goals
Guaranteed delivery of failure notification
Delivery of failure notification within bounded
time
Be lightweight, scalable and flexible

4
Background

Managing failures in distributed systems is
important and complex
Lots of cases and need to maintain a lot of
state.
Complexity of applications increase.
Known techniques (before this paper)
Weakly/strongly consistent membership services
maintain a list of each component and whether
up or down
() Can be used to implement consensus
(-) Bound to process or machine. Does not allow
application components to have failed with
respect to one operation but not another.
Unreliable failure detectors
Solves consensus and atomic broadcast problems in
partially synchronous distributed systems.
Can make mistakes and provides weak guarantees.

5
What is (/is not) FUSE?

A programming model for failure management that
helps DS nodes agree whether a failure has
occurred or not handles many corner cases.
Provides
Distributed one-way agreement
Guaranteed failure notification to all group
members
Tracks individual app communication paths
Scalable when apps use overlay network
Handles arbitrary and intransitive network
failures (whole group fails)
Enables fate-sharing of related dist. data
items
It is NOT a failure detection service -
Responsibility of detecting failures shared
between FUSE and application.
Does NOT promise efficiency for large groups of
systems
Applications Wide-area internet applications
such as content delivery networks, peer-to-peer
applications, web-services and grid computing.

6
How FUSE works

Every node in the system runs a FUSE layer.
Can create multiple FUSE groups between same set
of nodes.
Application invokes the
FuseId CreateGroup(NodeId set)
API to create a FUSE group. A unique Group
FuseID returned to creator.
FUSE layer on every node contacted (possibly
concurrently) and initialized.
Application passes on FuseId to every node in the
set.
Each node registers a failure callback
associated with the Group FuseId using the
void RegisterFailureHandler(Callback handler,
FuseId id)

7
How FUSE works (contd)

Nodes periodically ping each other.
If a node initiates a ping that is missed, the
node itself stops responding to future pings
ensures that individual observation of a failure
converted into a group notification.
Nodes notified of failure through callback
Failure notification can be triggered
explicitly, by application
or implicitly when FUSE detects communication
failure among group members.

8
Fuse Semantics

Group Creation
If any group node is unreachable then failure is
returned to creator and other nodes whose state
was established.
Notification for unknown FUSE group ignored.
Attempt to associate callback with non-existent
FUSE group results in callback being executed
immediately.
Alternative design?
Guarantees for Notification Delivery
When a notification is triggered every live node
in the system hears it within a small multiple of
failure timeout period. In practice, authors
claim a single failure timeout is enough is it
per node?
False positives possible during transient
failures.

9
Fuse Semantics (contd)

Fail-on-Send
Explicit application failure signaling required
as FUSE just guarantees failure notification
delivery.
Failed comm. Path and intransitive failures cause
a Fail-on-Send.
Crash recovery
A recovering node does not know if a failure
notification was triggered.
FUSE handles this by nodes actively comparing the
live FUSE groups during liveness checking.
FUSE does not use stable storage, but can be used
for masking transient failures.

10
Liveness Checking

FUSE piggybacks on existing overlay maintenance
pings.
Per-group spanning trees on an overlay network
used.
Terminology
Members Group members
Delegates Nodes of overlay that are not part of
the FUSE group but still aid in liveness
checking.
Required to notify neighbors in case of a
connectivity failure.
Overlapping liveness checking trees between
multiple FUSE groups result in all of them being
monitored during each ping.

11
Security-Scalability trade-offs

Two kinds of security attacks
Violation of FUSE semantics Dropped
notifications
handled using multiple dissemination trees
(redundancy)
Can use all-to-all pinging but high overhead.
Delegates attack
use per-group spanning trees without using
overlay nodes
Increases the amount of liveness checking
traffic.
DoS attacks malicious node causing frequent
unnecessary failure notifications.
Delegates cannot cause DoS because they can only
trigger soft notifications (explained later).

12
Implementation

Implemented on top of SkipNet
SkipNet features
Messages routed through the overlay result in a
client upcall on every intermediate overlay hop.
Overlay routing table is visible to the client.
Route directly between members during creation
and failure notifications reduces false
positives.

13
Implementation Group Creation

Group creation
Creation request/response directly between root
and member nodes
Members simultaneously route InstallChecking
messages through the overlay towards root. This
prepares overlay nodes for future liveness
forwarding

14
Implementation Steady State Notifications

Steady-State
Piggyback a hash containing all FUSE groups that
use a particular overlay link on the SkipNet ping
messages.
reuse overlay routing table maintenance traffic
for liveness checking
Notifications
Hard notifications used to dismantle the group
Direct communication. Reduces latency.
Soft notifications used to clear state on the
liveness checking tree.
Member receiving soft notifications initiates
repair directly with the root (group creator).
Provides resilience to delegate failures.

15
Implementation Group Repair

Repair
NeedRepair msg Sent by members to root. (In
order to reduce latency)
SoftNotification Sent by delegates to root.
Otherwise repair mostly similar to group
creation.

16
Experiments

Latency of group creation As group size
increases, latency increases since although nodes
contacted in parallel, probability of
encountering a slow link is increased.
Note Groups created by direct messages and hence
unaffected by the size of the network.

17
Experiments

Latency of Failure notification
Explicit notification- Lower than creation due
to
cached TCP connections
One-way message
Non-blocking.
Crash failures with ping interval of 1 min and
timeout of 30 secs. TCP connection timeout
dominates.

18
Experiments (contd)

At steady state, no additional traffic
introduced. (However, message size increased by
20 bytes due to hash)
With churn with average network size of 300 and
an additional 100 nodes churn, FUSE soft
notifications result in a 33 increase in
messages (good? Bad?)
Price paid for reusing overlay liveness..

19
Experiments

False positives
Unreliable communication links
Under high loss rates more groups failed
(obvious)
Larger the group size, greater the probability of
encountering an unreliable link.
Delegate failures Never generated false
positives (due to soft notifications and repair)

20
Summary

Can scale with the number of groups
Multiple FUSE groups can share liveness checking
messages
Designed to support large number of small and
medium sized groups.
If application already uses a scalable overlay,
FUSE can reuse existing liveness checking.
Otherwise can implement its own overlay or
alternative liveness checking topology.
Allows applications to declare failures even when
application level constraints are violated.
FAILURE could mean system failure, violation of
application constraints, invalidation of shared
data etc.

21
Discussion

Is the scalable claim true?
Scalable IF implemented on an overlay. Otherwise
FUSE does introduce liveness checking traffic
overhead.
FUSE just tells you whether or not theres a
failure. Theres very little info for the app to
use during repair. The repair here is more of a
re-establish.
Reports failure even if one process in the group
fails they recommend smaller groups
App semantics might not be conducive to that.
How to model other failure paradigms like say
group alive as long as quorum exists?
Cannot handle Byzantine failures (arbitrary o/p,
collusion non-determinism are bad for FUSE)