Fail-Stop Processors - PowerPoint PPT Presentation

About This Presentation

Title:

Fail-Stop Processors

Description:

Why fail-stop processors can simplify replicated services ... Components may collude with each other. Cannot necessarily detect output is faulty ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 22

Provided by: andreaarpa

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Fail-Stop Processors

1
Fail-Stop Processors
UNIVERSITY of WISCONSIN-MADISONComputer Sciences
Department
CS 739Distributed Systems
Andrea C. Arpaci-Dusseau

Byzantine Generals in Action Implementing
Fail-Stop Processors, Fred Schneider, TOCS, May
1984
Example usage of byzantine agreement
Why fail-stop processors can simplify
replicated services
Why fail-top processors are expensive
(impractical?) to build
Remaining Time Byzantine Werewolves (improved?)

2
Motivation

Goal Build systems that continue to work in
presence of component failure
Difficulty/cost of building those systems depends
upon how components can fail
Fail-stop components make building reliable
systems easier than components with byzantine
failures

3
Fail-Stop Processors

What is a failure?
Output (or behavior) that is inconsistent with
specification
What is a Byzantine failure?
Arbitrary, even malicious, behavior
Components may collude with each other
Cannot necessarily detect output is faulty
What is a fail-stop processor?
Halts instead of performing erroneous
transformations
Others can detect halted state
Others can access uncorrupted stable storage even
after failure

4
Questions to Answer

What are the advantages of fail-stop processors?
2) Real processors are not fail-stop
Can we build one?
How can we build an approximation of one?
3) Approximations of fail-stop processors are
expensive to build
Under what circumstances is replicated service
with fail-stop processors better?

5
1) Distributed State Machine

Common approach for building a reliable system
Idea Replicate faulty servers, coordinate client
interactions with replicas

input sequence
State machine
Client
Byzantine agreement
R
R
R
output
Combine outputs
T-fault tolerant Satisfies specification as long
as no more than t components fail Failure model
of components determines how many replicas, R,
are needed and their interactions
6
How to build t-fault tolerant state machine?

Inputs
Key All replicas receive and process same
sequence of inputs
1) Agreement Every nonfaulty replica receives
same request (interactive consistency or
byzantine agreement)
2) Ordering Every nonfaulty replica processes
requests in same order (logical clocks)
Outputs

Byzantine Fail-Stop
Combine output? majority any
Number of replicas? 2t1 t1
7
2) Building a Fail-Stop Processor

Must provide stable storage
Volatile Lost on failure
Stable
Not affected (lost or corrupted) by failure
Can be read by any processor
Benefit Recover work of failed process
Drawback Minimize interactions since slow
Can only build approximation of fail-stop
processor
Finite hardware -gt Finite failures could disable
all error detection hardware
k-fail-stop processor behaves fail-stop unless
k1 or more failures

8
Implementation of k-FSP Overview

Two components
k1 p-processes (program)
2k1 s-processes (storage)
Each process runs on own processor, all connected
with network
P-Processes (k1)
Each runs program for state machine
Interacts with s-processes to read and write data
If any fail (if any disagreement), then all STOP
Cannot necessarily detect k1 failures
S-Processes (2k1)
Each replicates contents of stable storage for
this FSP
Provides reliable data with k failures (cannot
just stop)
Detects disagreements/failures across p-processes
How???

9
Interactive Consistency Requirements

IC1. If nonfaulty p-process, then every
nonfaulty s-process receives request within ?
seconds (as measured on s-process clock)
IC2. Non-faulty s-processes in same k-FSP agree
on every request from p-process j
S-processes must agree even when p-process is
faulty
To provide IC1 and IC2
Assuming can authenticate sender of messages,use
signed message (SM) protocol for byzantine
agreement
Need just k1 processes for agreeement
IC3. For each k-FSP, clocks of all p-processes
are synchronized
All non-faulty p-processes must send requests at
same time to s-processes

10
FSP Algorithm Details Writes

Each p-process, on a write
Broadcast write to all s-processes
Byzantine agreement across all s-processes (all
s-processes must agree on same input value from
particular p-process)
Each s-process, on a write (Fig 1)
Ensure each p-process writes same value and
receive within time bound
Initial code Handle messages after at least time
? has transpired since receipt (every s-process
should receive by then)
If receive write request from all k1 p-processes
(M k1), then update value in stable storage
If not, then halt all p-processes
Set failed variable to true
Do not allow future writes

11
FSP Algorithm Details Reads

Each p-process, on a read
Broadcast request to all s-processes
Use result from majority (k1 out of 2k1)
Can read from other FSPs as well
Useful if FSP failed and re-balancing work
Each p-process, determine if halted/failed
Read failed variable from s-process (use
majority)

12
FSP Example

k2, SM code ba1 How many p and s
processes?

p
a 6 b failed 0
s

How do p-processes read a?
Broadcast request to each s-process
2) Each s-process responds to read request
3) Each p-process uses majority of responses
from s-process

13
FSP Example

k2, SM code ba1

p
a b failed
s

How do p-processes read a?
What if 2 s-processes fail?
E.g., think a5?
What if 3 s-processes fail?

14
FSP Example

k2, SM code ba1

p
a b failed
s

How do p-processes write b?
Each p-process j performs byzantine agreement
using signed message protocol SM(2) across
s-processes
Each s-process must agree on what p-process j is
doing, even if j is faulty
Each s-process looks at requests after time delta
elapsed
If see same write from all k1 processes, perform
write
Otherwise, halt all p-processes forbid future
writes

15
FSP Example

k2, SM code ba1

p
a b failed
s

How do p-processes write b?
What if 1 p-process (or network) is very slow?
What if 1 p-process gives incorrect request to
all s-processes?
What if 1 p-process gives incorrect request to
some?
Byzantine agreement catches All s-processes
agree that p-process is faulty (giving different
requests) agree to treat it similarly
When see doesnt agree with other p-processes,
will halt
What if 3 p-processes give bad result?

16
3) Higher-Level Example

Goal Service handling k faults N nodes for
performance
Solution Use Nk k-failstop processors
Example N2, k3

What happens if
3 p-processes in FSP0 fail? 4 p-processes in FSP0
fail?
1 p-process in FSP0, FSP1, and FSP2 fail? also in
FSP3?
2 p-processes in FSP0, FSP1, and FSP2 fail?
1 s-process in SS0 fails? also in SS1, SS2, and
SS3?
4 s-processes in SS0 fail?

17
Should we use Fail Stop Processors?

Metric Hardware cost for state machines
Fail-stop components
Worst-case (assuming 1 process per processor)
(Nk) 2k1 k1 (Nk) (3k2) processors
Best-case (assuming s-processes from different
FSP share same processor)
(Nk)(k1) (2k1) processors
Byzantine components
N (2k1)
Fail-stop can be better if s-processes share and
Ngtk
Metric Frequency of byzantine agreement protocol
Fail-Stop On every access to stable storage
Byzantine On every input read
Probably fewer input reads

18
Summary

Why build fail-stop components?
Easier for higher layers to model and deal with
Matches assumptions of many distributed protocols
Why not?
Usually more hardware
Usually more agreements needed
Higher-levels may be able to cope with slightly
faulty components
Violates end-to-end argument
Conclusion Probably shouldnt assume fail-stop
components

19
Byzantine Werewolves

Previous Too easy for villagers to identify
werewolves
Villager A had reliable information that Z was
werewolf
Villager B could validate that A was villager
Hard for Z to lie that C was werewolf, because D
could have checked C too
Signed Protocol Many could hear what one said
Difficult for werewolves to tell different lies
to others
Have to tell everyone same thing
New Changes to give more advantage to werewolves
Unknown number of werewolves (1 lt w lt 1/2 N)
Night Werewolves convert multiple villagers to
wolves (1 lt v lt w)
Key Info told by moderator will then be stale
and wrong!
Day Villagers can vote to lynch multiple victims

20
Byzantine-Werewolf Game Rules

Everyone secretly assigned as werewolf or
villager
W werewolves, rest are seeing villagers
I am moderator
Night round (changed order)
Close your eyes make noises with one hand to
hide activity
For all NAME, open your eyes Pick someone to
ask about
Useless for Werewolves, but hides their identity
Point to another player
Moderator signs thumbs up for werewolf, down for
villager
NAME, close your eyes
Werewolves, open your eyes W can see who is
who
Werewolves, pick villagers to convert
Moderator picks secret number between 1 and W
Silently agree on villagers by pointing
Moderator taps converts on shoulder should open
eyes to see other werewolves
Werewolves, close your eyes

21
Rules Day Time

Day Time Everyone open your eyes its daytime
Agreement time Everyone talks and votes on who
should be decommissioned
Villagers try to decommission werewolves
Werewolves try to trick villagers with bad info
Someone must propose who should be killed
Vote until kill villager or no more proposals or
no majority
Werewolves really spread at night, so large
incentive to kill as many as possible now
Moderator Uses majority voting to determine who
is decommissioned Okay, NAME is dead
Person is out of game (cant talk anymore) and
shows card
Repeat cycle until All werewolves dead OR
werewolves gt villagers