Distributed Algorithms for Failure Detection in Crash Environments

About This Presentation

Title:

Distributed Algorithms for Failure Detection in Crash Environments

Description:

Distributed Algorithms for Failure Detection in Crash Environments R. Corti as, A. Lafuente, M. Larrea Distributed Systems Group University of the Basque Country UPV/EHU – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 15

Provided by: IAS771

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Algorithms for Failure Detection in Crash Environments

1
Distributed Algorithms forFailure Detection
inCrash Environments

R. Cortiñas, A. Lafuente, M. Larrea
Distributed Systems Group
University of the Basque Country UPV/EHU

2
Guest Stars ?P, ?S and Omega

?P strong completeness, eventual strong accuracy
Eventually every process that crashes is
permanently suspected by every correct process
There is a time after which correct processes are
not suspected by any correct process
?S strong completeness, eventual weak accuracy
There is a time after which some correct process
is never suspected by any correct process
Omega eventual leader election
There is a time after which all the correct
processes always trust the same correct process

3
The First ?P Algorithm CT96
4
Communication Optimality
A ring arrangement of processes
5
Communication Optimality
Communication-efficient algorithms n links are
used forever
6
Communication Optimality
Communication-optimal algorithms C links are
used forever
7
Communication-optimal ?P
8
Communication-optimal Omega

We also propose an optimal implementation of ?S,
the weakest failure detector for solving
Consensus
processes ordered p1, ..., pn
heartbeat strategy
communication pattern one-to-successors
based on a trusted process (instead of a list of
suspected processes)

9
Communication-optimal Omega
i) Initially, p1 starts sending messages
periodically to the rest of processes, and all
processes trust p1
p2
p1
p5
p4
p3
trusted1 p1
trusted2 p1
trusted3 p1
trusted4 p1
trusted5 p1
10
Communication-optimal Omega
ii) If a process does not receive a message
within some timeout period from its trusted
process pi, then it suspects pi and takes the
next process pi1 as its new trusted process
trusted1 p1
trusted2 p1
trusted3 p1
timeout on p1 trusted4 p2
trusted5 p1
11
Communication-optimal Omega
iii) If a process trusts itself, then it starts
sending messages periodically to its successors
trusted1 p1
trusted3 p1
trusted4 p2
trusted5 p1
timeout on p1 trusted2 p2
12
Communication-optimal Omega
iv) If a process receives a message from a
process pi preceding its trusted process, then it
will trust pi again, increasing its timeout
period with respect to pi
trusted1 p1
message from p1 trusted2 p1 timeout_period21
trusted3 p2
message from p1 trusted4 p1 timeout_period41
trusted5 p1
13
Communication-optimal Omega

Lemma. With the previous algorithm, eventually
all the correct processes will permanently trust
the first correct process in p1, ..., pn
This property trivially allows us to provide the
properties of ?S
Eventual weak accuracy by not suspecting the
trusted process
Strong completeness by suspecting all the
processes except the trusted process

14
Communication-optimal Omega

Write a Comment

User Comments (0)