Title: Distributed Algorithms for Failure Detection in Crash Environments
1Distributed Algorithms forFailure Detection
inCrash Environments
- R. Cortiñas, A. Lafuente, M. Larrea
- Distributed Systems Group
- University of the Basque Country UPV/EHU
2Guest Stars ?P, ?S and Omega
- ?P strong completeness, eventual strong accuracy
- Eventually every process that crashes is
permanently suspected by every correct process - There is a time after which correct processes are
not suspected by any correct process - ?S strong completeness, eventual weak accuracy
- There is a time after which some correct process
is never suspected by any correct process - Omega eventual leader election
- There is a time after which all the correct
processes always trust the same correct process
3The First ?P Algorithm CT96
4Communication Optimality
A ring arrangement of processes
5Communication Optimality
Communication-efficient algorithms n links are
used forever
6Communication Optimality
Communication-optimal algorithms C links are
used forever
7Communication-optimal ?P
8Communication-optimal Omega
- We also propose an optimal implementation of ?S,
the weakest failure detector for solving
Consensus - processes ordered p1, ..., pn
- heartbeat strategy
- communication pattern one-to-successors
- based on a trusted process (instead of a list of
suspected processes)
9Communication-optimal Omega
i) Initially, p1 starts sending messages
periodically to the rest of processes, and all
processes trust p1
p2
p1
p5
p4
p3
trusted1 p1
trusted2 p1
trusted3 p1
trusted4 p1
trusted5 p1
10Communication-optimal Omega
ii) If a process does not receive a message
within some timeout period from its trusted
process pi, then it suspects pi and takes the
next process pi1 as its new trusted process
trusted1 p1
trusted2 p1
trusted3 p1
timeout on p1 trusted4 p2
trusted5 p1
11Communication-optimal Omega
iii) If a process trusts itself, then it starts
sending messages periodically to its successors
trusted1 p1
trusted3 p1
trusted4 p2
trusted5 p1
timeout on p1 trusted2 p2
12Communication-optimal Omega
iv) If a process receives a message from a
process pi preceding its trusted process, then it
will trust pi again, increasing its timeout
period with respect to pi
trusted1 p1
message from p1 trusted2 p1 timeout_period21
trusted3 p2
message from p1 trusted4 p1 timeout_period41
trusted5 p1
13Communication-optimal Omega
- Lemma. With the previous algorithm, eventually
all the correct processes will permanently trust
the first correct process in p1, ..., pn - This property trivially allows us to provide the
properties of ?S - Eventual weak accuracy by not suspecting the
trusted process - Strong completeness by suspecting all the
processes except the trusted process
14Communication-optimal Omega