A SqrtN Algorithm for Mutual Exclusion in Decentralized Systems Mamoru Maekawa University of Tokyo

About This Presentation

Title:

A SqrtN Algorithm for Mutual Exclusion in Decentralized Systems Mamoru Maekawa University of Tokyo

Description:

Safety: At most one node can access the critical section at a time. ... The message sent by a node after it has completed it critical section. LOCKED ... – PowerPoint PPT presentation

Number of Views:268

Avg rating:3.0/5.0

Slides: 50

Provided by: csU70

Category:

more less

Transcript and Presenter's Notes

Title: A SqrtN Algorithm for Mutual Exclusion in Decentralized Systems Mamoru Maekawa University of Tokyo

1
A Sqrt(N) Algorithm for Mutual Exclusion in
Decentralized Systems Mamoru
Maekawa University of Tokyo
2
Distributed Mutual Exclusion

The mutual exclusion problem involves the
allocation of a single, indivisible, non
shareable resource among n nodes.
In a distributed system, mutual exclusion is
based solely on message passing.
Requirements
Safety At most one node can access the critical
section at a time.
Liveness Requests to enter and leave the
critical section eventually succeed.( No
starvation and deadlock)

3
Previous Work

Ricart and Agrawala
Each node requesting mutual exclusion seeks
permission from all other nodes.
Complexity O(N), 2(N- 1) messages are required.
Thomas (Quorum based)
Each node requesting mutual exclusion seeks
permission from only a majority of nodes.
Complexity same as above , best case N messages
are required
Gifford and Skeen (weighted approach)
nodes can cast more than one vote. Majority of
the votes is the criteria for mutual exclusion
Centralized approach (not a distributed
algorithm)

4
Maekawa Algorithm

Uses only csqrt(N) messages to create mutual
exclusion.
Optimal distributed algorithm
Assumptions
Error free
FIFO Channels messages between two nodes are
delivered in the order sent

5
Optimal Algorithm

Goal To reduce the number of request messages.
Conditions
Distributed.
Request Resolution
Request Resolution Any pair of requests from
different nodes must reach a common node.

6
Formulation of conditions

Request resolution rule
Si is the set of nodes from which, node i should
obtain permission to enter critical section.
This non null intersection property is a
necessary condition for the Sis so that mutual
exclusion requests can be resolved
Reduction Rule
This rule reduces the number of messages to be
sent and received by a node.

7
Contd

Distributed Rule
Each node needs to send and receive the same
number of messages to obtain mutual exclusion
(Equal work).
Each node serves as an arbitrator for the same
number of nodes. This ensures that each node is
equally responsible for mutual exclusion (Equal
responsibility).

8
Optimal K

The general idea is to represent the maximum
number of Sis in terms of D, K guided by the
established set of rules. This evaluates to
(D-1)K 1
This should be equal to the number of nodes, N,
so that K is minimized for a given N.
D is the degree of duplication of nodes and KN
is the number of members such that N KN/D. Thus
DK.

9
Finding Sis

10
Algorithm Outline

If node i can lock all members of Si, then no
other node can capture all its members since the
intersection of its Voting Set with that of is
will have at least one node.
If a node fails to capture all its members, it
waits till all of them are freed to lock them.
To prevent deadlocks, nodes get a priority based
on the timestamp of their request.

11
Example (1)
12
Example (2)
13
Example (3)
14
Example (4)
15
Example (5)
16
Messages

REQUEST
The message sent by a node to request mutual
exclusion
REQUEST messages are time-stamped and earlier
ones get higher priority.
INQURE
The INQUIRE message is sent to a node i that has
requested a node j if j receives another request
that predates that of i.
The purpose of the INQUIRE message is to query
node j if it can indeed lock all its members. It
is only sent once.
RELINQUISH
A reply to INQUIRE if the originating node cannot
get all it members.
RELEASE
The message sent by a node after it has completed
it critical section
LOCKED
The message sent from a member node to a
requesting node if it is not currently locked by
another request.
FAILED
The message sent from a member node to a
requestor when it is currently locked by a higher
priority request.

17
Correctness

Mutual Exclusion
Proof by Contradiction
Starvation
For a node i in Maekawas algorithm, starvation
would occur if is REQUEST are continuously
blocked by preceding REQUEST messages at various
members of Si.
This is however impossible because there can be
at most (K-1) preceding outstanding requests for
any request by a node and therefore, in finite
time, is request will be accommodated.

18
Deadlocks

Deadlocks are eliminated in Maekawas algorithm
by attacking the circular wait condition
For any cycle, there must be one node in the
cycle whose REQUEST timestamp is preceded by both
of its adjacent nodes in the circular wait. The
removal of such a node breaks the circular wait
condition for deadlocks to occur.

19
Message Traffic

Light Demand
For an instance of mutual exclusion
(K-1) REQUEST messages
(K-1) LOCKED messages
(K-1) RELEASE messages
Heavy Demand
At most (K-1) messages for each of REQUEST,
INQUIRE, FAILED, RELEASED, RELINQUISH
Thus, a maximum of 5(K-1) messages.

20
Node Failure

Algorithm assumes that failures can be detected
by other nodes and failed nodes are removed from
the system.
A simple approach to deal with failure is to
allow another node to take over the
responsibilities of the failed node.

21
Comparison
22
Discussion

Permission based Vs Token Based
Is Distributed Property required?
Is the algorithm churn resistant?
Sqrt (n) is a big number for large networks. Can
we use this algorithm for them?

23
Reliable Communication in the Presence of Failures

Kenneth Birman Thomas Joseph

23
24
Motivation Background

Design of a communication facility for
distributed systems.
Consistent event orderings.
Optimize concurrency.
This system is in use at the NYSE, Swiss Exchange
and the French air traffic control system.

24
25
Example
A
A receives Update, Detects Failure of B
Update
B
Time
B Fails
C
C detects Failure of B, receives Update
25
26
Virtual Synchrony

Membership changes within a process group are
observed in the same order by all group members
that remain connected.
Total ordering with respect to regular messages.
Every process that observe the same two
consecutive membership changes, receive the same
set of regular multicast messages between the two
changes.

26
27
Definitions

Fault Tolerant Process Groups
Collection of processes that are cooperating to
perform a distributed computation, and use the
communication primitives described in this paper.
Broadcast
Refers to transmission of a message from a
process to the members of a process group, and
not to all processes in the system.

27
28
System Characteristics

No memory sharing or synchronized clocks.
Halting failures.
Communication failures
Hierarchical structure
Logical approach to failure

28
29
Broadcast Primitives

ABCAST
CBCAST
GBCAST
All broadcast primitives are atomic

29
30
ABCAST

ABCAST (mesg, label, dests)
Order in which data arrives at a destination is
the same as the order at other destinations, if
they have the same label.

30
31
ABCAST
ABCAST queue
P
ABCAST queue
Delivery queue
ABCAST arrives
ABCAST queue
Sender transmits mesg to its destinations. Recipie
nt adds mesg to priority queue associated with
label, tags it as undeliverable, assigns priority
informs sender. Sender computes maximum
priority, sends it back to recipients. Recipients
change priority, tag it as deliverable,
transfer messages to delivery queue in increasing
order of priority.
31
32
CBCAST

CBCAST(mesg,label,dests)
If B precedes B then B is delivered before B at
any overlapping destination.
B precedes B if
B ? B the same process p sends B before it
sends B
B ? B B is delivered at sender(B) before B
is sent.

32
33
CBCAST
ABCAST queue
P
ABCAST queue
Delivery queue
ABCAST arrives
ABCAST queue
BUFP
CBCAST arrives
Transmission of B from BUFP to BUFQ -A transfer
packet (B1, B2.) is sent to q and includes all
B in BUFP such that B?B and it has not been
delivered to all destinations. For i lt j, Bi ?
Bj -At q, Bi are places in BUFQ. If Bi was
destined for q, it is also placed in the delivery
queue.
33
34
GBCAST

GBCAST(action, G)
The order in which GBCAST are delivered relative
to other broadcasts is the same at overlapping
destinations.
A failure GBCAST is delivered after all messaged
have been sent by the failed process.

34
35
GBCAST
ABCAST queue
P
ABCAST queue
Wait queue
Delivery queue
ABCAST arrives
ABCAST queue
IDLISTP
GBCAST arrives
BUFP
CBCAST arrives
Consider a Failure GBCAST. The protocol ensures
that all processes receiving this message
schedule for transmission any messages sent by
the failed process. Then the message is ordered
relative to other GBCASTS, ABCASTS and CBCASTS.
35
36
View Management using GBCAST

Site View
set of sites deemed operational.
changes when sites fail/recover.
site view sequence reflects these changes.
View Manager
oldest site is the view manager
ensures that all operational sites see the same
view sequence.

36
37
View Management using GBCAST
P2
View Manager P1
Record Site View To stable storage Cease to
accept Messages from P1
P5 down, View P1,P2,P3,P4
P3
P5 Fails
ACK
P4
Positive ACK /Negative ACK
P5
After receiving the positive ACKS a commit
message is sent to all sites by the view manager
37
38
Summary

Virtual Synchrony extends the notion of Casual
and Total ordering to synchronous systems over
asynchronous communication.
Achieves High levels of concurrency.
Simplifies higher level code.

38
39
Discussion

How well does the protocol perform with frequent
failures/recoveries within a cluster?
Can we use Gossip techniques/spanning trees to
avoid all the transmission burden on the
broadcasting node?
Does virtual synchrony scale well?
Can we use the principle of virtual synchrony to
implement consensus?

39
40
Distributed Snapshots Determining Global States
of Distributed Systems

K. MANI CHANDY
University of Texas
and
Leslie Lamport
Microsoft Research, RD

41
Global State

The global state of a distributed computation is
the set of local states of all individual
processes involved in the computation plus the
state of the communication channels.

42
Need for Global States??

Stable Properties
Let Y be a predicate function that defined on the
global states of a distributed system D
The predicate y is said to be a stable property
of D if y(s) implies y(s) for all global states
s of D reachable from global state S of D
Examples of y
Computation has terminated
The system is deadlocked
All tokens in the ring have disappeared

43
Global Snapshot Algorithm

Chandy and Lamport snapshot algorithm records a
logical (or causal) snapshot of the system.
System Model
No failures, all messages arrive intact, exactly
once, eventually
Communication channels are unidirectional and
FIFO-ordered
There is a communication path between every
process pair

44
Chandy and Lamport Snapshot Algorithm

1. Marker (token message) sending rule for
initiator process P0
After P0 has recorded its state
for each outgoing channel C, send a marker on C
2. Marker receiving rule for a process Pk
On receipt of a marker over channel C
if this is first marker being received at Pk
record Pks state
record the state of C as empty
turn on recording of messages over all other
incoming channels
for each outgoing channel C, send a marker on C
else
turn off recording messages only on channel C,
and mark state of C as all the messages recorded
over C
Protocol terminates when every process has
received a marker from every other process

45
Snapshot Example
Consistent Cut

e10
e13
P1
a
e23
P2
e20
b
P3
e30
Consistent Cut time-cut across processors and
channels so no event after the cut
happens-before an event before the cut
46
Termination of the Algorithm

Process must ensure that
L1 No marker remains forever in an incident
input channel
L2 It records its state within finite time of
initiation of the algorithm
If the graph is strongly connected and at least
one process spontaneously record its state, then
all processes will record their states and states
of incoming channels in finite time

47
Properties of the recorded global state

Observed global state is not identical to any of
the global states that occurred in the
computation.
If Sinit and Sfin are the global state when
Lamports algorithm started and finished
respectively and S is the state recorded by the
algorithm then,
S is reachable from Sinit
Sfinal is reachable from S

48
Stability Detection

The reachability property of the snapshot
algorithm is useful for detecting stable
properties.
If a stable predicate is true in the state Ssnap
then we may conclude that the predicate is true
in the state Sfin
Similarly if the predicate evaluates to False for
Ssnap, then it must also be False for Sinit.
Take repeated snapshots.

A SqrtN Algorithm for Mutual Exclusion in Decentralized Systems Mamoru Maekawa University of Tokyo - PowerPoint PPT Presentation

A SqrtN Algorithm for Mutual Exclusion in Decentralized Systems Mamoru Maekawa University of Tokyo

Safety: At most one node can access the critical section at a time. ... The message sent by a node after it has completed it critical section. LOCKED ... – PowerPoint PPT presentation