A SqrtN Algorithm for Mutual Exclusion in Decentralized Systems Mamoru Maekawa University of Tokyo - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

A SqrtN Algorithm for Mutual Exclusion in Decentralized Systems Mamoru Maekawa University of Tokyo

Description:

Safety: At most one node can access the critical section at a time. ... The message sent by a node after it has completed it critical section. LOCKED ... – PowerPoint PPT presentation

Number of Views:268
Avg rating:3.0/5.0
Slides: 50
Provided by: csU70
Category:

less

Transcript and Presenter's Notes

Title: A SqrtN Algorithm for Mutual Exclusion in Decentralized Systems Mamoru Maekawa University of Tokyo


1
A Sqrt(N) Algorithm for Mutual Exclusion in
Decentralized Systems Mamoru
Maekawa University of Tokyo
2
Distributed Mutual Exclusion
  • The mutual exclusion problem involves the
    allocation of a single, indivisible, non
    shareable resource among n nodes.
  • In a distributed system, mutual exclusion is
    based solely on message passing.
  • Requirements
  • Safety At most one node can access the critical
    section at a time.
  • Liveness Requests to enter and leave the
    critical section eventually succeed.( No
    starvation and deadlock)

3
Previous Work
  • Ricart and Agrawala
  • Each node requesting mutual exclusion seeks
    permission from all other nodes.
  • Complexity O(N), 2(N- 1) messages are required.
  • Thomas (Quorum based)
  • Each node requesting mutual exclusion seeks
    permission from only a majority of nodes.
  • Complexity same as above , best case N messages
    are required
  • Gifford and Skeen (weighted approach)
  • nodes can cast more than one vote. Majority of
    the votes is the criteria for mutual exclusion
  • Centralized approach (not a distributed
    algorithm)

4
Maekawa Algorithm
  • Uses only csqrt(N) messages to create mutual
    exclusion.
  • Optimal distributed algorithm
  • Assumptions
  • Error free
  • FIFO Channels messages between two nodes are
    delivered in the order sent

5
Optimal Algorithm
  • Goal To reduce the number of request messages.
  • Conditions
  • Distributed.
  • Request Resolution
  • Request Resolution Any pair of requests from
    different nodes must reach a common node.

6
Formulation of conditions
  • Request resolution rule
  • Si is the set of nodes from which, node i should
    obtain permission to enter critical section.
  • This non null intersection property is a
    necessary condition for the Sis so that mutual
    exclusion requests can be resolved
  • Reduction Rule
  • This rule reduces the number of messages to be
    sent and received by a node.

7
Contd
  • Distributed Rule
  • Each node needs to send and receive the same
    number of messages to obtain mutual exclusion
    (Equal work).
  • Each node serves as an arbitrator for the same
    number of nodes. This ensures that each node is
    equally responsible for mutual exclusion (Equal
    responsibility).

8
Optimal K
  • The general idea is to represent the maximum
    number of Sis in terms of D, K guided by the
    established set of rules. This evaluates to
    (D-1)K 1
  • This should be equal to the number of nodes, N,
    so that K is minimized for a given N.
  • D is the degree of duplication of nodes and KN
    is the number of members such that N KN/D. Thus
    DK.

9
Finding Sis



10
Algorithm Outline
  • If node i can lock all members of Si, then no
    other node can capture all its members since the
    intersection of its Voting Set with that of is
    will have at least one node.
  • If a node fails to capture all its members, it
    waits till all of them are freed to lock them.
  • To prevent deadlocks, nodes get a priority based
    on the timestamp of their request.

11
Example (1)
12
Example (2)
13
Example (3)
14
Example (4)
15
Example (5)
16
Messages
  • REQUEST
  • The message sent by a node to request mutual
    exclusion
  • REQUEST messages are time-stamped and earlier
    ones get higher priority.
  • INQURE
  • The INQUIRE message is sent to a node i that has
    requested a node j if j receives another request
    that predates that of i.
  • The purpose of the INQUIRE message is to query
    node j if it can indeed lock all its members. It
    is only sent once.
  • RELINQUISH
  • A reply to INQUIRE if the originating node cannot
    get all it members.
  • RELEASE
  • The message sent by a node after it has completed
    it critical section
  • LOCKED
  • The message sent from a member node to a
    requesting node if it is not currently locked by
    another request.
  • FAILED
  • The message sent from a member node to a
    requestor when it is currently locked by a higher
    priority request.

17
Correctness
  • Mutual Exclusion
  • Proof by Contradiction
  • Starvation
  • For a node i in Maekawas algorithm, starvation
    would occur if is REQUEST are continuously
    blocked by preceding REQUEST messages at various
    members of Si.
  • This is however impossible because there can be
    at most (K-1) preceding outstanding requests for
    any request by a node and therefore, in finite
    time, is request will be accommodated.

18
Deadlocks
  • Deadlocks are eliminated in Maekawas algorithm
    by attacking the circular wait condition
  • For any cycle, there must be one node in the
    cycle whose REQUEST timestamp is preceded by both
    of its adjacent nodes in the circular wait. The
    removal of such a node breaks the circular wait
    condition for deadlocks to occur.

19
Message Traffic
  • Light Demand
  • For an instance of mutual exclusion
  • (K-1) REQUEST messages
  • (K-1) LOCKED messages
  • (K-1) RELEASE messages
  • Heavy Demand
  • At most (K-1) messages for each of REQUEST,
    INQUIRE, FAILED, RELEASED, RELINQUISH
  • Thus, a maximum of 5(K-1) messages.

20
Node Failure
  • Algorithm assumes that failures can be detected
    by other nodes and failed nodes are removed from
    the system.
  • A simple approach to deal with failure is to
    allow another node to take over the
    responsibilities of the failed node.

21
Comparison
22
Discussion
  • Permission based Vs Token Based
  • Is Distributed Property required?
  • Is the algorithm churn resistant?
  • Sqrt (n) is a big number for large networks. Can
    we use this algorithm for them?

23
Reliable Communication in the Presence of Failures
  • Kenneth Birman Thomas Joseph

23
24
Motivation Background
  • Design of a communication facility for
    distributed systems.
  • Consistent event orderings.
  • Optimize concurrency.
  • This system is in use at the NYSE, Swiss Exchange
    and the French air traffic control system.

24
25
Example
A
A receives Update, Detects Failure of B
Update
B
Time
B Fails
C
C detects Failure of B, receives Update
25
26
Virtual Synchrony
  • Membership changes within a process group are
    observed in the same order by all group members
    that remain connected.
  • Total ordering with respect to regular messages.
  • Every process that observe the same two
    consecutive membership changes, receive the same
    set of regular multicast messages between the two
    changes.

26
27
Definitions
  • Fault Tolerant Process Groups
  • Collection of processes that are cooperating to
    perform a distributed computation, and use the
    communication primitives described in this paper.
  • Broadcast
  • Refers to transmission of a message from a
    process to the members of a process group, and
    not to all processes in the system.

27
28
System Characteristics
  • No memory sharing or synchronized clocks.
  • Halting failures.
  • Communication failures
  • Hierarchical structure
  • Logical approach to failure

28
29
Broadcast Primitives
  • ABCAST
  • CBCAST
  • GBCAST
  • All broadcast primitives are atomic

29
30
ABCAST
  • ABCAST (mesg, label, dests)
  • Order in which data arrives at a destination is
    the same as the order at other destinations, if
    they have the same label.

30
31
ABCAST
ABCAST queue
P
ABCAST queue
Delivery queue
ABCAST arrives
ABCAST queue
Sender transmits mesg to its destinations. Recipie
nt adds mesg to priority queue associated with
label, tags it as undeliverable, assigns priority
informs sender. Sender computes maximum
priority, sends it back to recipients. Recipients
change priority, tag it as deliverable,
transfer messages to delivery queue in increasing
order of priority.
31
32
CBCAST
  • CBCAST(mesg,label,dests)
  • If B precedes B then B is delivered before B at
    any overlapping destination.
  • B precedes B if
  • B ? B the same process p sends B before it
    sends B
  • B ? B B is delivered at sender(B) before B
    is sent.

32
33
CBCAST
ABCAST queue
P
ABCAST queue
Delivery queue
ABCAST arrives
ABCAST queue
BUFP
CBCAST arrives
Transmission of B from BUFP to BUFQ -A transfer
packet (B1, B2.) is sent to q and includes all
B in BUFP such that B?B and it has not been
delivered to all destinations. For i lt j, Bi ?
Bj -At q, Bi are places in BUFQ. If Bi was
destined for q, it is also placed in the delivery
queue.
33
34
GBCAST
  • GBCAST(action, G)
  • The order in which GBCAST are delivered relative
    to other broadcasts is the same at overlapping
    destinations.
  • A failure GBCAST is delivered after all messaged
    have been sent by the failed process.

34
35
GBCAST
ABCAST queue
P
ABCAST queue
Wait queue
Delivery queue
ABCAST arrives
ABCAST queue
IDLISTP
GBCAST arrives
BUFP
CBCAST arrives
Consider a Failure GBCAST. The protocol ensures
that all processes receiving this message
schedule for transmission any messages sent by
the failed process. Then the message is ordered
relative to other GBCASTS, ABCASTS and CBCASTS.
35
36
View Management using GBCAST
  • Site View
  • set of sites deemed operational.
  • changes when sites fail/recover.
  • site view sequence reflects these changes.
  • View Manager
  • oldest site is the view manager
  • ensures that all operational sites see the same
    view sequence.

36
37
View Management using GBCAST
P2
View Manager P1
Record Site View To stable storage Cease to
accept Messages from P1
P5 down, View P1,P2,P3,P4
P3
P5 Fails
ACK
P4
Positive ACK /Negative ACK
P5
After receiving the positive ACKS a commit
message is sent to all sites by the view manager
37
38
Summary
  • Virtual Synchrony extends the notion of Casual
    and Total ordering to synchronous systems over
    asynchronous communication.
  • Achieves High levels of concurrency.
  • Simplifies higher level code.

38
39
Discussion
  • How well does the protocol perform with frequent
    failures/recoveries within a cluster?
  • Can we use Gossip techniques/spanning trees to
    avoid all the transmission burden on the
    broadcasting node?
  • Does virtual synchrony scale well?
  • Can we use the principle of virtual synchrony to
    implement consensus?

39
40
Distributed Snapshots Determining Global States
of Distributed Systems
  • K. MANI CHANDY
  • University of Texas
  • and
  • Leslie Lamport
  • Microsoft Research, RD

41
Global State
  • The global state of a distributed computation is
    the set of local states of all individual
    processes involved in the computation plus the
    state of the communication channels.

42
Need for Global States??
  • Stable Properties
  • Let Y be a predicate function that defined on the
    global states of a distributed system D
  • The predicate y is said to be a stable property
    of D if y(s) implies y(s) for all global states
    s of D reachable from global state S of D
  • Examples of y
  • Computation has terminated
  • The system is deadlocked
  • All tokens in the ring have disappeared

43
Global Snapshot Algorithm
  • Chandy and Lamport snapshot algorithm records a
    logical (or causal) snapshot of the system.
  • System Model
  • No failures, all messages arrive intact, exactly
    once, eventually
  • Communication channels are unidirectional and
    FIFO-ordered
  • There is a communication path between every
    process pair

44
Chandy and Lamport Snapshot Algorithm
  • 1. Marker (token message) sending rule for
    initiator process P0
  • After P0 has recorded its state
  • for each outgoing channel C, send a marker on C
  • 2. Marker receiving rule for a process Pk
  • On receipt of a marker over channel C
  • if this is first marker being received at Pk
  • record Pks state
  • record the state of C as empty
  • turn on recording of messages over all other
    incoming channels
  • for each outgoing channel C, send a marker on C
  • else
  • turn off recording messages only on channel C,
    and mark state of C as all the messages recorded
    over C
  • Protocol terminates when every process has
    received a marker from every other process

45
Snapshot Example
Consistent Cut

e10
e13
P1
a
e23
P2
e20
b
P3
e30
Consistent Cut time-cut across processors and
channels so no event after the cut
happens-before an event before the cut
46
Termination of the Algorithm
  • Process must ensure that
  • L1 No marker remains forever in an incident
    input channel
  • L2 It records its state within finite time of
    initiation of the algorithm
  • If the graph is strongly connected and at least
    one process spontaneously record its state, then
    all processes will record their states and states
    of incoming channels in finite time

47
Properties of the recorded global state
  • Observed global state is not identical to any of
    the global states that occurred in the
    computation.
  • If Sinit and Sfin are the global state when
    Lamports algorithm started and finished
    respectively and S is the state recorded by the
    algorithm then,
  • S is reachable from Sinit
  • Sfinal is reachable from S

48
Stability Detection
  • The reachability property of the snapshot
    algorithm is useful for detecting stable
    properties.
  • If a stable predicate is true in the state Ssnap
    then we may conclude that the predicate is true
    in the state Sfin
  • Similarly if the predicate evaluates to False for
    Ssnap, then it must also be False for Sinit.
  • Take repeated snapshots.

49
Discussion
  • Can we infer anything about Dynamic properties of
    the system from Ssnap?
  • Can we extend this algorithm to not fully
    connected networks(P2P systems)?
Write a Comment
User Comments (0)
About PowerShow.com