Distributed Synchronization - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Distributed Synchronization

Description:

Electing a Leader. Replicated data schemes use a primary copy (the up-to-date by definition) ... In leader election, it is an integral part of the algorithm's ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 62
Provided by: csBg
Category:

less

Transcript and Presenter's Notes

Title: Distributed Synchronization


1
Distributed Synchronization
  • No shared memory
  • processes reside on different machines ?
    semaphores are ruled out
  • Processes (processors a.k.a. Agents)
    communicate by message passing
  • Models of distributed computation
  • Distributed mutual exclusion
  • Leader election

2
Model of distributed computation
  • Events
  • Sending of messages
  • Receiving of messages
  • internal interrupt or time-out
  • Processes (processors) can wait for events
  • process p waits for events by executing
  • Wait for A1, A2,
  • A1 (sourse parameters)
  • code to handle A1
  • process q executes send(p, A1 parameters)
  • p will eventually perform the code for A1, with
    the unpacked parameters

3
Causality
  • No global system state
  • cannot be determined by a single observer
  • Communication delays
  • impossible to synchronize two observers
    (machines) exactly
  • Distributed systems are causal (no traveling
    back in time)
  • for each processor separately, events are
    totally ordered

4
Simplest causality happens_before
  • send always happens_before receive
  • two events of the same agent are ordered
  • e1 lt e3 e4 lt e7 e7 ?? e5

5
Ordering events
  • define the happens_before relation as the
    transitive closure of the two relations
  • e1 ltm e2 send ? receive
  • e3 ltp e4 e3 ? e4 on processor p
  • require
  • e1 lth e2 and e2 lth e3 gt e1 lth e3
  • in the former example
  • e1 lth e8 (msg. processor msg.)
  • e2 lth e7 (msg. processor)

6
Ordering by time stamps
  • a global partial order can be achieved by a
    topological order of the lth relation
  • for ordering events during execution, one needs
    to compute the order on the fly
  • This can be done by assigning time-stamps to
    events
  • Lamport 78

7
Lamports time-stamp algorithm
  • Initially,
  • my_TS 0
  • On event e,
  • if e is the receipt of a message m,
  • my_TS max(m.TS, my_TS)
  • my_TS
  • e.TS my_TS
  • if e is the sending of message m
  • m.TS my_TS

8
Lamports time-stamps
  • timestamps assigned by Lamports algorithm are
    causal
  • e1 ltm e3 gt e1.TS lt e3.TS
  • e1 ltp e4 gt e1.TS lt e4.TS

9
Causality violations
  • the causality relation between two equal
    time-stamps is not clear
  • Lamport suggests to determine by processor
  • but what about the meaning ?...

10
Vector time-stamps
  • try to use a multiple time-stamp
  • record time-stamps of all processors
  • Vector time-stamps, containing information (TS)
    on all processors
  • e1.VT v e2.VT ltgt e1.VTi e2.VTi for
    all i
  • e1.VT ltVT e2.VT ltgt e1.VT e2.VT and
  • e1.VT ? e2.VT

11
Vector time-stamps
12
A simple algorithm for VT
  • Initially,
  • my_VT 0,,,0
  • On event e,
  • if e is the reciept of message m,
  • for i 1 to M
  • my_VTi max(m.VTi, my_VTi)
  • my_VTself
  • e.VT my_VT
  • if e is the sending of message m,
  • m.VT my_VT

13
Comparing vector time-stamps
14
Detecting causality violation
15
In simpler words
  • sending and receiving messages are events
  • if the sending events of two different messages
    are ordered
  • e1 send(m) lth e2 send(m)
  • then the violation in the example is
  • e4 rec(m) ltp e3 rec(m)

16
Causal Communication
  • One could enforce our form of causality
  • Block incoming messages and deliver them when
    they fit causally
  • Each source of messages enumerates them
    sequentially
  • The receiver only delivers a message that fits
    the sequence of messages received from the same
    source
  • Assuming no messages are lost
  • One method of numbering can be Lamports TSs

17
Causal communication - algorithm
  • Initially
  • each earliestk is set to the 1k timestamp
  • each blockedk is set to
  • On the receipt of message m from processor p
    not_earlier(proc_i_vts, proc_j_vts,i)
  • delivery_list if msg_vtsi lt
    proc_i_vtsi
  • if(blockedp is empty) return TRUE
  • earliestp m.timestamp else
  • add m to the tail of blockedp return FALSE
  • while(there is a k s.t. blockedk is not empty
    and
  • for every i1..M except for k and Self
  • not_earlier(earliesti,earliestk,i))
  • remove the message at the head of blockedk
    -gt delivery_list
  • if(blockedk is not empty
  • set earliestk to m.timestamp, where m at
    head of blockedk
  • else
  • increment earliestk by 1k
  • Deliver the messages in delivery_list, in causal
    order

18
Delivering messages causally
19
Consistent States
  • In order to detect certain failures, the
    systems state has to be examined
  • Examining data of different processors at
    different times can generate artifacts
  • It is difficult to define simultaneity in a
    distributed system
  • The state of the processors is not enough, since
    there might be undelivered messages

20
Consistent States (II)
  • A global state can be meaningful (i.e.
    consistent) only if it is reachable
  • A a consistent state is one that can actually
    happen through series of legal operations of the
    distributed system
  • In other words, if in the state the processor pk
    has received a message from processor pm, then
    the state of processor pm must be such that the
    message has already been sent

21
Consistent States (III)
  • Surprisingly, a simple condition can guarantee a
    consistent global state
  • For every pair of observations oi , oj , that
    are part of the state, it is not the case that
  • oi lth oj
  • An immediate implementation of the lth relation
    are vector time-stamps

22
Phantom deadlock
23
Distributed Mutual Exclusion
  • The simplest way to ensure mutual exclusion is
    to use a global clock
  • Allow the processor that sent the
    earliest-request to enter the critical section
  • Processors can use Lamports method to share a
    global clock
  • Exactly one of n requests is deterministically
    the earliest
  • Requests time-stamps are compared to local TSs

24
Global clock based DME (Ricart Agrawala81)
  • Request_CS
  • my_TS current_TS
  • requesting TRUE
  • pending_reply M-1
  • for every other processor j,
  • send(j, Remote_Request my_timestamp)
  • wait until pending_reply is 0
  • Release_CS
  • requesting FALSE
  • for j1 through M
  • if deferred_replyj TRUE
  • send(j, Reply)
  • deferred_replyj FALSE

25
DME (Ricart Agrawala81)
  • Main
  • wait until a message is received
  • Remote_Request(sender request_time)
  • if(not requesting or my_timestamp gt
    request_time)
  • send(sender, Reply)
  • else
  • deferred_replysender TRUE
  • Reply(sender)
  • pending_reply--

26
Global clock based DME
  • Deterministic decision, who is later in the queue
    for the CS
  • For M processors a minimal number of messages
    needed to enter 2(M-1)
  • Protocol uses symmetric information every
    processor receives the same information and can
    compute the decision of all other processors

27
Token based ME
  • Unique token circulating among all processors
  • A processor possessing the token can enter the
    CS
  • Fixed logical structure among processors token
    travels along this structure
  • Processors passing the token along a ring, good
    performance if processors have frequent requests
  • Fixed order that is not related to the order of
    requests or to number of requesting processors

28
Token based ME on a Tree
  • Hierarchical logical structure needs less
    messages delivered during a request for token
  • a unique path from any (requesting) processor to
    any other (holding token)
  • only one neighbor lies on the path to the token
    holder
  • Each processor stores a pointer to its neighbor
    on the path to the token - current_dir
  • requests are moved to next neighbors on the path
    and the requests (i.e. return path) are stored in
    a FIFO queue
  • released tokens are sent to top of queue
  • each processor on the path can deliver token to
    its top of queue

29
Tree-based token ME (Raymond89)
  • Nq(neighbor) Add neighbor to requestQ
  • Dq() Return the name at head of requestQ
  • ismt() True iff requestQ is empty
  • Request_CS() Release_CS()
  • if not Token_hldr Incs false
  • if ismt() if not ismt()
  • send(current_dir, REQUEST)
    current_dir Dq()
  • Nq(self)
    send(current_dir, TOKEN)
  • wait until Token_hldr is true
    Token_hldr false
  • Incs true if not ismt()
  • send(current_dir, REQUEST)

30
Raymonds algorithm - Main
  • while(true)
  • REQUEST TOKEN
  • if Token_hldr current_dir Dq()
  • if Incs if current_dir self
  • Nq(sender) Token_hldr true
  • else else
  • current_dir sender
    send(current_dir, TOKEN)
  • send(current_dir, TOKEN) if not
    ismt()
  • Token_hldr false
    send(current_dir, REQUEST)
  • else
  • if ismt()
  • send(current_dir, REQUEST)
  • Nq(sender)

31
Raymonds algorithm example
32
Raymonds algorithm - features
  • low storage required O(d)
  • low message passing overhead O(logn) per
    request
  • when demand increases, work to pass token
    decreases..
  • But token may travel a long distance before
    reaching its destination
  • Therefore good only for CS which is seldom
    entered.

33
Path compression token-based ME
  • IsRequesting True iff processor is requesting
    token
  • Current_dir The current guess of end of waiting
    line
  • Next next processor for token (NIL - end of
    waiting line)
  • Request_CS() Release_CS()
  • IsRequesting true Incs false
  • if not Token_hldr IsRequesting false
  • send(current_dir, REQUEST self) if next
    ? NIL
  • current_dir self Token_hldr
    false
  • next NIL send(next, TOKEN)
  • wait until Token_hldr is true next
    NIL
  • Incs true

34
Path compression (II)
  • while(true)
  • REQUEST(requester)
  • if IsRequesting true
  • if next NIL
  • next requester
  • else
  • send(current_dir, REQUEST requester)
  • elseif Token_hldr true
  • Token_hldr false
  • send(requester, TOKEN requester)
  • else
  • send(current_dir, REQUEST requester)
  • current_dir requester
  • TOKEN()
  • Token_hldr true

35
Path compression (example)
36
Path compression
  • Processors submit forward information about the
    requester
  • Processors receiving requests point back to the
    last in line
  • which is potentially the holder of the token
  • The last in line for the token adds the new
    request to the tail
  • The last in line itself would add new requesters
    to its tail
  • The state of all processors is not necessarily
    consistent (improved efficiency)
  • requests are sent to better knowing processors

37
Electing a Leader
  • Replicated data schemes use a primary copy (the
    up-to-date by definition)
  • Distributed computation might need a
    coordinator, to assign tasks to participating
    processors
  • If a leader fails, a new leader has to be
    elected in order to determine the systems state
    and restart computation
  • All participants must know who the leader is. In
    synchronization, non token holders only need to
    know that one does not hold the token
  • In synchronization, (stability) failure
    treatment can be an additional requirement
  • In leader election, it is an integral part of
    the algorithms behavior, if the coordinator
    fails a coordinator has to be elected

38
The Bully algorithm
  • General assumptions
  • Processors can store their state during failure
    and increase version numbers upon recovery
  • Failures halt all processing (no erratic
    behavior)
  • Additional assumption
  • Messages are delivered within Tm seconds
  • Nodes respond to messages within Tp seconds
  • This allows a reliable failure detector
  • If a processor does not respond to a message
    within T 2 Tm Tp it must have failed
  • These are called Synchronous Systems

39
Algorithm requirements
  • Nodes have one of four status values
  • Down, Election, Reorganization, Normal
  • Correctnes assertion 1
  • For G, a consistent state, and for any pair of
    nodes pi, pj
  • 1. If statusi e Normal, Reorganization and
    statusj e Normal, Reorganization then
    Coordinatori Coordinatorj
  • 2. If statusi statusj Normal then
    Definitioni Definitionj
  • Recovering from failure, a node sets its status
    to Down. Starting an election process, it changes
    to Election. Finishing an election, nodes go into
    Reorganization. Receiving the new common state gt
    Normal
  • If two nodes think that they are in working
    order, they agree on who is the coordinator and
    on the state of the system

40
Algorithm requirements
  • Guarantee that the election algorithm makes
    progress, i.e. does not stay in the Election
    status (for example)
  • Correctnes assertion 2
  • For a consistent state G, eventually (with no
    failures)
  • 1. There is a node i s.t. Statei Normal and
    Coordinatori i
  • 2. For any other nonfailed node j Statej
    Normal and Coordinatorj i
  • The simplest strategy is to assign priorities to
    all processors, so that they know the priorities
    of all others
  • Each one just finds out whether higher priority
    nodes are not failed

41
The Bully algorithm - initialization
  • Up set of processors known to be in the group
  • halted identity of processor that notified of the
    current election
  • Coordinator_Timeout() / check if
    coordinator is alive /
  • if State Normal or State Reorganization
  • send(Coordinator, AreYouUp), timeoutT
  • wait until Coordinator sends AYU_answer
    timeoutT
  • on timeout
  • Election()
  • Recovery() Check() / Coordinator checks all
    others /
  • State Down if State Normal and
    Coordinator Self
  • Election() for every other node j
  • send(j, AreYouNormal)
  • wait until j sends (AYN_answer
    status), timeoutT
  • if (j ? Up and statusFalse) or j
    !? Up
  • Election()
  • return()

42
The Bully algorithm - election()
  • Election()
  • highest True
  • For every higher-priority processor p
  • send(p, AreYouUp)
  • wait up to T seconds for (AYU_answer)
    messages
  • AYU_answer(sender)
  • highest False
  • if highest False
  • return()
  • State Election
  • halted Self
  • Up
  • For every lower-priority processor p
  • send(p, Enter_Election)
  • wait up to T seconds for (EE_answer) messages
  • EE_answer(sender)
  • Up Up U sender

43
The Bully algorithm - election() II
  • num_answers 0
  • Coordinator Self
  • State Reorganization
  • for each p in Up
  • send(p, Set_Coordinator Self)
  • wait up to T seconds for (SC_answer) messages
  • SC_answer(sender)
  • num_answers num_answers 0
  • if num_answers lt Up for each p in Up
  • Election() send(p, New_State
    Definition)
  • return() wait up to T seconds for
    (SC_answer) messages
  • NS_answer(sender)
  • num_answers
  • if num_answers lt Up
  • Election()
  • return()

44
The Bully algorithm
  • The election procedure run by each agent first
    determines whether a better leader exists
  • If so, wait for the leader to initiate election
  • Otherwise, attempt to establish itself as leader
  • Whenever an Enter_Election message is received
    immediate response is needed, even if higher
    priority nodes were all checked because a
    leader may have recovered
  • The same is true for receiving a Set_Coordinator
    message. Update coordinator and move to state of
    Reorganization
  • Similarly, update the Definition (state of
    system)

45
The Bully algorithm - Control
  • Main()
  • while(True)
  • wait for a message
  • case SreYouUp(sender)
  • send(sender, AYU_answer)
  • case AreYouNormal(sender)
  • if State Normal, send(sender, AYN_answer
    True)
  • else send(sender, AYN_answer False)
  • case Enter_Election(sender)
  • State Election
  • stop_processing()
  • stop the election procedure, if it is
    processing
  • halted sender
  • send(sender, EE_answer)

46
The Bully algorithm Control (II)
  • case Set_Coordinator(sender, newleader)
  • if State Election and halted newleader
  • Coordinator newleader
  • State Reorganization
  • send(sender, SC_answer
  • case New_State(sender, newdef)
  • if Coordinator sender and State
    Reorganization
  • Definition newdef
  • State Normal

47
The Bully algorithm Example
48
The Bully algorithm
  • simple algorithm that makes a strong assumption
  • timeouts can accurately detect failed processors
  • lost messages or overfull buffers can make the
    bully algorithm elect two leaders
  • very long timeouts can make failure detection
    almost certain
  • but, make the algorithm run too long
  • maybe the leader should be tied up to the group
    that it coordinates, in case timeout gets so long
    that failure is no longer certain

49
Electing a group leader
  • Electing a global leader is too difficult, when
    timeouts are not clear or too long
  • Tie the coordinator to the group it leads
  • All members of a group agree on the group
    number
  • Each group number is unique
  • The group number is part of the state definition
    of each node
  • Only members of the same group agree on the
    identity of the coordinator

50
Electing a group leader - correctness
  • Correctness assertion 3
  • For G, a consistent state and any pair of nodes
    pi and pj the following two conditions hold
  • If Statusi ? Normal, reorganization, Statusj ?
    Normal, reorganization, and Groupi Groupj,
    then Coordinatori Coordinatorj
  • If Statusi Normal, Statusj Normal, and Groupi
    Groupj, then Definitioni Definitionj
  • .

51
Electing a group leader liveness
  • Let R be a maximal set of nodes that can
    communicate in a consistent state G0. The
    following conditions are eventually true in any
    run, starting at s.t. R remains the maximal set
    of communicating nodes
  • Correctness assertion 4
  • There is a node pi ? R, such that Statei Normal
    and Coordinatori pi
  • For any other non failed node pi ? R, Statej
    Normal and Coordinatorj pi

52
Electing a group leader observations
  • Correctness assertion 3 is easy to satisfy. Any
    processor p that wishes to establish itself as a
    leader, forms a unique group number and suggests
    to some group of processor to join the group with
    itself as Coordinator
  • Group identifier is unique ? assertion 3.1 is
    fullfilled
  • If participants accept the Definition that p
    circulates together with the new group
    identifier, assertion 3.2 is satisfied
  • The hard assertions to satisfy is 4
  • Run an election algorithm that insures that each
    node in the group has the same Coordinator in
    finite time
  • For more than one Coordinator, the Bully
    Algorithm may make Coordinators compete for
    participants and not enable progress

53
The Invitation Algorithm
  • Groups need to coalesce into larger groups
  • Coordinators search periodically for other
    groups
  • The coordinators found are kept in a set
    Others
  • Each Coordinator detecting another group, tries
    to merge it with its own
  • To avoid deadlock it delays for a time between
    detecting and acting
  • unlike the Bully algorithm, timeout does not
    mean much
  • Not hearing from your coordinator ? form your
    own group and proceed
  • Group IDs are unique composed of a nodes ID
    and a running number

54
The Invitation Algorithm Check for groups
  • Check()
  • if State Normal and Coordinator Self
  • Others
  • for every other node p,
  • send (p, AreYouCoordinator)
  • wait up to T seconds for (AYC_answer)
    messages
  • AYC_answer(sender is_coordinator)
  • if is_coordinator True
  • Others Others U sender
  • if Others
  • return()
  • wait for a time inversely proportional to your
    priority
  • Merge(Others)

55
The Invitation Algorithm suspected failure
  • Timeout()
  • if Coordinator Self
  • return()
  • send(Coordinator, AreYouThere group)
  • wait for AYT_answer, timeout is T
  • on timeout,
  • is_coordinator False
  • AYT_answer(sender is_coordinator)
  • if is_coordinator False
  • Recovery()
  • Recovery()
  • State Election
  • stop_processing()
  • Counter group Self Counter
  • Coordinator Self
  • Up
  • State Reorganization

56
The Invitation Algorithm Merge groups
  • Merge(Coordinator_set)
  • if Coordinator Self and State Normal
  • State Election
  • stop_processing()
  • Counter Group Self Counter
  • Coordinator Self
  • UpSet Up
  • Up
  • for each p in Coordinator_set,
  • send(p, Invitation Self, Group)
  • for each p in UpSet,
  • send(p, Invitation Self, Group)
  • wait for T seconds
  • State Reorganization
  • num_answer 0
  • for each p in Up
  • send(p, Ready Group, Definition)
  • wait up to T seconds for Ready_answer messages
  • Ready_answer(sender ingroup, new_group)

57
The Invitation Algorithm Invite Merging
  • Invitation()
  • while True
  • wait for Invitation (new_coordinator
    new_group)
  • if State Normal
  • stop_processing()
  • old_coordinator Coordinator
  • UpSet Up
  • State Election
  • Coordinator new_coordinator
  • Group new_group
  • if old_coordinator Self
  • for each p in UpSet
  • send(p, Invitation Coordinator, Group)
  • send(Coordinator, Accept Group)
  • wait up to T seconds for an Accept_answer(sender
    accepted) message
  • on Timeout,
  • accepted False
  • if accepted is False invoke Recovery()
  • State Reorganization

58
The Invitation Algorithm Main
  • Main()
  • while True
  • wait for a message
  • Ready(sender new_group, new_description)
  • if Group new_group and State Reorganization
  • Description new_description
  • State Normal
  • send(Coordinator, Ready_answer True, Group)
  • else
  • send(sender, Ready_answer False)
  • AreYouCoordinator(sender)
  • if State Normal and Coordinator Self
  • send(sender, AYC_answer True)
  • else
  • send(sender, AYC_answer False)

59
The Invitation Algorithm Main
  • Main()
  • while True
  • wait for a message
  • ..
  • AreYouThere(sender old_group)
  • if Group old_group and Coordinator Self and
    sender in Up
  • send(sender, AYT_answer True)
  • else
  • send(sender, AYT_answer False)
  • Accept(sender new_group)
  • if State Election and Coordinator Self and
    Group new_group
  • Up Up U sender
  • send(sender, Accept_answer True)
  • else
  • send(sender, Accept_answer False)

60
The Invitation Algorithm Example
61
Electing a leader
  • Imposing a strong logical structure on the
    system (synchronous) enables an efficient
    algorithm the Bully algorithm
  • Lacking such a structure (asynchronous) needs a
    slower algorithm (merging groups) the
    Invitation algorithm
  • Bounded response time is at the basis of the
    (synchronous) Bully algorithm
  • The Invitation algorithm works correctly in the
    presence of timing failures (i.e. practical)
  • Stating the correctness of asynchronous
    algorithms is much more complex
  • Every processor agrees on a value in a
    synchronous system
  • Only a group of processors needs to agree on
    group membership and on a value, in an
    asynchronous system
  • Consistency is relative to the group,
    uncommunicating processors can be ignored
  • The Invitation algorithm uses sequence numbers,
    instead of global knowledge
Write a Comment
User Comments (0)
About PowerShow.com