Title: Distributed Systems
1Distributed Systems
Distributed Systems Clocks Concurrency Deadlocks
2Distributed Systems
Intended Schedule for this Lecture
Times in DS
Physical Time
Synchronization of Clocks
Logical Time
Concurrency
Centralized Algorithm
Token-based Algorithm
Voting Algorithm(
Ricart
-
Agrawala
)
Election Algorithms(Bully, Ring)
Deadlock
Centralized Detection
Path Pushing
Distributed Detection of Cycles(
Chandy
-
Misra
-Haas)
3Motivation
Problems with Time
1. Nobody really has one 2. Every ordinary clock
is unprecise 3. Due to Albert time is really
relative 4. However, we need time stamps in DS
(to order events etc.)
4Motivation
Why Timestamps in Systems?
In order to 1. do some precise performance
measurements 2. guarantee up-to-date data or
judge the actuality of data 3. establish a
total ordering of objects or operations
(like transactions) 4. what else?
5Motivation
Lack of a Uniform Global Time in DS
However, due to the nature of non precise clocks
there is no global, unique time in a DS Suppose
there is a central time server being able to
deliver exact times via time-messages to all
nodes of a widely spread DS (e.g. a MAN) gt
due to non deterministic transfer-times of these
time-messages there is no uniform time on all
nodes of the DS
Transfer-time of time-messages from the
central time server to a specific node may vary
over time
6Distributed Systems
Side Trip into Philosophy and Special Relativity
The Myth of Simultaneity Event 1 and event 2 at
same time
Event 1
Event 2
7Distributed Systems
Event Timelines (Example of previous Slide)
time
Node 3
Node 4
Node 5
Note The arrows start from an event and end at
an observation. The slope of the arrows
depend of relative speed of propagation
8Distributed Systems
Causality
Event 1
causes
Event 2
Requirement We have to establish causality, i.e.
each observer must see event 1 before event 2
9Distributed Systems
Event Timelines (Example of previous Slide)
time
Node 3
Node 4
Node 5
Note In the timeline view, event 2 must be
caused by some passage of information from event
1 if it is caused by event 1
10Distributed Systems
Example (distributed Unix make)
Editor on computer 1
Local time on computer 1
Compiler on computer 2
Local time on computer 2
Absolute time (Gods clock)
11Distributed Systems
Physical Time
Some systems really need quite accurate absolute
times.
How to achieve high accuracy? Which physical
entity may deliver precise timing?
1. The sun 2. An Atom
TAI (International Atomic Time)
12Distributed Systems
Problem with Physical Time
A TAI-day is about 3 msec shorter than a day
gt BHI inserts 1 sec, if the difference
between a day and a TAI-day is more than
800msec gt Definition of UTC universal time
coordinated, being the base of any international
time measure.
13Distributed Systems
Physical Time
- UTC-signals come from radio broadcasting stations
- or from satellites (GEOS, GPS) with an accuracy
of - 1.0 msec (broadcasting station)
- 0.1 msec (GEOS)
- 0.1 µsec (GPS)
Remark Using more than 1 UTC source you may
improve the accuracy
14Distributed Systems
Clock Synchronization
- Adjusting physical clocks
- local clock behind reference clock
- local clock ahead of reference clock
- Observation
- Clocks in DS tend to drift apart and need to be
resynchronized periodically - A. If local clock is behind a reference clock
- could be adjusted in one jump or
- could be adjusted in a series of small jumps
- B. What to do if local clock is ahead of
reference clock?
You can adjust by slowing down your local clock,
i.e. ignoring some of its HW clock ticks
15Distributed Systems
Absolute Clock Synchronization
computer to be synchronized with a UTC-Receiver
UTC-Time Server
t0
request
Ts time to handle the request
tUTC
t1
both time values (t0 and t1) are measured with
the same clock
time
16Distributed Systems
Absolute Clock Synchronization
- Initialize local clock t tUTC
- (Problem Time-Message Transfer-Time)
- Estimate Message transfer-time, (t1-t0)/2 ? t
tUTC (t1- t0)/2 - (Problem Time of the Request Message tr)
- Suppose tr is known, ? t tUTC (t1- t0 -
tr)/2 - (Problem Message transfer-times are load
dependent) - Multiple measurements (t1 - t0)
- Throw away measurements above a threshold value
- Take all others to get an average
17Distributed Systems
Relative Clock Synchronization (Berkeleys
Algorithm)
If you need a uniform time (without a
UTC-receiver per computer), but you can
established a central time-server
- Time-server periodically asks all nodes to give
him their times by the clock - The time server can estimate the local times of
all nodes regarding the - involved message transfer times.
- Time server uses the estimated local times for
building the arithmetic mean - The corresponding deviations from this
arithmetic mean are sent to the nodes
18Distributed Systems
Network Time Protocol (NTP)
- Goal
- absolute (UTC)-time service in large nets (e.g.
Internet) - high availability (via fault tolerance)
- protection against fabrication (via
authentication) - Architecture
- time-servers build up a hierarchical
synchronization subnet - all primary servers (root level 1 server) have
an UTC-receiver - secondary servers are synchronized by their
corresponding parent primary server - all other stations are leaves on level 3 being
synchronized by level 2 time servers - the accuracy of individual clocks decreases with
increasing level number - the net is able to reconfigure
19Distributed Systems
Three NTP Modes
- Multicast mode (for quick LANs, low accuracy)
- server sends periodically its actual time to its
leaves in the LAN via multicast - Procedure-call mode (medium accuracy)
- server responds to requests with its actual
timestamp - Symmetric mode (high accuracy used to
synchronize between the time servers) - intermediate exchange of timestamps
Remark In all cases the UDP-transportation
protocol is used, i.e. messages can get lost!
20Distributed Systems
Some NTP Details
Despite of multicast all other messages are
transferred in pairs, i.e.you note the send-time
as well as the receive-time.
ti-2
ti-1
Server A
m
m
Server B
ti-3
ti
Let o tA - tB be the true time difference
between B relative to A oi the
estimation of o t and t the
corresponding message transfer times for m and
m di t t the total message
transfer-time You can measure di ti - ti-3 -
(ti-1 - t I-2) ?? ti-2 ti-3 t o and ti
ti-1 t - o
21Distributed Systems
More NTP Details
- Successive pairs ltoi,digt may have to filtered
once more, to get better estimates - Time Servers synchronize with various other
time-servers, - typically with one on the same level
- and two other ones of a lower level)
- Server may choose their synchronization-partners
- Measurements in the Internet show, that 99 of
all nodes - have an synchronization error of less than 30
msec.
22Distributed Systems
Logical Time
In many cases its sufficient just to order the
relevant events, i.e. we want to be able to
position these events relatively, but not
absolutely. The interesting thing is the
relative position of an event on the time
axis Especially we do not need any scaling of
this logical time axis!
- A very simple solution is the ring clock (André,
Herman and Verjus 1985) - A clock message circulates
- Incremented at each event
23Distributed Systems
Logical Time
- Characteristics of a logical time
- causal dependencies have to be mapped correctly
- (e.g. send message before receive message)
- non related events (from independent activities)
- do not have to be ordered
- (i.e. they can appear in any order on the
logical time axis)
- Assumptions
- DS n single-processor nodes
- Activity of each node sequence of totally
ordered events EN - 3 types of events local events, sends, receives
- The total activity of the system is E ?EN
-
N
24Distributed Systems
Logical Time
Happen-before Relationship of events Let ?p
denote the local relation happen-before within
node p a ?p b iff a and b are both events on p
and a happens before b. We define the global
happen-before relation ? a ? b holds
iff ? node p a ?p b, or ? message m a
send(m), b receive(m), or ? event c a ? c
and c ? b.
Note The relation happen-before models
potential causality, not necessarily real
causality.
25Distributed Systems
Logical Time
Concurrency of events Two events a and b are
concurrent , a??b , iff neither a ? b nor b ? a
holds.
26Distributed Systems
Implies an inherent order
Example
node1
node2
node3
It holds e11 ? e12 ? e21 ? e22 ? e32 ,
furthermore e31 ? e32, whereas e31 is neither
related happen-before to e11, nor to e12, nor
to e21, nor to e22. e31 is concurrent to e11,
e12, e21, and e22.
Remark The relation happen-before ? is also
called causality-relation.
27Distributed Systems
Lamport Time
With the ordering implied by the
happen-before-relation we can establish the
Lamport time L via simple counters, whereby E
events,
The mapping L E ? N defines the Lamport-time
L, i.e. each e ? E gets a time stamp L(e), as
follows
1. e is a pure local event or a sending-event
if e has no local predecessor, then L(e) 1,
otherwise there is a local predecessor e,
thus L(e) L(e) 1
2. e is a receiving event, with a corresponding
sending-event s if e has no local
predecessor, then L(e) L(s) 1, otherwise
there is a local predecessor e, thus L(e)
maxL(s),L(e) 1
28Distributed Systems
Example
node 1
node 2
node 3
Note Any node has only a local counter being
incremented with each local event. With
each communication we have to adjust the involved
counters of the two communicating nodes
to be consistent with the happen-before-relation
.
Remark The same mechanism can be used to adjust
clocks on different nodes. The Lamport time is
consistent with the happen-before-relation,
i.e. if x ? y, then L(x) lt L(y), but not vice
versa.
29Distributed Systems
Example Adjusting local clocks with varying rates
30Distributed Systems
Relationships between the Notions
The Lamport-Time is consistent with the
causality, but it does not characterize
causality. If x causes y, then x has a smaller
Lamport-time stamp than y, x ? y ? L(x) lt L(y)
However L(x) lt L(y) does not necessarily
imply x causes y !!!
31Distributed Systems
Vector Time
- There is a DS with n nodes.
- The n-dimensional vector Vp is the vector-time of
node p, - if it is built according to the following rules
- (1) Initially, Vp (0,0)
- (2) For a local event on node p Vpp 1
- For a send event on p, do the same and
- append the new Vp to the message
- (4) When receiving a message m with an appended
V(m) on node p - increment Vp as in (2), and later on do
- Vp maxV(m), Vp)
Build the maximum componentwise
32Distributed Systems
Example Vector Time
P1
P2
P3
33Distributed Systems
Characteristics of the Vector Time
You can define the following relations for the
vector-time A) Suppose u,v are two vector times
of dimension n 1. u ? v ? up ? vp ?p
1, K, n 2. u lt v ? u ? v and u ? v 3. u??
v ? (u ? v ) and (v ? u)
34Distributed Systems
Characteristics of the Vector Time
The following inter relationships between
causality and vector-time hold A.) e ? e ?
V(e) lt V(e) B.) e ?? e ? V(e) ?? V(e) The
vector-time is the best known estimation for
global sequencing that is based only on local
information.
35Distributed Systems
Total Ordering of Events
The Lamport-time gives us at least a
partial-ordering of distributed events which is
sufficient for many problems.
However, if we add the unambiguous node number,
we can establish a total-ordering An event e
on node a gets the global time stamp LT(e)
(L(e), a). (L(e),a) lt (L(e),b) ltgt L(e)
lt L(e) or L(e) L(e) and a lt b
36Distributed Systems
Causal Ordering of Messages
If a message system guaranteeing the original
order of the messages, is an agreeable
characteristic that may ease protocols or
algorithms.
Definition m1 and m2 are two messages being
received at the same node i. A set of mesages is
causally ordered if for all pairs ltm1,m2gt the
following holds send(m1) ? send(m2) ?
receive(m1) ? receive(m2)
Example of non causally ordered messages
P1
P2
P3
37Distributed Systems
Protocol forcing Causal Ordering of Messages
- Each node i maintains a nxn matrix Mi,
initialized to 0, - (i.e. no message was sent up to now).
- When sending a message from node i to node j,
increment Mi i,j, - i.e. (i,j, Mi i,j) unambiguously identifies
the message
38Distributed Systems
Protocol forcing Causal Ordering of Messages
- The incremented matrix Mi and the node number i
are - appended to the message, i.e. lt i, Mi,
ltmessagegtgt is sent to node j - Upon receiving a message (with Matrix M) at node
j - first this node j updates its matrix Mj as
follows - ?? k,l ?1,n, l ? j Mjk,l max Mjk,l ,
Mk,l and - Mji,j Mji,j 1
- Delay this message until the following holds M
?lt Mj, - ( A lt B iff ?k,l Ak,l lt Bk,l )
- i.e. wait for earlier messages to node j,
having not yet arrived - (could be even a message from the same node i)
-
39Distributed Systems
Example
P1
P2
0 1 0 0 0 1 0 1 0
P3
40Distributed Systems
Concurrency Control
- About coping with some sort of conflicts
- Locking
- Transactions
- Time-Stamp Orderings
41Distributed Systems
Mutual Exclusion
The problem For accessing shared data or for
using of resources we often have to provide
exclusiveness !! The corresponding pieces of
code are named critical sections!
Concurrent accesses are not allowed
Data
Logically we still have a common memory
42Distributed Systems
Mutual Exclusion
Requirements for a correct solution 1.
Safety Only a single task/threads is allowed
to be in the
critical section! 2. Liveness Each competitor
must enter its critical section
after finite waiting time 3.
Sequence order Waiting in front of a critical
section is handled according to FCFS 4.
Fault tolerance 1. and 2. have to be fulfilled
even in case
of failures.
No Deadlocks No Starvation
43Distributed Systems
Criteria for Mutual Exclusion
- Number of needed messages per critical section
CS, minimal nm - Protocol delay (to evaluate who is the next) per
CS, minimal d - Response time RTCS, time interval between
requesting to enter - a CS and until you leave the CS, minimal RTCS
- Throughput TPCS, passing a CS per time unit
(maximize TPCS) - TPCS 1/(d ECS)
44Distributed Systems
Solutions for Mutual Exclusion in DS
- Three major approaches
- Centralized lock manager
- Token-passing lock manager
- Standard Token Algorithm
- Enhanced Token Algorithm
- Distributed lock manager
- Lamport Algorithm
- Ricard-Agrawala Algorithm
45Distributed Systems
Centralized Lock Manager
One task is designated to be the coordinator for
all competing tasks concerning a specific
critical region, CR CSs belonging to the same
mutual exclusion problem Centralized lock
manager CLM controls accesses to CR using a
token which represents permission to enter CS To
enter its CS client sends a request message to
CLM and the waits for a positive answer from the
CLM If no client holds the token the CLM
responds immediately with the token. Otherwise
this request is queued.
46Distributed Systems
Centralized Lock Manager
Token holder
Client
Client
Client
Server might crash! 1. Client may hold the
token 2. Client may have returned it 3. What
about queued request?
queue
Centralized Lock Manager
Question What problems might arise?
47Distributed Systems
The queued message is optional. Benefits?
Centralized Lock Manager
Application 1
Application 2
Lock Manager
receive_message
send_message
send_message
send_message
receive_message
receive_message
receive_message
send_message
send_message
A1
A2
queued requests
Note A major drawback of a centralized lock
manager is the single point of failure Another
drawback is the danger of becoming a bottleneck.
The protocol delay is determined by at least two
messages (request, grant)
48Distributed Systems
Token-Passing Mutual Exclusion
There is a single token for all participants
competing for a critical section. To enter a
critical section an application must posses this
token.
We have to invent a logical ring amongst those
participants and hand over this token within
this logical ring in order to guarantee that each
participant will have the chance to enter the
critical section
- The token-passing algorithm
- before entering the critical section
- an application must await the token
- after the critical section each application
sends the token - to the next neighbored participant
- if no participants want to enter the critical
section - the token continues circulating
49Distributed Systems
Logical Ring
Standard Token Algorithm
Current Lock Holder
50Distributed Systems
Analysis of the Token Based Exclusion
Check out the list of requirements 1. Safety,
yes, due to unique token, only the token
holder may enter its CS 2. Liveness, yes, as
long as the logical ring has only a finite
of nodes 3. Sequence order, no, CLM may change
the internal order of waiting requests 4.
Fault tolerance, no, splitting of the logical
ring and you may be lost.
51Distributed Systems
Problems with the Token-Passing Mutual Exclusion
1. How do you determine if the token is lost or
is just being used for a very long time?
2. What happens if the location that has the
token crashes for an extended period of time
3. How to maintain a logical ring if a
participant drop out (voluntarily or by
failure) of the system?
4. How to identify and add new participants
joining the logical ring, respectively remove old
ones?
5. That token is perpetually passed over the
logical ring even though none of the
participants wants to enter its CS ?
unnecessary overhead
52Distributed Systems
Implementation Problems
53Distributed Systems
Implementation Problems
54Distributed Systems
Implementation Problems
Question What may happen if you always try to
give the token to the next neighbored node? If
that participant does not wait for it ? poor
performance !
55Distributed Systems
Implementation Problems
Prob ?1
Question How to solve this problem as a
system-architect if we do not want to change
the philosophy of the standard token algorithm?
56Distributed Systems
Implementation Solution
Invest another TokenHandler-thread per
application and critical section
Participant on Node i 1
TokenHandler Node i 1
Prob ?1
Non blocking option
Receive(Token fom Nodei)
Send_Request(Token for CrS_1)
Receive(Local_Request)
Receive(Token for CrS_1)
If Local_Request ?
no
yes
Critical Section_1
Send(Local_Request)
Send_Release(Token for CrS_1)
Receive(Local_Release)
Send(Token to Node i2)
57Distributed Systems
Example Perpetual Passing the Token
CS
CS
Node i
CS
Node j
no need for the token
CS
Node k
no need for the token
Exercise 1 Invent a better token based solution
avoiding the overhead of perpetual token
passing! Hint You have to know who really
wants to get the token!
58Distributed Systems
Distributed Lock Manager
- Though similar to the centralized solution
- there are additional problems to solve
- Who sends messages when and to whom?
- Who receives messages when and from whom?
- Which messages are necessary to enter a critical
section?
59Distributed Systems
Distributed Lock Manager
- Three message types (2 are required, 1 is
optional) - Request_Message
- Queued_Message
- Granted_Message
60Distributed Systems
Request Message
The application wishing to enter its critical
section sends this message to all those
applications (threads) competing for this
critical section. How?
- Either n-times individually or via a multi-cast
(see later slides). - Each request message contains a timestamp from
the source.
61Distributed Systems
Queued Message
This message is only optional and is sent by
those recipients of the request message whenever
the request cannot be granted immediately, i.e.
- recipient is currently in the critical section
or - recipient had initiated an earlier request
Remark This message type eases to find out
whether there are dead participants
62Distributed Systems
Granted Message
Sent to a requesting process from all
participants in two circumstances
- recipient is not in its critical section
- and has no earlier request
- if recipient has queued request it will sent
grant - upon leaving the critical section
63Distributed Systems
Release Message
After having released the resource sent to all
participants with a queued request-message
Remark Have a closer look on both algorithms in
Stallings, p. 603 - 606, 1. Lamport Time,
Clocks, and the Ordering of Events in a DS, C.
ACM, July 1978 2. Ricart An Optimal Algorithm
for Mutual Exclusion in Computer Networks,
C.ACM January and September 1981
64Distributed Systems
Ricart/Agrawala-Algorithm
Waiting for Entrance in Critical Section
Requesting Mutual Exclusion
Computation outside of Critical section
Critical Section
Activating Others
65Distributed Systems
Closer Look on Ricart/Agrawala-Algorithm (1981)
- No tokens anymore
- Cooperative voting to determine the intended
sequence of CSs - Does not rely on an interconnection media
offering ordered messages - Serialization based on logical time stamps
(total ordering) - If a participant wants to enter its CS it asks
all others for permission - and does not proceed until it has permission of
all other participants - If a participant gets a permission request and
is not interested in its CS, - it returns permission immediately to the
requester.
66Distributed Systems
Correctness Conditions (1)
- All nodes behave identically, thus we just
regard node x - After voting, three groups of requests may be
distinguished - 1. known at node x with a time stamp less than Cx
- 2. known at node x whith a time stamp greater
than Cx - 3. those being still unknown at node x
67Distributed Systems
Correctness Conditions (2)
During this voting process marks may change
according to the following conditions
Condition 1 Requests of group 1 have to be
served or they have to take a time stamp
greater than Cx Condition 2 Requests of group 2
may not get a time stamp smaller than
Cx Condition 3 Request of group 3 must have
time stamps greater than Cx
68Distributed Systems
Two Phases of the Voting Algorithm
1. Participants at node i willing to enter their
critical section send request messages ei to
all other participants, where ei contains the
actual Lamport time Li of node i. (After
each send, node i increments its counter Ci). 2.
All other participants return permission messages
ai. Node x replies to a request message ei
as soon as all older requests (received at
earlier Lamport times) are completed.
Delay a bit
Cx maxCx,Ci 1
Result If all permission messages have arrived
at node i, the corresponding requester may
enter its critical section.
69Distributed Systems
Example of the Voting Algorithm
Node i
Node j
Node k
Suppose Mi lt Mk ? the request message Mi has a
smaller time stamp than Mk, we have to delay
the answer for the request message ek in node i !
70Distributed Systems
Comparison between Mutual Exclusion Algorithms
T Message Transfer Time E Execution Time
of CS
71Distributed Systems
Election Algorithms
Suppose, your centralized lock manager crashes
for a longer period of time. Then you need a new
one, i.e. you have to elect a new one. How to
do that in a DS?
- The 2 major election algorithms are based upon
- each node has a unique node number
- (i.e. there is a total ordering of all nodes)
- node with highest number of all active nodes is
coordinator - after a crash a restarting node is put back to
the set - of active nodes
72Distributed Systems
Bully Algorithm (Garcia-Molina, 1982)
Goal Find the active node with highest number,
tell him to be the coordinator and tell
this all other nodes, too
Start The algorithm may start at any node, may
be a node recognizing that the previous
coordinator is no longer active.
- Message types
- Election messages e, initiating the election
- Answer message a, confirming the reception of an
e-message - Coordinator messages c, telling, the sender is
the new coordinator
73Distributed Systems
Steps of Bully Algorithm
1. Some node Ni sends e-messages to all other
nodes Nj, j gt i. 2. If there is no answer within
time-limit T, Ni elects himself as coordinator
sending this information via a c-message to all
others Nj, j lt i. 3. If Ni got an a-message
within T (i.e. there is an active node with a
higher number), it is awaiting another
time-limit T. He restarts the whole algorithm,
if there is no c-message within T. 4. If Nj
receives an e-message from Ni, it answers with an
a-message to Ni and starts the algorithm for
itself (step 1). 5. If a node N -after having
crashed and being restarted- is active again, it
starts step 1. 6. The node with the highest
number establishes itself as coordinator
74Distributed Systems
Example Bully Algorithm
Nodes 3 and 4 have to start the algorithm due to
their higher number
75Distributed Systems
Ring Algorithm (Le Lann, 1977)
- Each node is part of one logical ring
- Each node knows that logical ring, i.e. its
immediate successor as well - as all other successors.
- 2 types of messages are used
- election-message e to elect the new coordinator
- coordinator-message c to introduce the
coordinator to the nodes - The algorithm is initiated by some node Ni
detecting - that the coordinator no longer works
- This initiating node Ni send an e-message with
its node number i - to its immediate successor Ni1
- If this immediate successor Ni1 does not
answer, it is assumed that - Ni1 has crashed and the e-message is sent to
Ni2
76Distributed Systems
Ring Algorithm (Le Lann, 1977)
- If node Ni receives an e- or c-message, it
contains a list of node numbers - If an e-message does not contain its node number
i, Ni adds its node number - and sends this e-message to Ni1
- If an e-message contains its node number i, this
e-message has circled - once around the ring of all active nodes
- If its an c-message keeps in mind the node with
the highest number in that list - being the new coordinator
- If a c-message has circled once around the
logical ring, its deleted - After having restarted a crashed node you can
use an inquiry-message, - circling once around the logical ring
77Distributed Systems
Ring Algorithm (Le Lann, 1977)
78Distributed Systems
Ring Algorithm (Le Lann, 1977)
79Distributed Systems
Ring Algorithm (Le Lann, 1977)
This coordinator-message circles once around the
logical-ring
80Distributed Systems
Comparison of both Election Algorithms
81Distributed Systems
Deadlocks in Distributed Systems
- Prevention (sometimes)
- Avoidance (far too complicated and
time-consuming) - Ignoring (often done in practice)
- Detecting (sometimes really needed)
82Distributed Systems
Deadlocks in Distributed Systems
- In DS a distinction is made between
- Resource deadlock processes are stuck waiting
for resources - held be each other
- Communication dl processes are stuck waiting
for messages - from each other where no messages are in transit
83Distributed Systems
Distributed Deadlocks
- Using locks within transactions may lead to
deadlocks
A deadlock has occurred if the global waiting
graph contains a cycle.
84Distributed Systems
Deadlock Prevention in Distributed Systems
1. Only allow single resource holding (gt no
cycles) 2. Preallocation of resources (gt low
resource efficiency) 3. Forced release to
request 4. Acquire in order ( quite a cumbersome
task to number all resources in a DS) 5.
Seniority rules each application gets a
timestamp. if a senior application
request a resource being held by a
junior, the senior wins.
85Distributed Systems
Deadlock Avoidance in Distributed Systems
Deadlock avoidance in DS is impractical
because 1. Every node must keep the track of
the global state of the DS gt
substantial storage and communication
overhead 2. Checking for a global state safe
must be mutual exclusive 3. Checking for safe
states requires substantial processing and
communication overhead if there are many
processes and resources
86Distributed Systems
Deadlock Detection in Distributed Systems
Increased problem If there is a deadlock in
general resources from different nodes are
involved Several approaches
1. Centralized Control 2. Hierarchical
control 3. Distributed Control
In any case Deadlocks must be detected within a
finite amount of time
87Distributed Systems
Deadlock Detection in Distributed Systems
- Corretness in a waiting-graph depends on
- progress
- safety
88Distributed Systems
Deadlock Detection in Distributed Systems
- General remarks
- Deadlocks must be detected within a finite
amount of time - Message delay and out of date data may cause
false cycles - to be detected (phantom deadlocks)
- After a possible deadlock has been detected,
- one may need to double check that it is a real
one!
89Distributed Systems
Deadlock Detection in DS Centralized Control
- local and global deadlock detector (LDD and GDD)
- (if a LDD detects a local deadlock it resolves
it locally!). - The GDD gets status information from the LDD
- on waiting-graph updates
- periodically
- on each request
- (if a GDD detects a deadlock involving
resources at two or more nodes, - it resolves this deadlock globally!)
90Distributed Systems
Deadlock Detection in DS Centralized Control
- Major drawbacks
- The node hosting the GDD is a point of single
failure - Phantom deadlocks may arise because
- the global waiting graph is not up to date
91Distributed Systems
Deadlock Detection in DS Hierarchical Control
- hierarchy of deadlock-detectors (controllers)
- waiting graphs (union of waiting graphs of its
children) - deadlocks resolved at lowest level possible
92Distributed Systems
Deadlock Detection in DS Hierarchical Control
Each node in the tree (except a leaf node) keeps
track of the resource allocation information of
itself and of all its successors gt
A deadlock that involves a set of resources will
be detected by the node that is the common
ancestor of all nodes whose resources are among
the objects in conflict.
93Distributed Systems
Distributed Deadlock Detection in DS (Obermark,
1982)
- no global waiting-graph
- deadlock detection cycle
- wait for information from other nodes
- combine with local waiting-information
- break cycles, if detected
- share information on potential global cycles
Remark The non-local portion of the global
waiting-graph is an abstract node ex
94Distributed Systems
Distributed Deadlock Detection in DS (Obermark,
1982)
Situation on node x
Already a deadlock???
ex
No local deadlock
95Distributed Systems
Distributed Deadlock Detection in DS
(Chandy/Misra/Haas 1983)
- a probe message lti, j, kgt is sent whenever a
process blocks - this probe message is sent along the edges of
the waiting-graph - if the recipient is waiting for a resource
- if this probe message is sent to the initiating
process, - then there is a deadlock
96Distributed Systems
Distributed Deadlock Detection in DS
(Chandy/Misra/Haas)
- If a process P has to wait for a resource R it
sends a message - to the owner O of that resource.
- This message contains
- PID of waiting process P
- PID of sending process S
- PID of receiving process E
- The receiving process E checks, if E is also
waiting. If so, - it modifies the message
- First component of message still holds
- 2. Component is changed to PID(E)
- 3. Component is changed to the PID of that
process - process E is waiting for.
- If the message ever reaches the waiting process
P, then there is a deadlock.
97Distributed Systems
Example of Distributed Deadlock Detection in DS
(0, 8, 0)
(0,4,6)
P6 P8
P4
(0,1,2)
P0 ? P1 ?P2
P3
(0,5,7))
P7
P5
Node 1
Node 2
Node 1
98Distributed Systems
Deadlock Detection in DS Distributed Control
Recommended Reading Knapp, E. Deadlock
Detection in Distributed Databases, ACM Comp.
Surveys, 1987 Sinha, P. Distributed Operating
Systems Concepts and Design,
IEEE Computer Society, 1996 Galli,
D. Distributed Operating Systems Concepts and
Practice, Prentice Hall, 2000
99Distributed Systems
Deadlocks in Message Communication
1. Deadlocks may occur if each member of a
specific group is waiting for a message of
another member of the same group. 2.
Deadlocks may occur due to unavailability of
message buffers etc. Study for yourself Read
Stallings Chapter 14.4., p. 615 ff
100Distributed Systems
Multicast Paradigm
d
c
P
P
a
a
a
a
a b
b c
P
P
P
P
P
P
- Ordering (unordered, FIFO, Causal, Agreed)
- Delivery guarantees (unreliable, reliable,
safe/stable) - Open groups versus closed groups
- Failure model (omission, fail-stop,
crash-recovery, network partitions)
101Distributed Systems
Traditional Protocols for Multicast
- Example TCP/IP a point-to-point interconnection
- Automatic flow control
- Reliable delivery
- Connection service
- Complexity (n2)
- Linear degradation in performance
Remark More on Linux-Multicast see
www.cs.washington.edu/esler/multicast/
102Distributed Systems
Traditional Protocols for Multicast
- Example Unreliable broadcast/multicast (UDP,
IP-Multicast) - Employs hardware support for broadcast and
multicast - Message losses 0.01 at normal load, more than
30 at high load - Buffers overflow (in the network and in the OS)
- Interrupt misses
- No connection service
103Distributed Systems
IP-Multicast
- Multicast extension to IP
- Best effort multicast service
- No accurate membership
- Class D addresses are reserved for multicast
- 224.0.0.0 to 239.255.255.255 are used as group
addresses - The standard defines how hardware Ethernet
multicast addresses - can be used if these are possible
104Distributed Systems
IP-Multicast Locical Design
105Distributed Systems
IP Multicast
- Extensions to IP inside a host
- Host may send IP multicast using a multicast
- by using address as the destination address
- Host manages a table of groups and local
- application processes that belong to this group
- When a multicast message arrives at the host, it
delivers - copies of it to all of the local processes that
belong to that group - A host acts as a member of a group only if it
has at least - one active process that joined that group
106Distributed Systems
IP Multicast Group Management
- Extensions to IP within one sub-net (IGMP)
- A multicast router periodically sends queries to
all hosts - participating in IP multicast on the special
224.0.0.1 all-hosts group - Each relevant host sets a random timer for each
group it is member of. - When the timer expires, it sends a report
message on that group multicast access. - Each host that gets a report message for a group
- cancels its local timer for that group
- When a host joins a group it announces that on
the group multicast address
Remark We have to skip further interesting
topics like backbones, multicast routing,
reliable multicast services (see other
specialized lectures).