Distributed Systems

About This Presentation

Title:

Distributed Systems

Description:

... timeline view, event 2 must be caused by some passage of information from event 1 ... Happen-before Relationship of events: ... – PowerPoint PPT presentation

Number of Views:133

Avg rating:3.0/5.0

Slides: 107

Provided by: liefl

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Systems

1
Distributed Systems
Distributed Systems Clocks Concurrency Deadlocks

2
Distributed Systems
Intended Schedule for this Lecture

Times in DS

Physical Time

Synchronization of Clocks

Logical Time

Concurrency

Centralized Algorithm

Token-based Algorithm

Voting Algorithm(
Ricart
-
Agrawala
)

Election Algorithms(Bully, Ring)

Deadlock

Centralized Detection

Path Pushing

Distributed Detection of Cycles(
Chandy
-
Misra
-Haas)
3
Motivation
Problems with Time
1. Nobody really has one 2. Every ordinary clock
is unprecise 3. Due to Albert time is really
relative 4. However, we need time stamps in DS
(to order events etc.)
4
Motivation
Why Timestamps in Systems?
In order to 1. do some precise performance
measurements 2. guarantee up-to-date data or
judge the actuality of data 3. establish a
total ordering of objects or operations
(like transactions) 4. what else?
5
Motivation
Lack of a Uniform Global Time in DS
However, due to the nature of non precise clocks
there is no global, unique time in a DS Suppose
there is a central time server being able to
deliver exact times via time-messages to all
nodes of a widely spread DS (e.g. a MAN) gt
due to non deterministic transfer-times of these
time-messages there is no uniform time on all
nodes of the DS
Transfer-time of time-messages from the
central time server to a specific node may vary
over time
6
Distributed Systems
Side Trip into Philosophy and Special Relativity
The Myth of Simultaneity Event 1 and event 2 at
same time
Event 1
Event 2
7
Distributed Systems
Event Timelines (Example of previous Slide)
time
Node 3
Node 4
Node 5
Note The arrows start from an event and end at
an observation. The slope of the arrows
depend of relative speed of propagation
8
Distributed Systems
Causality
Event 1
causes
Event 2
Requirement We have to establish causality, i.e.
each observer must see event 1 before event 2
9
Distributed Systems
Event Timelines (Example of previous Slide)
time
Node 3
Node 4
Node 5
Note In the timeline view, event 2 must be
caused by some passage of information from event
1 if it is caused by event 1
10
Distributed Systems
Example (distributed Unix make)
Editor on computer 1
Local time on computer 1
Compiler on computer 2
Local time on computer 2
Absolute time (Gods clock)
11
Distributed Systems
Physical Time
Some systems really need quite accurate absolute
times.
How to achieve high accuracy? Which physical
entity may deliver precise timing?
1. The sun 2. An Atom
TAI (International Atomic Time)
12
Distributed Systems
Problem with Physical Time
A TAI-day is about 3 msec shorter than a day
gt BHI inserts 1 sec, if the difference
between a day and a TAI-day is more than
800msec gt Definition of UTC universal time
coordinated, being the base of any international
time measure.
13
Distributed Systems
Physical Time

UTC-signals come from radio broadcasting stations
or from satellites (GEOS, GPS) with an accuracy
of
1.0 msec (broadcasting station)
0.1 msec (GEOS)
0.1 µsec (GPS)

Remark Using more than 1 UTC source you may
improve the accuracy
14
Distributed Systems
Clock Synchronization

Adjusting physical clocks
local clock behind reference clock
local clock ahead of reference clock

Observation
Clocks in DS tend to drift apart and need to be
resynchronized periodically
A. If local clock is behind a reference clock
could be adjusted in one jump or
could be adjusted in a series of small jumps
B. What to do if local clock is ahead of
reference clock?

You can adjust by slowing down your local clock,
i.e. ignoring some of its HW clock ticks
15
Distributed Systems
Absolute Clock Synchronization
computer to be synchronized with a UTC-Receiver
UTC-Time Server
t0
request
Ts time to handle the request
tUTC
t1
both time values (t0 and t1) are measured with
the same clock
time
16
Distributed Systems
Absolute Clock Synchronization

Initialize local clock t tUTC
(Problem Time-Message Transfer-Time)
Estimate Message transfer-time, (t1-t0)/2 ? t
tUTC (t1- t0)/2
(Problem Time of the Request Message tr)
Suppose tr is known, ? t tUTC (t1- t0 -
tr)/2
(Problem Message transfer-times are load
dependent)
Multiple measurements (t1 - t0)
Throw away measurements above a threshold value
Take all others to get an average

17
Distributed Systems
Relative Clock Synchronization (Berkeleys
Algorithm)
If you need a uniform time (without a
UTC-receiver per computer), but you can
established a central time-server

Time-server periodically asks all nodes to give
him their times by the clock
The time server can estimate the local times of
all nodes regarding the
involved message transfer times.
Time server uses the estimated local times for
building the arithmetic mean
The corresponding deviations from this
arithmetic mean are sent to the nodes

18
Distributed Systems
Network Time Protocol (NTP)

Goal
absolute (UTC)-time service in large nets (e.g.
Internet)
high availability (via fault tolerance)
protection against fabrication (via
authentication)
Architecture
time-servers build up a hierarchical
synchronization subnet
all primary servers (root level 1 server) have
an UTC-receiver
secondary servers are synchronized by their
corresponding parent primary server
all other stations are leaves on level 3 being
synchronized by level 2 time servers
the accuracy of individual clocks decreases with
increasing level number
the net is able to reconfigure

19
Distributed Systems
Three NTP Modes

Multicast mode (for quick LANs, low accuracy)
server sends periodically its actual time to its
leaves in the LAN via multicast
Procedure-call mode (medium accuracy)
server responds to requests with its actual
timestamp
Symmetric mode (high accuracy used to
synchronize between the time servers)
intermediate exchange of timestamps

Remark In all cases the UDP-transportation
protocol is used, i.e. messages can get lost!
20
Distributed Systems
Some NTP Details
Despite of multicast all other messages are
transferred in pairs, i.e.you note the send-time
as well as the receive-time.
ti-2
ti-1
Server A
m
m
Server B
ti-3
ti
Let o tA - tB be the true time difference
between B relative to A oi the
estimation of o t and t the
corresponding message transfer times for m and
m di t t the total message
transfer-time You can measure di ti - ti-3 -
(ti-1 - t I-2) ?? ti-2 ti-3 t o and ti
ti-1 t - o
21
Distributed Systems
More NTP Details

Successive pairs ltoi,digt may have to filtered
once more, to get better estimates
Time Servers synchronize with various other
time-servers,
typically with one on the same level
and two other ones of a lower level)
Server may choose their synchronization-partners
Measurements in the Internet show, that 99 of
all nodes
have an synchronization error of less than 30
msec.

22
Distributed Systems
Logical Time
In many cases its sufficient just to order the
relevant events, i.e. we want to be able to
position these events relatively, but not
absolutely. The interesting thing is the
relative position of an event on the time
axis Especially we do not need any scaling of
this logical time axis!

A very simple solution is the ring clock (André,
Herman and Verjus 1985)
A clock message circulates
Incremented at each event

23
Distributed Systems
Logical Time

Characteristics of a logical time
causal dependencies have to be mapped correctly
(e.g. send message before receive message)
non related events (from independent activities)
do not have to be ordered
(i.e. they can appear in any order on the
logical time axis)

Assumptions
DS n single-processor nodes
Activity of each node sequence of totally
ordered events EN
3 types of events local events, sends, receives
The total activity of the system is E ?EN
N

24
Distributed Systems
Logical Time
Happen-before Relationship of events Let ?p
denote the local relation happen-before within
node p a ?p b iff a and b are both events on p
and a happens before b. We define the global
happen-before relation ? a ? b holds
iff ? node p a ?p b, or ? message m a
send(m), b receive(m), or ? event c a ? c
and c ? b.
Note The relation happen-before models
potential causality, not necessarily real
causality.
25
Distributed Systems
Logical Time
Concurrency of events Two events a and b are
concurrent , a??b , iff neither a ? b nor b ? a
holds.
26
Distributed Systems
Implies an inherent order
Example
node1
node2
node3
It holds e11 ? e12 ? e21 ? e22 ? e32 ,
furthermore e31 ? e32, whereas e31 is neither
related happen-before to e11, nor to e12, nor
to e21, nor to e22. e31 is concurrent to e11,
e12, e21, and e22.
Remark The relation happen-before ? is also
called causality-relation.
27
Distributed Systems
Lamport Time
With the ordering implied by the
happen-before-relation we can establish the
Lamport time L via simple counters, whereby E
events,
The mapping L E ? N defines the Lamport-time
L, i.e. each e ? E gets a time stamp L(e), as
follows
1. e is a pure local event or a sending-event
if e has no local predecessor, then L(e) 1,
otherwise there is a local predecessor e,
thus L(e) L(e) 1
2. e is a receiving event, with a corresponding
sending-event s if e has no local
predecessor, then L(e) L(s) 1, otherwise
there is a local predecessor e, thus L(e)
maxL(s),L(e) 1
28
Distributed Systems
Example
node 1
node 2
node 3
Note Any node has only a local counter being
incremented with each local event. With
each communication we have to adjust the involved
counters of the two communicating nodes
to be consistent with the happen-before-relation
.
Remark The same mechanism can be used to adjust
clocks on different nodes. The Lamport time is
consistent with the happen-before-relation,
i.e. if x ? y, then L(x) lt L(y), but not vice
versa.
29
Distributed Systems
Example Adjusting local clocks with varying rates
30
Distributed Systems
Relationships between the Notions
The Lamport-Time is consistent with the
causality, but it does not characterize
causality. If x causes y, then x has a smaller
Lamport-time stamp than y, x ? y ? L(x) lt L(y)
However L(x) lt L(y) does not necessarily
imply x causes y !!!
31
Distributed Systems
Vector Time

There is a DS with n nodes.
The n-dimensional vector Vp is the vector-time of
node p,
if it is built according to the following rules
(1) Initially, Vp (0,0)
(2) For a local event on node p Vpp 1
For a send event on p, do the same and
append the new Vp to the message
(4) When receiving a message m with an appended
V(m) on node p
increment Vp as in (2), and later on do
Vp maxV(m), Vp)

Build the maximum componentwise
32
Distributed Systems
Example Vector Time
P1
P2
P3
33
Distributed Systems
Characteristics of the Vector Time
You can define the following relations for the
vector-time A) Suppose u,v are two vector times
of dimension n 1. u ? v ? up ? vp ?p
1, K, n 2. u lt v ? u ? v and u ? v 3. u??
v ? (u ? v ) and (v ? u)
34
Distributed Systems
Characteristics of the Vector Time
The following inter relationships between
causality and vector-time hold A.) e ? e ?
V(e) lt V(e) B.) e ?? e ? V(e) ?? V(e) The
vector-time is the best known estimation for
global sequencing that is based only on local
information.
35
Distributed Systems
Total Ordering of Events
The Lamport-time gives us at least a
partial-ordering of distributed events which is
sufficient for many problems.
However, if we add the unambiguous node number,
we can establish a total-ordering An event e
on node a gets the global time stamp LT(e)
(L(e), a). (L(e),a) lt (L(e),b) ltgt L(e)
lt L(e) or L(e) L(e) and a lt b
36
Distributed Systems
Causal Ordering of Messages
If a message system guaranteeing the original
order of the messages, is an agreeable
characteristic that may ease protocols or
algorithms.
Definition m1 and m2 are two messages being
received at the same node i. A set of mesages is
causally ordered if for all pairs ltm1,m2gt the
following holds send(m1) ? send(m2) ?
receive(m1) ? receive(m2)
Example of non causally ordered messages
P1
P2
P3
37
Distributed Systems
Protocol forcing Causal Ordering of Messages

Each node i maintains a nxn matrix Mi,
initialized to 0,
(i.e. no message was sent up to now).
When sending a message from node i to node j,
increment Mi i,j,
i.e. (i,j, Mi i,j) unambiguously identifies
the message

38
Distributed Systems
Protocol forcing Causal Ordering of Messages

The incremented matrix Mi and the node number i
are
appended to the message, i.e. lt i, Mi,
ltmessagegtgt is sent to node j
Upon receiving a message (with Matrix M) at node
j
first this node j updates its matrix Mj as
follows
?? k,l ?1,n, l ? j Mjk,l max Mjk,l ,
Mk,l and
Mji,j Mji,j 1
Delay this message until the following holds M
?lt Mj,
( A lt B iff ?k,l Ak,l lt Bk,l )
i.e. wait for earlier messages to node j,
having not yet arrived
(could be even a message from the same node i)

39
Distributed Systems
Example
P1
P2
0 1 0 0 0 1 0 1 0
P3
40
Distributed Systems
Concurrency Control

About coping with some sort of conflicts
Locking
Transactions
Time-Stamp Orderings

41
Distributed Systems
Mutual Exclusion
The problem For accessing shared data or for
using of resources we often have to provide
exclusiveness !! The corresponding pieces of
code are named critical sections!
Concurrent accesses are not allowed
Data
Logically we still have a common memory
42
Distributed Systems
Mutual Exclusion
Requirements for a correct solution 1.
Safety Only a single task/threads is allowed
to be in the
critical section! 2. Liveness Each competitor
must enter its critical section
after finite waiting time 3.
Sequence order Waiting in front of a critical
section is handled according to FCFS 4.
Fault tolerance 1. and 2. have to be fulfilled
even in case
of failures.
No Deadlocks No Starvation
43
Distributed Systems
Criteria for Mutual Exclusion

Number of needed messages per critical section
CS, minimal nm
Protocol delay (to evaluate who is the next) per
CS, minimal d
Response time RTCS, time interval between
requesting to enter
a CS and until you leave the CS, minimal RTCS
Throughput TPCS, passing a CS per time unit
(maximize TPCS)
TPCS 1/(d ECS)

44
Distributed Systems
Solutions for Mutual Exclusion in DS

Three major approaches
Centralized lock manager
Token-passing lock manager
Standard Token Algorithm
Enhanced Token Algorithm
Distributed lock manager
Lamport Algorithm
Ricard-Agrawala Algorithm

45
Distributed Systems
Centralized Lock Manager
One task is designated to be the coordinator for
all competing tasks concerning a specific
critical region, CR CSs belonging to the same
mutual exclusion problem Centralized lock
manager CLM controls accesses to CR using a
token which represents permission to enter CS To
enter its CS client sends a request message to
CLM and the waits for a positive answer from the
CLM If no client holds the token the CLM
responds immediately with the token. Otherwise
this request is queued.
46
Distributed Systems
Centralized Lock Manager
Token holder
Client
Client
Client
Server might crash! 1. Client may hold the
token 2. Client may have returned it 3. What
about queued request?
queue
Centralized Lock Manager
Question What problems might arise?
47
Distributed Systems
The queued message is optional. Benefits?
Centralized Lock Manager
Application 1
Application 2
Lock Manager
receive_message
send_message
send_message
send_message
receive_message
receive_message
receive_message
send_message
send_message
A1
A2
queued requests
Note A major drawback of a centralized lock
manager is the single point of failure Another
drawback is the danger of becoming a bottleneck.
The protocol delay is determined by at least two
messages (request, grant)
48
Distributed Systems
Token-Passing Mutual Exclusion
There is a single token for all participants
competing for a critical section. To enter a
critical section an application must posses this
token.
We have to invent a logical ring amongst those
participants and hand over this token within
this logical ring in order to guarantee that each
participant will have the chance to enter the
critical section

The token-passing algorithm
before entering the critical section
an application must await the token

after the critical section each application
sends the token
to the next neighbored participant

if no participants want to enter the critical
section
the token continues circulating

49
Distributed Systems
Logical Ring
Standard Token Algorithm
Current Lock Holder
50
Distributed Systems
Analysis of the Token Based Exclusion
Check out the list of requirements 1. Safety,
yes, due to unique token, only the token
holder may enter its CS 2. Liveness, yes, as
long as the logical ring has only a finite
of nodes 3. Sequence order, no, CLM may change
the internal order of waiting requests 4.
Fault tolerance, no, splitting of the logical
ring and you may be lost.
51
Distributed Systems
Problems with the Token-Passing Mutual Exclusion
1. How do you determine if the token is lost or
is just being used for a very long time?
2. What happens if the location that has the
token crashes for an extended period of time
3. How to maintain a logical ring if a
participant drop out (voluntarily or by
failure) of the system?
4. How to identify and add new participants
joining the logical ring, respectively remove old
ones?
5. That token is perpetually passed over the
logical ring even though none of the
participants wants to enter its CS ?
unnecessary overhead
52
Distributed Systems
Implementation Problems
53
Distributed Systems
Implementation Problems
54
Distributed Systems
Implementation Problems
Question What may happen if you always try to
give the token to the next neighbored node? If
that participant does not wait for it ? poor
performance !
55
Distributed Systems
Implementation Problems
Prob ?1
Question How to solve this problem as a
system-architect if we do not want to change
the philosophy of the standard token algorithm?
56
Distributed Systems
Implementation Solution
Invest another TokenHandler-thread per
application and critical section
Participant on Node i 1
TokenHandler Node i 1
Prob ?1
Non blocking option
Receive(Token fom Nodei)
Send_Request(Token for CrS_1)
Receive(Local_Request)
Receive(Token for CrS_1)
If Local_Request ?
no
yes
Critical Section_1
Send(Local_Request)
Send_Release(Token for CrS_1)
Receive(Local_Release)
Send(Token to Node i2)
57
Distributed Systems
Example Perpetual Passing the Token
CS
CS
Node i
CS
Node j
no need for the token
CS
Node k
no need for the token
Exercise 1 Invent a better token based solution
avoiding the overhead of perpetual token
passing! Hint You have to know who really
wants to get the token!
58
Distributed Systems
Distributed Lock Manager

Though similar to the centralized solution
there are additional problems to solve
Who sends messages when and to whom?
Who receives messages when and from whom?
Which messages are necessary to enter a critical
section?

59
Distributed Systems
Distributed Lock Manager

Three message types (2 are required, 1 is
optional)
Request_Message
Queued_Message
Granted_Message

60
Distributed Systems
Request Message
The application wishing to enter its critical
section sends this message to all those
applications (threads) competing for this
critical section. How?

Either n-times individually or via a multi-cast
(see later slides).
Each request message contains a timestamp from
the source.

61
Distributed Systems
Queued Message
This message is only optional and is sent by
those recipients of the request message whenever
the request cannot be granted immediately, i.e.

recipient is currently in the critical section
or
recipient had initiated an earlier request

Remark This message type eases to find out
whether there are dead participants
62
Distributed Systems
Granted Message
Sent to a requesting process from all
participants in two circumstances

recipient is not in its critical section
and has no earlier request
if recipient has queued request it will sent
grant
upon leaving the critical section

63
Distributed Systems
Release Message
After having released the resource sent to all
participants with a queued request-message
Remark Have a closer look on both algorithms in
Stallings, p. 603 - 606, 1. Lamport Time,
Clocks, and the Ordering of Events in a DS, C.
ACM, July 1978 2. Ricart An Optimal Algorithm
for Mutual Exclusion in Computer Networks,
C.ACM January and September 1981
64
Distributed Systems
Ricart/Agrawala-Algorithm
Waiting for Entrance in Critical Section
Requesting Mutual Exclusion
Computation outside of Critical section
Critical Section
Activating Others
65
Distributed Systems
Closer Look on Ricart/Agrawala-Algorithm (1981)

No tokens anymore
Cooperative voting to determine the intended
sequence of CSs
Does not rely on an interconnection media
offering ordered messages
Serialization based on logical time stamps
(total ordering)
If a participant wants to enter its CS it asks
all others for permission
and does not proceed until it has permission of
all other participants
If a participant gets a permission request and
is not interested in its CS,
it returns permission immediately to the
requester.

66
Distributed Systems
Correctness Conditions (1)

All nodes behave identically, thus we just
regard node x
After voting, three groups of requests may be
distinguished
1. known at node x with a time stamp less than Cx
2. known at node x whith a time stamp greater
than Cx
3. those being still unknown at node x

67
Distributed Systems
Correctness Conditions (2)
During this voting process marks may change
according to the following conditions
Condition 1 Requests of group 1 have to be
served or they have to take a time stamp
greater than Cx Condition 2 Requests of group 2
may not get a time stamp smaller than
Cx Condition 3 Request of group 3 must have
time stamps greater than Cx
68
Distributed Systems
Two Phases of the Voting Algorithm
1. Participants at node i willing to enter their
critical section send request messages ei to
all other participants, where ei contains the
actual Lamport time Li of node i. (After
each send, node i increments its counter Ci). 2.
All other participants return permission messages
ai. Node x replies to a request message ei
as soon as all older requests (received at
earlier Lamport times) are completed.
Delay a bit
Cx maxCx,Ci 1
Result If all permission messages have arrived
at node i, the corresponding requester may
enter its critical section.
69
Distributed Systems
Example of the Voting Algorithm
Node i
Node j
Node k
Suppose Mi lt Mk ? the request message Mi has a
smaller time stamp than Mk, we have to delay
the answer for the request message ek in node i !
70
Distributed Systems
Comparison between Mutual Exclusion Algorithms
T Message Transfer Time E Execution Time
of CS
71
Distributed Systems
Election Algorithms
Suppose, your centralized lock manager crashes
for a longer period of time. Then you need a new
one, i.e. you have to elect a new one. How to
do that in a DS?

The 2 major election algorithms are based upon
each node has a unique node number
(i.e. there is a total ordering of all nodes)
node with highest number of all active nodes is
coordinator
after a crash a restarting node is put back to
the set
of active nodes

72
Distributed Systems
Bully Algorithm (Garcia-Molina, 1982)
Goal Find the active node with highest number,
tell him to be the coordinator and tell
this all other nodes, too
Start The algorithm may start at any node, may
be a node recognizing that the previous
coordinator is no longer active.

Message types
Election messages e, initiating the election
Answer message a, confirming the reception of an
e-message
Coordinator messages c, telling, the sender is
the new coordinator

73
Distributed Systems
Steps of Bully Algorithm
1. Some node Ni sends e-messages to all other
nodes Nj, j gt i. 2. If there is no answer within
time-limit T, Ni elects himself as coordinator
sending this information via a c-message to all
others Nj, j lt i. 3. If Ni got an a-message
within T (i.e. there is an active node with a
higher number), it is awaiting another
time-limit T. He restarts the whole algorithm,
if there is no c-message within T. 4. If Nj
receives an e-message from Ni, it answers with an
a-message to Ni and starts the algorithm for
itself (step 1). 5. If a node N -after having
crashed and being restarted- is active again, it
starts step 1. 6. The node with the highest
number establishes itself as coordinator
74
Distributed Systems
Example Bully Algorithm
Nodes 3 and 4 have to start the algorithm due to
their higher number
75
Distributed Systems
Ring Algorithm (Le Lann, 1977)

Each node is part of one logical ring
Each node knows that logical ring, i.e. its
immediate successor as well
as all other successors.
2 types of messages are used
election-message e to elect the new coordinator
coordinator-message c to introduce the
coordinator to the nodes
The algorithm is initiated by some node Ni
detecting
that the coordinator no longer works
This initiating node Ni send an e-message with
its node number i
to its immediate successor Ni1
If this immediate successor Ni1 does not
answer, it is assumed that
Ni1 has crashed and the e-message is sent to
Ni2

76
Distributed Systems
Ring Algorithm (Le Lann, 1977)

If node Ni receives an e- or c-message, it
contains a list of node numbers
If an e-message does not contain its node number
i, Ni adds its node number
and sends this e-message to Ni1
If an e-message contains its node number i, this
e-message has circled
once around the ring of all active nodes
If its an c-message keeps in mind the node with
the highest number in that list
being the new coordinator
If a c-message has circled once around the
logical ring, its deleted
After having restarted a crashed node you can
use an inquiry-message,
circling once around the logical ring

77
Distributed Systems
Ring Algorithm (Le Lann, 1977)
78
Distributed Systems
Ring Algorithm (Le Lann, 1977)
79
Distributed Systems
Ring Algorithm (Le Lann, 1977)
This coordinator-message circles once around the
logical-ring
80
Distributed Systems
Comparison of both Election Algorithms
81
Distributed Systems
Deadlocks in Distributed Systems

Prevention (sometimes)
Avoidance (far too complicated and
time-consuming)
Ignoring (often done in practice)
Detecting (sometimes really needed)

82
Distributed Systems
Deadlocks in Distributed Systems

In DS a distinction is made between
Resource deadlock processes are stuck waiting
for resources
held be each other
Communication dl processes are stuck waiting
for messages
from each other where no messages are in transit

83
Distributed Systems
Distributed Deadlocks

Using locks within transactions may lead to
deadlocks

A deadlock has occurred if the global waiting
graph contains a cycle.
84
Distributed Systems
Deadlock Prevention in Distributed Systems
1. Only allow single resource holding (gt no
cycles) 2. Preallocation of resources (gt low
resource efficiency) 3. Forced release to
request 4. Acquire in order ( quite a cumbersome
task to number all resources in a DS) 5.
Seniority rules each application gets a
timestamp. if a senior application
request a resource being held by a
junior, the senior wins.
85
Distributed Systems
Deadlock Avoidance in Distributed Systems
Deadlock avoidance in DS is impractical
because 1. Every node must keep the track of
the global state of the DS gt
substantial storage and communication
overhead 2. Checking for a global state safe
must be mutual exclusive 3. Checking for safe
states requires substantial processing and
communication overhead if there are many
processes and resources
86
Distributed Systems
Deadlock Detection in Distributed Systems
Increased problem If there is a deadlock in
general resources from different nodes are
involved Several approaches
1. Centralized Control 2. Hierarchical
control 3. Distributed Control
In any case Deadlocks must be detected within a
finite amount of time
87
Distributed Systems
Deadlock Detection in Distributed Systems

Corretness in a waiting-graph depends on
progress
safety

88
Distributed Systems
Deadlock Detection in Distributed Systems

General remarks
Deadlocks must be detected within a finite
amount of time
Message delay and out of date data may cause
false cycles
to be detected (phantom deadlocks)
After a possible deadlock has been detected,
one may need to double check that it is a real
one!

89
Distributed Systems
Deadlock Detection in DS Centralized Control

local and global deadlock detector (LDD and GDD)
(if a LDD detects a local deadlock it resolves
it locally!).
The GDD gets status information from the LDD
on waiting-graph updates
periodically
on each request
(if a GDD detects a deadlock involving
resources at two or more nodes,
it resolves this deadlock globally!)

90
Distributed Systems
Deadlock Detection in DS Centralized Control

Major drawbacks
The node hosting the GDD is a point of single
failure
Phantom deadlocks may arise because
the global waiting graph is not up to date

91
Distributed Systems
Deadlock Detection in DS Hierarchical Control

hierarchy of deadlock-detectors (controllers)
waiting graphs (union of waiting graphs of its
children)
deadlocks resolved at lowest level possible

92
Distributed Systems
Deadlock Detection in DS Hierarchical Control
Each node in the tree (except a leaf node) keeps
track of the resource allocation information of
itself and of all its successors gt
A deadlock that involves a set of resources will
be detected by the node that is the common
ancestor of all nodes whose resources are among
the objects in conflict.
93
Distributed Systems
Distributed Deadlock Detection in DS (Obermark,
1982)

no global waiting-graph
deadlock detection cycle
wait for information from other nodes
combine with local waiting-information
break cycles, if detected
share information on potential global cycles

Remark The non-local portion of the global
waiting-graph is an abstract node ex
94
Distributed Systems
Distributed Deadlock Detection in DS (Obermark,
1982)
Situation on node x
Already a deadlock???
ex
No local deadlock
95
Distributed Systems
Distributed Deadlock Detection in DS
(Chandy/Misra/Haas 1983)

a probe message lti, j, kgt is sent whenever a
process blocks
this probe message is sent along the edges of
the waiting-graph
if the recipient is waiting for a resource
if this probe message is sent to the initiating
process,
then there is a deadlock

96
Distributed Systems
Distributed Deadlock Detection in DS
(Chandy/Misra/Haas)

If a process P has to wait for a resource R it
sends a message
to the owner O of that resource.
This message contains
PID of waiting process P
PID of sending process S
PID of receiving process E
The receiving process E checks, if E is also
waiting. If so,
it modifies the message
First component of message still holds
2. Component is changed to PID(E)
3. Component is changed to the PID of that
process
process E is waiting for.
If the message ever reaches the waiting process
P, then there is a deadlock.

97
Distributed Systems
Example of Distributed Deadlock Detection in DS
(0, 8, 0)
(0,4,6)
P6 P8
P4
(0,1,2)
P0 ? P1 ?P2
P3
(0,5,7))
P7
P5
Node 1
Node 2
Node 1
98
Distributed Systems
Deadlock Detection in DS Distributed Control
Recommended Reading Knapp, E. Deadlock
Detection in Distributed Databases, ACM Comp.
Surveys, 1987 Sinha, P. Distributed Operating
Systems Concepts and Design,
IEEE Computer Society, 1996 Galli,
D. Distributed Operating Systems Concepts and
Practice, Prentice Hall, 2000
99
Distributed Systems
Deadlocks in Message Communication
1. Deadlocks may occur if each member of a
specific group is waiting for a message of
another member of the same group. 2.
Deadlocks may occur due to unavailability of
message buffers etc. Study for yourself Read
Stallings Chapter 14.4., p. 615 ff
100
Distributed Systems
Multicast Paradigm
d
c
P
P
a
a
a
a
a b
b c
P
P
P
P
P
P

Ordering (unordered, FIFO, Causal, Agreed)
Delivery guarantees (unreliable, reliable,
safe/stable)
Open groups versus closed groups
Failure model (omission, fail-stop,
crash-recovery, network partitions)

Multiple groups

101
Distributed Systems
Traditional Protocols for Multicast

Example TCP/IP a point-to-point interconnection
Automatic flow control
Reliable delivery
Connection service
Complexity (n2)
Linear degradation in performance

Remark More on Linux-Multicast see
www.cs.washington.edu/esler/multicast/
102
Distributed Systems
Traditional Protocols for Multicast

Example Unreliable broadcast/multicast (UDP,
IP-Multicast)
Employs hardware support for broadcast and
multicast
Message losses 0.01 at normal load, more than
30 at high load
Buffers overflow (in the network and in the OS)
Interrupt misses
No connection service

103
Distributed Systems
IP-Multicast

Multicast extension to IP
Best effort multicast service
No accurate membership
Class D addresses are reserved for multicast
224.0.0.0 to 239.255.255.255 are used as group
addresses
The standard defines how hardware Ethernet
multicast addresses
can be used if these are possible

104
Distributed Systems
IP-Multicast Locical Design
105
Distributed Systems
IP Multicast

Extensions to IP inside a host
Host may send IP multicast using a multicast
by using address as the destination address
Host manages a table of groups and local
application processes that belong to this group
When a multicast message arrives at the host, it
delivers
copies of it to all of the local processes that
belong to that group
A host acts as a member of a group only if it
has at least
one active process that joined that group

106
Distributed Systems
IP Multicast Group Management

Extensions to IP within one sub-net (IGMP)
A multicast router periodically sends queries to
all hosts
participating in IP multicast on the special
224.0.0.1 all-hosts group
Each relevant host sets a random timer for each
group it is member of.
When the timer expires, it sends a report
message on that group multicast access.
Each host that gets a report message for a group
cancels its local timer for that group
When a host joins a group it announces that on
the group multicast address