Title: Distributed Systems
1Distributed Systems
Distributed Systems Clocks Concurrency Deadlocks
2Distributed Systems
Intended Schedule for this Lecture
Times in DS
How to adjust your clock?
Concurrency
Centralized Algorithm
Token-based Algorithm
Voting Algorithm(
Ricart
-
Agrawala
)
Election Algorithms(Bully, Ring)
Determining a specific node
- Transactions
- Principles (see course systemarchitecture)
- Distributed Transactions
See Jean Bacon Concurrent Systems
Deadlock
Centralized Detection
Path Pushing
Distributed Detection of Cycles(
Chandy
-
Misra
-Haas)
3Distributed Systems
Election Algorithms
Suppose, your centralized lock manager crashes
for a longer period of time. Then you need a new
one, i.e. you have to elect a new one. How to
do that in a DS?
- The 2 major election algorithms are based upon
- each node has a unique node number
- (i.e. there is a total ordering of all nodes)
- node with highest number of all active nodes is
coordinator - after a crash a restarting node is put back to
the set - of active nodes (it may become the new
coordinator)
4Distributed Systems
Bully Algorithm (Garcia-Molina, 1982)
Goal Find the active node with the highest
number, tell him to be the coordinator
and tell this all other nodes, too
Start The algorithm may start at any node at any
time. A node recognizes that the previous
coordinator is no longer active.
- Message types
- Election messages e, initiating the election
- Answer message a, confirming the reception of an
e-message - Coordinator messages c, broadcasting the new
coordinator
5Distributed Systems
Steps of Bully Algorithm
1. Some node Ni sends e-messages to all other
nodes Nj, j gt i. 2. If there is no answer within
a time-limit T, Ni elects himself as coordinator
sending this information via c-messages to all
others Nj, j lt i. 3. If Ni got an a-message
within T (i.e. there is an active node with a
higher number), it is awaiting another
time-limit T. Then it restarts the whole
algorithm, if there is no c-message within
that time limit T. 4. If Nj receives an
e-message from Ni, it answers with an a-message
to Ni and starts the algorithm for itself
(step 1). 5. If a node N -after having crashed
and being restarted- is active again, it starts
step 1. 6. The node with the highest number
establishes itself as coordinator
6Distributed Systems
Example Bully Algorithm
Nodes 3 and 4 have to start the algorithm due to
their higher number
7Distributed Systems
Ring Algorithm (Le Lann, 1977)
- Each node is part of one logical ring
- Each node knows that logical ring, i.e. its
immediate successor as well - as all other successors.
- 2 types of messages are used
- election-message e to elect the new coordinator
- coordinator-message c to introduce the
coordinator to the nodes - The algorithm is initiated by some node Ni
detecting - that the coordinator no longer works
- This initiating node Ni send an e-message with
its node number i - to its immediate successor Ni1
- If this immediate successor Ni1 does not
answer, it is assumed that - Ni1 has crashed and the e-message is sent to
Ni2
8Distributed Systems
Ring Algorithm (Le Lann, 1977)
- If node Ni receives an e- or c-message, it
contains a list of node numbers - If an e-message does not contain its node number
i, Ni adds its node number - and sends this e-message to Ni1
- If an e-message contains its node number i, this
e-message has circled - once around the ring of all active nodes
- If its an c-message keeps in mind the node with
the highest number in that list - being the new coordinator
- If a c-message has circled once around the
logical ring, its deleted - After having restarted a crashed node you can
use an inquiry-message, - circling once around the logical ring
9Distributed Systems
Ring Algorithm (Le Lann, 1977)
10Distributed Systems
Ring Algorithm (Le Lann, 1977)
11Distributed Systems
Ring Algorithm (Le Lann, 1977)
This coordinator-message circles once around the
logical-ring
12Distributed Systems
Comparison of both Election Algorithms
13Transactions
Introduction Example Elements of a
Transaction Architecture of a Transaction
System Serializability Two-Phase-Lockin
g Protocol Recovery
14Transactions
Transactions
Notion A transaction is a sequence of
operations performing a single logically
composite function on a shared data base.
Remark The notion transaction derives from
traditional business deals
- You can negotiate changes until you sign on the
bottom line
- And your peer is also stuck
15Transactions
Some Examples
- Reserve a seat in an airplane to NY
- Transfer money from your account to mine
- Withdraw money from an automatic teller machine
- Buy a book from amazon.com
- Apply a change to a name server
16Distributed Transactions
Transactions
17Distributed Transactions
Assumptions
- Homogeneous system, each node has a local
transaction manager TM - Each node manages its own data (no replicas)
- Each transaction send its operations to its
local transaction manager TM - If the data is not local, local TM sends request
to corresponding, remote TM - On a commit and on an abort the TM has to notify
all nodes, - being affected by the transaction
18Distributed Transactions
Potential Failures
- Node Failures
- If a node crashes, we assume that the node stops
immediately, - i.e. it does not perform any operations anymore
- The content of volatile memory is lost
- and the node has to restart again
- A node is either active (i.e. working correctly)
or - inactive(I.e. does not respond anymore)
19Distributed Transactions
Potential Failures
- Network Failures
- Broken connection
- Faulty communication software
- Crashed intermediate node (bridge, gateway etc.)
- Lost of a message
- Altered message
- Partitioning of the network
20Distributed Transactions
Managing Failures
- Many failures are handled on lower layers of the
communication software - However, a few of them have to be handled on
layer 7 within the - transaction manager
- The origin of failures on other nodes cannot be
detected - We have to rely on time outs, i.e. we only can
conclude - that there might be a failure
21Distributed Transactions
Coordination of Distributed Transactions
- Central Scheduler, i.e. one node is the only
scheduler - Decentralized coordination
22Distributed Transactions
Centralized Scheduler
T1,1
T1,2
T1,t1
T2,1
T2,2
T2,t2
...
...
Transaction manager TM1
Transaction manager TM2
Scheduler S
communication network
Resource manager RM1
Resource manager RM2
23Distributed Transactions
Centralized Scheduler
- Analysis
- We can use 2 phase locking protocol, S has a
global view on - all locks within the DS
- Single point of failure
- Scheduler may become a bottleneck (bad for
scalability) - Nodes are no longer really autonomous
- Even pure local transaction have to be sent to
the central scheduler -
This is the most inconvenient point of all
drawbacks
24Distributed Systems
Deadlocks in Distributed Systems
- Prevention (sometimes)
- Avoidance (far too complicated and
time-consuming) - Ignoring (often done in practice)
- Detecting (sometimes really needed)
25Distributed Systems
Deadlocks in Distributed Systems
- Prevention (sometimes)
- Avoidance (far too complicated and
time-consuming) - Ignoring (often done in practice)
- Detecting (sometimes really needed)
26Distributed Systems
Deadlocks in Distributed Systems
- In DS a distinction is made between
- Resource deadlock processes are stuck waiting
for resources - held be each other
- Communication dl processes are stuck waiting
for messages - from each other where no messages are in transit
27Distributed Systems
Distributed Deadlocks
- Using locks within transactions may lead to
deadlocks
A deadlock has occurred if the global waiting
graph contains a cycle.
28Distributed Systems
Deadlock Prevention in Distributed Systems
1. Only allow single resource holding (gt no
cycles) 2. Preallocation of resources (gt low
resource efficiency) 3. Forced release to
request 4. Acquire in order ( quite a cumbersome
task to number all resources in a DS) 5.
Seniority rules each application gets a
timestamp. if a senior application
request a resource being held by a
junior, the senior wins.
29Distributed Systems
Deadlock Avoidance in Distributed Systems
Deadlock avoidance in DS is impractical
because 1. Every node must keep the track of
the global state of the DS gt
substantial storage and communication
overhead 2. Checking for a global state safe
must be mutual exclusive 3. Checking for safe
states requires substantial processing and
communication overhead if there are many
processes and resources
30Distributed Systems
Deadlock Detection in Distributed Systems
Increased problem If there is a deadlock in
general resources from different nodes are
involved Several approaches
1. Centralized Control 2. Hierarchical
control 3. Distributed Control
In any case Deadlocks must be detected within a
finite amount of time
31Distributed Systems
Deadlock Detection in Distributed Systems
- Correctness in a waiting-graph depends on
- progress
- safety
32Distributed Systems
Deadlock Detection in Distributed Systems
- General remarks
- Deadlocks must be detected within a finite
amount of time - Message delay and out of date data may cause
false cycles - to be detected (phantom deadlocks)
- After a possible deadlock has been detected,
- one may need to double check that it is a real
one!
33Distributed Systems
Deadlock Detection in DS Centralized Control
- local and global deadlock detector (LDD and GDD)
- (if a LDD detects a local deadlock it resolves
it locally!). - The GDD gets status information from the LDD
- on waiting-graph updates
- periodically
- on each request
- (if a GDD detects a deadlock involving
resources at two or more nodes, - it resolves this deadlock globally!)
34Distributed Systems
Deadlock Detection in DS Centralized Control
- Major drawbacks
- The node hosting the GDD is a point of single
failure - Phantom deadlocks may arise because
- the global waiting graph is not up to date
35Distributed Systems
Deadlock Detection in DS Hierarchical Control
- hierarchy of deadlock-detectors (controllers)
- waiting graphs (union of waiting graphs of its
children) - deadlocks resolved at lowest level possible
36Distributed Systems
Deadlock Detection in DS Hierarchical Control
Each node in the tree (except a leaf node) keeps
track of the resource allocation information of
itself and of all its successors gt
A deadlock that involves a set of resources will
be detected by the node that is the common
ancestor of all nodes whose resources are among
the objects in conflict.
37Distributed Systems
Distributed Deadlock Detection in DS (Obermark,
1982)
- no global waiting-graph
- deadlock detection cycle
- wait for information from other nodes
- combine with local waiting-information
- break cycles, if detected
- share information on potential global cycles
Remark The non-local portion of the global
waiting-graph is an abstract node ex
38Distributed Systems
Distributed Deadlock Detection in DS (Obermark,
1982)
Situation on node x
Already a deadlock???
ex
No local deadlock
39Distributed Systems
Distributed Deadlock Detection in DS
(Chandy/Misra/Haas 1983)
- a probe message lti, j, kgt is sent whenever a
process blocks - this probe message is sent along the edges of
the waiting-graph - if the recipient is waiting for a resource
- if this probe message is sent to the initiating
process, - then there is a deadlock
40Distributed Systems
Distributed Deadlock Detection in DS
(Chandy/Misra/Haas)
- If a process P has to wait for a resource R it
sends a message - to the owner O of that resource.
- This message contains
- PID of waiting process P
- PID of sending process S
- PID of receiving process E
- The receiving process E checks, if E is also
waiting. If so, - it modifies the message
- First component of message still holds
- 2. Component is changed to PID(E)
- 3. Component is changed to the PID of that
process - process E is waiting for.
- If the message ever reaches the waiting process
P, then there is a deadlock.
41Distributed Systems
Example of Distributed Deadlock Detection in DS
(0, 8, 0)
(0,4,6)
P6 P8
P4
(0,1,2)
P0 ? P1 ?P2
P3
(0,5,7))
P7
P5
Node 1
Node 2
Node 1
42Distributed Systems
Deadlock Detection in DS Distributed Control
Recommended Reading Knapp, E. Deadlock
Detection in Distributed Databases, ACM Comp.
Surveys, 1987 Sinha, P. Distributed Operating
Systems Concepts and Design,
IEEE Computer Society, 1996 Galli,
D. Distributed Operating Systems Concepts and
Practice, Prentice Hall, 2000
43Distributed Systems
Deadlocks in Message Communication
1. Deadlocks may occur if each member of a
specific group is waiting for a message of
another member of the same group. 2.
Deadlocks may occur due to unavailability of
message buffers etc. Study for yourself Read
Stallings Chapter 14.4., p. 615 ff
44Distributed Systems
Multicast Paradigm
d
c
P
P
a
a
a
a
a b
b c
P
P
P
P
P
P
- Ordering (unordered, FIFO, Causal, Agreed)
- Delivery guarantees (unreliable, reliable,
safe/stable) - Open groups versus closed groups
- Failure model (omission, fail-stop,
crash-recovery, network partitions)
45Distributed Systems
Traditional Protocols for Multicast
- Example TCP/IP a point-to-point interconnection
- Automatic flow control
- Reliable delivery
- Connection service
- Complexity (n2)
- Linear degradation in performance
Remark More on Linux-Multicast see
www.cs.washington.edu/esler/multicast/
46Distributed Systems
Traditional Protocols for Multicast
- Example Unreliable broadcast/multicast (UDP,
IP-Multicast) - Employs hardware support for broadcast and
multicast - Message losses 0.01 at normal load, more than
30 at high load - Buffers overflow (in the network and in the OS)
- Interrupt misses
- No connection service
47Distributed Systems
IP-Multicast
- Multicast extension to IP
- Best effort multicast service
- No accurate membership
- Class D addresses are reserved for multicast
- 224.0.0.0 to 239.255.255.255 are used as group
addresses - The standard defines how hardware Ethernet
multicast addresses - can be used if these are possible
48Distributed Systems
IP-Multicast Locical Design
49Distributed Systems
IP Multicast
- Extensions to IP inside a host
- Host may send IP multicast using a multicast
- by using address as the destination address
- Host manages a table of groups and local
- application processes that belong to this group
- When a multicast message arrives at the host, it
delivers - copies of it to all of the local processes that
belong to that group - A host acts as a member of a group only if it
has at least - one active process that joined that group
50Distributed Systems
IP Multicast Group Management
- Extensions to IP within one sub-net (IGMP)
- A multicast router periodically sends queries to
all hosts - participating in IP multicast on the special
224.0.0.1 all-hosts group - Each relevant host sets a random timer for each
group it is member of. - When the timer expires, it sends a report
message on that group multicast access. - Each host that gets a report message for a group
- cancels its local timer for that group
- When a host joins a group it announces that on
the group multicast address
Remark We have to skip further interesting
topics like backbones, multicast routing,
reliable multicast services (see other
specialized lectures).