Title: PeertoPeer and GRID Computing, 2G1526 Lecture 03
1Peer-to-Peer and GRID Computing, 2G1526Lecture
03 04
- Seif Haridi
- Seif_at_imit.kth.se
2Formal Models for Message Passing Systems
Asynchronous Systems Synchronous Systems
3Formal Models for Message Passing Systems
- Synchronous and asynchronous message passing
systems - No failure
- Basic complexity measures
- Pseudocode conventions for describing message
passing algorithms
- Processors
- Communication channels
- Bidirectional between two processors
- Topology
- Pattern of connection
- Undirected graph where each node is processor,
and an edge is a communication channel - Algorithm for a message passing system
- Local program on each processor
- Processor performs local computation, send and
receive messages to/from each of its neighbors
5System (formal)algorithm
- An algorithm/system
- n processors, p0,,pn-1 i is the index of
processor pi - Each pi is modeled as a (possibly infinite) state
machine, with state set Qi - A subset Ii of Qi contains all initial states
- The edges incident on pi are labeled with
integers 1,,r where r is the degree of the node
pi - Each state of pi contains 2r special components,
outbufil and inbufil, for every l, 1lr
6System (formal)algorithm
- An algorithm/system (continued)
- outbufil holds messages that pi sent to its
neighbor over the lth channel but have not yet
been delivered to the neighbor - inbufil holds messages that has been delivered
to pi on its lth channel but have not yet been
processed with an internal computation step - In an initial state every inbufil is empty,
outbufil may not be
7System (formal)algorithm
- Accessible state of pi
- Internal variables/registers
- inbufi. components (not outbuf. components)
- pis transition function (pis computation step)
- Takes as input the accessible state of pi
- Produces a value for the accessible state of pi,
in which all inbufi. are empty - Produces at most one message for each outbufi.
8System (formal)algorithm
- Message previously sent by pi cannot influence
pis current computation step - Each step processes all the messages waiting to
be delivered to pi - Results in a state change and at most one message
to be sent to each neighbor
- A configuration
- Describes a state of the whole system
- Is a vector C(q0,,qn-1)
- qi is a state of pi
- The states (values) of outbuf variables represent
messages in transit on the communication channels - An initial configuration is C(q0,,qn-1) where
each qi ? Ii is an initial state of pi
- Events are actions that take place in a
distributed system/algorithm - Computation event
- comp(i) a computation step of pi where pis
transition function is applied on its current
accessible state - Delivery event
- del(i, j, m) the delivery of message m from pi
to pj
- The behavior of a system is modeled as an
execution - Execution
- An sequence of alternating configurations and
events - C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- Ck is a configuration
- ?k is an event
- The sequence must satisfy a variety of conditions
(called safety and liveness conditions)
12Executions (Continued)
- An execution
- A sequence that satisfies all required safety
conditions for a particular system type under
study - An admissible execution
- In addition the sequence satisfied all required
liveness conditions - System types
- Asynchronous message passing
- Synchronous message passing
13Asynchronous Systems
- An asynchronous system
- No fixed time bound on how long it takes for a
message to be delivered - No fixed time bound on how much time elapses
between to consecutive steps of a processor - Example (Internet)
- An email message can take days to arrive, but
normally it takes few seconds - In real system, there are upper bounds on message
delays and processor step times, but sometimes
very large, and change over time
14Asynchronous SystemsExecution Segments
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- Ck is a configuration
- ?k is an event
- If ? is finite it must end in a configuration
- An execution is an execution segment where C0 is
an initial configuration
15Asynchronous SystemsExecution Segments
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- If ?k del(i,j,m)
- m must be in outbufil in Ck-1
- l is the pis label for the channel pi,pj
- Changes from Ck-1 to Ck
- m is removed from outbufil
- M is added to inbufjh
- h is the pjs label for the channel pi,pj
16Asynchronous Systemsdel(i,j,m)
- Example del(3,0,m)
- A message m from p3 to p0
- m is in outbuf31
- m is removed from outbuf31 and placed in
17Asynchronous SystemsExecution Segments
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- If ?k comp(i)
- Changes from Ck-1 to Ck
- pi changes state according to its transition
function and its accessible state in Ck-1 - inbufi. variables are emptied
- The set of output messages (according to the
transition function) are added to outbufi.
18Asynchronous Systems (AS)Executions
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- In AS there are multiple executions depending on
- The choice of ?k at Ck-1
- A unique execution is determined by the choice of
the sequence - ?1, ?2, ?3,
- This sequence is called a schedule
19Asynchronous Systems (AS)Schedules
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- An execution is uniquely determined by the
initial configuration C0 and a schedule ? - Denoted by exec(C0, ?)
20Asynchronous Systems (AS)Admissible Executions
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- In AS an execution is admissible if
- Each processor has infinite number of computation
events - Every message sent is eventually delivered
- A schedule is admissible if it is the schedule of
admissible execution
21Asynchronous Systems (AS)Admissible Executions
- Remarks, the requirement
- Each processor has infinite number of computation
events - Models that processors do not fail
- Processor termination is modeled by
- Having the transition function not changing the
processors state after reaching certain point in
an execution - Performing dummy steps
22Asynchronous Systems (AS)Complexity Measures
- The number of messages
- The amount of time
- We are looking at worst-case performance
- We need a notion of termination of a
system/algorithm - The system has terminated if
- All processors are in terminated states
- No messages are in transit
23Asynchronous Systems (AS)Message Complexity
- Message complexity of an algorithm A in AS is the
maximum, over all admissible executions of A, of
the total number of messages sent
24Asynchronous Systems (AS)Time Complexity
- The time an AS algorithm takes is less obvious
- We make ideal assumptions with the following
intuition - The message delay in any execution is one unit
time - Independent computation events at different
processors happen simultaneously - Calculate time until termination
25Asynchronous Systems (AS)Timed Execution
- Each event has an associated nonnegative integer
- Models the time at which the event occurs
- comp(i) event occurs at pi
- del(i,j,m) occurs at pi and pj
- The times starts at 0, and are nondecreasing, but
strictly increasing for each processor - Several events can happen at the same time if
they occur on different processors
26Asynchronous Systems (AS)Timed Execution
- Message delay for m is the amount of time m waits
in the senders outbuf together with the amount
of time m waits in the recipients inbuf - Time complexity in AS is the maximum time until
termination (among all admissible timed
executions) in which message delay is one
27Asynchronous Systems (AS)Algorithm Descriptions
- Algorithms will be described in an event-driven
fashion - The effect of each message is described
individually - upon receiving ?M? ?some action?
- Processors can be triggered by other events
- upon event ?a? ?some action?
28Synchronous Systems
- In synchronous system
- Processors execute in lockstep
- Execution is partitioned into rounds
- At each round
- Each processor can send a message to each
neighbor - Messages are delivered
- Each processor compute based on received messages
- This means that message delivery delays are
predictable, and have an upper bound - This model is simpler for constructing
distributed algorithms
29Synchronous SystemsExecution Segments
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- Ck is a configuration
- ?k is an event
- The execution sequence is constrained
- Partitioned into disjoint rounds
- A round consists of a delivery event for every
message in an outbuf variable - Followed by one computation step for every
30Synchronous Systems (SS)Admissible Executions
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- In SS an execution is admissible if it is
infinite - Implies that every message sent is eventually
delivered - There is only one single execution for any
initial configuration - This is in contrast to asynchronous systems
(multiple executions for a given initial
31Spanning Tree Algorithms
32Broadcast and Convergecast on a Spanning Tree
- What is a spanning tree?
- Broadcast
- Convergecast
33BackgroundGraphs and Spanning Trees
- An undirected graph is a pair (V,E)
- V is the node set of G
- E is a collection of unordered pairs from V
- An element of E is v, u with u, v ? V
- The edge v, u is incident on u (and v)
- The degree of a node is the number of its
neighbors - A path of length k between v0 and vk is a
sequence ? v0,, vk?, such that for each iltk,
34BackgroundGraphs and Spanning Trees
- The distance between u, v ? V, d(u, v) is the
length of the shortest path between u and v - The diameter of a graph is the largest distance
between any two nodes - An undirected graph is connected if there is a
path between every pair
35BackgroundGraphs and Spanning Trees
- A cycle is a path ? v0,, vk? in which v0 vk
- A cycle is simple if the nodes v1 through vk are
all different - An undirected graph is acyclic if it contains no
simple cycle of length three or more
36BackgroundGraphs and Spanning Trees
A graph withsimple cycle ?a,b,a? An acyclic graph
An acyclic graph
37BackgroundGraphs and Spanning Trees
This graph isundirected and cyclic
38BackgroundGraphs and Spanning Trees
- G (V,E) is a subgraph of G if V?V and E?E
- G is a spanning subgraph if VV
39BackgroundGraphs and Spanning Trees
- G (V,E) is a subgraph of G if V?V and E?E
- G is a spanning subgraph if VV
40BackgroundSpanning Trees
- A tree is a graph that contains a minimal number
of edges connecting its nodes - Computations on trees have a low message
complexity - A tree is
- an undirected
- connected
- acyclic graph
- A spanning tree T of a graph G is a spanning
subgraph that is a tree
G is a spanning tree
- The following is equivalent for an undirected
graph G - G is a tree
- Between any two nodes there is a unique simple
path - G is connected and EN-1
- G is acyclic and EN-1
- G is acyclic but becomes cyclic if any edge is
G is a spanning tree
42BackgroundRooted Trees
- A tree T is rooted if there is unique node r
called the root - If u is a node on the path between v and r, u is
an ancestor of v, and v is a descendant of u - If u and v are neighbors then u is the father of
v, and v is a child of u - The depth of a tree is the maximal simple path
from r to any node
G is a spanning tree
43Broadcast and Convergecast on a Spanning Tree
- What is a spanning tree?
- Broadcast
- Convergecast
44Broadcast on a Spanning Tree
- A spanning tree of a network is given
- A distinguished processor pr wants to disseminate
a message ?M? to all processors - The tree is rooted at pr
- Each processor has a channel to its parent and a
set of channels to children
45Broadcast on a Spanning Tree
- pr sends ?M? on all channels leading to its
children and terminates - When a processor receives ?M? from its parent
channel, it send it on all its children channels
46Spanning Tree Broadcast Algorithm (Pseudo Code)
- Code for pr
- Upon receiving no message
- send ?M? to all children
- terminate
- Code for pi, 0?i?n-1, i ? r
- Upon receiving ?M? from parent
- send ?M? to all children
- terminate
47Spanning Tree Broadcast Algorithm (State
Transition Level)
- The state of each pi contains the variables
- parenti contains either a processor index or nil
- childreni contains a set of processor indices
- terminatedi a Boolean initially false
- Initially the values of parent and children
variables form a spanning tree rooted at pr,
outbuf and inbuf variables are empty
48Spanning Tree Broadcast Algorithm (State
Transition Level)
- The results of comp(pr) in the initial
configuration is that - ?M? is placed in outbufrj for each j in
childrenr - terminatedr is set to true
- The only thing that can happen after that is at
least one del(r,j, ?M?) where pj is a child or pr - comp(pi), s.t. i?r is similar to comp(pr)
49Broadcast on a Spanning TreeMessage complexity
- ?M? is sent exactly once on each channel that is
an edge in the spanning tree rooted at pr - The number of messages is equal to the number of
edges in the spanning tree - Which is n-1
50Broadcast on a Spanning TreeTime Complexity
- Think of the timed execution model where message
delay in 1 for all del(m,i,j), and comp(i) for
all pi, takes 0 time - That is we ignore comp(i) times
51Broadcast on a Spanning TreeTime Complexity
- At time 0, M is in outbufs of pr
52Broadcast on a Spanning TreeTime Complexity
- At time 1, M is delivered to all children
- The children perform a computation step and M is
now in the outbufs of the children
53Broadcast on a Spanning TreeTime Complexity
- At time 2, M is delivered to all children
- The children perform a computation step and M is
now in the outbufs of the children
54Broadcast on a Spanning TreeTime Complexity
- In every admissible execution of the broadcast
algorithm in AS, every processor at distance t
from pr in the spanning tree receives ?M? by time
55Broadcast on a Spanning TreeTime Complexity
- At time 1 processors at distance 1 from pr
receive and process ?M? - Assume at t-1 processors at distance t-1 receives
and processes ?M? - Since message delay is one, processors at time t
processors at distance t receives ?M?
56Broadcast on a Spanning TreeTime Complexity
- The time complexity is d where d is the depth of
the spanning tree root at pr
57Convergecast on a Spanning Tree
- Collecting information from the nodes of the tree
to the root - We consider an instance where is maximum of n
variables is forwarded to the root - xi is stored on pi
- The algorithm is initiated by the leaves
58Convergecast on a Spanning TreeAlgorithm
- If a node pi is a leaf, it sends its value xi to
its parent - A non-leaf node pj with k children waits to
receive messages containing vj1,,vjk from its
children pj1,,pjk - Pj computes vjmax(xj,vj1,,vjk) and sends vj to
its parent
p4 x2,x4
59Convergecast on a Spanning TreeAlgorithm
- There is an asynchronous convergecast algorithm
with message complexity n-1 and time complexity
d, when a rooted spanning tree with depth d is
known - Broadcast and convergecast can be combined, so
that the broadcast initiates a request to perform
a convergecast when a leaf receives the request
it starts the convergecast
60Next Lecture
- The synchronous model
- Spanning tree constructions and flooding
- Revisiting election algorithms
61Flooding and Building a Spanning Tree
- In cliques, or complete graphs, each pair of
nodes is directly connected by an edge - The following is equivalent for an undirected
graph G - G is a clique
- E u,v u,v?V and u?v
- E 1/2n(n-1)
- Each node has a degree n-1
- The problem
- Broadcast without preexisting spanning tree,
starting from a distinguished processor pr - In the asynchronous system
- In the synchronous system
64Flooding (Asynchronous)
- The algorithm (outline)
- pr sends the message ?M? to all its neighbors
- When a processor pi receives ?M? for the first
time from some neighbor pj, it sends ?M? to all
neighbors except pj
65Execution of the flooding algorithms (two steps)
66Execution of the flooding algorithms (steps 3 4)
67Flooding (Asynchronous)
- The algorithm induces a spanning tree rooted at
pr - The parent of pi is the processor from which pi
received its first message - If pi receives multiple messages before a
comp(i), parent is chosen arbitrarily among the
senders - The spanning tree is implicit
- Each processor knows the parent, but does not
know the children
68Spanning Tree Construction (informal algorithm
- Pr sends ?M? to all its neighbors
- When pi receives ?M? for the first time from,
say, pj - pi denotes pj as its parent and sends a ?parent?
message to pj - pi sends ?M? to all neighbors except pj
- When pi receives ?M? later on from, say, any
processor pj - pi sends ?already? to pj (indicating it is in the
69Spanning Tree Construction (informal algorithm
- After sending ?M? to all other neighbors pi waits
for either ?parent? or ?already? - ?parent? from pj pj is denoted as a pis child
- ?already? from pj pj is denoted as other
- When all recipients of pis ?M? responded
(?parent? or ?already?) pi terminates
70Spanning Tree Construction (informal algorithm)
- Pr sends ?M? to all its neighbors
- When pi receives ?M? for the first time from,
say, pj - pi denotes pj as its parent and sends a ?parent?
message to pj - pi sends ?M? to all neighbors except pj
- When pi receives ?M? later on from, say, any
processor pj - pi sends ?already? to pj (indicating it is in the
tree) - After sending ?M? to all other neighbors pi waits
for either ?parent? or ?already? - ?parent? from pj pj is denoted as a pis child
- ?already? from pj pj is denoted as other
- When all recipients of pis ?M? responded
(?parent? or ?already?) pi terminates
71Flooding to Construct Spanning Tree (Pseudo Code)
for Processor pi, 0in-1
- Initially parent ?, children ?, others ?
- Upon receiving no message
- if pi pr and parent ? then // root did
not send ?M? - send ?M? to all neighbors
- parenti pi
72Flooding to Construct Spanning Tree (Pseudo Code)
for Processor pi, 0in-1
- Initially parent ?, children ?, others ?
- Upon receiving ?M? from neighbor pj
- if parent ? then
- parent pj
- send ?parent? to pj
- send ?M? to all neighbors except pj
- else send ?already? to pj
73Flooding to Construct Spanning Tree (Pseudo Code)
for Processor pi, 0in-1
- Initially parent ?, children ?, others ?
- Upon receiving ?parent? from neighbor pj
- add pj to children
- if children ? others contains all neighbors
except parent then - terminate
- Upon receiving ?already? from neighbor pj
- add pj to others
- if children ? others contains all neighbors
except parent then - terminate
74Two Steps in the Construction of the Spanning Tree
75Spanning Tree Construction (AS)
- In every admissible execution in the asynchronous
model, the algorithm constructs a spanning tree
of the network rooted at pr - Once a parent variable is set, it never changes
- The set of children of a processor never
decreases - If pj is a child of pi, then pi is pjs parent
- The resulting graph G is a directed spanning tree
rooted at pr
76Spanning Tree Construction (AS)
- There is an asynchronous algorithm to find a
spanning tree of a network (graph) of m edges and
a diameter D, given a distinguished node, with
message complexity O(m) and time complexity O(D)
77BackgroundTypes of Spanning Trees
- BFS (Breadth First Search) tree
- In a BFS spanning tree with a root r, any node v
reachable from r, the path from r to v is a
shortest path from r to v in the graph G
78BackgroundTypes of Spanning Trees
frond edges
- DFS (Depth First Search) tree
- A spanning tree is a DFS if each frond edge
connects a node and its descendant
79Spanning Tree Construction on the Synchronous Case
- The same algorithm
- But the spanning tree is constructed is
guaranteed to be BFS tree - In a SS a round is
- Delivery of all messages
- Followed by one computation step of all processors
80Spanning Tree Construction on the Synchronous
Case 1/2
round 1
round 2
81Spanning Tree Construction on the Synchronous
Case 2/2
round 3
round 4
82Constructing a Depth First spanning Tree for a
Specified Root
- DFS (depth-first search) tree
- Adding on node at a time
83Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, children ?, unexplored
all neighbors of pi// root wakes up - Upon receiving no message
- if pi pr and parent ? then
- parent pi
- explore()
84Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, children ?, unexplored
all neighbors of pi - procedure explore()
- if unexplored ? ? then
- let pk be a processor in unexplored
- remove pk from unexplored
- send ?M? to pk
- else
- if parent ? pi then send ?parent? to
parent - terminate
85Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, children ?, unexplored
all neighbors of pi Upon receiving ?M? from
neighbor pj - if parent ? then
- parent pj
- remove pj from unexplored
- explore()
- else
- send ?already? to pj
- remove pj from unexplored
86Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, children ?, unexplored
all neighbors of pi - Upon receiving ?parent? from neighbor pj
- add pj to children
- explore()
- Upon receiving ?already? from neighbor pj
- explore()
87Constructing a Depth First Spanning Tree for a
Specified Root
- Message complexity
- Number of edges is m
- Each processor sends ?M? at most once on each
adjacent edge - We get 2m messages
- Each processor sends at most either ?parent? or
?parent? on each adjacent edge - We get here too 2m messages
- Thus total is 4m messages
- Time complexity is O(m)
88Constructing DFS Spanning Tree without a
Specified Root
- We assume that nodes have unique identifiers
(natural numbers) - Each processor that wakes up attempts to build a
DFS tree with itself as root - If two DFS trees try to connect to the same node,
the node will join the DFS tree whose root has
the higher identifier
89Constructing DFS Spanning Tree without a
Specified Root
- Each node keeps the maximal identifier it has
seen so far in a variable leader - When a node wakes up, it sets leader to its own
identifier - When a node receives a DFS message with
identifier y - If y gt leader, the node changes leader to y, and
set parent to node from which the message is
received - If y leader, the node belongs to this spanning
tree - If y lt leader, no messages are sent
90Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, leader -1, children ?,
- unexplored all neighbors of pi//
wakes up spontaneously - Upon receiving no message
- if parent ? then
- leader id
- parent pi
- explore()
91Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, leader -1, children ?,
- unexplored all neighbors of pi
- procedure explore()
- if unexplored ? ? then
- let pk be a processor in unexplored
- remove pk from unexplored
- send ?leader, leader? to pk
- else
- if parent ? pi then send ?parent, leader?
to parent - else terminate as root of spanning tree
92Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, leader -1, children ?,
unexplored all neighbors of pi - Upon receiving ?leader, newId? from neighbor pj
- if leader lt newId then
- leader newId
- parent pj children ?
- unexplored all neighbors of pi
except pj - explore()
- elseif leader newId then
- send ?already, leader? to pj
- remove pj from unexplored
93Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, leader -1, children ?,
unexplored all neighbors of pi - Upon receiving ?parent, newId? from neighbor pj
- if newId leader then
- add pj to children
- explore()
- Upon receiving ?already, newId? from neighbor pj
- if newId leader then explore()
94Next Lecture