Title: PeertoPeer and GRID Computing, 2G1526 Lecture 03
1Peer-to-Peer and GRID Computing, 2G1526Lecture
03 04
- Seif Haridi
- LECS, KTH
- Seif_at_imit.kth.se
2Formal Models for Message Passing Systems
Asynchronous Systems Synchronous Systems
3Formal Models for Message Passing Systems
- Synchronous and asynchronous message passing
systems - No failure
- Basic complexity measures
- Pseudocode conventions for describing message
passing algorithms
4Systems
- Processors
- Communication channels
- Bidirectional between two processors
- Topology
- Pattern of connection
- Undirected graph where each node is processor,
and an edge is a communication channel - Algorithm for a message passing system
- Local program on each processor
- Processor performs local computation, send and
receive messages to/from each of its neighbors
5System (formal)algorithm
- An algorithm/system
- n processors, p0,,pn-1 i is the index of
processor pi - Each pi is modeled as a (possibly infinite) state
machine, with state set Qi - A subset Ii of Qi contains all initial states
- The edges incident on pi are labeled with
integers 1,,r where r is the degree of the node
pi - Each state of pi contains 2r special components,
outbufil and inbufil, for every l, 1lr
6System (formal)algorithm
- An algorithm/system (continued)
- outbufil holds messages that pi sent to its
neighbor over the lth channel but have not yet
been delivered to the neighbor - inbufil holds messages that has been delivered
to pi on its lth channel but have not yet been
processed with an internal computation step - In an initial state every inbufil is empty,
outbufil may not be
7System (formal)algorithm
- Accessible state of pi
- Internal variables/registers
- inbufi. components (not outbuf. components)
- pis transition function (pis computation step)
- Takes as input the accessible state of pi
- Produces a value for the accessible state of pi,
in which all inbufi. are empty - Produces at most one message for each outbufi.
8System (formal)algorithm
- Message previously sent by pi cannot influence
pis current computation step - Each step processes all the messages waiting to
be delivered to pi - Results in a state change and at most one message
to be sent to each neighbor
9Configurations
- A configuration
- Describes a state of the whole system
- Is a vector C(q0,,qn-1)
- qi is a state of pi
- The states (values) of outbuf variables represent
messages in transit on the communication channels - An initial configuration is C(q0,,qn-1) where
each qi ? Ii is an initial state of pi
10Events
- Events are actions that take place in a
distributed system/algorithm - Computation event
- comp(i) a computation step of pi where pis
transition function is applied on its current
accessible state - Delivery event
- del(i, j, m) the delivery of message m from pi
to pj
11Executions
- The behavior of a system is modeled as an
execution - Execution
- An sequence of alternating configurations and
events - C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- Ck is a configuration
- ?k is an event
- The sequence must satisfy a variety of conditions
(called safety and liveness conditions)
12Executions (Continued)
- An execution
- A sequence that satisfies all required safety
conditions for a particular system type under
study - An admissible execution
- In addition the sequence satisfied all required
liveness conditions - System types
- Asynchronous message passing
- Synchronous message passing
13Asynchronous Systems
- An asynchronous system
- No fixed time bound on how long it takes for a
message to be delivered - No fixed time bound on how much time elapses
between to consecutive steps of a processor - Example (Internet)
- An email message can take days to arrive, but
normally it takes few seconds - In real system, there are upper bounds on message
delays and processor step times, but sometimes
very large, and change over time
14Asynchronous SystemsExecution Segments
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- Ck is a configuration
- ?k is an event
- If ? is finite it must end in a configuration
- An execution is an execution segment where C0 is
an initial configuration
15Asynchronous SystemsExecution Segments
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- If ?k del(i,j,m)
- m must be in outbufil in Ck-1
- l is the pis label for the channel pi,pj
- Changes from Ck-1 to Ck
- m is removed from outbufil
- M is added to inbufjh
- h is the pjs label for the channel pi,pj
16Asynchronous Systemsdel(i,j,m)
- Example del(3,0,m)
- A message m from p3 to p0
- m is in outbuf31
- m is removed from outbuf31 and placed in
inbuf03
17Asynchronous SystemsExecution Segments
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- If ?k comp(i)
- Changes from Ck-1 to Ck
- pi changes state according to its transition
function and its accessible state in Ck-1 - inbufi. variables are emptied
- The set of output messages (according to the
transition function) are added to outbufi.
variables
18Asynchronous Systems (AS)Executions
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- In AS there are multiple executions depending on
- The choice of ?k at Ck-1
- A unique execution is determined by the choice of
the sequence - ?1, ?2, ?3,
- This sequence is called a schedule
19Asynchronous Systems (AS)Schedules
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- An execution is uniquely determined by the
initial configuration C0 and a schedule ? - Denoted by exec(C0, ?)
20Asynchronous Systems (AS)Admissible Executions
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- In AS an execution is admissible if
- Each processor has infinite number of computation
events - Every message sent is eventually delivered
- A schedule is admissible if it is the schedule of
admissible execution
21Asynchronous Systems (AS)Admissible Executions
- Remarks, the requirement
- Each processor has infinite number of computation
events - Models that processors do not fail
- Processor termination is modeled by
- Having the transition function not changing the
processors state after reaching certain point in
an execution - Performing dummy steps
22Asynchronous Systems (AS)Complexity Measures
- The number of messages
- The amount of time
- We are looking at worst-case performance
- We need a notion of termination of a
system/algorithm - The system has terminated if
- All processors are in terminated states
- No messages are in transit
23Asynchronous Systems (AS)Message Complexity
- Message complexity of an algorithm A in AS is the
maximum, over all admissible executions of A, of
the total number of messages sent
24Asynchronous Systems (AS)Time Complexity
- The time an AS algorithm takes is less obvious
- We make ideal assumptions with the following
intuition - The message delay in any execution is one unit
time - Independent computation events at different
processors happen simultaneously - Calculate time until termination
25Asynchronous Systems (AS)Timed Execution
- Each event has an associated nonnegative integer
- Models the time at which the event occurs
- comp(i) event occurs at pi
- del(i,j,m) occurs at pi and pj
- The times starts at 0, and are nondecreasing, but
strictly increasing for each processor - Several events can happen at the same time if
they occur on different processors
26Asynchronous Systems (AS)Timed Execution
- Message delay for m is the amount of time m waits
in the senders outbuf together with the amount
of time m waits in the recipients inbuf - Time complexity in AS is the maximum time until
termination (among all admissible timed
executions) in which message delay is one
27Asynchronous Systems (AS)Algorithm Descriptions
- Algorithms will be described in an event-driven
fashion - The effect of each message is described
individually - upon receiving ?M? ?some action?
- Processors can be triggered by other events
- upon event ?a? ?some action?
28Synchronous Systems
- In synchronous system
- Processors execute in lockstep
- Execution is partitioned into rounds
- At each round
- Each processor can send a message to each
neighbor - Messages are delivered
- Each processor compute based on received messages
- This means that message delivery delays are
predictable, and have an upper bound - This model is simpler for constructing
distributed algorithms
29Synchronous SystemsExecution Segments
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- Ck is a configuration
- ?k is an event
- The execution sequence is constrained
- Partitioned into disjoint rounds
- A round consists of a delivery event for every
message in an outbuf variable - Followed by one computation step for every
processor
30Synchronous Systems (SS)Admissible Executions
- Execution segment ?
- C0, ?1, C1, ?2, C2, ?3, (possibly infinite)
- In SS an execution is admissible if it is
infinite - Implies that every message sent is eventually
delivered - There is only one single execution for any
initial configuration - This is in contrast to asynchronous systems
(multiple executions for a given initial
configuration)
31Spanning Tree Algorithms
32Broadcast and Convergecast on a Spanning Tree
- What is a spanning tree?
- Broadcast
- Convergecast
33BackgroundGraphs and Spanning Trees
- An undirected graph is a pair (V,E)
- V is the node set of G
- E is a collection of unordered pairs from V
- An element of E is v, u with u, v ? V
- The edge v, u is incident on u (and v)
- The degree of a node is the number of its
neighbors - A path of length k between v0 and vk is a
sequence ? v0,, vk?, such that for each iltk,
vi,vi1?E
34BackgroundGraphs and Spanning Trees
- The distance between u, v ? V, d(u, v) is the
length of the shortest path between u and v - The diameter of a graph is the largest distance
between any two nodes - An undirected graph is connected if there is a
path between every pair
35BackgroundGraphs and Spanning Trees
- A cycle is a path ? v0,, vk? in which v0 vk
- A cycle is simple if the nodes v1 through vk are
all different - An undirected graph is acyclic if it contains no
simple cycle of length three or more
36BackgroundGraphs and Spanning Trees
b
a
A graph withsimple cycle ?a,b,a? An acyclic graph
b
a
An acyclic graph
c
37BackgroundGraphs and Spanning Trees
b
a
This graph isundirected and cyclic
c
e
d
38BackgroundGraphs and Spanning Trees
G
- G (V,E) is a subgraph of G if V?V and E?E
- G is a spanning subgraph if VV
39BackgroundGraphs and Spanning Trees
G
b
- G (V,E) is a subgraph of G if V?V and E?E
- G is a spanning subgraph if VV
a
c
e
d
40BackgroundSpanning Trees
- A tree is a graph that contains a minimal number
of edges connecting its nodes - Computations on trees have a low message
complexity - A tree is
- an undirected
- connected
- acyclic graph
- A spanning tree T of a graph G is a spanning
subgraph that is a tree
G is a spanning tree
b
a
c
e
d
41BackgroundTrees
- The following is equivalent for an undirected
graph G - G is a tree
- Between any two nodes there is a unique simple
path - G is connected and EN-1
- G is acyclic and EN-1
- G is acyclic but becomes cyclic if any edge is
added
G is a spanning tree
b
a
c
e
d
42BackgroundRooted Trees
- A tree T is rooted if there is unique node r
called the root - If u is a node on the path between v and r, u is
an ancestor of v, and v is a descendant of u - If u and v are neighbors then u is the father of
v, and v is a child of u - The depth of a tree is the maximal simple path
from r to any node
G is a spanning tree
b
a
c
e
d
43Broadcast and Convergecast on a Spanning Tree
- What is a spanning tree?
- Broadcast
- Convergecast
44Broadcast on a Spanning Tree
Pr
M
- A spanning tree of a network is given
- A distinguished processor pr wants to disseminate
a message ?M? to all processors - The tree is rooted at pr
- Each processor has a channel to its parent and a
set of channels to children
M
45Broadcast on a Spanning Tree
Pr
M
- pr sends ?M? on all channels leading to its
children and terminates - When a processor receives ?M? from its parent
channel, it send it on all its children channels
M
M
46Spanning Tree Broadcast Algorithm (Pseudo Code)
- Code for pr
- Upon receiving no message
- send ?M? to all children
- terminate
- Code for pi, 0?i?n-1, i ? r
- Upon receiving ?M? from parent
- send ?M? to all children
- terminate
47Spanning Tree Broadcast Algorithm (State
Transition Level)
- The state of each pi contains the variables
- parenti contains either a processor index or nil
- childreni contains a set of processor indices
- terminatedi a Boolean initially false
- Initially the values of parent and children
variables form a spanning tree rooted at pr,
outbuf and inbuf variables are empty
48Spanning Tree Broadcast Algorithm (State
Transition Level)
- The results of comp(pr) in the initial
configuration is that - ?M? is placed in outbufrj for each j in
childrenr - terminatedr is set to true
- The only thing that can happen after that is at
least one del(r,j, ?M?) where pj is a child or pr - comp(pi), s.t. i?r is similar to comp(pr)
49Broadcast on a Spanning TreeMessage complexity
pr
- ?M? is sent exactly once on each channel that is
an edge in the spanning tree rooted at pr - The number of messages is equal to the number of
edges in the spanning tree - Which is n-1
M
M
M
M
M
M
M
50Broadcast on a Spanning TreeTime Complexity
pr
M
M
- Think of the timed execution model where message
delay in 1 for all del(m,i,j), and comp(i) for
all pi, takes 0 time - That is we ignore comp(i) times
M
M
M
M
M
51Broadcast on a Spanning TreeTime Complexity
(time0)
pr
M
M
- At time 0, M is in outbufs of pr
52Broadcast on a Spanning TreeTime Complexity
(time1)
pr
- At time 1, M is delivered to all children
- The children perform a computation step and M is
now in the outbufs of the children
M
M
M
53Broadcast on a Spanning TreeTime Complexity
(time2)
pr
- At time 2, M is delivered to all children
- The children perform a computation step and M is
now in the outbufs of the children
M
M
54Broadcast on a Spanning TreeTime Complexity
pr
- In every admissible execution of the broadcast
algorithm in AS, every processor at distance t
from pr in the spanning tree receives ?M? by time
t
M
M
M
M
M
M
M
55Broadcast on a Spanning TreeTime Complexity
pr
M
M
- At time 1 processors at distance 1 from pr
receive and process ?M? - Assume at t-1 processors at distance t-1 receives
and processes ?M? - Since message delay is one, processors at time t
processors at distance t receives ?M?
M
M
M
M
M
56Broadcast on a Spanning TreeTime Complexity
pr
M
M
- The time complexity is d where d is the depth of
the spanning tree root at pr
M
M
M
M
M
57Convergecast on a Spanning Tree
pr
- Collecting information from the nodes of the tree
to the root - We consider an instance where is maximum of n
variables is forwarded to the root - xi is stored on pi
- The algorithm is initiated by the leaves
x2
p2
x1
p1
58Convergecast on a Spanning TreeAlgorithm
pr
- If a node pi is a leaf, it sends its value xi to
its parent - A non-leaf node pj with k children waits to
receive messages containing vj1,,vjk from its
children pj1,,pjk - Pj computes vjmax(xj,vj1,,vjk) and sends vj to
its parent
p4 x2,x4
Max(x3,x1)
p3
p2
p1
59Convergecast on a Spanning TreeAlgorithm
- There is an asynchronous convergecast algorithm
with message complexity n-1 and time complexity
d, when a rooted spanning tree with depth d is
known - Broadcast and convergecast can be combined, so
that the broadcast initiates a request to perform
a convergecast when a leaf receives the request
it starts the convergecast
60Next Lecture
- The synchronous model
- Spanning tree constructions and flooding
- Revisiting election algorithms
61Flooding and Building a Spanning Tree
62BackgroundCliques
- In cliques, or complete graphs, each pair of
nodes is directly connected by an edge - The following is equivalent for an undirected
graph G - G is a clique
- E u,v u,v?V and u?v
- E 1/2n(n-1)
- Each node has a degree n-1
Clique
b
a
c
e
d
63Flooding
- The problem
- Broadcast without preexisting spanning tree,
starting from a distinguished processor pr - In the asynchronous system
- In the synchronous system
64Flooding (Asynchronous)
- The algorithm (outline)
- pr sends the message ?M? to all its neighbors
- When a processor pi receives ?M? for the first
time from some neighbor pj, it sends ?M? to all
neighbors except pj
65Execution of the flooding algorithms (two steps)
pr
pr
M
M
M
M
M
66Execution of the flooding algorithms (steps 3 4)
pr
pr
M
M
M
M
M
M
M
M
M
67Flooding (Asynchronous)
- The algorithm induces a spanning tree rooted at
pr - The parent of pi is the processor from which pi
received its first message - If pi receives multiple messages before a
comp(i), parent is chosen arbitrarily among the
senders - The spanning tree is implicit
- Each processor knows the parent, but does not
know the children
68Spanning Tree Construction (informal algorithm
1/2)
- Pr sends ?M? to all its neighbors
- When pi receives ?M? for the first time from,
say, pj - pi denotes pj as its parent and sends a ?parent?
message to pj - pi sends ?M? to all neighbors except pj
- When pi receives ?M? later on from, say, any
processor pj - pi sends ?already? to pj (indicating it is in the
tree)
69Spanning Tree Construction (informal algorithm
2/2)
- After sending ?M? to all other neighbors pi waits
for either ?parent? or ?already? - ?parent? from pj pj is denoted as a pis child
- ?already? from pj pj is denoted as other
- When all recipients of pis ?M? responded
(?parent? or ?already?) pi terminates
70Spanning Tree Construction (informal algorithm)
- Pr sends ?M? to all its neighbors
- When pi receives ?M? for the first time from,
say, pj - pi denotes pj as its parent and sends a ?parent?
message to pj - pi sends ?M? to all neighbors except pj
- When pi receives ?M? later on from, say, any
processor pj - pi sends ?already? to pj (indicating it is in the
tree) - After sending ?M? to all other neighbors pi waits
for either ?parent? or ?already? - ?parent? from pj pj is denoted as a pis child
- ?already? from pj pj is denoted as other
- When all recipients of pis ?M? responded
(?parent? or ?already?) pi terminates
71Flooding to Construct Spanning Tree (Pseudo Code)
for Processor pi, 0in-1
- Initially parent ?, children ?, others ?
- Upon receiving no message
- if pi pr and parent ? then // root did
not send ?M? - send ?M? to all neighbors
- parenti pi
72Flooding to Construct Spanning Tree (Pseudo Code)
for Processor pi, 0in-1
- Initially parent ?, children ?, others ?
- Upon receiving ?M? from neighbor pj
- if parent ? then
- parent pj
- send ?parent? to pj
- send ?M? to all neighbors except pj
- else send ?already? to pj
73Flooding to Construct Spanning Tree (Pseudo Code)
for Processor pi, 0in-1
- Initially parent ?, children ?, others ?
- Upon receiving ?parent? from neighbor pj
- add pj to children
- if children ? others contains all neighbors
except parent then - terminate
- Upon receiving ?already? from neighbor pj
- add pj to others
- if children ? others contains all neighbors
except parent then - terminate
74Two Steps in the Construction of the Spanning Tree
pr
pr
M
parent
M
M
parent
M
M
M
parent
M
M
75Spanning Tree Construction (AS)
- In every admissible execution in the asynchronous
model, the algorithm constructs a spanning tree
of the network rooted at pr - Once a parent variable is set, it never changes
- The set of children of a processor never
decreases - If pj is a child of pi, then pi is pjs parent
- The resulting graph G is a directed spanning tree
rooted at pr
76Spanning Tree Construction (AS)
- There is an asynchronous algorithm to find a
spanning tree of a network (graph) of m edges and
a diameter D, given a distinguished node, with
message complexity O(m) and time complexity O(D)
77BackgroundTypes of Spanning Trees
r
- BFS (Breadth First Search) tree
- In a BFS spanning tree with a root r, any node v
reachable from r, the path from r to v is a
shortest path from r to v in the graph G
78BackgroundTypes of Spanning Trees
frond edges
r
- DFS (Depth First Search) tree
- A spanning tree is a DFS if each frond edge
connects a node and its descendant
79Spanning Tree Construction on the Synchronous Case
- The same algorithm
- But the spanning tree is constructed is
guaranteed to be BFS tree - In a SS a round is
- Delivery of all messages
- Followed by one computation step of all processors
80Spanning Tree Construction on the Synchronous
Case 1/2
pr
pr
M
M
P
P
M
M
M
M
M
round 1
round 2
81Spanning Tree Construction on the Synchronous
Case 2/2
pr
pr
P
P
P
M
M
M
M
M
M
M
P
P
M
M
round 3
round 4
82Constructing a Depth First spanning Tree for a
Specified Root
r
- DFS (depth-first search) tree
- Adding on node at a time
83Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, children ?, unexplored
all neighbors of pi// root wakes up - Upon receiving no message
- if pi pr and parent ? then
- parent pi
- explore()
84Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, children ?, unexplored
all neighbors of pi - procedure explore()
- if unexplored ? ? then
- let pk be a processor in unexplored
- remove pk from unexplored
- send ?M? to pk
- else
- if parent ? pi then send ?parent? to
parent - terminate
85Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, children ?, unexplored
all neighbors of pi Upon receiving ?M? from
neighbor pj - if parent ? then
- parent pj
- remove pj from unexplored
- explore()
- else
- send ?already? to pj
- remove pj from unexplored
86Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, children ?, unexplored
all neighbors of pi - Upon receiving ?parent? from neighbor pj
- add pj to children
- explore()
- Upon receiving ?already? from neighbor pj
- explore()
87Constructing a Depth First Spanning Tree for a
Specified Root
- Message complexity
- Number of edges is m
- Each processor sends ?M? at most once on each
adjacent edge - We get 2m messages
- Each processor sends at most either ?parent? or
?parent? on each adjacent edge - We get here too 2m messages
- Thus total is 4m messages
- Time complexity is O(m)
88Constructing DFS Spanning Tree without a
Specified Root
- We assume that nodes have unique identifiers
(natural numbers) - Each processor that wakes up attempts to build a
DFS tree with itself as root - If two DFS trees try to connect to the same node,
the node will join the DFS tree whose root has
the higher identifier
89Constructing DFS Spanning Tree without a
Specified Root
- Each node keeps the maximal identifier it has
seen so far in a variable leader - When a node wakes up, it sets leader to its own
identifier - When a node receives a DFS message with
identifier y - If y gt leader, the node changes leader to y, and
set parent to node from which the message is
received - If y leader, the node belongs to this spanning
tree - If y lt leader, no messages are sent
90Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, leader -1, children ?,
- unexplored all neighbors of pi//
wakes up spontaneously - Upon receiving no message
- if parent ? then
- leader id
- parent pi
- explore()
91Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, leader -1, children ?,
- unexplored all neighbors of pi
- procedure explore()
- if unexplored ? ? then
- let pk be a processor in unexplored
- remove pk from unexplored
- send ?leader, leader? to pk
- else
- if parent ? pi then send ?parent, leader?
to parent - else terminate as root of spanning tree
92Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, leader -1, children ?,
unexplored all neighbors of pi - Upon receiving ?leader, newId? from neighbor pj
- if leader lt newId then
- leader newId
- parent pj children ?
- unexplored all neighbors of pi
except pj - explore()
- elseif leader newId then
- send ?already, leader? to pj
- remove pj from unexplored
93Flooding to Construct DFS Spanning Tree (Pseudo
Code) for Processor pi, 0in-1
- Initially parent ?, leader -1, children ?,
unexplored all neighbors of pi - Upon receiving ?parent, newId? from neighbor pj
- if newId leader then
- add pj to children
- explore()
- Upon receiving ?already, newId? from neighbor pj
- if newId leader then explore()
94Next Lecture