Title: Broadcast and scatter algorithms on meshbased topologies
1Broadcast and scatter algorithms on mesh-based
topologies
2Lecture outline
- Model definition and lower bounds
- One-to-All Broadcast algorithm on hypercube
- All-to-All Broadcast algorithm on hypercube
- One-to-All Scatter algorithm on hypercube
- All-to-All Scatter algorithm on hypercube
- Broadcast algorithms on torus
3Model definition
- Ports number (1-port, k-port, all-port)
- Directedness of channels (simplex,
- half-duplex,
full-duplex) - Switching technique (store-and-forward (SF)
- or
wormhole (WH) ) - Packet manipulation capability (combining model ,
noncombining model, intermediate -
reception capability )
4Lower bounds for basic communication algorithms
on hypercube
All-port, full-duplex, SF model
5Lower bounds for basic communication algorithms
on hypercube
- O-A-B cannot take less rounds than is the value
of the diameter. Since the broadcast packet has
to be delivered to distinct nodes, the
number of transmissions is at least . - During A-A-B, each node has to receive
distinct packets. Node can receive at most d
packets in one round (on all its d links). Thus,
at least rounds are needed, and
the number of transmissions is at least
times the number of transmissions of one O-A-B -
. - During O-A-S, the source has to send the total of
distinct packets and it can send at
most d packets in one round. There is
nodes of the distance k from the source. The
number of transmissions cannot be less than
. - A-A-S consists of O-A-Ss running
concurrently. So, the number of transmissions is
. Every round at most
packets can be delivered on hypercube.
Thus it will take at least
rounds.
6Basic communication algorithms on hypercube
- The idea building minimal-weight spanning tree,
rooted at broadcasting/scattering node. - Generally, building minimal-weight spanning tree
on hypercube is easy task. - In scatter algorithms the tree must be balanced.
7One-to-All Broadcast algorithm on hypercube
8One-to-All Broadcast algorithm on hypercube
- The idea At every round k communication only on
k-dimension links. - The Algorithm (for any node i)
- toSend true Msg contains value - for
broadcasting node, - toSend false Msg is null -
for all other nodes - At round k do
- If toSend true then send Msg on link k
- Else if received message M then
- Msg ? M
- toSend ? true
- End of round
9One-to-All Broadcast algorithm on hypercube
k0
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
10One-to-All Broadcast algorithm on hypercube
k1
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
11One-to-All Broadcast algorithm on hypercube
k2
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
12One-to-All Broadcast algorithm on hypercube
k3
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
13One-to-All Broadcast algorithm on hypercube
k4
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
14One-to-All Broadcast algorithm on hypercube
k4 , End of round
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
15One-to-All Broadcast algorithm on hypercube
- The kind of such built tree is called, d-level
spanning binomial tree - Complexity analysis
- of rounds d
- of transmissions
- The algorithm is optimal!
- Note The algorithm didn't assume all-port,
full-duplex model. So, the algorithm on
1-port, half-duplex model satisfies lower
bound of all-port, full-duplex model.
16All-to-All Broadcast algorithm on hypercube
17All-to-All Broadcast algorithm on hypercube
- The start Every node builds its spanning tree.
Denote by spanning tree for node i (
root of is i). The tree consists of
sequence of disjoint sets of directed links
, where - q number of rounds to broadcast packet on
- is set of links on which
occurs transmission of a packet at round k. - Actually, the sequence defines
algorithm for broadcasting packet from node i .
18All-to-All Broadcast algorithm on hypercube
k0
d3
(001)
(000)
(100)
(101)
(000)
(011)
(010)
(111)
(110)
19All-to-All Broadcast algorithm on hypercube
k1
d3
(001)
(000)
(001)
(100)
(101)
(000)
(011)
(010)
(100)
(111)
(110)
20All-to-All Broadcast algorithm on hypercube
k2
d3
(001)
(000)
(101)
(001)
(100)
(101)
(000)
(010)
(011)
(010)
(100)
(110)
(111)
(110)
21All-to-All Broadcast algorithm on hypercube
k3
d3
(001)
(000)
(111)
(101)
(001)
(100)
(101)
(000)
(010)
(011)
(011)
(010)
(100)
(110)
(111)
(110)
22All-to-All Broadcast algorithm on hypercube
d3
k3, End of round
(001)
(000)
(111)
(101)
(001)
(100)
(101)
(000)
(010)
(011)
(011)
(010)
(100)
(110)
(111)
(110)
23All-to-All Broadcast algorithm on hypercube
- Denote by T spanning tree with root in
and
accordingly. - Lemma For every node t, the sequence of sets
-
defines algorithm for broadcasting packet
from node t (proof based on fact that
and differ in a particular bit if and
only if x and y differ on the same bit, so
is a link if and only if (x,y) is a
link).
24All-to-All Broadcast algorithm on hypercube
(111)
(101)
(001)
(000)
(010)
(011)
(100)
(110)
(101)
(111)
(011)
(010)
(000)
(001)
(110)
(100)
25All-to-All Broadcast algorithm on hypercube
- In order to broadcast simultaneously, for every
two trees next property should
hold - ()
26All-to-All Broadcast algorithm on hypercube
- Now we show how to construct such
that will hold property (). - Definition A link(x,y) is of type j if x and y
differ in the j-th bit. - Observation If for each k, the links in
are of different types, then for each k, the sets
, where t ranges over all nodes, are
disjoint. (if for , links
and
were from the same type, then
(since ), and they would be of
the same type because and
are of the same type as
and
respectively ? contradicts the fact that
and both belongs to ) - Conclusion We need to construct such
that for every i, the links in are off
different types.
27All-to-All Broadcast algorithm on hypercube
- Step I Dividing
- Define as a set of d-bit length
nodes having k unity bits and d-k zero bits. - For each set define disjoint subsets
which are equivalence classes under
single bit rotation to the left. - For each k, is the equivalence class of
the element .
28All-to-All Broadcast algorithm on hypercube
Division of space 0,..,31
29All-to-All Broadcast algorithm on hypercube
- Step II Numbering
- For each node t associate distinct number n(t)
- in next order
-
- Define sequence of numbers
. - Thus, m(t) that corresponding to sequence n(t)
is - d,1,2,..,d,1,2
30All-to-All Broadcast algorithm on hypercube
Numbering of space 0,..,31
31All-to-All Broadcast algorithm on hypercube
- Step III Ordering
- Every first element t in must fulfill
the bit in position m(t) from the
right should be 1 - Every element t in must fulfill
- the bit in position m(t)-1 m(t)gt1 or d
m(t)1 from the right should be 0
32All-to-All Broadcast algorithm on hypercube
- Step IV Constructing
- For ,define node sets
-
- For each set consists of the
links, that connect nodes with
corresponding nodes in
obtained from t by reversing the bit in position
m(t). - (In particular, nodes in set connected
to corresponding nodes in because of
ordering)
33All-to-All Broadcast algorithm on hypercube
- Step V Building routing table (for node k)
- In routing table rows rounds, columns
links. - Every entry contains only source node.
- Every row i constructed from set by
creating entry x at column for
every link (x,y). - In first row the source is k itself and the
destination nodes are k-th neighbors.
34All-to-All Broadcast algorithm on hypercube
m(0001)1
m(1001)4
m(1101)3
m(0010)2
m(0011)1
m(1011)4
m(1111)3
(0000)
m(0100)3
m(0110)2
m(0101)1
m(0111)1
m(1000)4
m(1100)3
m(1010)2
m(1110)2
35All-to-All Broadcast algorithm on hypercube
Routing table
36All-to-All Broadcast algorithm on hypercube
k0
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
37All-to-All Broadcast algorithm on hypercube
k1
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
38All-to-All Broadcast algorithm on hypercube
k2
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
39All-to-All Broadcast algorithm on hypercube
k3
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
40All-to-All Broadcast algorithm on hypercube
k4
d4
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
41All-to-All Broadcast algorithm on hypercube
d4
k4, End of round
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
42All-to-All Broadcast algorithm on hypercube
Algorithm (for any node i)
- At the initialization
- Build routing table as shown before
- Put d packets into queue (packet header include
source and destination nodes). - At round k do
- Send all messages in queue that should be sent
according to the routing table. - Receive messages and put them into queue
- End of round
43All-to-All Broadcast algorithm on hypercube
- Complexity analysis
- Every one of the sets has exactly
d elements. Also .Therefore - of rounds
- of transmissions
- The algorithm is optimal!
- Note We didn't prove correctness of
building - contains links only of different types
(for every k) - every link holds
(for every k)
actually, we have to prove only that
44One-to-All Scatter algorithm on hypercube
45One-to-All Scatter algorithm on hypercube
- Observation I A-A-B and O-A-S problems are
similar, but here we need to build for scattering
node balanced spanning tree On first level d
sub-trees, with size at most
. - Observation II For any tree T with root t, of
size n, scatter problem via single link to root,
resolved in n rounds (by giving priority to
furthest nodes).
46One-to-All Scatter algorithm on hypercube
- Definition class is compatible with
class if has d nodes and
there exist two nodes
. ,
such that is obtained from . by reversing
some unity bit of to zero bit. - Observation III For every class
there exists a compatible class
. - (Take any node whose rightmost bit is
a 1, and leftmost bit is a 0. Let s be string
of consecutive zeros with maximal number of bits
and let be the node obtained from by
reversing the unity bit immediately to the right
of s. Then, the equivalence class of is
compatible with )
47One-to-All Scatter algorithm on hypercube
- Algorithm for scattering node (0,..,0)
- Steps I-II as in A-A-Broadcast algorithm.
- Step IV Constructing Spanning Tree
- Start with empty tree T (only nodes).
- Add to T links, connecting (0,..,0) with all the
nodes in . The first node in
is chosen arbitrarily. - For each class (in ascending order) find
its compatible class and add
corresponding links to the tree T. - Add to T links, connecting (1,..,1) with all the
nodes in .
48One-to-All Scatter algorithm on hypercube
- Observation IV For any we have
where and
can be obtained from by reversing some unity
of to 0. - Conclusion Each node x (except (0,..,0)) is in
the sub-tree . Since there are at
most nodes x having the
same value m(x), each sub-tree contains at
most nodes. Therefore
tree T is balanced
49One-to-All Scatter algorithm on hypercube
Algorithm (for scattering node i)
- Node i
- At the initialization
- Build spanning tree.
- Distribute tree structure (optional).
- At round k do
- On every link l send message to node at
distance q-k on the sub-tree . (q
) - End of round
50One-to-All Scatter algorithm on hypercube
Algorithm (for scattering node i)
- Any other node j
- At the initialization
- Receive and forward tree structure (optional).
- At any round do
- On receiving message store
(if message target j), or forward
according to spanning tree structure. - End of round
51One-to-All Scatter algorithm on hypercube
(1011)
(0001)
(0101)
(0011)
m(t)1
(1010)
(0010)
(0110)
(0111)
m(t)2
(0000)
(0100)
(1100)
(1110)
(1111)
m(t)3
(1000)
(1001)
(1101)
m(t)4
52One-to-All Scatter algorithm on hypercube
(0000)
(1000)
(1100)
(0100)
(0101)
(1001)
(1101)
(0001)
(0110)
(1110)
(1010)
(0010)
(0011)
(1011)
(0111)
(1111)
53One-to-All Scatter algorithm on hypercube
- Complexity analysis
- There are nodes at distance k from the
root. - of rounds
- of transmissions
- Time transmission overhead for spanning tree
structure distribution. - The algorithm is optimal!
54All-to-All Scatter algorithm on hypercube
55All-to-All Scatter algorithm on hypercube
- The idea Divide-and-Conquer
- Recursive construction. 3 Phases.
- Decompose d-cube into two (d-1)-cubes and
. - Assume (without loss of generality) that nodes
and nodes
- I. A-A-S recursive algorithm on and
simultaneously. - II. Each node in transmits to it
counterpart node in all of the
packets that are destined to the nodes in
, giving priority to the furthest
nodes. - III. A-A-S recursive algorithm on and
simultaneously.
56All-to-All Scatter algorithm on hypercube
d-cube
57All-to-All Scatter algorithm on hypercube
Phase I
58All-to-All Scatter algorithm on hypercube
Phase II
59All-to-All Scatter algorithm on hypercube
Phase III
60All-to-All Scatter algorithm on hypercube
- of rounds
-
- of transmissions
- The algorithm is not optimal!
61All-to-All Scatter algorithm on hypercube
- Solution performing Phase II simultaneously with
phases I and III. - During Phases I and III links that algorithm
uses in Phase II are not used. - Phases I and III take time every one.
- Conclusion first half of Phase II can be
performed simultaneously with Phases I, second
half of Phase II performed simultaneously with
Phases III. -
62All-to-All Scatter algorithm on hypercube
- Rule to transmit packet at Phase II
- Denote by the time unit during
which node i transmits its own packet that is
destined to node j, in this algorithm on d-cube. - Each node transmits its packets to node
in next order the
packet destined for
is transmitted before the packet
destined to if
. - It is also an order, in which the latter node
forwards them in Phases III . - Each node transmits its packet
destined for node
last.
63All-to-All Scatter algorithm on hypercube
- Lemma Under the above rules Phases III proceed
uninterrupted. - Proof Denote by the number of its
own packets that node i has transmitted up to
time n (including n). Particulary, it is also
number of packets of node that must be
transmitted by node during
the first n time units of Phases III . - We observe that is the
number of received by node
packets, from node after first n time
units of Phases III. - Now, by induction on d, we prove that for every
d, there exists total exchange algorithm for the
d-cube satisfies
and
64All-to-All Scatter algorithm on hypercube
- Result
- of rounds
- of transmissions (didn't changed)
- The algorithm is now optimal!
65All-to-All Scatter algorithm on hypercube
- Problem Divide-and-Conquer solution now is not
applicable. - Solution Implementation of iterative algorithm
(for any node i)
66All-to-All Scatter algorithm on hypercube
- Definitions
- Write node as binary number .
- Denote by , node .
- Denote by reverse of bit .
- Link between i and called k-th link of
i.
67All-to-All Scatter algorithm on hypercube
- I. Sending its own packets
- During rounds , node i transmits all
its packets for nodes
(where
- any bits) through its k-th link. - The order of sending packets on k-th link is
according to distance from node i. - Packet for node is sent last.
- The exact order derived from a table, that can be
built iteratively (by combining previous
columns).
68All-to-All Scatter algorithm on hypercube
d4
For node i (0000)
69All-to-All Scatter algorithm on hypercube
k0
d3
(001)
(000)
(011)
(010)
(101)
(100)
(111)
(110)
70All-to-All Scatter algorithm on hypercube
k1
d3
(001)
(000)
m(001)
m(011)
(011)
(010)
m(101)
(101)
(100)
(111)
(110)
71All-to-All Scatter algorithm on hypercube
k2
d3
(001)
(000)
m(010)
(011)
(010)
m(011)
m(111)
(101)
(100)
m(101)
(111)
(110)
72All-to-All Scatter algorithm on hypercube
k3
d3
(001)
(000)
(011)
(010)
m(110)
(101)
(100)
m(111)
(111)
(110)
73All-to-All Scatter algorithm on hypercube
k4
d3
(001)
(000)
(011)
(010)
m(100)
(101)
(100)
m(110)
(111)
m(111)
(110)
74All-to-All Scatter algorithm on hypercube
k4,End of Round
d3
(001)
(000)
(011)
(010)
(101)
(100)
(111)
(110)
75All-to-All Scatter algorithm on hypercube
- II. Forwarding arriving packets
- A packet, destined for node
is placed in queue which contains
packets to be transmitted by i through the
-th link, where - Packets from different source nodes
that placed in same queue, are ordered according
to the lexicographic order between and
. - At k-th link forwarding packets start at round
76All-to-All Scatter algorithm on hypercube
- Notes
- Any node i can build such table by calculating
all entries XOR i. Thus any packet route can be
calculated locally. - The presented rules for transmitting packets
fulfill the properties, demanded earlier in
recursive building.
77Broadcast algorithms on torus
78Broadcast algorithms on torus
- Model All-port, full-duplex, WH
- Lower bound
- Intermediate research results
- Recursive tiling approach
- Dilated-diagonal-based approach
- Both, applicable only on two-dimensional
tori.
79Broadcast algorithms on torus
- Assume torus
- Look at the torus as on mosaic of constant
patterns
80Recursive tiling approach
81Recursive tiling approach
82Recursive tiling approach
83Recursive tiling approach
84Recursive tiling approach
5 B-Patterns
k1
85Recursive tiling approach
A-Pattern
k1
86Recursive tiling approach 5 A-Patterns
87Recursive tiling approach B-Pattern
88Recursive tiling approach 5 B-Patterns
k2
89Recursive tiling approach A-Pattern
k2
90Recursive tiling approach
- Observation Torus for any k can
be represented as a single A-Pattern. (every
A-Pattern consists of 5 B-Patterns). - Define for nodes that transmit at
any odd round i, for nodes that
transmit at any even round i. (the broadcaster
has A-sons(B-sons) for every odd(even) i. His
A-sons has B-sons for every even i, and A-sons
for i3,5,k(or k-1). And so on)
91Recursive tiling approach
- The Algorithm (for any node t )
- toSend true Msg contains value - for
broadcasting node, - toSend false Msg is null -
for all other nodes - At round i ( ) do
- If toSend true and i is odd then
- send Msg to all (5)
- Else If toSend true and i is even then
- send Msg to all (5)
- Else if received message M then
- Msg ? M
- toSend ? true
- End of round
92Recursive tiling approach
- Complexity analysis
- of rounds 2k, (reminder
) - of transmissions every node receives message
only once, therefore it is - The algorithm is optimal!
- Note The algorithm is applicable only to tori
93Dilated-diagonal-based approach
- Assume torus.
- The algorithm consists out of 3 Phases.
- Phase I Broadcast the packet into all rows so
that each row contains exactly one packet. (by
recursive splitting the torus into 5 horizontal
strips, sending the packet using XY routing). - Phase I takes rounds
94Dilated-diagonal-based approach
95Dilated-diagonal-based approach
96Dilated-diagonal-based approach
97Dilated-diagonal-based approach
- Phase II Align the packets to the main diagonal
in all rows in parallel. - Phase II takes single round.
98Dilated-diagonal-based approach
99Dilated-diagonal-based approach
- Phase III Decompose the torus into 5 diagonal
bands (of width ). - The main diagonal is in the middle diagonal of
the first one. - Every node on middle diagonal sends 4 packets to
the counterpart nodes on the middle diagonals of
other 4 bands. - All packets are send on link-disjoint paths.
- Recursively divide each band into 5 bands and
continue with the algorithm. - Phase III takes rounds.
100Dilated-diagonal-based approach
101Dilated-diagonal-based approach
102Dilated-diagonal-based approach
103Dilated-diagonal-based approach
- Complexity analysis
- of rounds 2k1
- of transmissions every node receives message
only once, therefore it is - The algorithm is almost optimal!
- Note The algorithm can be generalized and
applied to arbitrary 2-D torus, in which case it
requires at most 5 rounds more than the lower
bound.
104Acknowledgements
- D. P. BERTSEKAS, C. OZVEREN, G. D. STAMOULIS, P.
TSENG, AND J.N. TSITSIKLISt, Optimal
Communication Algorithms for Hypercubes - PAVEL TVIRDIK, Topics in Parallel Computing, CS
838, UNIVERSITY OF WISCONSIN-MADISON
105The End