Title: Interconnection Network
1Interconnection Network
- PRAM Model is too simple
- Physically, PEs communicate through the network
- (either buses or switching networks)
- Cost depends on network topology
- Question
- Should user exploit the interconnection network
topology? - Does user have the freedom to exploit the
topology?
2Mesh with Wraparound
3- Example multiplying two n by n matrices on a
mesh - Initially,
- PEi,j has xi,j ai,j and yi,j bi,j
- Row i shift left x data (i-1) times
- Col j shift up y data j-1 times
- At each step,
- PEi,j do
- ci,j c xy.
- Send x to left (Wrap around), send y to
up(wrap) -
- How about transitive closure?
4Matrix Multiplication
Step 1
a11
a12
a13
a14
b11
b12
b13
b14
a21
a24
a22
a23
b24
b21
b22
b23
a31
a33
a34
a32
b31
b32
b33
b34
a41
a42
a43
a44
b43
b44
b41
b42
5Step 2 Rearrange Data
6Step 3 Multiply Add and Move Data
Data Move at Cell ik
S
bjk
aij
c21 a22b21 a21b11 a24b41 a23b33
7b34 b24 b14
Systolic Array Algorithm
b43 b33 b23 b13
b42 b32 b22 b12
b41 b31 b21 b11
a14 a13 a12 a11
a24 a23 a22 a21
a34 a33 a32 a31
a43 a42 a41
8How to simulate wraparound mesh using regular
mesh without losing speed more than a constant
factor?
9Tree Architecture
Application Census functions, Data Base, Queue,
Stack
10Tree Computation
- Census function a1 ... an
- Applications
- Can you compute
- si a1 a2 ... ai, for i1 ... n?
- parallel prefix computation
- Bottleneck Every data goes to root
- How to solve Make channel to thick as it goes to
the top of the tree gt fat tree
11Example Parallel Prefix Computation
Step 1 Upward phase For each node,
when it receive data from left and right, then
sum left right
if node is not the root, send sum to its parent
when the root receives data from left
and right children
send 0 to its left child
send left to its right child
12Step 2 Downward phase When a
nonleaf receives sum from its parent
send sum to its left child
send left sum
to its right child When a leaf
node receives sum from its parent
then prefix sum data
1 3 6
10 15 21
28 36
13- Disadvantages of Trees
- Small bisection width
- Root can be the bottle neck
14Properties of Interconnection Networks
- Small Diameter diam max(u,v in V) (u,v)
- Large Bisection Width
- Smallest number of edges whose removal divides G
into two equal size - Fixed node degree
- Uniformity (symmetric)
- Graph looks the same independent from which
vertex you look - Incremental extendability Allow any size
- Scalable (graph) construct larger one easily.
- i.e., smaller one can be obtained from the larger
one by removing some nodes and edges. - hypercube, mesh
- shuffle exchange netwok, DeBruins graph
- Routing and collective communication
- one to all, all to all
- Embeddability
- Simple layout complexity (gt small bisection
width conflict) - Fault tolerance
15Fat Tree
CM5 Data Link
16Hypercube
- One way of solving the bottleneck of tree and
large diameter of mesh - Recursively defined as follows
- Hn
Hn-1
Hn-1
17Hypercube Interconnection
Large Bisection width Small Radius High Fault
Tolerant But node degree too high
18Mapping Mesh onto a hypercube
- Ai,j on mesh -gt A(gray(i)gray(j)) on Hypercube
- Ai,j1 on mesh -gt A(gray(i)gray(j1)) on
Hypercube - connected to A(gray(i)gray(j))
19Mapping a binary tree on a hypercube
20Hypercube Data Move Example
- Reversing a list
- Before PEi has Ai
- After PEi has An-i-1, 0lt i lt n-1
- Reverse (H)
- Swap (A) for the highest bit
- Reverse two Hk-1 in parallel
- Matrix Transpose
- Ai,j -gt Aj,i
21Shuffle Exchange Network
000 001 010 011 100 101 110 111
000 001 010 011 100 101 110 111
22Mesh of Trees
2D with 16 nodes
23Cube Connected Cycles
e1
e2
e1
e2
e3
e5
e3
e5
e4
e4
Hupercube node with dimension 4
CCC node
2n nodes
r2n nodes
r logn
24Cube Connecyed Cycles
1111
0000
25CCC
- Large bisection width
- Scalable
- Small diameter
- Can simulate Hypercube
26Simulation of Hypercube using CCC
- Divide and Conquer Algorithm communication
pattern - Ascend d1, d2, d4, d8, ..., dn/2
- Descend dn/2, ...., d4, d2, ..., d1
- example
- merging
- Sorting
- FFT
- For this type of data movement, CCC can simulate
hypercube data move without any penalty
27De Bruins Graph
(xn-1,xn-2,...,x0) -gt (xn-2,...,x0,0) and
-gt (xn-2,...,x0,1) Highly
recursive Linear Shift Register Lock Combination
28Multistage Interconnection Network
- Blocking Networks
- Unidirectional MIN
- Bidirectional MIN
- Non Blocking Networks
- Any input port can be connected to any free
output port without affecting the existing
connections. - 2D mesh Crossbar
- Time Division bus
- Clos network.