Title: Interconnection Networks Contd.
1Interconnection Networks Contd.
- L.N. Bhuyan
- Partly from Berkeley Notes
2More Static Networks Linear Arrays and Rings
- Linear Array
- Diameter?
- Average Distance?
- Bisection bandwidth?
- Route A -gt B given by relative address R B-A
- Torus?
- Examples FDDI, SCI, FiberChannel Arbitrated
Loop, KSR1
3Multidimensional Meshes and Tori
3D Cube
2D Grid
- d-dimensional array
- n kd-1 X ...X kO nodes
- described by d-vector of coordinates (id-1, ...,
iO) - d-dimensional k-ary mesh N kd
- k dÖN
- described by d-vector of radix k coordinate
- d-dimensional k-ary torus (or k-ary d-cube)?
- Ex Intel Paragon (2D), SGI Origin (Hypercube),
Cray T3E (3DMesh)
4Hypercubes
- Also called binary n-cubes. of nodes N
2n. - O(logN) Hops
- Good bisection BW
- Complexity
- Out degree is n logN
- correct dimensions in order
- with random comm. 2 ports per processor
0-D
1-D
2-D
3-D
4-D
5-D !
5N 26 nodesS (sn-1 sn-2 si s2s1s0)D
(dn-1 dn-2 di d2d1d0)E-cube routing For
i0 to n-1 Compare si and di Route along i
dimension if they differ.Distance Hamming
distance between S and D the no. of dimensions
by which S and D differ.Diameter Maximum
distance n log2 N Dimension of the
hypercubeNo. of alternate parts nFault
tolerance (n-1) O(log2 N)
Routing in Hypercube
000gt001gt011gt111 000gt010gt110gt111 000gt100gt10
1gt111
6Origin Network
- Each router has six pairs of 1.56MB/s
unidirectional links - Two to nodes, four to other routers
- latency 41ns pin to pin across a router
- Flexible cables up to 3 ft long
- Four virtual channels request, reply, other
two for priority or I/O
7Case Study Cray T3D
- Build up info in shell
- Remote memory operations encoded in address
8Trees
- Diameter and ave distance logarithmic
- k-ary tree, height d logk N
- address specified d-vector of radix k coordinates
describing path down from root - Fixed degree
- Route up to common ancestor and down
- R B xor A
- let i be position of most significant 1 in R,
route up i1 levels - down in direction given by low i1 bits of B
- H-tree space is O(N) with O(ÖN) long wires
- Bisection BW?
9Real Machines
Machine Topology Cycle Time (ns) Channel Width (bits) Routing Delay (cycles) Flit (data bits)
nCUBE/2 Hypercube 25 1 40 32
TMC CM-5 Fat-Tree 25 4 10 4
IBM SP-2 Banyan 25 8 5 16
Intel Paragon 2D Mesh 11.5 16 2 16
Meiko CS-2 Fat-Tree 20 8 7 8
CRAY T3D 3D Torus 6.67 16 2 16
DASH Torus 30 16 2 16
J-Machine 3D Mesh 31 8 2 8
Monsoon Butterfly 20 16 2 16
SGI Origin Hypercube 2.5 20 16 160
Myricom Arbitrary 6.25 16 50 16
- Wide links, smaller routing delay
- Tremendous variation
10What is Dynamic Network
- Dynamic Network is the network that can connect
any input to any output by enabling or disabling
some switches in the network - Examples
- - Shared Bus The bus arbiter connects a
processor to a memory - - Crossbar Consists of a lot of switching
elements, which can be enabled to connect many
inputs to many outputs simultaneously - - Multistage Network Consists of several
stages of switches that are enabled to get
connections - - The nodes in static networks (like Mesh)
also consist of dynamic crossbars
11Dynamic Network Consists of Switches Switch
Components
- Output ports
- transmitter (typically drives clock and data)
- Input ports
- synchronizer aligns data signal with local clock
domain - essentially FIFO buffer
- Crossbar
- connects each input to any output
- degree limited by area or pinout
- Buffering
- Control logic
- complexity depends on routing logic and
scheduling algorithm - determine output port for each incoming packet
- arbitrate among inputs directed at same output
12Crossbar Switch Design
- Complexity O(N2) for an NXN Crossbar Why?
See next page
13How do you build a crossbar
From Control
N2 switches gt Cost O(N2) Time taken by the
arbiter O(N2)
Multiplexors are controlled from controller
14Crossbar Contd.
- An NXN Crossbar allows all N inputs to be
connected simultaneously to all N outputs - It allows all one-to-one mappings, called
permutations. No. of permutations N! - When two or more inputs request the same output,
only one of them is connected and others are
either dropped or buffered - When processors access memories through crossbar,
this situation is called memory access conflicts
15Multistage Interconnection Network
- A network consisting of multiple stages of
crossbar switches has the following properties. - NxN network for N2n
- Consists of log2N stages of 2x2 switches
- Has N/2 2x2 switches per stage
- Cost O(N log n) instead of O(N2) for Crossbar
- For N an, a MIN can be similarly designed with
axa switches
16Multistage interconnection networks
0
000
1
1
001
2
010
1
3
011
4
100
5
101
6
110
0
7
111
Omega Network and Self Routing
Note Complexity O(Nlog2N) Conflict, less BW than
Crossbar, but cost effective
17Example SP
- 8-port switch, 40 MB/s per link, 8-bit phit,
16-bit flit, single 40 MHz clock - packet sw, cut-through, no virtual channel,
source-based routing - variable packet lt 255 bytes, 31 byte fifo per
input, 7 bytes per output, 16 phit links - 128 8-byte chunks in central queue, LRU per
output - run in shadow mode
18Switching Techniques
- Circuit Switching A control message is sent from
source to destination and a path is reserved.
Communication starts. The path is released when
communication is complete. - Store-and-forward policy (Packet Switching) each
switch waits for the full packet to arrive in
switch before sending to the next switch (good
for WAN) - Cut-through routing or worm hole routing switch
examines the header, decides where to send the
message, and then starts forwarding it
immediately - In worm hole routing, when head of message is
blocked, message stays strung out over the
network, potentially blocking other messages
(needs only buffer the piece of the packet that
is sent between switches). CM-5 uses it, with
each switch buffer being 4 bits per port. - Cut through routing lets the tail continue when
head is blocked, storing the whole message into
an intermmediate switch. (Requires a buffer large
enough to hold the largest packet).
19(No Transcript)
20Store and Forward vs. Cut-Through
- Advantage
- Latency reduces from function ofnumber of
intermediate switches X by the size of the packet
to time for 1st part of the packet to
negotiate the switches the packet size
interconnect BW
21StoreForward vs Cut-Through Routing
- h(n/b D) vs n/b h D
- what if message is fragmented?
- wormhole vs virtual cut-through
22Example
- Q. Compare the efficiency of store-and-forward
(packet switching) vs. wormhole routing for
transmission of a 20 bytes packet between a
source and destination, which are d-nodes apart.
Each node takes 0.25 microsecond and link
transfer rate is 20 MB/sec. - Answer Time to transfer 20 bytes over a link
20/20 MB/sec 1 microsecond. - Packet switching nodes x (node delay
transfer time) d x (.25 1) 1.25 d
microseconds - Wormhole ( nodes x node delay) transfer time
- 0.25 d 1
- Book For d7, packet switching takes 8.75
microseconds vs. 2.75 microseconds for wormhole
routing