Interconnection Networks Contd. - PowerPoint PPT Presentation

About This Presentation

Title:

Interconnection Networks Contd.

Description:

Route A - B given by relative address R = B-A. Torus? ... DASH. 16. 2. 16. 6.67. 3D Torus. CRAY T3D. 8. 7. 8. 20. Fat-Tree. Meiko CS-2. 16. 2. 16. 11.5. 2D Mesh ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 23

Provided by: david3085

Learn more at: http://www.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: Interconnection Networks Contd.

1
Interconnection Networks Contd.

L.N. Bhuyan
Partly from Berkeley Notes

2
More Static Networks Linear Arrays and Rings

Linear Array
Diameter?
Average Distance?
Bisection bandwidth?
Route A -gt B given by relative address R B-A
Torus?
Examples FDDI, SCI, FiberChannel Arbitrated
Loop, KSR1

3
Multidimensional Meshes and Tori
3D Cube
2D Grid

d-dimensional array
n kd-1 X ...X kO nodes
described by d-vector of coordinates (id-1, ...,
iO)
d-dimensional k-ary mesh N kd
k dÖN
described by d-vector of radix k coordinate
d-dimensional k-ary torus (or k-ary d-cube)?
Ex Intel Paragon (2D), SGI Origin (Hypercube),
Cray T3E (3DMesh)

4
Hypercubes

Also called binary n-cubes. of nodes N
2n.
O(logN) Hops
Good bisection BW
Complexity
Out degree is n logN
correct dimensions in order
with random comm. 2 ports per processor

0-D
1-D
2-D
3-D
4-D
5-D !
5
N 26 nodesS (sn-1 sn-2 si s2s1s0)D
(dn-1 dn-2 di d2d1d0)E-cube routing For
i0 to n-1 Compare si and di Route along i
dimension if they differ.Distance Hamming
distance between S and D the no. of dimensions
by which S and D differ.Diameter Maximum
distance n log2 N Dimension of the
hypercubeNo. of alternate parts nFault
tolerance (n-1) O(log2 N)
Routing in Hypercube
000gt001gt011gt111 000gt010gt110gt111 000gt100gt10
1gt111
6
Origin Network

Each router has six pairs of 1.56MB/s
unidirectional links
Two to nodes, four to other routers
latency 41ns pin to pin across a router
Flexible cables up to 3 ft long
Four virtual channels request, reply, other
two for priority or I/O

7
Case Study Cray T3D

Build up info in shell
Remote memory operations encoded in address

8
Trees

Diameter and ave distance logarithmic
k-ary tree, height d logk N
address specified d-vector of radix k coordinates
describing path down from root
Fixed degree
Route up to common ancestor and down
R B xor A
let i be position of most significant 1 in R,
route up i1 levels
down in direction given by low i1 bits of B
H-tree space is O(N) with O(ÖN) long wires
Bisection BW?

9
Real Machines
Machine Topology Cycle Time (ns) Channel Width (bits) Routing Delay (cycles) Flit (data bits)
nCUBE/2 Hypercube 25 1 40 32
TMC CM-5 Fat-Tree 25 4 10 4
IBM SP-2 Banyan 25 8 5 16
Intel Paragon 2D Mesh 11.5 16 2 16
Meiko CS-2 Fat-Tree 20 8 7 8
CRAY T3D 3D Torus 6.67 16 2 16
DASH Torus 30 16 2 16
J-Machine 3D Mesh 31 8 2 8
Monsoon Butterfly 20 16 2 16
SGI Origin Hypercube 2.5 20 16 160
Myricom Arbitrary 6.25 16 50 16

Wide links, smaller routing delay
Tremendous variation

10
What is Dynamic Network

Dynamic Network is the network that can connect
any input to any output by enabling or disabling
some switches in the network
Examples
- Shared Bus The bus arbiter connects a
processor to a memory
- Crossbar Consists of a lot of switching
elements, which can be enabled to connect many
inputs to many outputs simultaneously
- Multistage Network Consists of several
stages of switches that are enabled to get
connections
- The nodes in static networks (like Mesh)
also consist of dynamic crossbars

11
Dynamic Network Consists of Switches Switch
Components

Output ports
transmitter (typically drives clock and data)
Input ports
synchronizer aligns data signal with local clock
domain
essentially FIFO buffer
Crossbar
connects each input to any output
degree limited by area or pinout
Buffering
Control logic
complexity depends on routing logic and
scheduling algorithm
determine output port for each incoming packet
arbitrate among inputs directed at same output

12
Crossbar Switch Design

Complexity O(N2) for an NXN Crossbar Why?
See next page

13
How do you build a crossbar
From Control
N2 switches gt Cost O(N2) Time taken by the
arbiter O(N2)
Multiplexors are controlled from controller
14
Crossbar Contd.

An NXN Crossbar allows all N inputs to be
connected simultaneously to all N outputs
It allows all one-to-one mappings, called
permutations. No. of permutations N!
When two or more inputs request the same output,
only one of them is connected and others are
either dropped or buffered
When processors access memories through crossbar,
this situation is called memory access conflicts

15
Multistage Interconnection Network

A network consisting of multiple stages of
crossbar switches has the following properties.
NxN network for N2n
Consists of log2N stages of 2x2 switches
Has N/2 2x2 switches per stage
Cost O(N log n) instead of O(N2) for Crossbar
For N an, a MIN can be similarly designed with
axa switches

16
Multistage interconnection networks
0
000
1
1
001
2
010
1
3
011
4
100
5
101
6
110
0
7
111
Omega Network and Self Routing
Note Complexity O(Nlog2N) Conflict, less BW than
Crossbar, but cost effective
17
Example SP

8-port switch, 40 MB/s per link, 8-bit phit,
16-bit flit, single 40 MHz clock
packet sw, cut-through, no virtual channel,
source-based routing
variable packet lt 255 bytes, 31 byte fifo per
input, 7 bytes per output, 16 phit links
128 8-byte chunks in central queue, LRU per
output
run in shadow mode

18
Switching Techniques

Circuit Switching A control message is sent from
source to destination and a path is reserved.
Communication starts. The path is released when
communication is complete.
Store-and-forward policy (Packet Switching) each
switch waits for the full packet to arrive in
switch before sending to the next switch (good
for WAN)
Cut-through routing or worm hole routing switch
examines the header, decides where to send the
message, and then starts forwarding it
immediately
In worm hole routing, when head of message is
blocked, message stays strung out over the
network, potentially blocking other messages
(needs only buffer the piece of the packet that
is sent between switches). CM-5 uses it, with
each switch buffer being 4 bits per port.
Cut through routing lets the tail continue when
head is blocked, storing the whole message into
an intermmediate switch. (Requires a buffer large
enough to hold the largest packet).

19
(No Transcript)
20
Store and Forward vs. Cut-Through

Advantage
Latency reduces from function ofnumber of
intermediate switches X by the size of the packet
to time for 1st part of the packet to
negotiate the switches the packet size
interconnect BW

21
StoreForward vs Cut-Through Routing

h(n/b D) vs n/b h D
what if message is fragmented?
wormhole vs virtual cut-through

22
Example

Q. Compare the efficiency of store-and-forward
(packet switching) vs. wormhole routing for
transmission of a 20 bytes packet between a
source and destination, which are d-nodes apart.
Each node takes 0.25 microsecond and link
transfer rate is 20 MB/sec.
Answer Time to transfer 20 bytes over a link
20/20 MB/sec 1 microsecond.
Packet switching nodes x (node delay
transfer time) d x (.25 1) 1.25 d
microseconds
Wormhole ( nodes x node delay) transfer time
0.25 d 1
Book For d7, packet switching takes 8.75
microseconds vs. 2.75 microseconds for wormhole
routing