CS575 Parallel Processing - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

CS575 Parallel Processing

Description:

Only one switch on per (row,column) pair. Non blocking: Pi to Mj does not block Pl to Mk ... Put 0 in front of first batch, 1 in front of second ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 22
Provided by: csColo
Category:

less

Transcript and Presenter's Notes

Title: CS575 Parallel Processing


1
CS575 Parallel Processing
  • Lecture three Interconnection Networks
  • Wim Bohm, CSU

Except as otherwise noted, the content of this
presentation is licensed under the Creative
Commons Attribution 2.5 license.
2
Interconnection networks
  • Connect
  • Processors, memories, I/O devices
  • Dynamic interconnection networks
  • Connect any to any using switches or busses
  • Two types of switches
  • On / off 1 input, 1 output
  • Pass through / cross over 2 inputs, 2 outputs
  • Static interconnection networks
  • Connect point to point using wires

3
Dynamic Interconnection NetworkCrossbar
  • Connects e.g. p processors to b memories
  • p b matrix
  • p horizontal lines, b vertical lines
  • Cross points on/off switches
  • Only one switch on per (row,column) pair
  • Non blocking Pi to Mj does not block Pl to Mk
  • Very costly, does not scale well
  • p b switches, complex timing and checking

4
Dynamic Interconnection NetworkBus
  • Connects processors, memories, I/O devices
  • Master can issue a request to get the bus
  • Slave can respond to a request, one bus is
    granted
  • If there are multiple masters, we need an arbiter
  • Sequential
  • Only one communication at the time
  • Bottleneck
  • But simple and cheap

5
Crossbar vs bus
  • Crossbar
  • Scalable in performance
  • Not scalable in hardware complexity
  • Bus
  • Not scalable in performance
  • Scalable in hardware complexity
  • Compromise multistage network

6
Multi-stage network
  • Connects n components to each other
  • Usually built from O(n.log(n)) 2x2 switches
  • Cheaper than cross bar
  • Faster than bus
  • Many topologies
  • e.g. Omega (book fig 2.12), Butterfly, ...

7
Static Interconnection Networks
  • Fixed wires (channels) between devices
  • Many topologies
  • Completely connected
  • (n(n-1))/2 channels
  • Static counterpart of crossbar
  • Star
  • One central PE for message passing
  • Static counterpart of bus
  • Multistage network with PE at each switch

8
More topologies
  • Necklace or ring
  • Mesh / Torus
  • 2D, 3D
  • Trees
  • Fat tree
  • Hypercube
  • 2n nodes in nD hypercube
  • n links per node in nD hypercube
  • Addressing 1 bit per dimension

9
Hypercube
  • Two connected nodes differ in one bit
  • nD hypercube can be divided in
  • 2 (n-1) D cubes in n ways
  • 4 (n-2) D cubes
  • 8 (n-3) D cubes
  • To get from node s to node t
  • Follow the path determined by the differing bits
  • E.g. 01100 ? 11000 01100 ? 11100 ? 11000
  • Question how many (simple) paths from one node
    to another?

10
Measures of static networks
  • Diameter
  • Maximal shortest path between two nodes
  • Ring ?p/2?, hypercube log(p)
  • 2D wraparound mesh 2 ?sqrt(p)/2?
  • Connectivity
  • Measure of multiplicity of paths between nodes
  • Arc connectivity
  • Minimum arcs to be removed to create two
    disconnected networks
  • Ring 2, hypercube log(p), mesh 2,
    wraparound mesh 4

11
More measures
  • Bisection width
  • Minimal arcs to be removed to partition the
    network in two
  • (off by one node) equal halves
  • Ring 2, Complete binary tree 1, 2D mesh
    sqrt(p)
  • Question bisection width of a hypercube?
  • Channel width
  • bits communicated simultaneously over channel
  • Channel rate / bandwidth
  • Peak communication rate (bits/second)
  • Bisection bandwidth
  • Bisection width channel bandwidth

12
Summary of measures p nodes
The textbook mentions bisection width of a star
as 1, but the only way to split a star into
(almost) equal halves is by cutting half of its
links.
13
Meshes and Hyper cubes
  • Mesh
  • Buildable, scalable, cheaper than hyper cubes
  • Many (eg grid) applications map naturally
  • Cut through works well in meshes
  • Commercial systems based on it.
  • Hyper cube
  • Recursive structure nice for algorithm design
  • Often same O complexity as PRAMs
  • Often hypercube algorithm also good for other
    topologies, so good starting point

14
Embedding
  • Relationship between two networks
  • Studied by mapping one into the other
  • Why?
  • G(V,E) ? G(V,E)
  • graph G, G, vertices V, V, edges E, E
  • Map E ?E, V ? V
  • congestion of k k (gt1) e-s to one e
  • dilation of k 1 e to k e-s
  • expansion V / V
  • Often we want congestiondilationexpansion1

15
Ring into hypercube
  • Number the nodes of the ring s.t.
  • Hamming distance between two adjacent nodes 1
  • Gray code provides such a numbering
  • Can be built recursively binary reflected Gray
    code
  • 2 nodes 0 1 OK
  • 2k nodes
  • take Gray code for 2k-1 nodes
  • Concatenate it with reflected Gray code for 2k-1
    nodes
  • Put 0 in front of first batch, 1 in front of
    second
  • Mesh can be embedded into a hypercube
  • (Toroidal) mesh rings of rings

16
ring to hypercube cont
  • 0 00 000 G(0,1) 0 i
    ?G(i,dim)
  • 1 01 001 G(1,1) 1
  • 11 011
  • 10 010 G(i,x1) 0G(i,x)
    ilt2x
  • 110 1G(2
    x1-i-1,x) igt2x
  • 111
    ( is concatenation)
  • 101
  • 100

17
2D Mesh into hypercube
  • Note 2D Mesh
  • Rows rings
  • Cols rings
  • 2r 2s wraparound mesh into 2rs cube
  • Map node(i,j) onto node G(i,r)G(j,s)
  • Row coincides with sub cube
  • Column coincides with sub cube
  • S.t. if adjacent in mesh then adjacent in cube

18
Complete binary tree into hypercube
  • Map tree root to any cube node
  • left child to same node
  • right child at level j invert bit j of parent
    node
  • 000
  • 000
    001
  • 000 010 001
    011
  • 000 100 010 110 001 101 011 111

19
Routing Mechanisms
  • Determine all source ? destination paths
  • Minimal a shortest path
  • Deterministic one path per (src,dst) pair
  • Mesh dimension ordered (XY routing)
  • Cube E-routing
  • Send along least significant 1 bit in src XOR dst
  • Adaptive many paths per (src,dst) pair
  • Minimal only shortest
  • Why adaptive? Discuss.

20
Routing (communication) Costs
  • Three factors
  • Start up at source (ts)
  • OS, buffers, error correction info, routing
    algorithm
  • Hop time (th)
  • The time it takes to get from one PE to the next
  • Also called node latency
  • Word transfer time (tw)
  • Inverse of channel bandwidth

21
Two rout(switch)ing techniques
  • Store and Forward O(m.l)
  • Strict whole message travels from PE to PE
  • m words, l links
  • tcomm ts (m.tw th).l
  • Often, th is much less than m.tw tcomm ts
    m.l.tw
  • Cut-through O(ml)
  • Non-strict message broken in flits (packets)
  • Flits are pipelined through the network
  • tcomm ts l.th m.tw
  • Circular path finite flit buffer can give rise
    to deadlock
Write a Comment
User Comments (0)
About PowerShow.com