Networks: Switch Design - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Networks: Switch Design

Description:

Switches to avoid head-of-line blocking. Additional cost. Switch ... 3D bidirectional torus, dimension order (NIC selected), virtual cut-through, packet sw. ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 29
Provided by: david3094
Category:

less

Transcript and Presenter's Notes

Title: Networks: Switch Design


1
Networks Switch Design
2
Switch Design
3
How do you build a crossbar
4
Input buffered swtich
  • Independent routing logic per input
  • FSM
  • Scheduler logic arbitrates each output
  • priority, FIFO, random
  • Head-of-line blocking problem

5
Switches to avoid head-of-line blocking
  • Additional cost
  • Switch cycle time, routing delay
  • How would you build a shared pool?

6
Example IBM SP vulcan switch
  • Many gigabit Ethernet switches use similar design
    without the cut-through

7
Output scheduling
  • n independent arbitration problems?
  • static priority, random, round-robin
  • simplifications due to routing algorithm?
  • Dimension order routing
  • Adaptive routing
  • general case is max bipartite matching

8
Stacked Dimension Switches
  • Dimension order on 3D cube?

9
Flow Control
  • What do you do when push comes to shove?
  • Ethernet collision detection and retry after
    delay
  • FDDI, token ring arbitration token
  • TCP/WAN buffer, drop, adjust rate
  • any solution must adjust to output rate
  • Link-level flow control

10
Examples
  • Short Links
  • long links
  • several flits on the wire

11
Smoothing the flow
Incoming Phits
Flow-control Symbols
Full
High
Stop
Mark
Low
Go
Mark
Empty
Outgoing Phits
  • How much slack do you need to maximize bandwidth?

12
Link vs End-to-End flow control
  • Hot Spots
  • back pressure
  • all the buffers in the tree from the hot spot to
    the sources are full
  • Global communication operations
  • Simple back pressure
  • with completely balanced communication patterns
  • simple end-to-end protocols in the global
    communicationhave been shown to mitigate this
    problem
  • a node may wait after sending a certain amount of
    data until it has also received this amount, or
    it may wait for chunks of its data to be
    acknowledged
  • Admission Control
  • NI-to-NI credit-based flow control
  • keep the packet within the source NI rather than
    blocking traffic within the network

13
Example T3D
  • 3D bidirectional torus, dimension order (NIC
    selected), virtual cut-through, packet sw.
  • 16 bit x 150 MHz, short, wide, synch.
  • rotating priority per output
  • logically separate request/response (two VCs
    each)
  • 3 independent, stacked switches
  • 8 16-bit flit buffers on each of 4 VC in each
    directions

14
Example SP
  • 8-port switch, 40 MB/s per link, 8-bit phit,
    16-bit flit, single 40 MHz clock
  • packet sw, cut-through, no virtual channel,
    source-based routing
  • variable packet lt 255 bytes, 31 byte fifo per
    input, 7 bytes per output
  • 128 8-byte chunks in central queue, LRU per
    output
  • run in shadow mode

15
Summary
  • Routing Algorithms restrict the set of routes
    within the topology
  • simple mechanism selects turn at each hop
  • arithmetic, selection, lookup
  • Deadlock-free if channel dependence graph is
    acyclic
  • limit turns to eliminate dependences
  • add separate channel resources to break
    dependences
  • combination of topology, algorithm, and switch
    design
  • Deterministic vs adaptive routing
  • Switch design issues
  • input/output/pooled buffering, routing logic,
    selection logic
  • Flow control
  • Real networks are a package of design choices

16
Cache Coherence in Scalable Machines
17
Context for Scalable Cache Coherence
Scalable Networks - many simultaneous transactio
ns
Realizing Pgm Models through net
transaction protocols - efficient node-to-net
interface - interprets transactions
Scalable distributed memory
Caches naturally replicate data - coherence
through bus snooping protocols - consistency
Need cache coherence protocols that scale! -
no broadcast or single point of order
18
Generic Solution Directories
  • Maintain state vector explicitly
  • associate with memory block
  • records state of block in each cache
  • On miss, communicate with directory
  • determine location of cached copies
  • determine action to take
  • conduct protocol to maintain coherence

19
A Cache Coherent System Must
  • Provide set of states, state transition diagram,
    and actions
  • Manage coherence protocol
  • (0) Determine when to invoke coherence protocol
  • (a) Find info about state of block in other
    caches to determine action
  • whether need to communicate with other cached
    copies
  • (b) Locate the other copies
  • (c) Communicate with those copies
    (inval/update)
  • (0) is done the same way on all systems
  • state of the line is maintained in the cache
  • protocol is invoked if an access fault occurs
    on the line
  • Different approaches distinguished by (a) to (c)

20
Bus-based Coherence
  • All of (a), (b), (c) done through broadcast on
    bus
  • faulting processor sends out a search
  • others respond to the search probe and take
    necessary action
  • Could do it in scalable network too
  • broadcast to all processors, and let them respond
  • Conceptually simple, but broadcast doesnt scale
    with p
  • on bus, bus bandwidth doesnt scale
  • on scalable network, every fault leads to at
    least p network transactions
  • Scalable coherence
  • can have same cache states and state transition
    diagram
  • different mechanisms to manage protocol

21
One Approach Hierarchical Snooping
  • Extend snooping approach hierarchy of broadcast
    media
  • tree of buses or rings (KSR-1)
  • processors are in the bus- or ring-based
    multiprocessors at the leaves
  • parents and children connected by two-way snoopy
    interfaces
  • snoop both buses and propagate relevant
    transactions
  • main memory may be centralized at root or
    distributed among leaves
  • Issues (a) - (c) handled similarly to bus, but
    not full broadcast
  • faulting processor sends out search bus
    transaction on its bus
  • propagates up and down hiearchy based on snoop
    results
  • Problems
  • high latency multiple levels, and snoop/lookup
    at every level
  • bandwidth bottleneck at root
  • Not popular today

22
Scalable Approach Directories
  • Every memory block has associated directory
    information
  • keeps track of copies of cached blocks and their
    states
  • on a miss, find directory entry, look it up, and
    communicate only with the nodes that have copies
    if necessary
  • in scalable networks, communication with
    directory and copies is through network
    transactions
  • Many alternatives for organizing directory
    information

23
Basic Operation of Directory
k processors. With each cache-block in
memory k presence-bits, 1 dirty-bit With
each cache-block in cache 1 valid bit, and 1
dirty (owner) bit
  • Read from main memory by processor i
  • If dirty-bit OFF then read from main memory
    turn pi ON
  • if dirty-bit ON then recall line from dirty
    proc (cache state to shared) update memory turn
    dirty-bit OFF turn pi ON supply recalled data
    to i
  • Write to main memory by processor i
  • If dirty-bit OFF then supply data to i send
    invalidations to all caches that have the block
    turn dirty-bit ON turn pi ON ...
  • ...

24
Basic Directory Transactions
25
A Popular Middle Ground
  • Two-level hierarchy
  • Individual nodes are multiprocessors, connected
    non-hierarchically
  • e.g. mesh of SMPs
  • Coherence across nodes is directory-based
  • directory keeps track of nodes, not individual
    processors
  • Coherence within nodes is snooping or directory
  • orthogonal, but needs a good interface of
    functionality
  • Examples
  • Convex Exemplar directory-directory
  • Sequent, Data General, HAL directory-snoopy
  • SMP on a chip?

26
Example Two-level Hierarchies
27
Advantages of Multiprocessor Nodes
  • Potential for cost and performance advantages
  • can use commodity SMPs
  • less nodes for directory to keep track of
  • much communication may be contained within node
    (cheaper)
  • nodes prefetch data for each other (fewer
    remote misses)
  • combining of requests (like hierarchical, only
    two-level)
  • can even share caches (overlapping of working
    sets)
  • benefits depend on sharing pattern (and mapping)
  • good for widely read-shared e.g. tree data in
    Barnes-Hut
  • good for nearest-neighbor, if properly mapped
  • not so good for all-to-all communication

28
Disadvantages of Coherent MP Nodes
  • Bandwidth shared among nodes
  • all-to-all example
  • applies to coherent or not
  • Bus increases latency to local memory
  • With coherence, typically wait for local snoop
    results before sending remote requests
  • Snoopy bus at remote node increases delays there
    too, increasing latency and reducing bandwidth
  • May hurt performance if sharing patterns dont
    comply
Write a Comment
User Comments (0)
About PowerShow.com