Interconnection Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Interconnection Networks

Description:

From: Ambuj Goyal, 'Computer Science Grand Challenge Simplicity of Design, ... Independently managed buffers multiplexed over the physical channel ... – PowerPoint PPT presentation

Number of Views:557
Avg rating:3.0/5.0
Slides: 68
Provided by: cre29
Category:

less

Transcript and Presenter's Notes

Title: Interconnection Networks


1
Interconnection Networks
2
Overview
  • Physical Layer and Message Switching
  • Network Topologies
  • Metrics
  • Deadlock Livelock
  • Routing Layer
  • The Messaging Layer

3
Interconnection Networks
  • Fabric for scalable, multiprocessor architectures
  • Distinct from traditional networking
    architectures such as Internet Protocol (IP)
    based systems
  • We are interested in applications to large
    clusters as well as embedded systems

4
CLUX A Beowulf Cluster
Interconnection Network Cables
Myrinet Switch
Images from the Clux cluster at
http//www.fyslab.hut.fi/clux/
5
The Practical Problem
From Ambuj Goyal, Computer Science Grand
Challenge Simplicity of Design, Computing
Research Association Conference on "Grand
Research Challenges" in Computer Science and
Engineering, June 2002
6
Example Embedded Devices
picoChip http//www.picochip.com/
  • Issues
  • Execution performance
  • Power dissipation
  • Number of chip types
  • Size and form factor

PACT XPP Technologies http//www.pactcorp.com/
7
Physical Layer and Message Switching
8
Messaging Hierarchy
Routing Layer
Where? Destination decisions, i.e., which output
port
Switching Layer
When? When is data forwarded
Physical Layer
How? synchronization of data transfer
  • This organization is distinct from traditional
    networking implementations
  • Emphasis is on low latency communication
  • Only recently have standards been evolving
  • Infiniband http//www.infinibandta.org/home

9
The Physical Layer
Data
Packets
checksum
header
Flit flow control digit
Phit physical flow control digit
  • Data is transmitted based on a hierarchical data
    structuring mechanism
  • Messages ? packets ? flits ? phits
  • While flits and phits are fixed size, packets and
    data may be variable sized

10
Flow Control
  • Flow control digit synchronized transfer of a
    unit of information
  • Based on buffer management
  • Asynchronous vs. synchronous flow control
  • Flow control occurs at multiple levels
  • message flow control
  • physical flow control
  • Mechanisms
  • Credit based flow control

11
Switching Layer
  • Comprised of three sets of techniques
  • switching techniques
  • flow control
  • buffer management
  • Organization and operation of routers are largely
    determined by the switching layer
  • Connection Oriented vs. Connectionless
    communication

12
Generic Router Architecture
Wire delay
Switching delay
Routing delay
13
Virtual Channels
  • Each virtual channel is a pair of unidirectional
    channels
  • Independently managed buffers multiplexed over
    the physical channel
  • De-couples buffers from physical channels
  • Originally introduced to break cyclic
    dependencies
  • Improves performance through reduction of
    blocking delay
  • Virtual lanes vs. virtual channels
  • As the number of virtual channels increase, the
    increased channel multiplexing has two effects
  • decrease in header delay
  • increase in average data flit delay
  • Impact on router performance
  • switch complexity

14
Circuit Switching
Data
Acknowledgment
Header Probe
Link
tr
ts
tsetup
tdata
Time Busy
  • Hardware path setup by a routing header or probe
  • End-to-end acknowledgment initiates transfer at
    full hardware bandwidth
  • Source routing vs. distributed routing
  • System is limited by signaling rate along the
    circuits

15
Packet Switching
  • Blocking delays in circuit switching avoided in
    packet switched networks ? full link utilization
    in the presence of data
  • Increased storage requirements at the nodes
  • Packetization and in-order delivery requirements
  • Buffering
  • use of local processor memory
  • central queues

16
Virtual Cut-Through
Packet Header
Message Packet cuts through the Router
tw
Link
tblocking
tr
ts
Time Busy
  • Messages cut-through to the next router when
    feasible
  • In the absence of blocking, messages are
    pipelined
  • pipeline cycle time is the larger of intra-router
    and inter-router flow control delays
  • When the header is blocked, the complete message
    is buffered
  • High load behavior approaches that of packet
    switching

17
Wormhole Switching
Header Flit
Link
Single Flit
tr
ts
twormhole
Time Busy
  • Messages are pipelined, but buffer space is on
    the order of a few flits
  • Small buffers message pipelining ? small
    compact buffers
  • Supports variable sized messages
  • Messages cannot be interleaved over a channel
    routing information is only associated with the
    header
  • Base Latency is equivalent to that of virtual
    cut-through

18
Comparison of Switching Techniques
  • Packet switching and virtual cut-through
  • consume network bandwidth proportional to network
    load
  • predictable demands
  • VCT behaves like wormhole at low loads and like
    packet switching at high loads
  • link level error control for packet switching
  • Wormhole switching
  • provides low latency
  • lower saturation point
  • higher variance of message latency than packet or
    VCT switching
  • Virtual channels
  • blocking delay vs. data delay
  • router flow control latency
  • Optimistic vs. conservative flow control

19
Saturation
20
Network Topologies
21
Motivation
  • Crossbars provide full connectivity among ports,
    but cost and complexity grow quadratically in the
    number of ports
  • Buses provide minimal connectivity and do not
    provide scalable performance
  • Network topologies span a spectrum of solutions
    that trade-off cost, performance (latency
    bandwidth), reliability, and implementation
    complexity

22
Direct Networks
  • Fixed degree
  • Modular
  • Topologies
  • Meshes
  • Multidimensional tori
  • Special case of tori the binary hypercube

23
Indirect Networks
  • Indirect networks
  • uniform base latency
  • centralized or distributed control
  • Engineering approximations to direct networks

Multistage Network
Backward
Forward
Fat Tree Network
Bandwidth increases as you go up the tree
24
Specific MINs
000
000
000
000
000
000
001
001
001
001
001
001
010
010
010
010
010
010
011
011
011
011
011
011
100
100
100
100
100
100
101
101
101
101
101
101
110
110
110
110
110
110
111
111
111
111
111
111
  • Switch sizes and interstage interconnect
    establish distinct MINS
  • Majority of interesting MINs have been shown to
    be topologically equivalent

25
Metrics
26
Evaluation Metrics
  • Latency
  • Message transit time
  • Determined by switching technique and traffic
    patterns
  • Node degree (channel width)
  • Number of input/output channels
  • This metric is determined by packaging
    constraints
  • pin/wiring constraints
  • Diameter
  • Path diversity
  • A measure of reliability

27
Evaluation Metrics
bisection
  • Bisection bandwidth
  • This is minimum bandwidth across any bisection of
    the network
  • Bisection bandwidth is a limiting attribute of
    performance

28
Constant Resource Analysis Bisection Width
29
Constant Resource Analysis Pin out
30
Latency Under Contention
32-ary 2-cube vs. 10-ary 3 cube
31
Deadlock and Livelock
32
Deadlock and Livelock
router
Virtual Channel
  • Deadlock freedom can be ensured by enforcing
    constraints
  • For example, following dimension order routing in
    2D meshes

33
Occurrence of Deadlock
1
3
2
4
  • Deadlock is caused by dependencies between buffers

34
Deadlock in a Ring Network
35
Deadlock Avoidance Principle
  • Deadlock is caused by dependencies between buffers

36
Routing Constraints on Virtual Channels
  • Add multiple virtual channels to each physical
    channel
  • Place routing restrictions between virtual
    channels

37
Break Cycles
38
Channel Dependence Graph
39
Routing Layer
40
Routing Protocols
Routing Algorithms
Unicast Routing
Multicast Routing
Number of Destinations
Centralized Routing
Source Routing
Distributed Routing
Multiphase Routing
Routing Decisions
Table Lookup
Finite State Machine
Implementation
Deterministic Routing
Adaptive Routing
Adaptivity
Progressiveness
Progressive
Backtracking
Profitable
Misrouting
Minimality
Number of Paths
Complete
Partial
Source J. Duato, S. Yalamanchili, and L. Ni,
Interconnection Networks, Morgan Kaufman 2003.
41
Key Routing Categories
  • Deterministic
  • The path is fixed by the source destination pair
  • Source Routing
  • Path is looked up prior to message injection
  • May differ each time the network and NIs are
    initialized
  • Adaptive routing
  • Path is determined by run-time network conditions
  • Unicast
  • Single source to single destination
  • Multicast
  • Single source to multiple destinations

42
Generic Router Architecture
43
Software Layer
44
The Message Layer
  • Message layer background
  • Cluster computers
  • Myrinet SAN
  • Design properties
  • End-to-End communication path
  • Injection
  • Network transmission
  • Ejection
  • Overall performance

45
Cluster Computers
  • Cost-effective alternative to supercomputers
  • Number of commodity workstations
  • Specialized network hardware and software
  • Result Large pool of host processors

Courtesy of C. Ulmer
46
Myrinet
  • Descendant of Caltech Mosaic project
  • Wormhole network
  • Source routing
  • High-speed, Ultra-reliable network
  • Configurable topology Switches, NICs, and cables

Courtesy of C. Ulmer
47
Myrinet Switches Links
  • 16 Port crossbar chip
  • 2.02.0 Gbps per port
  • 300 ns Latency
  • Line card
  • 8 Network ports
  • 8 Backplane ports
  • Backplane cabinet
  • 17 line card slots
  • 128 Hosts

Courtesy of C. Ulmer
48
Myrinet NI Architecture
  • Custom RISC CPU
  • 33-200MHz
  • Big endian
  • gcc is available
  • SRAM
  • 1-9MB
  • No CPU cache
  • DMA Engines
  • PCI / SRAM
  • SRAM / Tx
  • Rx / SRAM

SRAM
RISC CPU
PCI
Tx Rx
Host DMA
SAN DMA
LANai Processor
Network Interface Card
Courtesy of C. Ulmer
49
Message Layers
Courtesy of C. Ulmer
50
Message Layer Communication Software
  • Message layers are enabling technology for
    clusters
  • Enable cluster to function as single image
    multiprocessor system
  • Responsible for transferring messages between
    resources
  • Hide hardware details from end users

Courtesy of C. Ulmer
51
Message Layer Design Issues
  • Performance is critical
  • Competing with SMPs, where overhead is lt1us
  • Use every trick to get performance
  • Single cluster user -- remove device sharing
    overhead
  • Little protection -- co-operative environment
  • Reliable hardware -- optimize for common case
    of few errors
  • Smart hardware -- offload host communication
  • Arch hacks -- x86 is a turkey, use MMX, SSE, WC..

Courtesy of C. Ulmer
52
Message Layer Organization
User-space Application
Kernel NI Device Driver
User-space Message Layer Library
NI Firmware
Courtesy of C. Ulmer
53
End Users Perspective
Processor A
Processor B
Msg
Courtesy of C. Ulmer
54
End-to-End Communication Path
  • Three phases of data transfer
  • Injection
  • Network
  • Ejection

CPU
CPU
Memory
Memory
2
1
3
NI
SAN
NI
Source
Destination
Courtesy of C. Ulmer
55
TPIL Performance LANai 9 NI with Pentium
III-550 MHz Host
Bandwidth (MBytes/s)
Injection Size (Bytes)
Courtesy of C. Ulmer
56
The Message Path
M
M
CPU
CPU
PCI
PCI
OS
OS
PCI
PCI
Memory
Memory
NI
NI
Network
  • Wire bandwidth is not the bottleneck!
  • Operating system and/or user level software
    limits performance

57
Universal Performance Metrics
Sender
(processor busy)
Transmission time (size bandwidth)
Time of Flight
Receiver Overhead
Receiver
(processor busy)
Transport Latency
Total Latency
Total Latency Sender Overhead Time of Flight
Message Size BW
Receiver Overhead
Includes header/trailer in BW calculation?
58
Simplified Latency Model
  • Total Latency Overhead Message Size / BW
  • Overhead Sender Overhead Time of Flight
  • Receiver Overhead
  • Can relate overhead to network bandwidth
    utilization

59
Commercial Example
60
Scalable Switching Fabrics for Internet Routers
Router
  • Internet bandwidth growth ? routers with
  • large numbers of ports
  • high bisection bandwidth
  • Historically these solutions have used
  • Backplanes
  • Crossbar switches
  • White paper Scalable Switching Fabrics for
    Internet Routers, by W. J. Dally, http
    //www.avici.com/technology/whitepapers/

61
Requirements
  • Scalable
  • Incremental
  • Economical ? cost linear in the number of nodes
  • Robust
  • Fault tolerant ? path diversity reconfiguration
  • Non-blocking features
  • Performance
  • High bisection bandwidth
  • Quality of Service (QoS)
  • Bounded delay

62
Switching Fabric
  • Three components
  • Topology ? 3D torus
  • Routing ? source routing with randomization
  • Flow control ? virtual channels and virtual
    networks
  • Maximum configuration 14 x 8 x 5 560
  • Channel speed is 10 Gbps

63
Packaging
  • Uniformly short wires between adjacent nodes
  • Can be built in passive backplanes
  • Run at high speed
  • Bandwidth inversely proportional to square of
    wire length
  • Cabling costs
  • Power costs

Figures are from Scalable Switching Fabrics for
Internet Routers, by W. J. Dally (can be found at
www.avici.com)
64
Properties
  • Path diversity
  • Avoids tree saturation
  • Edge disjoint paths for fault tolerance
  • Heart beat checks (100 microsecs) deflecting
    while tables are updated

Figures are from Scalable Switching Fabrics for
Internet Routers, by W. J. Dally (can be found at
www.avici.com)
65
Properties
Figures are from Scalable Switching Fabrics for
Internet Routers, by W. J. Dally (can be found at
www.avici.com)
66
Use of Virtual Channels
  • Virtual channels aggregated into virtual networks
  • Two networks for each output port
  • Distinct networks prevent undesirable coupling
  • Only bandwidth on a link is shared
  • Fair arbitration mechanisms
  • Distinct networks enable QoS constraints to be
    met
  • Separate best effort and constant bit rate traffic

67
Summary
  • Distinguish between traditional networking and
    high performance multiprocessor communication
  • Hierarchy of implementations
  • Physical, switching and routing
  • Protocol families and protocol layers (the
    protocol stack)
  • Datapath and architecture of the switches
  • Metrics
  • Bisection bandwidth
  • Reliability
  • Traditional latency and bandwidth
Write a Comment
User Comments (0)
About PowerShow.com