Current major high performance networking technologies - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Current major high performance networking technologies

Description:

Subnet discovery for the subnet is setup. Subnet management packets (SMP) ... Make system calls to setup everything (open QP, bind QP to port, bind complete ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 29
Provided by: rache82
Category:

less

Transcript and Presenter's Notes

Title: Current major high performance networking technologies


1
Current major high performance networking
technologies
  • InfiniBand
  • Myrinet
  • Qudrics
  • 10G-Ethernet

2
InfiniBand
  • Was originally designed as a system area
    network connecting CPUs and I/O devices.
  • A larger role replaceing all I/O standards for
    data centers PCI, Fibre Channel, and Ethernet
    everything connects through InfiniBand.
  • A less role Low latency, high bandwidth, low
    overhead interconnect for commercial datacenters
    between servers and storage.
  • Can form local area or even large area networks.
  • Has become the de-facto interconnect for high
    performance clusters (100 systems in top 500
    supercomputer list).

3
  • Infiniband architecture
  • Specification (Infiniband architecture
    specification release 1.2.1, January 2008/Oct.
    2006) available at Infiniband Trade Association
    (http//www.infinibandta.org)

4
  • Infiniband architecture overview

5
  • Infiniband architecture overview
  • Components
  • Links
  • Channel adaptors
  • Switches
  • Routers
  • The specification allows Infiniband wide area
    network, but mostly adopted as a system/storage
    area network.
  • Topology
  • Irregular
  • Regular Fat tree
  • Link speed
  • single data rate (SDR) 2.5Gbps (X), 10Gbps
    (4X), and 30Gbps (12X).
  • Double data rate (DDR) 5Gbps (X), 20 Gbps (4X)

6
  • Layers somewhat similar to TCP/IP
  • Physical layer
  • Link layer
  • Error detection (CRC checksum)
  • flow control (credit based)
  • switching, virtual lanes (VL),
  • forwarding table computed by subnet manager
  • Not adaptive
  • Network layer across subnets.
  • No use for the cluster environment
  • Transport layer
  • Reliable/unreliable, connection/datagram
  • Verbs interface between adaptors and OS/Users

7
  • Link layer Packet format
  • Local Route Header (LRH) 8 bytes. Used for local
    routing by switches within a IBA subnet
  • Global Route Header (GRH) 40 Bytes. Used for
    routing between subnets
  • Base Transport header (BTH) 12 Bytes, for IBA
    transport
  • Reliable datagram extended transport header
    (RDETH) 4 bytes, just for reliable datagram
  • Datagram extended transport header (DETH) 8
    bytes
  • RDMA extended transport header (RETH) 16 bytes
  • Atomic, ACK, Atomic ACK,
  • Immediate DATA extended transport header 4
    bytes, optimized for small packets.
  • Invalidate
  • Invariant CRC and variant CRC
  • CRC for fields not changed and changed.

8
  • Local Route Header
  • Switching based on the destination port address
    (LID)
  • Multipath switching by allocating multiple LIDs
    to one port

9
  • Local Route Header
  • Switching based on the destination port address
    (LID).
  • Forwarding table entry (LID, outgoing-port)

10
  • Local Route Header
  • Multipath switching by allocating multiple LIDs
    to one port, see the previous example.
  • GRH same format as IPV6 address (16 bytes
    address)

11
Subnet management
  • Discover subnet topology and topology changes,
    compute the paths, assign LIDs, distribute the
    routes, configure devices
  • Not well-defined in the specification
  • Forwarding table must be computed such that all
    devices in the network can be reached.
  • References
  • A. Bermudez, R. Casado, F.J. Quiles, T. M.
    Pinkston, J. Duato, Evaluation of a Subnet
    Management Mechanism for Infiniband Networks,
    ICPP 2003.
  • A. Vishnu, A. R. Mamidala, H. Jin, D. K. Panda,
    Performance Modeling of Subnet Management on Fat
    Tree Infiniband Networks using OpenSM, Workshop
    on System Management Tools on Large Scale
    Parallel Systems, Held in Conjunction with IPDPS
    2005

12
  • InfiniBand devices and entities related to subnet
    management
  • Devices Channel Adapters (CA), Host Channel
    Adapters, switches, routers
  • Subnet manager (SM) discovering, configuring,
    activating and managing the subnet
  • A subnet management agent (SMA) in every device
    generates, responses to control packets (subnet
    management packets (SMPs)), and configures local
    components for subnet management
  • SM exchange control packets with SMA with subnet
    management interface (SMI).

13
(No Transcript)
14
  • Subnet management packets (SMP)
  • 256 bytes of data
  • Use unreliable datagram service on the management
    virtual lane (VL 15)
  • Two routing schemes
  • LID routed use lookup table for forwarding
  • Use after the subnet is setup. E.g. Check the
    status of an active port
  • Direct routed has the information of the output
    port for each intermediate hop.
  • Subnet discovery for the subnet is setup

15
  • Subnet management packets (SMP)
  • Define the operation to be performed by SM
  • Get get the information about CA, switch, port
  • Set set the attribute of a port (e.g. LID)
  • GetResp get response
  • Trap inform SM about the state of a local node
  • A SMA stop sending Trap message until it receives
    TrapRepress packet.
  • Topology information can be obtained by a sweep
    and by peridical Traps.

16
  • Subnet Management phases
  • Topology discovery sending direct routed SMP to
    every port and processing the responses.
  • Path computation computing valid paths between
    each pair of end node
  • Path distribution phase configuring the
    forwarding table

17
  • Subnet discovery
  • SM starts by sending a direct routed Get SMP to
    its local node. Upone receiving response, SM
    sends SMPs with additive depth.

18
  • Path computation
  • Compute paths between all pair of nodes
  • For irregular topology
  • Up/Down routing does not work directly
  • Need information about the incoming interface and
    the destination and Infiniband only uses
    destination
  • Potential solution
  • find all possible paths
  • remove all possible down link following up links
    in each node
  • find one output port for each destination
  • Other solutions destination renaming
  • Fat tree topology
  • What is the best that can be achieved (optimal
    routing) is also not clear.

19
  • Path distribution
  • Ordering issue the network may be in an
    inconsistent state when partially updated, which
    may result in deadlock during this period.
  • Traditional solution, no data packets for a
    period of time
  • deadlock free reconfiguration schemes.
  • How to do this correctly, effectively, and
    incrementally is still open.

20
  • Base transport header

21
  • Verbs
  • OS/Users access the adaptor through verbs
  • Communication mechanism Queue Pair (QP)
  • Users can queue up a set of instructions that the
    hardware executes.
  • A pair of queues in each QP one for send, one
    for receive.
  • Users can post send requests to the send queue
    and receive requests to the receive queue.
  • Three types of send operations SEND,
    RDMA-(WRITE, READ, ATOMIC), MEMORY-BINDING
  • One receive operation (matching SEND)

22
(No Transcript)
23
(No Transcript)
24
  • Queue Pair
  • The status of the result of an operation
    (send/receive) is stored in the complete queue.
  • Send/receive queues can bind to different
    complete queues.
  • Related system level verbs
  • Open QP, create complete queue, Open HCA, open
    protection domain, register memory, allocate
    memory window, etc
  • User level verbs
  • post send/receive request, poll for completion.

25
  • To communicate
  • Make system calls to setup everything (open QP,
    bind QP to port, bind complete queues, connect
    local QP to remote QP, register memory, etc).
  • Post send/receive requests.
  • Check completion.

26
  • InfiniBand has an almost perfect software/network
    interface (Chien'94 paper)
  • The network subsystem realizes all user level
    functionality.
  • User level accesses to the network interface. A
    few machine instructions will accomplish the
    transmission task without involving the OS.
  • Network supports in-order delivery and and fault
    tolerance.
  • Buffer management is pushed out to the user.

27
  • SilverStorm 9024
  • 24 ports 4X(10Gbps) or 8 ports 12X(30 Gbps)
  • switch type cut-through
  • switch latency lt 140ns
  • switch bandwidth 480 Gbps
  • forwarding table size 48K
  • VL support 8 1 management

28
  • SilverStorm 9240
  • 24 expansion slots, each expansion model 12 port
    4X or 4 port 12X (24x12 288, 288 by 288
    switch)
  • switch type cut-through
  • switch latency lt 140ns to lt 420ns
  • switch bandwidth 5.76Tbps
  • forwarding table size 48K
  • VL support 8 1 management
Write a Comment
User Comments (0)
About PowerShow.com