Infiniband architecture - PowerPoint PPT Presentation

About This Presentation
Title:

Infiniband architecture

Description:

Infiniband architecture. Specification (Infiniband architecture ... switching, virtual lanes (VL), forwarding table computed by subnet manager. Not adaptive ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 18
Provided by: rache82
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Infiniband architecture


1
  • Infiniband architecture
  • Specification (Infiniband architecture
    specification release 1.2, Oct. 5, 2004)
    available at Infiniband Trade Association
    (http//www.infinibandta.org)
  • Potential improvements

2
  • Infiniband architecture overview

3
  • Infiniband architecture overview
  • Components
  • Links
  • Channel adaptors
  • Switches
  • Routers
  • The specification allows Infiniband wide area
    network, but mostly adopted as a system/storage
    area network.
  • Topology
  • Irregular
  • Regular Fat tree
  • Link speed
  • 2.5Gbps (X), 10Gbps (4X), and 30Gbps (12X).

4
  • Layers somewhat similar to TCP/IP
  • Physical layer
  • Link layer
  • Error detection (CRC checksum)
  • flow control (credit based)
  • switching, virtual lanes (VL),
  • forwarding table computed by subnet manager
  • Not adaptive
  • Network layer across subnets.
  • No use for the cluster environment
  • Transport layer
  • Reliable/unreliable, connection/datagram
  • Verbs interface between adaptors and OS/Users

5
  • Packet format
  • Local Route Header (LRH) 8 bytes. Used for local
    routing by switches within a IBA subnet
  • Global Route Header (GRH) 40 Bytes. Used for
    routing between subnets
  • Base Transport header (BTH) 12 Bytes, for IBA
    transport
  • Reliable datagram extended transport header
    (RDETH) 4 bytes, just for reliable datagram
  • Datagram extended transport header (DETH) 8
    bytes
  • RDMA extended transport header (RETH) 16 bytes
  • Atomic, ACK, Atomic ACK,
  • Immediate DATA extended transport header 4
    bytes, optimized for small packets.
  • Invalidate
  • Invariant CRC and variant CRC
  • CRC for fields not changed and changed.

6
  • Local Route Header
  • Switching based on the destination port address
    (LID)
  • Multipath switching by allocating multiple LIDs
    to one port

7
  • Local Route Header
  • Switching based on the destination port address
    (LID)
  • Multipath switching by allocating multiple LIDs
    to one port
  • GRH same format as IPV6 address (16 bytes
    address)

8
  • Base transport header

9
  • Verbs
  • OS/Users access the adaptor through verbs
  • Communication mechanism Queue Pair (QP)
  • Support the four types of services, including
    reliable connection service
  • Each connection takes one QP on each end.
  • Each QP has a send queue and a receive queue.
  • Users can post send requests to the send queue
    and receive requests to the receive queue.
  • Three types of send operations SEND,
    RDMA-(WRITE, READ, ATOMIC), MEMORY-BINDING
  • One receive operation (matching SEND)

10
  • Queue Pair
  • The status of the result of an operation
    (send/receive) is stored in the complete queue.
  • Send/receive queues can bind to different
    complete queues.
  • Related system level verbs
  • Open QP, create complete queue, Open HCA, open
    protection domain, register memory, allocate
    memory window, etc
  • User level verbs
  • post send/receive request, poll for completion.

11
  • To communicate
  • Make system calls to setup everything (open QP,
    bind QP to port, bind complete queues, connect
    local QP to remote QP, register memory, etc).
  • Post send/receive requests.
  • Check completion.
  • What if a packet arrives before a receive request
    is posted?
  • Not specified in the standard
  • The right response should be a receiver not
    ready (RNR) error. The sender is back-pressed in
    this case.

12
  • Infiniband has a perfect software interface
    (Chien'94 paper)
  • The network subsystem realizes all user level
    functionality.
  • User level accesses to the network interface. A
    few machine instructions will accomplish the
    transmission task without involving the OS.
  • Network supports in-order delivery and and fault
    tolerance.
  • Buffer management is pushed out to the user.

13
  • SilverStorm 9024
  • 24 ports 4X(10Gbps) or 8 ports 12X(30 Gbps)
  • switch type cut-through
  • switch latency lt 140ns
  • switch bandwidth 480 Gbps
  • forwarding table size 48K
  • VL support 8 1 management

14
  • SilverStorm 9240
  • 24 expansion slots, each expansion model 12 port
    4X or 4 port 12X (24x12 288, 288 by 288
    switch)
  • switch type cut-through
  • switch latency lt 140ns to lt 420ns
  • switch bandwidth 5.76Tbps
  • forwarding table size 48K
  • VL support 8 1 management

15
  • Potential improvements on Infiniband using
    compiled communication
  • Improving the internal Infiniband fabric
  • Offline routing for static pattern (static SM for
    a reduced traffic pattern) can be beneficial for
    irregular networks.
  • Simplify the layer architecture by having a
    direct link model (for known patterns), the
    header can be simplified, may not matter much
    (Infiniband layers are thin).
  • Simplify the protection mechanism.
  • Circuit switch type Infiniband.
  • Reliable communication protocol is still needed.
  • Potential benefits can be evaluated by simulation.

16
  • Improving the messaging software (software to
    hardware interface) no chance.
  • Improving the MPI implementation over Infiniband
    similar to our current work on Ethernet
  • Message scheduling for collective/point-to-point
    communications based on the network topology.
  • Exploring NIC features (buffers in NIC,
    multicast)
  • Reducing the number of instructions in a library
    routine makes sense. Compiled communication can
    be used to optimize the MPI library.
  • Compiled communication can help improving the
    library implementation (e.g. reducing the number
    of message copies, early requests posting , using
    RDMA, etc).

17
  • One particular project
  • Design algorithms for Infiniband subnet manager
  • Improving routing performance for Infiniband
    subnet manager (SM).
  • Objective minimize the maximum channel load for
    an given traffic pattern
  • Optimize according to a given pattern the
    traffic pattern in an application is usually not
    all-to-all
  • Default routing used in IBA SM
  • For a sparse traffic pattern, the maximum channel
    load can usually be minimized using the minimim
    interference principle.
  • Need to extend minimum interference routing for
    load balance deadlock free routing.
  • The best way to realize IBA SM is still not clear
    (unknown) at this time, we can probably do
    something here.
  • Irregular network or Fat tree network
Write a Comment
User Comments (0)
About PowerShow.com