Interconnection network network interface and a case study - PowerPoint PPT Presentation

About This Presentation
Title:

Interconnection network network interface and a case study

Description:

Interconnection network network interface and a case study * * * * * * * * * * * * * * * Network interface design issue The networking requirement user s ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 27
Provided by: XinY8
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Interconnection network network interface and a case study


1
Interconnection network network interface and a
case study
2
Network interface design issue
  • The networking requirement users perspective
  • In-order message delivery
  • Reliable delivery
  • Error control
  • Flow control
  • Deadlock free
  • Typical network hardware features
  • Arbitrary delivery order (adaptive/multipath
    routing)
  • Finite buffering
  • Limited fault handling
  • How and where should we bridge the gap?
  • Network hardware? Network systems? Or a
    hardware/systems/software approach?

3
The Internet approach
  • How does the Internet realize these functions?
  • No deadlock issue
  • Reliability/flow control/in-order delivery are
    done at the TCP layer?
  • The network layer (IP) provides best effort
    service.
  • IP is done in the software as well.
  • Drawbacks
  • Too many layers of software
  • Users need to go through the OS to access the
    communication hardware (system calls can cause
    context switching).

4
Approach in HPC networks
  • Where should these functions be realized?
  • High performance networking
  • Most functionality below the network layer are
    done by the hardware (or almost hardware)
  • This provide the APIs for network transactions
  • If there is mis-match between what the network
    provides and what users want, a software
    messaging layer is created to bridge the gaps.

5
Messaging Layer
  • Bridge between the hardware functionality and the
    user communication requirement
  • Typical network hardware features
  • Arbitrary delivery order (adaptive/multipath
    routing)
  • Finite buffering
  • Limited fault handling
  • Typical user communication requirement
  • In-order delivery
  • End-to-end flow control
  • Reliable transmission

6
Messaging Layer
7
Communication cost
  • Communication cost hardware cost software
    cost (messaging layer cost)
  • Hardware message time msize/bandwidth
  • Software time
  • Buffer management
  • End-to-end flow control
  • Running protocols
  • Which one is dominating?
  • Depends on how much the software has to do.

8
Network software/hardware interaction -- a case
study
  • A case study on the communication performance
    issues on CM5
  • V. Karamcheti and A. A. Chien, Software Overhead
    in Messaging layers Where does the time go? ACM
    ASPLOS-VI, 1994.

9
What do we see in the study?
  • The mis-match between the user requirement and
    network functionality can introduce significant
    software overheads (50-70).
  • Implication?
  • Should we focus on hardware or software or
    software/hardware co-design?
  • Improving routing performance may increase
    software cost
  • Adaptive routing introduces out of order packets
  • Providing low level network feature to
    applications is problematic.

10
Summary from the study
  • In the design of the communication system,
    holistic understanding must be achieved
  • Focusing on network hardware may not be
    sufficient. Software overhead can be much larger
    than routing time.
  • It would be ideal for the network to directly
    provide high level services.
  • The newer generation interconnect hardware tries
    to achieve this.

11
Case study
  • IBM Bluegene/L system
  • InfiniBand

12
Interconnect Family share for 06/2011 top 500
supercomputers
Interconnect Family Count Share Rmax Sum (GF) Rpeak Sum (GF) Processor Sum
Myrinet 4 0.80 384451 524412 55152
Quadrics 1 0.20 52840 63795 9968
Gigabit Ethernet 232 46.40 11796979 22042181 2098562
Infiniband 206 41.20 22980393 32759581 2411516
Mixed 1 0.20 66567 82944 13824
NUMAlink 2 0.40 107961 121241 18944
SP Switch 1 0.20 75760 92781 12208
Proprietary 29 5.80 9841862 13901082 1886982
Fat Tree 1 0.20 122400 131072 1280
Custom 23 4.60 13500813 15460859 1271488
Totals 500 100 58930025.59 85179949.00 7779924
13
Overview of the IBM Blue Gene/L System
Architecture
  • Design objectives
  • Hardware overview
  • System architecture
  • Node architecture
  • Interconnect architecture

14
Highlights
  • A 64K-node highly integrated supercomputer based
    on system-on-a-chip technology
  • Two ASICs
  • Blue Gene/L compute (BLC), Blue Gene/L Link (BLL)
  • Distributed memory, massively parallel processing
    (MPP) architecture.
  • Use the message passing programming model (MPI).
  • 360 Tflops peak performance
  • Optimized for cost/performance

15
Design objectives
  • Objective 1 360-Tflops supercomputer
  • Earth Simulator (Japan, fastest supercomputer
    from 2002 to 2004) 35.86 Tflops
  • Objective 2 power efficiency
  • Performance/rack performance/watt watt/rack
  • Watt/rack is a constant of around 20kW
  • Performance/watt determines performance/rack

16
  • Power efficiency
  • 360Tflops gt 20 megawatts with conventional
    processors
  • Need low-power processor design (2-10 times
    better power efficiency)

17
Design objectives (continue)
  • Objective 3 extreme scalability
  • Optimized for cost/performance ? use low power,
    less powerful processors ? need a lot of
    processors
  • Up to 65536 processors.
  • Interconnect scalability
  • Reliability, availability, and serviceability
  • Application scalability

18
Blue Gene/L system components
19
Blue Gene/L Compute ASIC
  • 2 Power PC440 cores with floating-point
    enhancements
  • 700MHz
  • Everything of a typical superscalar processor
  • Pipelined microarchitecture with dual instruction
    fetch, decode, and out of order issue, out of
    order dispatch, out of order execution and out of
    order completion, etc
  • 1 W each through extensive power management

20
Blue Gene/L Compute ASIC
21
Memory system on a BGL node
  • BG/L only supports distributed memory paradigm.
  • No need for efficient support for cache coherence
    on each node.
  • Coherence enforced by software if needed.
  • Two cores operate in two modes
  • Communication coprocessor mode
  • Need coherence, managed in system level libraries
  • Virtual node mode
  • Memory is physical partitioned (not shared).

22
Blue Gene/L networks
  • Five networks.
  • 100 Mbps Ethernet control network for
    diagnostics, debugging, and some other things.
  • 1000 Mbps Ethernet for I/O
  • Three high-band width, low-latency networks for
    data transmission and synchronization.
  • 3-D torus network for point-to-point
    communication
  • Collective network for global operations
  • Barrier network
  • All network logic is integrated in the BG/L node
    ASIC
  • Memory mapped interfaces from user space

23
3-D torus network
  • Support p2p communication
  • Link bandwidth 1.4Gb/s, 6 bidirectional link per
    node (1.2GB/s).
  • 64x32x32 torus diameter 32161664 hops, worst
    case hardware latency 6.4us.
  • Cut-through routing
  • Adaptive routing

24
Collective network
  • Binary tree topology, static routing
  • Link bandwidth 2.8Gb/s
  • Maximum hardware latency 5us
  • With arithmetic and logical hardware can perform
    integer operation on the data
  • Efficient support for reduce, scan, global sum,
    and broadcast operations
  • Floating point operation can be done with 2
    passes.

25
Barrier network
  • Hardware support for global synchronization.
  • 1.5us for barrier on 64K nodes.

26
IBM BlueGene/L summary
  • Optimize cost/performance
  • limiting applications.
  • Use low power design
  • Lower frequency, system-on-a-chip
  • Great performance per watt metric
  • Scalability support
  • Hardware support for global communication and
    barrier
  • Low latency, high bandwidth support
Write a Comment
User Comments (0)
About PowerShow.com