Interconnection network network interface and a case study - PowerPoint PPT Presentation

About This Presentation

Title:

Interconnection network network interface and a case study

Description:

Interconnection network network interface and a case study * * * * * * * * * * * * * * * Network interface design issue The networking requirement user s ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 27

Provided by: XinY8

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Interconnection network network interface and a case study

1
Interconnection network network interface and a
case study
2
Network interface design issue

The networking requirement users perspective
In-order message delivery
Reliable delivery
Error control
Flow control
Deadlock free
Typical network hardware features
Arbitrary delivery order (adaptive/multipath
routing)
Finite buffering
Limited fault handling
How and where should we bridge the gap?
Network hardware? Network systems? Or a
hardware/systems/software approach?

3
The Internet approach

How does the Internet realize these functions?
No deadlock issue
Reliability/flow control/in-order delivery are
done at the TCP layer?
The network layer (IP) provides best effort
service.
IP is done in the software as well.
Drawbacks
Too many layers of software
Users need to go through the OS to access the
communication hardware (system calls can cause
context switching).

4
Approach in HPC networks

Where should these functions be realized?
High performance networking
Most functionality below the network layer are
done by the hardware (or almost hardware)
This provide the APIs for network transactions
If there is mis-match between what the network
provides and what users want, a software
messaging layer is created to bridge the gaps.

5
Messaging Layer

Bridge between the hardware functionality and the
user communication requirement
Typical network hardware features
Arbitrary delivery order (adaptive/multipath
routing)
Finite buffering
Limited fault handling
Typical user communication requirement
In-order delivery
End-to-end flow control
Reliable transmission

6
Messaging Layer
7
Communication cost

Communication cost hardware cost software
cost (messaging layer cost)
Hardware message time msize/bandwidth
Software time
Buffer management
End-to-end flow control
Running protocols
Which one is dominating?
Depends on how much the software has to do.

8
Network software/hardware interaction -- a case
study

A case study on the communication performance
issues on CM5
V. Karamcheti and A. A. Chien, Software Overhead
in Messaging layers Where does the time go? ACM
ASPLOS-VI, 1994.

9
What do we see in the study?

The mis-match between the user requirement and
network functionality can introduce significant
software overheads (50-70).
Implication?
Should we focus on hardware or software or
software/hardware co-design?
Improving routing performance may increase
software cost
Adaptive routing introduces out of order packets
Providing low level network feature to
applications is problematic.

10
Summary from the study

In the design of the communication system,
holistic understanding must be achieved
Focusing on network hardware may not be
sufficient. Software overhead can be much larger
than routing time.
It would be ideal for the network to directly
provide high level services.
The newer generation interconnect hardware tries
to achieve this.

11
Case study

IBM Bluegene/L system
InfiniBand

12
Interconnect Family share for 06/2011 top 500
supercomputers
Interconnect Family Count Share Rmax Sum (GF) Rpeak Sum (GF) Processor Sum
Myrinet 4 0.80 384451 524412 55152
Quadrics 1 0.20 52840 63795 9968
Gigabit Ethernet 232 46.40 11796979 22042181 2098562
Infiniband 206 41.20 22980393 32759581 2411516
Mixed 1 0.20 66567 82944 13824
NUMAlink 2 0.40 107961 121241 18944
SP Switch 1 0.20 75760 92781 12208
Proprietary 29 5.80 9841862 13901082 1886982
Fat Tree 1 0.20 122400 131072 1280
Custom 23 4.60 13500813 15460859 1271488
Totals 500 100 58930025.59 85179949.00 7779924
13
Overview of the IBM Blue Gene/L System
Architecture

Design objectives
Hardware overview
System architecture
Node architecture
Interconnect architecture

14
Highlights

A 64K-node highly integrated supercomputer based
on system-on-a-chip technology
Two ASICs
Blue Gene/L compute (BLC), Blue Gene/L Link (BLL)
Distributed memory, massively parallel processing
(MPP) architecture.
Use the message passing programming model (MPI).
360 Tflops peak performance
Optimized for cost/performance

15
Design objectives

Objective 1 360-Tflops supercomputer
Earth Simulator (Japan, fastest supercomputer
from 2002 to 2004) 35.86 Tflops
Objective 2 power efficiency
Performance/rack performance/watt watt/rack
Watt/rack is a constant of around 20kW
Performance/watt determines performance/rack

Power efficiency
360Tflops gt 20 megawatts with conventional
processors
Need low-power processor design (2-10 times
better power efficiency)

17
Design objectives (continue)

Objective 3 extreme scalability
Optimized for cost/performance ? use low power,
less powerful processors ? need a lot of
processors
Up to 65536 processors.
Interconnect scalability
Reliability, availability, and serviceability
Application scalability

18
Blue Gene/L system components
19
Blue Gene/L Compute ASIC

2 Power PC440 cores with floating-point
enhancements
700MHz
Everything of a typical superscalar processor
Pipelined microarchitecture with dual instruction
fetch, decode, and out of order issue, out of
order dispatch, out of order execution and out of
order completion, etc
1 W each through extensive power management

20
Blue Gene/L Compute ASIC
21
Memory system on a BGL node

BG/L only supports distributed memory paradigm.
No need for efficient support for cache coherence
on each node.
Coherence enforced by software if needed.
Two cores operate in two modes
Communication coprocessor mode
Need coherence, managed in system level libraries
Virtual node mode
Memory is physical partitioned (not shared).

22
Blue Gene/L networks

Five networks.
100 Mbps Ethernet control network for
diagnostics, debugging, and some other things.
1000 Mbps Ethernet for I/O
Three high-band width, low-latency networks for
data transmission and synchronization.
3-D torus network for point-to-point
communication
Collective network for global operations
Barrier network
All network logic is integrated in the BG/L node
ASIC
Memory mapped interfaces from user space

23
3-D torus network