Blue GeneL system architecture

About This Presentation

Title:

Blue GeneL system architecture

Description:

Blue Gene/L system architecture. Overview of the IBM Blue Gene/L ... Performance/watt determines performance/rack ... per watt metric. Scalability ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 23

Provided by: xiny2

Category:

more less

Transcript and Presenter's Notes

Title: Blue GeneL system architecture

1
Blue Gene/L system architecture
2
Overview of the IBM Blue Gene/L System
Architecture

Design objectives
Design approach
Hardware overview
System architecture
Node architecture
Interconnect architecture

3
Highlights

A 64K-node highly integrated supercomputer based
on system-on-a-chip technology
Two ASICs
Blue Gene/L compute (BLC), Blue Gene/L Link (BLL)
Distributed memory, massively parallel processing
(MPP) architecture.
Use the message passing programming model (MPI).
360 Tflops peak performance
Optimized for cost/performance

4
Design objectives

Objective 1 360-Tflops supercomputer
Earth Simulator (Japan, fastest supercomputer
from 2002 to 2004) 35.86 Tflops
Objective 2 power efficiency
Performance/rack performance/watt watt/rack
Watt/rack is a constant of around 20kW
Performance/watt determines performance/rack

Power efficiency
360Tflops gt 20 megawatts with conventional
processors
Need low-power processor design (2-10 times
better power efficiency)

6
Design objectives (continue)

Objective 3 extreme scalability
Optimized for cost/performance ? use low power,
less powerful processors ? need a lot of
processors
Up to 65536 processors.
Interconnect scalability
Reliability, availability, and serviceability
Application scalability

7
Application-based design approach

Limit the type of applications to improve
scalability and cost/performance ratio.
Which applications?
Applications in national labs (lawrence
livermore, Los Alamos, Sandia)
Simulations of physical phenomena
Real-time data processing
Offline data analysis

8
Application scalability issue

Two types of applications
Strong scaling fixed problem size.
Data on each node decreases as the number of
nodes increases
Weak scaling fixed the data size on each node.
Problem size increases as the number of node
increases.
Most applications from national labs are weak
scaling applications while commercial HPC
applications tend to be strong scaling.

9
Application scalability issue

Strong scaling fixed problem size
Amdahls law
Communication to computation ratio
Load balancing
Small messages
Global communication dominates
Memory footprint
File I/O

10
Application scalability issue

Weak scaling
Amdahls law
Problem segmentation limits
Load balancing
Global communication dominates
Memory footprint
File I/O

11
Application scalability issue

Amdahls law usually not a problem for the
applications considered.
Problem segmentation limits/communication-to-compu
tation ratio determined by application, cant do
much about it.
Load balancing
Major limit for both types of applications.
Cannot do much about it.
Global communication dominates
Major limit for both types of applications
Calls for efficient hardware support for global
communication

12
Application scalability issue

Small messages
Calls for efficient support for small messages.
Memory footprint determines how much memory to
be put in each node
File I/O need parallel I/O support.

13
Blue Gene/L system components
14
Blue Gene/L Compute ASIC

2 Power PC440 cores with floating-point
enhancements
700MHz
Everything of a typical superscalar processor
Pipelined microarchitecture with dual instruction
fetch, decode, and out of order issue, out of
order dispatch, out of order execution and out of
order completion, etc
1 W each through extensive power management

15
Blue Gene/L Compute ASIC
16
Memory system on a BGL node

BG/L only supports distributed memory paradigm.
No need for efficient support for cache coherence
on each node.
Coherence enforced by software if needed.
Two cores operate in two modes
Communication coprocessor mode
Need coherence, managed in system level libraries
Virtual node mode
Memory is physical partitioned (not shared).

17
Blue Gene/L networks

Five networks.
100 Mbps Ethernet control network for
diagnostics, debugging, and some other things.
1000 Mbps Ethernet for I/O
Three high-band width, low-latency networks for
data transmission and synchronization.
3-D torus network for point-to-point
communication
Collective network for global operations
Barrier network
All network logic is integrated in the BG/L node
ASIC
Memory mapped interfaces from user space

18
3-D torus network

Support p2p communication
Link bandwidth 1.4Gb/s, 6 bidirectional link per
node (1.2GB/s).
64x32x32 torus diameter 32161664 hops, worst
case hardware latency 6.4us.
Cut-through routing
Adaptive routing

19
Collective network

Binary tree topology, static routing
Link bandwidth 2.8Gb/s
Maximum hardware latency 5us
With arithmetic and logical hardware can perform
integer operation on the data
Efficient support for reduce, scan, global sum,
and broadcast operations
Floating point operation can be done with 2
passes.

20
Barrier network

Hardware support for global synchronization.
1.5us for barrier on 64K nodes.

21
System level reliability, availability, and
servicibility

IBM is strong in this area.
Simplicity
Can isolate and replace the failing node
In the unit of 512 node (8x8x8).
Lots of redundancy
Flexible partitioning for availability

22
Conclusion

Optimize cost/performance
limiting applications.
Use low power design
Lower frequency, system-on-a-chip
Great performance per watt metric
Scalability support
Hardware support for global communication and
barrier
Low latency, high bandwidth support
Simplicity for reliability, availability,
serviceability.

Write a Comment

User Comments (0)

About PowerShow.com

Blue GeneL system architecture - PowerPoint PPT Presentation

Blue GeneL system architecture

Blue Gene/L system architecture. Overview of the IBM Blue Gene/L ... Performance/watt determines performance/rack ... per watt metric. Scalability ... – PowerPoint PPT presentation