Blue GeneL system architecture - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Blue GeneL system architecture

Description:

Blue Gene/L system architecture. Overview of the IBM Blue Gene/L ... Performance/watt determines performance/rack ... per watt metric. Scalability ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 23
Provided by: xiny2
Category:

less

Transcript and Presenter's Notes

Title: Blue GeneL system architecture


1
Blue Gene/L system architecture
2
Overview of the IBM Blue Gene/L System
Architecture
  • Design objectives
  • Design approach
  • Hardware overview
  • System architecture
  • Node architecture
  • Interconnect architecture

3
Highlights
  • A 64K-node highly integrated supercomputer based
    on system-on-a-chip technology
  • Two ASICs
  • Blue Gene/L compute (BLC), Blue Gene/L Link (BLL)
  • Distributed memory, massively parallel processing
    (MPP) architecture.
  • Use the message passing programming model (MPI).
  • 360 Tflops peak performance
  • Optimized for cost/performance

4
Design objectives
  • Objective 1 360-Tflops supercomputer
  • Earth Simulator (Japan, fastest supercomputer
    from 2002 to 2004) 35.86 Tflops
  • Objective 2 power efficiency
  • Performance/rack performance/watt watt/rack
  • Watt/rack is a constant of around 20kW
  • Performance/watt determines performance/rack

5
  • Power efficiency
  • 360Tflops gt 20 megawatts with conventional
    processors
  • Need low-power processor design (2-10 times
    better power efficiency)

6
Design objectives (continue)
  • Objective 3 extreme scalability
  • Optimized for cost/performance ? use low power,
    less powerful processors ? need a lot of
    processors
  • Up to 65536 processors.
  • Interconnect scalability
  • Reliability, availability, and serviceability
  • Application scalability

7
Application-based design approach
  • Limit the type of applications to improve
    scalability and cost/performance ratio.
  • Which applications?
  • Applications in national labs (lawrence
    livermore, Los Alamos, Sandia)
  • Simulations of physical phenomena
  • Real-time data processing
  • Offline data analysis

8
Application scalability issue
  • Two types of applications
  • Strong scaling fixed problem size.
  • Data on each node decreases as the number of
    nodes increases
  • Weak scaling fixed the data size on each node.
  • Problem size increases as the number of node
    increases.
  • Most applications from national labs are weak
    scaling applications while commercial HPC
    applications tend to be strong scaling.

9
Application scalability issue
  • Strong scaling fixed problem size
  • Amdahls law
  • Communication to computation ratio
  • Load balancing
  • Small messages
  • Global communication dominates
  • Memory footprint
  • File I/O

10
Application scalability issue
  • Weak scaling
  • Amdahls law
  • Problem segmentation limits
  • Load balancing
  • Global communication dominates
  • Memory footprint
  • File I/O

11
Application scalability issue
  • Amdahls law usually not a problem for the
    applications considered.
  • Problem segmentation limits/communication-to-compu
    tation ratio determined by application, cant do
    much about it.
  • Load balancing
  • Major limit for both types of applications.
  • Cannot do much about it.
  • Global communication dominates
  • Major limit for both types of applications
  • Calls for efficient hardware support for global
    communication

12
Application scalability issue
  • Small messages
  • Calls for efficient support for small messages.
  • Memory footprint determines how much memory to
    be put in each node
  • File I/O need parallel I/O support.

13
Blue Gene/L system components
14
Blue Gene/L Compute ASIC
  • 2 Power PC440 cores with floating-point
    enhancements
  • 700MHz
  • Everything of a typical superscalar processor
  • Pipelined microarchitecture with dual instruction
    fetch, decode, and out of order issue, out of
    order dispatch, out of order execution and out of
    order completion, etc
  • 1 W each through extensive power management

15
Blue Gene/L Compute ASIC
16
Memory system on a BGL node
  • BG/L only supports distributed memory paradigm.
  • No need for efficient support for cache coherence
    on each node.
  • Coherence enforced by software if needed.
  • Two cores operate in two modes
  • Communication coprocessor mode
  • Need coherence, managed in system level libraries
  • Virtual node mode
  • Memory is physical partitioned (not shared).

17
Blue Gene/L networks
  • Five networks.
  • 100 Mbps Ethernet control network for
    diagnostics, debugging, and some other things.
  • 1000 Mbps Ethernet for I/O
  • Three high-band width, low-latency networks for
    data transmission and synchronization.
  • 3-D torus network for point-to-point
    communication
  • Collective network for global operations
  • Barrier network
  • All network logic is integrated in the BG/L node
    ASIC
  • Memory mapped interfaces from user space

18
3-D torus network
  • Support p2p communication
  • Link bandwidth 1.4Gb/s, 6 bidirectional link per
    node (1.2GB/s).
  • 64x32x32 torus diameter 32161664 hops, worst
    case hardware latency 6.4us.
  • Cut-through routing
  • Adaptive routing

19
Collective network
  • Binary tree topology, static routing
  • Link bandwidth 2.8Gb/s
  • Maximum hardware latency 5us
  • With arithmetic and logical hardware can perform
    integer operation on the data
  • Efficient support for reduce, scan, global sum,
    and broadcast operations
  • Floating point operation can be done with 2
    passes.

20
Barrier network
  • Hardware support for global synchronization.
  • 1.5us for barrier on 64K nodes.

21
System level reliability, availability, and
servicibility
  • IBM is strong in this area.
  • Simplicity
  • Can isolate and replace the failing node
  • In the unit of 512 node (8x8x8).
  • Lots of redundancy
  • Flexible partitioning for availability

22
Conclusion
  • Optimize cost/performance
  • limiting applications.
  • Use low power design
  • Lower frequency, system-on-a-chip
  • Great performance per watt metric
  • Scalability support
  • Hardware support for global communication and
    barrier
  • Low latency, high bandwidth support
  • Simplicity for reliability, availability,
    serviceability.
Write a Comment
User Comments (0)
About PowerShow.com