Today

About This Presentation

Title:

Today

Description:

Processors and the Memory Hierarchy. Registers (1 clock cycle, 100s of bytes) ... Linpack performance of 280.6 teraflops (trillions of calculations per second, or ... – PowerPoint PPT presentation

Number of Views:13

Avg rating:3.0/5.0

Slides: 19

Provided by: fredann

Learn more at: https://eecs.ceas.uc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Today

1
Todays topics

Single processors and the Memory Hierarchy
Busses and Switched Networks
Interconnection Network Topologies
Multiprocessors
Multicomputers
Flynns Taxonomy
Modern clusters hybrid

2
Processors and the Memory Hierarchy

Registers (1 clock cycle, 100s of bytes)
1st level cache (3-5 clock cycles, 100s KBytes)
2nd level cache (10 clock cycles, MBytes)
Main memory (100 clock cycles, GBytes)
Disk (milliseconds, 100GB to gianormous)

CPU
registers
1st level Instructions
1st level Data
2nd Level unified (Instructions Data)
3
IBM Dual Core
From Intel 64 and IA-32 Architectures
Optimization Reference Manual http//www.intel.com
/design/processor/manuals/248966.pdf
4
Interconnection Network Topologies - Bus

Bus
A single shared data path
Pros
Simplicity
cache coherence
synchronization
Cons
fixed bandwidth
Does not scale well

Global Memory
CPU
CPU
CPU
5
Interconnection Network Topologies Switch based

Switch Based
mxn switches
Many possible topologies
Characterized by
Diameter
Worst case number of switches between two
processors
Impacts latency
Bisection width
Minimum number of connections that must be
removed to split the network into two
Communication bandwidth limitation
Edges per switch
Best if this is independent of the size of the
network

6
Interconnection Network Topologies - Mesh

2-D Mesh
2-D array of processors
Torus/Wraparound Mesh
Processors on edge of mesh are connected
Characteristics (n nodes)
Diameter or
Bisection width
Switch size 4
Number of switches n

7
Interconnection Network Topologies - Hypercube

Hypercube
A d-dimensional hypercube has n2d processors.
Each processor directly connected to d other
processors
Shortest path between a pair of processors is at
most d
Characteristics (n2d nodes)
Diameter d
Bisection width n/2
Switch size d
Number of switches n

3-D Hypercube
4-D Hypercube
8
Multistage Networks

Butterfly
Omega
Perfect shuffle

Characteristics for an Omega network (n2d nodes)
Diameter d-1
Bisection width n/2
Switch size 2
Number of switches d?n/2

An 8-input, 8-output Omega network of 2x2
switches
9
Shared Memory

One or more memories
Global address space (all system memory visible
to all processors)
Transfer of data between processors is usually
implicit, just read (write) to (from) a given
address (OpenMP)
Cache-coherency protocol to maintain consistency
between processors.

(UMA) Uniform-memory-access Shared-memory System
Memory
Memory
Memory
Interconnection Network
CPU
CPU
CPU
10
Distributed Shared Memory

Single address space with implicit communication
Hardware support for read/write to non-local
memories, cache coherency
Latency for a memory operation is greater when
accessing non local data than when accessing date
within a CPUs own memory

11
Distributed Memory

Each processor has access to its own memory only
Data transfer between processors is explicit,
user calls message passing functions
Common Libraries for message passing
MPI, PVM
User has complete control/responsibility for data
placement and management

12
Hybrid Systems

Distributed memory system with multiprocessor
shared memory nodes.
Most common architecture for current generation
of parallel machines

Interconnection Network
Network Interface
CPU
CPU
Memory
CPU
13
Flynns Taxonomy (figure 2.20 from Quinn)
Single
Multiple
SISD Uniprocessor SIMD Procesor arrays Pipelined vector processors
MISD Systolic array MIMD Multiprocessors Multicomputers
Single
Instruction stream
Multiple
14
Top 500 List

Some highlights from http//www.top500.org/
On the new list, the IBM BlueGene/L system,
installed at DOEs Lawrence Livermore National
Laboratory (LLNL), retains the No. 1 spot with a
Linpack performance of 280.6 teraflops (trillions
of calculations per second, or Tflop/s).
The new No. 2 systems is Sandia National
Laboratories Cray Red Storm supercomputer, only
the second system ever to be recorded to exceed
the 100 Tflops/s mark with 101.4 Tflops/s. The
initial Red Storm system was ranked No. 9 in the
last listing.
Slipping to No. 3 from No. 2 last June is the IBM
eServer Blue Gene Solution system, installed at
IBMs Thomas Watson Research Center with 91.20
Tflops/s Linpack performance.
The new No. 5 is the largest system in Europe, an
IBM JS21 cluster installed at the Barcelona
Supercomputing Center. The system reached 62.63
Tflops/s.