Today - PowerPoint PPT Presentation

About This Presentation
Title:

Today

Description:

Processors and the Memory Hierarchy. Registers (1 clock cycle, 100s of bytes) ... Linpack performance of 280.6 teraflops (trillions of calculations per second, or ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 19
Provided by: fredann
Learn more at: https://eecs.ceas.uc.edu
Category:
Tags: teraflops | today

less

Transcript and Presenter's Notes

Title: Today


1
Todays topics
  • Single processors and the Memory Hierarchy
  • Busses and Switched Networks
  • Interconnection Network Topologies
  • Multiprocessors
  • Multicomputers
  • Flynns Taxonomy
  • Modern clusters hybrid

2
Processors and the Memory Hierarchy
  • Registers (1 clock cycle, 100s of bytes)
  • 1st level cache (3-5 clock cycles, 100s KBytes)
  • 2nd level cache (10 clock cycles, MBytes)
  • Main memory (100 clock cycles, GBytes)
  • Disk (milliseconds, 100GB to gianormous)

CPU
registers
1st level Instructions
1st level Data
2nd Level unified (Instructions Data)
3
IBM Dual Core
From Intel 64 and IA-32 Architectures
Optimization Reference Manual http//www.intel.com
/design/processor/manuals/248966.pdf
4
Interconnection Network Topologies - Bus
  • Bus
  • A single shared data path
  • Pros
  • Simplicity
  • cache coherence
  • synchronization
  • Cons
  • fixed bandwidth
  • Does not scale well

Global Memory
CPU
CPU
CPU
5
Interconnection Network Topologies Switch based
  • Switch Based
  • mxn switches
  • Many possible topologies
  • Characterized by
  • Diameter
  • Worst case number of switches between two
    processors
  • Impacts latency
  • Bisection width
  • Minimum number of connections that must be
    removed to split the network into two
  • Communication bandwidth limitation
  • Edges per switch
  • Best if this is independent of the size of the
    network

6
Interconnection Network Topologies - Mesh
  • 2-D Mesh
  • 2-D array of processors
  • Torus/Wraparound Mesh
  • Processors on edge of mesh are connected
  • Characteristics (n nodes)
  • Diameter or
  • Bisection width
  • Switch size 4
  • Number of switches n

7
Interconnection Network Topologies - Hypercube
  • Hypercube
  • A d-dimensional hypercube has n2d processors.
  • Each processor directly connected to d other
    processors
  • Shortest path between a pair of processors is at
    most d
  • Characteristics (n2d nodes)
  • Diameter d
  • Bisection width n/2
  • Switch size d
  • Number of switches n

3-D Hypercube
4-D Hypercube
8
Multistage Networks
  • Butterfly
  • Omega
  • Perfect shuffle
  • Characteristics for an Omega network (n2d nodes)
  • Diameter d-1
  • Bisection width n/2
  • Switch size 2
  • Number of switches d?n/2

An 8-input, 8-output Omega network of 2x2
switches
9
Shared Memory
  • One or more memories
  • Global address space (all system memory visible
    to all processors)
  • Transfer of data between processors is usually
    implicit, just read (write) to (from) a given
    address (OpenMP)
  • Cache-coherency protocol to maintain consistency
    between processors.

(UMA) Uniform-memory-access Shared-memory System
Memory
Memory
Memory
Interconnection Network
CPU
CPU
CPU
10
Distributed Shared Memory
  • Single address space with implicit communication
  • Hardware support for read/write to non-local
    memories, cache coherency
  • Latency for a memory operation is greater when
    accessing non local data than when accessing date
    within a CPUs own memory

11
Distributed Memory
  • Each processor has access to its own memory only
  • Data transfer between processors is explicit,
    user calls message passing functions
  • Common Libraries for message passing
  • MPI, PVM
  • User has complete control/responsibility for data
    placement and management

12
Hybrid Systems
  • Distributed memory system with multiprocessor
    shared memory nodes.
  • Most common architecture for current generation
    of parallel machines

Interconnection Network
Network Interface
CPU
CPU
Memory
CPU
13
Flynns Taxonomy (figure 2.20 from Quinn)
Single
Multiple
SISD Uniprocessor SIMD Procesor arrays Pipelined vector processors
MISD Systolic array MIMD Multiprocessors Multicomputers
Single
Instruction stream
Multiple
14
Top 500 List
  • Some highlights from http//www.top500.org/
  • On the new list, the IBM BlueGene/L system,
    installed at DOEs Lawrence Livermore National
    Laboratory (LLNL), retains the No. 1 spot with a
    Linpack performance of 280.6 teraflops (trillions
    of calculations per second, or Tflop/s).
  • The new No. 2 systems is Sandia National
    Laboratories Cray Red Storm supercomputer, only
    the second system ever to be recorded to exceed
    the 100 Tflops/s mark with 101.4 Tflops/s. The
    initial Red Storm system was ranked No. 9 in the
    last listing.
  • Slipping to No. 3 from No. 2 last June is the IBM
    eServer Blue Gene Solution system, installed at
    IBMs Thomas Watson Research Center with 91.20
    Tflops/s Linpack performance.
  • The new No. 5 is the largest system in Europe, an
    IBM JS21 cluster installed at the Barcelona
    Supercomputing Center. The system reached 62.63
    Tflops/s.

15
Linux/Beowulf cluster basics
  • Goal
  • Get super computing processing power at the cost
    of a few PCs
  • How
  • Commodity components PCs and networks
  • Free software with open source

16
CPU nodes
  • A typical configuration
  • Dual socket
  • Dual core AMD or Intel nodes
  • 4 GB memory per node

17
Network Options
From D.K. Pandas Nowlab website at Ohio State,
http//nowlab.cse.ohio-state.edu/ Research
Overview presentation
18
Challenges
  • Cooling
  • Power constraints
  • Reliability
  • System Administration
Write a Comment
User Comments (0)
About PowerShow.com