Computer Science: An Overview - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Computer Science: An Overview

Description:

Simultaneous Multi-Threading ... Hopefully all CPU resources are used by at least one thread at any one time. ... Normally via threads pthreads, OpenMP ... – PowerPoint PPT presentation

Number of Views:1098
Avg rating:3.0/5.0
Slides: 15
Provided by: me6122
Category:

less

Transcript and Presenter's Notes

Title: Computer Science: An Overview


1
  • Computer Science An Overview
  • J. Glenn Brookshear, Glenn Brookshear
  • Addison-Wesley Publishing 7th edition (July 9,
    2002)

2
Parallel Computer Architecture
  • How to use more than 1 CPU to do a job
  • How to connect the CPUs together
  • How to write programs to use the CPUs together

3
ReferenceTexts
  • Far too many just grab any book with Parallel
    and Computer or Architecture in the title.
  • Some classic references
  • Computer Architecture and Parallel Processing
    Hwang Kai, Faye A. Briggs McGraw Hill 1984
  • Computer Architecture A Quantitative Approach
    John L. Hennessy, David A. Patterson, David
    Goldberg Morgan Kaufmann Publishers 3rd edn
    2002
  • Computer Organization and Design The
    Hardware/Software Interface David A. Patterson,
    John L. Hennessy Morgan Kaufmann Publishers 2nd
    edn 1997
  • Any book by Stallings or Tanenbaum
  • Something interesting
  • The Architecture of Symbolic Computers Kogge
    McGraw Hill 1991

4
First Important Concept
  • Amdahls Law
  • In laymans terms using p processors will yield
    a speedup of less than p?, there will always be
    some inefficiencies due to communication
    overheads.
  • Superlinear speedup?

5
Second Important Concept
  • Flynns Taxonomy
  • Single Instruction Single Data (SIMD)
  • Normal 1 CPU machine
  • Multiple Instruction Single Data (MISD)
  • More than one operation performed simultaneously
    on the same data
  • Not very meaningful/useful
  • Single Instruction Multiple Data (SIMD)
  • Same instruction/s on more than one set of data
  • Regular data sets/strides
  • E.g. weather forecasting, image processing etc.
  • Vector machines, massively parallel (MPP) systems
    etc.
  • Multiple Instruction Multiple Data (MIMD)
  • Independent processors performing different
    operations on different data
  • Most high-performance computers today
  • Clusters

6
Vector Machines
  • Vector units to handle more than one data set at
    one time.
  • Consider - AIZBI?CI for some I ?1 to N
  • Normal machine
  • Step 1 A1ZB1 ?C1
  • Step 2 A 2ZB2 ?C2
  • Step N A 2ZB2 ?C2
  • Vector machine with vector length L
  • Step 1 A1..LZB1..L?C1..L
  • Step 2 AL1..2LZBL1..2L ?CL1..2L
  • Step 3 A2L1..3LZB2L1..3L ?C2L1..3L
  • Step M A(M-1)L..N ZB(M-1)L..3L
    ?C(M-1)..3L where MN/L
  • Cray SV series, NEC SX series, Fujitsu VPP series
  • SSE2 on Pentium IV, Altivec on Power PC
  • The Earth Simulator is a cluster of NEC SX-6s

7
MPP Systems
  • Many (in the thousands) small processors
    operating in time step synchronisation on a set
    of data
  • Can be hard to program no known system today
  • CM1 and CM2 from Connection Systems were the
    classic examples
  • The GAAP (geometric array parallel processor)
    favourite of many university labs in the 80s
  • The Prism processor from Digital did not take
    off, but many of its concepts were incorporated
    in the design of the Alpha processor.

8
SMP Systems
  • Symmetric Multiple Processing
  • 2 or more CPUs with (almost) equal access to
    computers resources
  • External cache may be shared or private
  • Connected by bus or crossbar switch
  • Difficult to scale to very large sizes
  • Bus simple to design but cannot scale due to
    bus contention
  • Crossbar switch difficult to design
  • Bus systems - Itanium systems, Alpha Turbolaser
    series, HP V series, UltraSparcII based Suns,
  • Crossbar - Alpha ES series,

9
NUMA Systems
  • Non-Uniform Memory Access
  • Attempts to overcome SMP scaling bottleneck
  • Divide CPUs into smaller SMP blocks (normally
    blocks of 4 CPUs today)
  • Each block has its own memory and IO
  • Access to far memory via crossbar
  • SGI Origin series, HP Superdome, Alpha GS series,
    IBM p series, Sun cat series.
  • How to ensure program stays within same NUMA
    block throughout its lifetime?
  • How to ensure cache coherency among far blocks?

10
Distributed Memory Systems
  • Non-shared blocks of one or more CPUs with
    independent resources.
  • Communication between blocks by explicit message
    passing.
  • Cray T3 series, Transputers, most supercomputers
    today, clusters.

11
Simultaneous Multi-Threading
  • CPU resources are usually not fully utilised
    during normal operations.
  • Provide CPU with duplicate sets of registers
  • Feed 2 or more instruction streams into same CPU
  • Hopefully all CPU resources are used by at least
    one thread at any one time.
  • Cray MTA series (128 threads), Pentium IV Zeons
    (2 threads)

12
Shared Memory Programming
  • Normally via threads pthreads, OpenMP
  • Easiest way is to look for loops and distribute
    work across processors.
  • Compilers can normally identify such loops.
  • Eg. Matrix multiplication
  • for (i0 iltN i)
  • for (j0 jltN j)
  • for (k0 kltN k)
  • cijaikbkj
  • Just add one directive before the i-loop to tell
    the compiler to break the outer loop into smaller
    blocks
  • pragma OMP parallel for
  • for (i0 iltN i)
  • for (j0 jltN j)
  • for (k0 kltN k)
  • cijaikbkj

13
Distributed Memory Programming
  • Break program into independent blocks
  • Master node controls execution and
    synchronisation of code blocks on slave nodes.
  • Slaves may need to synchronise with one another
  • May need to distribute data beforehand.
  • MPI (Message Passing Interface) the de-facto
    standard in distributed memory programming today
  • PVM (Parallel Virtual Machine) still around but
    not as popular as MPI

14
Matrix Multiplication
  • Master Node
  • send A and B to slaves
  • distribute jobs to slaves
  • wait for partial results from slaves
  • - if more jobs send new job to slave
  • - if no more jobs tell slaves to die
  • wait for all salves to complete work
  • exit
  • Slave Node
  • receive A and B from master
  • receive work row/column information from master
    in order to begin work
  • perform computation
  • send result to master
  • wait for signal from master
  • - if more work perform as required
  • - if die signal exit
Write a Comment
User Comments (0)
About PowerShow.com