Parallel Programming Models - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Parallel Programming Models

Description:

Compiler figures out details. Automatic Parallelization ... stick to standards. high-level, implicit parallel ... We stick to standards for modern architectures ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 29
Provided by: barbara179
Category:

less

Transcript and Presenter's Notes

Title: Parallel Programming Models


1
Parallel Programming Models
  • Overview

2
Parallel Programming Models
  • Historically, programming models were designed
    for a given class of architectures
  • vector computers and vector code
  • SIMD computers and array operations
  • distributed memory computers and message passing
  • shared memory and threads

3
Parallel Programming Models
  • Idea is to make it easy for programmer to get
    performance on that architecture
  • Include primitives that are natural for the
    architecture
  • Programmer should consider how to best use the
    primitives in code

4
Parallel Programming Models
  • But in order to execute code on new architecture
    it must be rewritten
  • Some programming models were written for just one
    vendors machines!
  • Fortunately, industry standards are now widely
    available
  • Pthreads, OpenMP
  • MPI

Shared memory
5
Parallel Programming Models
  • Standards developed for programming a class of
    machines
  • Some adapted for more than one class of machines
  • MPI has been most successful in this respect
  • OpenMP now also on different kinds of platforms

6
Threads
  • Process has its own address space
  • A process may be executed by a team of threads
  • A thread shares its address space with other
    threads in same team
  • But thread stack provides space for data local
    (private) to thread
  • Threads used for shared memory parallel
    programming and multitasking

7
Threads
  • One thread per processor for shared memory
    parallel programming
  • One thread per task for time slicing (possibly on
    single processor)

This is our focus
8
How Can We Exploit Threads?
  • A thread programming model must provide (at
    least) the means to
  • Create and destroy threads
  • Distribute the computation among threads
  • Coordinate actions of threads on shared data
  • Name threads
  • (usually) specify which data is shared and which
    is private to a thread

9
Parallel Programming Models
  • Low-level parallel programming
  • Programmer must describe parallelism explicitly
  • Data and computation to be performed on each
    processor specified exactly by coder
  • For shared memory, create threads and specify all
    details of their work and their interactions

10
Parallel Programming Models
  • High-level parallel programming
  • Programmer describes parallelism implicitly
  • Details of data and computation to be performed
    on each processor determined by compiler
  • Compiler creates threads and determines details
    of their work and interactions

11
Parallel Programming Models
  • Low-level parallel programming models
  • pthreads for shared memory, MPI for distributed
    memory
  • High-level parallel programming
  • OpenMP for shared memory, HPF for distributed
    memory

12
How Does OpenMP Enable Us to Exploit Threads?
  • OpenMP provides thread programming model at a
    high level.
  • The user does not need to specify all the details
  • Especially with respect to the assignment of work
    to threads
  • Creation of threads
  • User makes strategic decisions
  • Compiler figures out details

13
Automatic Parallelization
  • Many compilers now have an automatic
    parallelization option for shared memory
    platforms
  • Idea is that compiler detects dependences,
    constructs parallel threads that respect them
  • Works well on very simple programs
  • But very hard to do on real programs
  • Dynamic improvement (run-time compiling) may help

14
Memory Models
  • Main difference in modern SMP architectures
  • Uniform memory access (UMA) same cost of access
    from any processor
  • realized in physically (true) shared memory
  • Non-uniform memory access (NUMA) different cost
    of access from different processors
  • true for physically distributed memory, including
    distributed shared memory also for some SMPs

15
Models of Memory Management
  • Symmetric shared memory memory is shared, same
    cost of access from any processor/core
  • Non-symmetric shared memory memory is shared,
    different cost of access from different
    processors/cores
  • Distributed shared memory memory is shared,
    different cost of access from different
    processors
  • Distributed memory memory is distributed,
    different cost of access from different processors

16
Shared Memory
  • This means that a variable x, a pointer p, or an
    array a refer to the same object, no matter
    what processor the reference originates from
  • Each processor can access a variable in the same
    amount of time as any other
  • Actually, this second statement is not true on
    some major platforms today, even for some with
    just a few processors
  • We will later discuss programming techniques that
    take access time differences into account

17
Shared Memory
  • All threads access same data space

Shared memory space
a
proc1
proc2
proc3
procN
18
More Realistic View of Shared Memory Architecture

Shared memory
a
cache1
cache2
cache3
cacheN
proc1
proc2
proc3
procN
a
19
Cache in Shared Memory
  • Copies of shared data are held in local cache
  • Or even in registers
  • Without extra effort, there may be
    inconsistencies
  • Thread 1 updates variable a
  • Thread 2 needs to use it
  • If thread 1 has not written a back to main
    memory, thread 2 will use a stale value
  • This is the memory consistency problem

20
Distributed Memory
  • It is no longer the case that a variable a, a
    pointer p, or an array a refer to the same
    location, independent of the processor a process
    executes on
  • It can be slow for a process on one processor to
    access data stored in memory associated with a
    different processor

21
Distributed Memory
a
a
a
a
proc1
proc2
proc3
procN
network
22
Distributed Shared Memory
  • A variable a, a pointer p, or an array a refers
    to the same location, independent of the
    processor a process executes on
  • It can be slow for a process on one processor to
    access data stored in memory associated with a
    different processor

23
Distributed Shared Memory

mem2
mem3
memN
mem1
cache2
cache1
cache4
cache3
proc1
proc2
proc3
procN
24
Important Note
  • Software Distributed Shared Memory can provide
    the illusion of shared memory on a distributed
    memory machine
  • No matter what the implementation, it
    conceptually looks like shared memory
  • There may be some very large performance
    differences

25
Programming vs. Hardware
  • One can implement a shared memory programming
    model
  • on shared or distributed memory hardware
  • (in software or in hardware)
  • One can implement a message passing programming
    model
  • on shared or distributed memory hardware
  • There may be large performance differences

26
Portability of programming models
shared memory programming
distributed memory programming
distr. memory machine
shared memory machine
27
Programming Models
  • We look at several programming models
  • stick to standards
  • high-level, implicit parallel programming
  • low-level, explicit parallel programming
  • Goal understand the model
  • Get experience in its usage

28
Summary
  • Different kinds of architectural parallelism and
    memory organization have led to different
    programming models
  • We stick to standards for modern architectures
  • There are several ways to find parallelism in a
    code
  • Programmer has to decide which way is best for a
    program
  • We discuss this soon, but first we get started
    with the API
Write a Comment
User Comments (0)
About PowerShow.com