Introduction to Parallel Computing - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Introduction to Parallel Computing

Description:

A Parallel machines and architectures. B ... sequential program to emulate processors, for ... Systems can emulate shared address space, ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 23
Provided by: nus5
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Parallel Computing


1
Introduction to Parallel Computing
  • E. van den Berg

2
Overview
  • Part I
  • A Parallel machines and architectures
  • B Communication
  • Part II
  • A Task and data partitioning
  • B Programming examples

3
Introduction to Parallel Computing
  • Part Ia

4
Why compute in parallel?
  • Speed of processor limited
  • Laws of physics
  • Speed of light
  • Quantum effects
  • Size of chip
  • Solution use more processors in parallel

5
Definitions (1)
  • T1 is defined as the time required by the fastest
    possible sequential implementation for solving a
    problem.
  • Tp is defined as the time required by a parallel
    implementation to solve the same problem, using p
    processors.

6
Definitions (2)
  • Speedup T1 / Tp
  • Efficiency Speedup / p T1 / (pTp)
  • Speedup ? p
  • Efficiency ? 1
  • Proof Suppose speedup gt p, then we have T1 gt Tp
    p.
  • We can write sequential program to emulate
    processors, for
  • which the total time required to run is less than
    T1, which is
  • a contradiction to our initial assumption,
    therefore speedup
  • always ? p.
    ?

7
Speedup and efficiency
  • In general speedup lt p, and efficiency lt 1
  • Initialisation
  • Communication of data
  • Unbalanced workload
  • Inherently sequential sections
  • Additional computations
  • Maximum theoretically achievable speedup depends
    on problem

8
Parallel Architectures
  • There are three basic architectures based on
  • the location of memory
  • Data-parallel
  • Shared memory
  • Distributed memory

9
Data-Parallel
Instructions
c10 if (c gt 20) a 20 else a c
Master Processor
P1
P2
P3
Pn
M1
M2
M3
Mn
10
Shared Memory
Memory
Interconnection Network
P1
P2
P3
Pn
11
Distributed Memory
Interconnection Network
P1
P2
P3
Pn
M1
M2
M3
Mn
12
Memory Access Times
  • Shared memory
  • Uniform access time
  • Distributed memory
  • Local memory access is fast
  • Remote memory access is (much) slower

13
Shared Memory
Memory
Interconnection Network
P1
P2
P3
Pn
14
Finding maximum value
  • 1. (M) Set global maximum to infinity
  • 2. (M) Assign each slave a subset of values
  • 3. (S) Parallel find local maximum in assigned
    set
  • 4. (S) If local maximum gt global maximum
  • set global maximum to local maximum
  • Problem What happens if processors read or
    write
  • the same variable at the same time?.

15
Memory Models
  • For memory access there are four models
  • EREW, Exclusive Read Exclusive Write
  • CREW, Concurrent Read Exclusive Write
  • CRCW, Concurrent Read Concurrent Write
  • ERCW, Exclusive Read Concurrent Write
  • EREW is most restricted, CRCW least. The
  • ERCW model is unrealistic and not used.

16
Concurrent Write (1)
  • Concurrent write means that more than one
  • processor can write to memory at the same
  • time. What should be do when multiple
  • processors write to the same memory address?

17
Concurrent Write (2)
  • Different methods can be used when there
  • are multiple writes to the same address
  • Only write when values are identical
  • Write the sum of the values
  • Assign each processor a priority rank, and use
    value of highest ranked processor
  • Select one of the values at random

18
Cache (1)
  • Cache is a small and fast memory that is
  • located on the processor chip. It is added to
  • processors to reduce memory delays.

CPU
Main Memory
19
Cache (2)
  • When there is more than one processor, and
  • thus, more than one cache, the problem of
  • cache coherence arises. When two processors
  • write different values to the same memory
  • address. Both caches will contain the different
  • values, while memory can hold only either
  • one of them. Several techniques are available
  • to reduce the problem.

20
Network of PCs
  • Specialised hardware that supports multiple CPUs
    is very expensive.
  • Single processor off-shelf PCs are cheap
  • Connect computers by a fast network to get
    equivalent performance, more cost effective
  • Operating Systems can emulate shared address
    space, if needed
  • Data coherence is an important issue

21
Idealised Machine
  • Parallel Random Access Memory (PRAM)
  • model is an idealised machine. Each processor
  • is assumed to share the same clock-cycle and
  • each instruction requires a constant time of
  • one unit to execute. Furthermore, memory
  • access is uniform.
  • Note The most widely used models are the
    PRAM-CREW
  • and PRAM-CRCW.

22
Questions?
Write a Comment
User Comments (0)
About PowerShow.com