Introduction to Parallel Computing - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Introduction to Parallel Computing

Description:

A Parallel machines and architectures. B ... sequential program to emulate processors, for ... Systems can emulate shared address space, ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 23

Provided by: nus5

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Parallel Computing

1
Introduction to Parallel Computing

E. van den Berg

2
Overview

Part I
A Parallel machines and architectures
B Communication
Part II
A Task and data partitioning
B Programming examples

3
Introduction to Parallel Computing

Part Ia

4
Why compute in parallel?

Speed of processor limited
Laws of physics
Speed of light
Quantum effects
Size of chip
Solution use more processors in parallel

5
Definitions (1)

T1 is defined as the time required by the fastest
possible sequential implementation for solving a
problem.
Tp is defined as the time required by a parallel
implementation to solve the same problem, using p
processors.

6
Definitions (2)

Speedup T1 / Tp
Efficiency Speedup / p T1 / (pTp)
Speedup ? p
Efficiency ? 1
Proof Suppose speedup gt p, then we have T1 gt Tp
p.
We can write sequential program to emulate
processors, for
which the total time required to run is less than
T1, which is
a contradiction to our initial assumption,
therefore speedup
always ? p.
?

7
Speedup and efficiency

In general speedup lt p, and efficiency lt 1
Initialisation
Communication of data
Unbalanced workload
Inherently sequential sections
Additional computations
Maximum theoretically achievable speedup depends
on problem

8
Parallel Architectures

There are three basic architectures based on
the location of memory
Data-parallel
Shared memory
Distributed memory

9
Data-Parallel
Instructions
c10 if (c gt 20) a 20 else a c
Master Processor
P1
P2
P3
Pn
M1
M2
M3
Mn
10
Shared Memory
Memory
Interconnection Network
P1
P2
P3
Pn
11
Distributed Memory
Interconnection Network
P1
P2
P3
Pn
M1
M2
M3
Mn
12
Memory Access Times

Shared memory
Uniform access time
Distributed memory
Local memory access is fast
Remote memory access is (much) slower

13
Shared Memory
Memory
Interconnection Network
P1
P2
P3
Pn
14
Finding maximum value

1. (M) Set global maximum to infinity
2. (M) Assign each slave a subset of values
3. (S) Parallel find local maximum in assigned
set
4. (S) If local maximum gt global maximum
set global maximum to local maximum
Problem What happens if processors read or
write
the same variable at the same time?.

15
Memory Models

For memory access there are four models
EREW, Exclusive Read Exclusive Write
CREW, Concurrent Read Exclusive Write
CRCW, Concurrent Read Concurrent Write
ERCW, Exclusive Read Concurrent Write
EREW is most restricted, CRCW least. The
ERCW model is unrealistic and not used.

16
Concurrent Write (1)

Concurrent write means that more than one
processor can write to memory at the same
time. What should be do when multiple
processors write to the same memory address?

17
Concurrent Write (2)

Different methods can be used when there
are multiple writes to the same address
Only write when values are identical
Write the sum of the values
Assign each processor a priority rank, and use
value of highest ranked processor
Select one of the values at random

18
Cache (1)

Cache is a small and fast memory that is
located on the processor chip. It is added to
processors to reduce memory delays.

CPU
Main Memory
19
Cache (2)

When there is more than one processor, and
thus, more than one cache, the problem of
cache coherence arises. When two processors
write different values to the same memory
address. Both caches will contain the different
values, while memory can hold only either
one of them. Several techniques are available
to reduce the problem.

20
Network of PCs

Specialised hardware that supports multiple CPUs
is very expensive.
Single processor off-shelf PCs are cheap
Connect computers by a fast network to get
equivalent performance, more cost effective
Operating Systems can emulate shared address
space, if needed
Data coherence is an important issue

21
Idealised Machine