ECE 1747: Parallel Programming - PowerPoint PPT Presentation

About This Presentation
Title:

ECE 1747: Parallel Programming

Description:

Cache miss if p2/p3 access X. Valid data from other cache. Distributed Shared Memory (DSM) ... SC implementation. Synchronous read/write ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 32
Provided by: CITI
Category:
Tags: ece | miss | parallel | programming | sc

less

Transcript and Presenter's Notes

Title: ECE 1747: Parallel Programming


1
ECE 1747 Parallel Programming
  • Distributed Shared Memory
  • (DSM)

2
Multiprocessor (SMP)
proc1
proc3
proc2
X0
X0
X0
X0
3
Consistency Models
  • Sequential Consistency
  • All processors observe the same order
  • Must correspond to some serial order
  • Only ordering constraint is that reads/writes of
    P1 appear in the same order, but no restrictions
    on relative ordering between processors.

4
Common consistency protocols
  • Write update
  • Multicast update to all replicas
  • Write invalidate
  • Invalidate cached copies in p2, p3
  • Cache miss if p2/p3 access X
  • Valid data from other cache

5
Distributed Shared Memory (DSM)
shared memory
network
mem0
mem1
mem2
memN
...
proc0
proc1
proc2
procN
6
DSM programming
  • Standard pthread-like
  • synchronizations
  • Barriers
  • Locks
  • Semaphores

7
Sequential SOR
  • for some number of timesteps/iterations
  • for (i0 iltn i )
  • for( j1, jltn, j )
  • tempij 0.25
  • ( gridi-1j gridi1j
  • gridij-1 gridij1 )
  • for( i0 iltn i )
  • for( j1 jltn j )
  • gridij tempij

8
Parallel SOR with Barriers (1 of 2)
  • void sor (void arg)
  • int slice (int)arg
  • int from (slice (n-1))/p 1
  • int to ((slice1) (n-1))/p 1
  • for some number of iterations

9
Parallel SOR with Barriers (2 of 2)
  • for (ifrom iltto i)
  • for (j1 jltn j)
  • tempij 0.25 (gridi-1j
    gridi1j gridij-1 gridij1)
  • barrier()
  • for (ifrom iltto i)
  • for (j1 jltn j)
  • gridijtempij
  • barrier()

10
Sequential Consistency DSM
  • As proposed by Li Hudak, TOCS 86.
  • Use virtual memory to implement sharing.
  • Shared memory divided up by virtual memory pages.
  • Use an SMP-like coherence protocol.
  • Keep pages in one of three states
  • invalid, read-only, read-write

11
SC implementation
  • Synchronous read/write
  • Writes must be propagated before moving on to the
    next operation

12
Read-Write False Sharing
x
y
13
Read-Write False Sharing (Cont.)
w(x)
w(x)
w(x)
r(x)
r(y)
r(y)
14
Read-Write False Sharing (Cont.)
w(x)
w(x)
w(x)
r(x)
r(y)
r(y)
synch
15
Weak Consistency (WEAKC)
  • Data modifications are only propagated at the
    time of synchronization.
  • Works fine if program is properly synchronized
    through system primitives.
  • All programs should be

16
Read-Write False Sharing (Before)
w(x)
w(x)
w(x)
r(x)
r(y)
r(y)
synch
17
Read-Write False Sharing (WEAKC)
w(x)
w(x)
r(y)
r(y)
r(x)
synch
18
Write-Write False Sharing
x
y
19
Write-Write False Sharing
w(x)
w(x)
w(x)
r(x)
w(y)
w(y)
synch
20
Write-Write False Sharing (WEAKC)
w(x)
w(x)
w(x)
w(y)
r(x)
w(y)
synch
21
Multiple Writer (MW) Protocols
  • Allows multiple writers per page.
  • Modifications merged at synchronization
    (according to weakc definition).
  • Modifications are recorded through a mechanism
    called twinning and diffing.

22
Write-Write False Sharing and MW
w(x)
w(x)
w(x)
w(y)
w(y)
r(x)
synch
23
Creating a diff (delta)
Diff (delta)
twin
w(x)
...
w(x)
write- protected
write- protected
writable
24
Write-Write False Sharing and MW
x
synch
twin
w(x)
w(x)
w(x)
x
w(y)
w(y)
r(x)
x
twin
y
y
25
Release Consistency (RC)
  • Distinguish acquires from releases
  • Ordinary read/write wait until the previous
    acquire is performed
  • Release waits until previous read/write are
    performed
  • Acquire/release are sequentially consistent
    w.r.t. one another

26
Eager Lazy Release Consistency
  • Eager release consistency transfer consistency
    information at release of a lock.
  • Lazy release consistency transfer consistency
    information at acquire of a lock.

27
Eager Release Consistency
w(x) rel
p1
acq w(x) rel
p2
Acq w(x) rel
p3
acq r(x)
p4
28
Lazy Release Consistency
w(x) rel
p1
acq w(x) rel
p2
Acq w(x) rel
p3
acq r(x)
p4
29
Lazy Release Consistency
  • Acquiring processor determines witch
    modifications it needs to see.

w(x) rel
p1
acq w(y) rel
p2
acq r(x) r(y)
p3
synch
30
Vector Timestamps
1 0 0
0 0 0
w(x) rel
p1
1 1 0
acq w(y) rel
0 0 0
p2
acq r(x) r(y)
p3
0 0 0
31
DSM Summary
  • Relaxed consistency
  • applications definition of correctness
  • gt70 performance of corresponding message passing
    applications
Write a Comment
User Comments (0)
About PowerShow.com