Shared Memory Consistency Models: A Tutorial Adve - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Shared Memory Consistency Models: A Tutorial Adve

Description:

'Shared Memory Consistency Models: A Tutorial' Adve & Gharachorloo ... Release Consistency ... Maintain sequential consistency among 'special' operations ... – PowerPoint PPT presentation

Number of Views:678

Avg rating:3.0/5.0

Slides: 23

Provided by: robert512

Category:

more less

Transcript and Presenter's Notes

Title: Shared Memory Consistency Models: A Tutorial Adve

1
Shared Memory Consistency Models A Tutorial
Adve Gharachorloo

Robert T. Bauer

2
Shared Memory

Shared memory single address space abstraction
in a multiprocessor environment.

3
Memory Model

Specifics how reads and writes appear to executed
May (usually) varies by level
Programming language can provide a memory model,
for example Java has its own (JMM, JSR 133)
Processor
Memory subsystem

4
Definitions

Sequential (Processor)
Result of an execution is the same as if the
operations had been executed in the order
specified by the program.
Sequentially Consistent (Multiprocessor)
Result of any execution is the same as if the
operations of all the processors were executed in
some sequential order and the operations of each
individual processor appear in the sequence in
the order specified by the program.

5
Uniprocessor
Processor
Memory operations in program order sequential
memory
6
Multiprocessor
Processor
Processor
Sequential Consistency
memory
7
Relaxing Sequential Consistency

Program Order
Write followed by a read to a different location
can be reordered
Write followed by a write to a different location
can be reordered
Read followed by a write to (or read from) a
different location can be reordered
Write Atomicity
Another processors writes can be read even
though the write is not visible to the writing
processor
A processors own writes can be read even though
the writes are not visible to other processors

8
Uniprocessor with Write Buffer
Processor
P1 flag1 1 if(flag2 0) critical section
P2 flag2 1 if(flag1 0) critical section
Write Buffer
memory
9
Multiprocessor with Write Buffer
Processor
Processor
P2 flag2 1 if(flag1 0) critical section
P1 flag1 1 if(flag2 0) critical section
Write Buffer
Write Buffer
memory
10
Memory Barrier

P1
flag1 1
mb()
if(flag2 0)
critical section

P2
flag2 1
mb()
if(flag1 0)
critical section

11
Effect of Memory Barrier
Processor
Processor
P1 flag1 1 mb() if(flag2 0) critical
section
P1 flag1 1 mb() if(flag2 0) critical
section
Write Buffer
Write Buffer
memory
12
Write Through Memory Bus
P1
P1
P1 P2 data 2000 while(head
0) head 1
data
data
Write Through Cache
Write Through Cache
head
Interconnect
P2 sees write to head before seeing write to
data
2
1
Memory head data
Program Order has been relaxed
13
Late Cache Invalidate Signal

P1s writes arrive in-order to memory
The read from data occurs before the
cache-invalidate signal arrives at P2
P2 reads new value of head
P2 reads old value of data from cache
ISSUE
Memory operations need to complete.
Cache-invalidate signal needs to propagate
Write Atomicity has been relaxed

P1
P1
invalidate
data
Write Through Cache
Write Through Cache
head
data
Interconnect
1
2
Memory head data
3
14
Fences
15
Relaxing Write to Read

Reorder read following previous writes
IBM prohibits read from returning the value of a
write before the write is visible to all
processors.
TSO can read own processors write
Cannot read another processors write early (must
be visible to all processors).
Our buffer example is similar in effect
IBM has serialization instruction (so that the
writes propagate and the reads wont be
reordered)
TSO wont be reordered if instruction is RMW
so you can enforce order using a
read-modify-write instruction.

16
Relaxing Write to Read/Write

SPARC PSO
Writes to different locations can be pipelined or
overlapped reach memory or caches out-of-order
PSO identical to TSO, but allows a processor to
read its own writes early
Processors cannot read other processors writes
before they are globally visible
STBAR (store barrier) so writes cant get
reordered

17
Weak Ordering

Data operations (read/writes)
Synchronization operations (fences/barriers)
Model allows
Reordering of operations between synchronization
operations
Each processor ensures that synchronization
instructions are not issued until all previous
operations (data and sync) are complete.
Ensures that writes always appear atomic, so no
fence is required to ensure write atomicity

18
Release Consistency

Acquire read memory operation that gains access
to a set of shared locations
Release a write operation that grants
permission for accessing a set of shared
locations
Two flavors
Maintain sequential consistency among special
operations
Maintain processor consistency among special
operations

19
Release Consistency

RC SC
Acquire ? all, all ? release, special ? special
If acquire appears before any operation, program
order is enforced so that acquire completes
before the following operations.
RC PC
Acquire ? all, all-gtrelease, special ? special,
except for a special write followed by a special
read

20
RC - PC

Program order for read following write requires
using rmw operations, if write being ordered is
ordinary then the write in the rmw needs to be
a release

21
Just to make it more complicated

Alpha
mb enforce program order between any statements
wmb only enforce program order among write
statements
RMO
(LD ST) (LD ST)
LDSTLD means that load and store operations
before the barrier must be completed before any
load operation after the barrier. Store
operations after the barrier may be reordered
before the barrier.
Power
SYNC like alphas mb, except that when placed
between two reads to the same location, the
second read may go first.
Power allows writes to be seen early
RMW sequences are used to make writes appear
atomic

22
Discussion/Conclusion

System-centric directly expose ordering and
write atomicity relaxations. Complicated,
difficult to port.
Programmer-centric Programmer provides
information to determine what optimizations can
be performed (when reading/writing particular
variables). Compiler complexity increased.
Debugging more difficult
Relaxed memory models have proven to be effective
in increasing performance the cost of this
higher performance is greater complexity.