Title: Shared Memory Consistency Models: A Tutorial Adve
1Shared Memory Consistency Models A Tutorial
Adve Gharachorloo
2Shared Memory
- Shared memory single address space abstraction
in a multiprocessor environment.
3Memory Model
- Specifics how reads and writes appear to executed
- May (usually) varies by level
- Programming language can provide a memory model,
for example Java has its own (JMM, JSR 133) - Processor
- Memory subsystem
4Definitions
- Sequential (Processor)
- Result of an execution is the same as if the
operations had been executed in the order
specified by the program. - Sequentially Consistent (Multiprocessor)
- Result of any execution is the same as if the
operations of all the processors were executed in
some sequential order and the operations of each
individual processor appear in the sequence in
the order specified by the program.
5Uniprocessor
Processor
Memory operations in program order sequential
memory
6Multiprocessor
Processor
Processor
Sequential Consistency
memory
7Relaxing Sequential Consistency
- Program Order
- Write followed by a read to a different location
can be reordered - Write followed by a write to a different location
can be reordered - Read followed by a write to (or read from) a
different location can be reordered - Write Atomicity
- Another processors writes can be read even
though the write is not visible to the writing
processor - A processors own writes can be read even though
the writes are not visible to other processors
8Uniprocessor with Write Buffer
Processor
P1 flag1 1 if(flag2 0) critical section
P2 flag2 1 if(flag1 0) critical section
Write Buffer
memory
9Multiprocessor with Write Buffer
Processor
Processor
P2 flag2 1 if(flag1 0) critical section
P1 flag1 1 if(flag2 0) critical section
Write Buffer
Write Buffer
memory
10Memory Barrier
- P1
- flag1 1
- mb()
- if(flag2 0)
- critical section
-
- P2
- flag2 1
- mb()
- if(flag1 0)
- critical section
-
11Effect of Memory Barrier
Processor
Processor
P1 flag1 1 mb() if(flag2 0) critical
section
P1 flag1 1 mb() if(flag2 0) critical
section
Write Buffer
Write Buffer
memory
12Write Through Memory Bus
P1
P1
P1 P2 data 2000 while(head
0) head 1
data
data
Write Through Cache
Write Through Cache
head
Interconnect
P2 sees write to head before seeing write to
data
2
1
Memory head data
Program Order has been relaxed
13Late Cache Invalidate Signal
- P1s writes arrive in-order to memory
- The read from data occurs before the
cache-invalidate signal arrives at P2 - P2 reads new value of head
- P2 reads old value of data from cache
- ISSUE
- Memory operations need to complete.
Cache-invalidate signal needs to propagate - Write Atomicity has been relaxed
P1
P1
invalidate
data
Write Through Cache
Write Through Cache
head
data
Interconnect
1
2
Memory head data
3
14Fences
15Relaxing Write to Read
- Reorder read following previous writes
- IBM prohibits read from returning the value of a
write before the write is visible to all
processors. - TSO can read own processors write
- Cannot read another processors write early (must
be visible to all processors). - Our buffer example is similar in effect
- IBM has serialization instruction (so that the
writes propagate and the reads wont be
reordered) - TSO wont be reordered if instruction is RMW
so you can enforce order using a
read-modify-write instruction.
16Relaxing Write to Read/Write
- SPARC PSO
- Writes to different locations can be pipelined or
overlapped reach memory or caches out-of-order - PSO identical to TSO, but allows a processor to
read its own writes early - Processors cannot read other processors writes
before they are globally visible - STBAR (store barrier) so writes cant get
reordered
17Weak Ordering
- Data operations (read/writes)
- Synchronization operations (fences/barriers)
- Model allows
- Reordering of operations between synchronization
operations - Each processor ensures that synchronization
instructions are not issued until all previous
operations (data and sync) are complete. - Ensures that writes always appear atomic, so no
fence is required to ensure write atomicity
18Release Consistency
- Acquire read memory operation that gains access
to a set of shared locations - Release a write operation that grants
permission for accessing a set of shared
locations - Two flavors
- Maintain sequential consistency among special
operations - Maintain processor consistency among special
operations
19Release Consistency
- RC SC
- Acquire ? all, all ? release, special ? special
- If acquire appears before any operation, program
order is enforced so that acquire completes
before the following operations. - RC PC
- Acquire ? all, all-gtrelease, special ? special,
except for a special write followed by a special
read
20RC - PC
- Program order for read following write requires
using rmw operations, if write being ordered is
ordinary then the write in the rmw needs to be
a release
21Just to make it more complicated
- Alpha
- mb enforce program order between any statements
- wmb only enforce program order among write
statements - RMO
- (LD ST) (LD ST)
- LDSTLD means that load and store operations
before the barrier must be completed before any
load operation after the barrier. Store
operations after the barrier may be reordered
before the barrier. - Power
- SYNC like alphas mb, except that when placed
between two reads to the same location, the
second read may go first. - Power allows writes to be seen early
- RMW sequences are used to make writes appear
atomic
22Discussion/Conclusion
- System-centric directly expose ordering and
write atomicity relaxations. Complicated,
difficult to port. - Programmer-centric Programmer provides
information to determine what optimizations can
be performed (when reading/writing particular
variables). Compiler complexity increased.
Debugging more difficult - Relaxed memory models have proven to be effective
in increasing performance the cost of this
higher performance is greater complexity.