Shared Memory Consistency Models: A Tutorial Adve - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Shared Memory Consistency Models: A Tutorial Adve

Description:

'Shared Memory Consistency Models: A Tutorial' Adve & Gharachorloo ... Release Consistency ... Maintain sequential consistency among 'special' operations ... – PowerPoint PPT presentation

Number of Views:678
Avg rating:3.0/5.0
Slides: 23
Provided by: robert512
Category:

less

Transcript and Presenter's Notes

Title: Shared Memory Consistency Models: A Tutorial Adve


1
Shared Memory Consistency Models A Tutorial
Adve Gharachorloo
  • Robert T. Bauer

2
Shared Memory
  • Shared memory single address space abstraction
    in a multiprocessor environment.

3
Memory Model
  • Specifics how reads and writes appear to executed
  • May (usually) varies by level
  • Programming language can provide a memory model,
    for example Java has its own (JMM, JSR 133)
  • Processor
  • Memory subsystem

4
Definitions
  • Sequential (Processor)
  • Result of an execution is the same as if the
    operations had been executed in the order
    specified by the program.
  • Sequentially Consistent (Multiprocessor)
  • Result of any execution is the same as if the
    operations of all the processors were executed in
    some sequential order and the operations of each
    individual processor appear in the sequence in
    the order specified by the program.

5
Uniprocessor
Processor
Memory operations in program order sequential
memory
6
Multiprocessor
Processor
Processor
Sequential Consistency
memory
7
Relaxing Sequential Consistency
  • Program Order
  • Write followed by a read to a different location
    can be reordered
  • Write followed by a write to a different location
    can be reordered
  • Read followed by a write to (or read from) a
    different location can be reordered
  • Write Atomicity
  • Another processors writes can be read even
    though the write is not visible to the writing
    processor
  • A processors own writes can be read even though
    the writes are not visible to other processors

8
Uniprocessor with Write Buffer
Processor
P1 flag1 1 if(flag2 0) critical section
P2 flag2 1 if(flag1 0) critical section
Write Buffer
memory
9
Multiprocessor with Write Buffer
Processor
Processor
P2 flag2 1 if(flag1 0) critical section
P1 flag1 1 if(flag2 0) critical section
Write Buffer
Write Buffer
memory
10
Memory Barrier
  • P1
  • flag1 1
  • mb()
  • if(flag2 0)
  • critical section
  • P2
  • flag2 1
  • mb()
  • if(flag1 0)
  • critical section

11
Effect of Memory Barrier
Processor
Processor
P1 flag1 1 mb() if(flag2 0) critical
section
P1 flag1 1 mb() if(flag2 0) critical
section
Write Buffer
Write Buffer
memory
12
Write Through Memory Bus
P1
P1
P1 P2 data 2000 while(head
0) head 1
data
data
Write Through Cache
Write Through Cache
head
Interconnect
P2 sees write to head before seeing write to
data
2
1
Memory head data
Program Order has been relaxed
13
Late Cache Invalidate Signal
  • P1s writes arrive in-order to memory
  • The read from data occurs before the
    cache-invalidate signal arrives at P2
  • P2 reads new value of head
  • P2 reads old value of data from cache
  • ISSUE
  • Memory operations need to complete.
    Cache-invalidate signal needs to propagate
  • Write Atomicity has been relaxed

P1
P1
invalidate
data
Write Through Cache
Write Through Cache
head
data
Interconnect
1
2
Memory head data
3
14
Fences
15
Relaxing Write to Read
  • Reorder read following previous writes
  • IBM prohibits read from returning the value of a
    write before the write is visible to all
    processors.
  • TSO can read own processors write
  • Cannot read another processors write early (must
    be visible to all processors).
  • Our buffer example is similar in effect
  • IBM has serialization instruction (so that the
    writes propagate and the reads wont be
    reordered)
  • TSO wont be reordered if instruction is RMW
    so you can enforce order using a
    read-modify-write instruction.

16
Relaxing Write to Read/Write
  • SPARC PSO
  • Writes to different locations can be pipelined or
    overlapped reach memory or caches out-of-order
  • PSO identical to TSO, but allows a processor to
    read its own writes early
  • Processors cannot read other processors writes
    before they are globally visible
  • STBAR (store barrier) so writes cant get
    reordered

17
Weak Ordering
  • Data operations (read/writes)
  • Synchronization operations (fences/barriers)
  • Model allows
  • Reordering of operations between synchronization
    operations
  • Each processor ensures that synchronization
    instructions are not issued until all previous
    operations (data and sync) are complete.
  • Ensures that writes always appear atomic, so no
    fence is required to ensure write atomicity

18
Release Consistency
  • Acquire read memory operation that gains access
    to a set of shared locations
  • Release a write operation that grants
    permission for accessing a set of shared
    locations
  • Two flavors
  • Maintain sequential consistency among special
    operations
  • Maintain processor consistency among special
    operations

19
Release Consistency
  • RC SC
  • Acquire ? all, all ? release, special ? special
  • If acquire appears before any operation, program
    order is enforced so that acquire completes
    before the following operations.
  • RC PC
  • Acquire ? all, all-gtrelease, special ? special,
    except for a special write followed by a special
    read

20
RC - PC
  • Program order for read following write requires
    using rmw operations, if write being ordered is
    ordinary then the write in the rmw needs to be
    a release

21
Just to make it more complicated
  • Alpha
  • mb enforce program order between any statements
  • wmb only enforce program order among write
    statements
  • RMO
  • (LD ST) (LD ST)
  • LDSTLD means that load and store operations
    before the barrier must be completed before any
    load operation after the barrier. Store
    operations after the barrier may be reordered
    before the barrier.
  • Power
  • SYNC like alphas mb, except that when placed
    between two reads to the same location, the
    second read may go first.
  • Power allows writes to be seen early
  • RMW sequences are used to make writes appear
    atomic

22
Discussion/Conclusion
  • System-centric directly expose ordering and
    write atomicity relaxations. Complicated,
    difficult to port.
  • Programmer-centric Programmer provides
    information to determine what optimizations can
    be performed (when reading/writing particular
    variables). Compiler complexity increased.
    Debugging more difficult
  • Relaxed memory models have proven to be effective
    in increasing performance the cost of this
    higher performance is greater complexity.
Write a Comment
User Comments (0)
About PowerShow.com