Cache Coherence Protocols in Shared Memory Multiprocessors - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Cache Coherence Protocols in Shared Memory Multiprocessors

Description:

Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet envar Outline Introduction Background Information The cache coherence problem Cahce Enforcement ... – PowerPoint PPT presentation

Number of Views:887

Avg rating:3.0/5.0

Slides: 31

Provided by: cmpeBoun

Category:

more less

Transcript and Presenter's Notes

Title: Cache Coherence Protocols in Shared Memory Multiprocessors

1
Cache Coherence Protocols in Shared Memory
Multiprocessors

Mehmet Senvar

2
Outline

Introduction
Background Information
The cache coherence problem
Cahce Enforcement Strategies
Consistency models
Simple Solutions
Hardware Protocols
Snooping protocols
Directory-based protocols
Compiler and Software protocols
Future work and conclusions

3
The Cache Coherence Problem

Caches allow greater performance by storing
frequently used data in faster memory
Since all processors share the same address
space, it is possible for more than one processor
to cache an address (or data item) at a time
If one processor updates the data item without
informing the other processor, inconsistencies
may result and cause incorrect executions

4
Cache Coherence Problem
5
Cache Coherence (cont.)

For correct execution, coherence must be enforced
between the caches
Two major factors are
performance
implementation cost
Four primary design issues are
coherence detection strategy
coherence enforcement strategy
precision of block-sharing information
cache block size

6
Cache Enforcement Strategies

A cache enforcement strategy is the mechanism
which makes caches consistent
write-update (WU)
write-invalidate (WI)
hybrid protocols, competitive-update (CU)
Performance of WU and WI vary depending on the
application and the number of writes
Hybrid protocols switch between WU and WI based
on the of writes to a block

7
Consistency Models

A consistency model defines how the consistency
of data values is maintained
Some consistency models are
sequential consistency
weak consistency
release consistency
Weak consistency models are more efficient to
implement and require fewer coherence messages

8
Shared Caches (1)
Processors share a single cache, essentially
punting the problem. Useful for very small
machines. E.g., DPC in the Encore, Alliant
FX/8. Problems are limited cache bandwidth and
cache interference Benefits are fine-grain
sharing and prefetch effects
9
Non-cacheable Items (2)

Make shared data non-cacheable
One of the simplest software solution
Also at hardware, make cache locations
unreachable

10
Broadcast Writes (3)

Every cache write request is sent to all other
caches
Firstly need to discover whether each cache hold
this data
Other copies are either updated or invalidated
Significant additional memory transactions occur

11
Hardware Protocols

Snoop Bus Mechanism
Directory Based Methods
Full Directory
Limited Directory
Chained Directory

12
Snoop Bus Protocol

Snooping protocols rely on a shared bus between
the processors for coherence
On a processor write, the write is passed through
the cache to main memory on the bus
Any processor caching the address may update or
invalidate its cache entry as appropriate
Snooping protocols do not scale well beyond 32
processors because of the shared bus
The choice between WU, WI, and CU is especially
important to reduce communication

13
MESI (4-state) Invalidation Protocol

Each line in the cache can be in one of 4 states
Modifed (exclusive) only in 1 cache, modified
Exclusive (unmodified) only in 1 cache,
unmodified
Shared (unmodified)
Invalid

14
MESI State Transition Diagram
15
MESI Example
16
Directory-Based Protocols

Directory-based protocols do not rely on a shared
bus to exchange coherence information (use
point-to-point connections)
more scaleable (can have hundreds of processors)
each processor can have its own memory
implement weak consistency for efficiency

17
Directory-Based Protocols (cont.)

Each node maintains a directory storing cache
information and memory information
A processor communicates with the directory to
access memory
if a processor requests a non-local memory page,
the directory uses its information to find the
page
Then, it uses messages to retrieve the page and
insure all other processors have consistent info.
Since the directory maintains which processors
are caching the page, it only needs to send
messages to those processors

18
Directory-Based Protocols (cont.)

Designing a directory requires defining
cache block granularity
cache controller design
directory structure
Cache block granularity is the size of the cache
and the size of a cache line
CC-NUMA machines have a separate, smaller cache
from main memory
COMA machines use nodes entire memory as cache
for remote pages
Block size affects performance (false sharing)

19
Directory-Based Protocols (cont.)

Cache controller is hardware that maintains the
directory and processes memory requests
custom hardware
programmable protocol processor
The directory structure is how the cache and
memory information is organized
p1-bit full directory
linked-list directories
tagged directories

20
Directory Models

Full Directory
Link to all caches for all shared locations
Limited Directory
To some caches having shared data, n lt N
Chained (linked)Directory
To one chache, form ths cache to others,
single/double link

21
Directory Sample (full)
22
Lock-Based Protocols

New work that promises to be more scaleable than
directory protocols
Implements scope consistency which is similar to
lazy release consistency
Coherence information exchanged by reading and
writing notices from the lock which protects the
shared memory
Currently, implemented in software similar to
DSM, but may move to hardware if performance
gains can be realized

23
Software Protocols

Software protocols enforce consistency with
limited hardware support by relying either on the
compiler or specialized software handlers
Similar to distributed shared memory (DSM)
systems but at a lower level
sharing usually in blocks not pages
needs to be more efficient for better performance
architecture support for sharing

24
Classification of Software Protocols

Several criteria distinguish software protocols
dynamism - compile-time or run-time analysis
selectivity - level of coherence actions
restrictiveness - conservative or as-needed
consistency enforcement
adaptivity - can protocol adapt to access
patterns
granularity - size and structure of coherence
data
blocking - program block on which coherence is
enforced
positioning - position of coherence instructions
updating - how memory is updated after a write
checking - how incoherence is detected

25
Software Coherence with Limited Hardware Support

Compiler must generate consistent code as no
hardware coherence provided
Hardware maintains time tags which are updated on
every write
On a read, compiler generates coherence reads
which check time tags to insure data is
consistent
Relies on the compiler to detect read which may
be inconsistent, and the hardware must maintain
these time tags
Using tags, it is also possible to perform
dynamic self-invalidation of blocks
Many techniques based on using these time tags

26
Software Coherence with Limited Hardware Support
(cont.)

If hardware has no time tags, Petersen and Li
developed an algorithm which uses only page
translation hardware and page status tables
Sharing information is maintained by a software
handler at the page-level
On a page access or fault, the software handler
checks the sharing information, updates page
tables, and performs coherence actions
Slower than hardware as software handlers involve
the OS and are on the critical memory access path

27
Enforcing Coherence by Restricting Parallelism

Compilers can also guarantee coherence by
structuring the language to limit parallelism
easier to enforce coherence
limits the programmer and potential parallelism
simplifies compiler design
good performance can be achieved with no hardware
support
Parallel language restrictions include
doall parallel loops
master/slave processes

28
Optimizing Compilers

Optimizing compilers are designed to maintain
coherence with limited hardware support without
overly restricting the programmer
rely on detecting data dependencies
may use synchronization variables (locks,
barriers)
can provide the hardware with hints
can detect when coherence is not needed
may have problems with dynamic sharing
offer good performance, but are hard to design

29
Future Work

Hardware protocols are well defined, and the
directory structure is near optimal
Cost improvements can be obtained by mass
producing cache controller chips
Software protocols are a good area for future
research because they are also applicable at
higher-levels of sharing (DSM, databases, ...)
Optimizing compilers need to be improved to
detect data dependencies and optimize code for
the parallel environment

30
Conclusions

Hardware protocols offer the best performance but
require high hardware costs
Software protocols can be used when there is no
hardware support with a slight performance
penalty
Optimizing compilers can enforce coherence or
provide hints to the hardware
A combination of hardware and compiler
optimizations is the best

Write a Comment

User Comments (0)