Cache Coherence Protocols in Shared Memory Multiprocessors - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Cache Coherence Protocols in Shared Memory Multiprocessors

Description:

Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet envar Outline Introduction Background Information The cache coherence problem Cahce Enforcement ... – PowerPoint PPT presentation

Number of Views:887
Avg rating:3.0/5.0
Slides: 31
Provided by: cmpeBoun
Category:

less

Transcript and Presenter's Notes

Title: Cache Coherence Protocols in Shared Memory Multiprocessors


1
Cache Coherence Protocols in Shared Memory
Multiprocessors
  • Mehmet Senvar

2
Outline
  • Introduction
  • Background Information
  • The cache coherence problem
  • Cahce Enforcement Strategies
  • Consistency models
  • Simple Solutions
  • Hardware Protocols
  • Snooping protocols
  • Directory-based protocols
  • Compiler and Software protocols
  • Future work and conclusions

3
The Cache Coherence Problem
  • Caches allow greater performance by storing
    frequently used data in faster memory
  • Since all processors share the same address
    space, it is possible for more than one processor
    to cache an address (or data item) at a time
  • If one processor updates the data item without
    informing the other processor, inconsistencies
    may result and cause incorrect executions

4
Cache Coherence Problem
5
Cache Coherence (cont.)
  • For correct execution, coherence must be enforced
    between the caches
  • Two major factors are
  • performance
  • implementation cost
  • Four primary design issues are
  • coherence detection strategy
  • coherence enforcement strategy
  • precision of block-sharing information
  • cache block size

6
Cache Enforcement Strategies
  • A cache enforcement strategy is the mechanism
    which makes caches consistent
  • write-update (WU)
  • write-invalidate (WI)
  • hybrid protocols, competitive-update (CU)
  • Performance of WU and WI vary depending on the
    application and the number of writes
  • Hybrid protocols switch between WU and WI based
    on the of writes to a block

7
Consistency Models
  • A consistency model defines how the consistency
    of data values is maintained
  • Some consistency models are
  • sequential consistency
  • weak consistency
  • release consistency
  • Weak consistency models are more efficient to
    implement and require fewer coherence messages

8
Shared Caches (1)
Processors share a single cache, essentially
punting the problem. Useful for very small
machines. E.g., DPC in the Encore, Alliant
FX/8. Problems are limited cache bandwidth and
cache interference Benefits are fine-grain
sharing and prefetch effects
9
Non-cacheable Items (2)
  • Make shared data non-cacheable
  • One of the simplest software solution
  • Also at hardware, make cache locations
    unreachable

10
Broadcast Writes (3)
  • Every cache write request is sent to all other
    caches
  • Firstly need to discover whether each cache hold
    this data
  • Other copies are either updated or invalidated
  • Significant additional memory transactions occur

11
Hardware Protocols
  • Snoop Bus Mechanism
  • Directory Based Methods
  • Full Directory
  • Limited Directory
  • Chained Directory

12
Snoop Bus Protocol
  • Snooping protocols rely on a shared bus between
    the processors for coherence
  • On a processor write, the write is passed through
    the cache to main memory on the bus
  • Any processor caching the address may update or
    invalidate its cache entry as appropriate
  • Snooping protocols do not scale well beyond 32
    processors because of the shared bus
  • The choice between WU, WI, and CU is especially
    important to reduce communication

13
MESI (4-state) Invalidation Protocol
  • Each line in the cache can be in one of 4 states
  • Modifed (exclusive) only in 1 cache, modified
  • Exclusive (unmodified) only in 1 cache,
    unmodified
  • Shared (unmodified)
  • Invalid

14
MESI State Transition Diagram
15
MESI Example
16
Directory-Based Protocols
  • Directory-based protocols do not rely on a shared
    bus to exchange coherence information (use
    point-to-point connections)
  • more scaleable (can have hundreds of processors)
  • each processor can have its own memory
  • implement weak consistency for efficiency

17
Directory-Based Protocols (cont.)
  • Each node maintains a directory storing cache
    information and memory information
  • A processor communicates with the directory to
    access memory
  • if a processor requests a non-local memory page,
    the directory uses its information to find the
    page
  • Then, it uses messages to retrieve the page and
    insure all other processors have consistent info.
  • Since the directory maintains which processors
    are caching the page, it only needs to send
    messages to those processors

18
Directory-Based Protocols (cont.)
  • Designing a directory requires defining
  • cache block granularity
  • cache controller design
  • directory structure
  • Cache block granularity is the size of the cache
    and the size of a cache line
  • CC-NUMA machines have a separate, smaller cache
    from main memory
  • COMA machines use nodes entire memory as cache
    for remote pages
  • Block size affects performance (false sharing)

19
Directory-Based Protocols (cont.)
  • Cache controller is hardware that maintains the
    directory and processes memory requests
  • custom hardware
  • programmable protocol processor
  • The directory structure is how the cache and
    memory information is organized
  • p1-bit full directory
  • linked-list directories
  • tagged directories

20
Directory Models
  • Full Directory
  • Link to all caches for all shared locations
  • Limited Directory
  • To some caches having shared data, n lt N
  • Chained (linked)Directory
  • To one chache, form ths cache to others,
    single/double link

21
Directory Sample (full)
22
Lock-Based Protocols
  • New work that promises to be more scaleable than
    directory protocols
  • Implements scope consistency which is similar to
    lazy release consistency
  • Coherence information exchanged by reading and
    writing notices from the lock which protects the
    shared memory
  • Currently, implemented in software similar to
    DSM, but may move to hardware if performance
    gains can be realized

23
Software Protocols
  • Software protocols enforce consistency with
    limited hardware support by relying either on the
    compiler or specialized software handlers
  • Similar to distributed shared memory (DSM)
    systems but at a lower level
  • sharing usually in blocks not pages
  • needs to be more efficient for better performance
  • architecture support for sharing

24
Classification of Software Protocols
  • Several criteria distinguish software protocols
  • dynamism - compile-time or run-time analysis
  • selectivity - level of coherence actions
  • restrictiveness - conservative or as-needed
    consistency enforcement
  • adaptivity - can protocol adapt to access
    patterns
  • granularity - size and structure of coherence
    data
  • blocking - program block on which coherence is
    enforced
  • positioning - position of coherence instructions
  • updating - how memory is updated after a write
  • checking - how incoherence is detected

25
Software Coherence with Limited Hardware Support
  • Compiler must generate consistent code as no
    hardware coherence provided
  • Hardware maintains time tags which are updated on
    every write
  • On a read, compiler generates coherence reads
    which check time tags to insure data is
    consistent
  • Relies on the compiler to detect read which may
    be inconsistent, and the hardware must maintain
    these time tags
  • Using tags, it is also possible to perform
    dynamic self-invalidation of blocks
  • Many techniques based on using these time tags

26
Software Coherence with Limited Hardware Support
(cont.)
  • If hardware has no time tags, Petersen and Li
    developed an algorithm which uses only page
    translation hardware and page status tables
  • Sharing information is maintained by a software
    handler at the page-level
  • On a page access or fault, the software handler
    checks the sharing information, updates page
    tables, and performs coherence actions
  • Slower than hardware as software handlers involve
    the OS and are on the critical memory access path

27
Enforcing Coherence by Restricting Parallelism
  • Compilers can also guarantee coherence by
    structuring the language to limit parallelism
  • easier to enforce coherence
  • limits the programmer and potential parallelism
  • simplifies compiler design
  • good performance can be achieved with no hardware
    support
  • Parallel language restrictions include
  • doall parallel loops
  • master/slave processes

28
Optimizing Compilers
  • Optimizing compilers are designed to maintain
    coherence with limited hardware support without
    overly restricting the programmer
  • rely on detecting data dependencies
  • may use synchronization variables (locks,
    barriers)
  • can provide the hardware with hints
  • can detect when coherence is not needed
  • may have problems with dynamic sharing
  • offer good performance, but are hard to design

29
Future Work
  • Hardware protocols are well defined, and the
    directory structure is near optimal
  • Cost improvements can be obtained by mass
    producing cache controller chips
  • Software protocols are a good area for future
    research because they are also applicable at
    higher-levels of sharing (DSM, databases, ...)
  • Optimizing compilers need to be improved to
    detect data dependencies and optimize code for
    the parallel environment

30
Conclusions
  • Hardware protocols offer the best performance but
    require high hardware costs
  • Software protocols can be used when there is no
    hardware support with a slight performance
    penalty
  • Optimizing compilers can enforce coherence or
    provide hints to the hardware
  • A combination of hardware and compiler
    optimizations is the best
Write a Comment
User Comments (0)
About PowerShow.com