Cache Coherence Schemes for Multiprocessors - PowerPoint PPT Presentation

About This Presentation
Title:

Cache Coherence Schemes for Multiprocessors

Description:

'A Survey of Cache Coherence Schemes for Multiprocessors', Per ... Eviction : Pointer replacement. Resembles set associative cache and requires eviction policy ... – PowerPoint PPT presentation

Number of Views:292
Avg rating:3.0/5.0
Slides: 14
Provided by: OPE169
Category:

less

Transcript and Presenter's Notes

Title: Cache Coherence Schemes for Multiprocessors


1
Cache Coherence Schemes for Multiprocessors
Sivakumar M Osman Unsal
2
  • Consistency
  • Different Directory Schemes
  • Comparison of Directory schemes
  • Hierarchical Directory scheme (in detail)
  • Referred Papers
  • Directory-Based Cache Coherence in Large-Scale
    Multiprocessors, David Chaiken, Craig Fields,
    Kiyoshi Kurihara and Anant Agarwal
  • A Survey of Cache Coherence Schemes for
    Multiprocessors, Per Stenstrom
  • Cache Consistency and Sequential Consistency,
    James R Goodman
  • LimitLess Directories A Scalable Cache
    Coherence Schemes, David Chaiken, John
    Kubiatowicz and Anant Agarwal
  • A Hierarchical Directory Scheme for Large-Scale
    Cache-Coherent Multiprocessors, A Dissertation
    by Yeong-Chang Maa

3
CONSISTENCY
  • Strict Consistency
  • Any read to memory location X returns the value
    stored by the most recent write operation to X
  • P1 W(x)1 P1 W(x)1
  • P2 R(x)1 P2 R(x)0 R(x)1
  • Sequential Consistency Program order Memory
    coherence
  • The result of any execution is the same as if
    the operations of all processors were executed in
    some sequential order, and the operations of each
    individual processor appear in this sequence in
    the order specified y its program
  • P1 W(x)1 P1 W(x)1
  • P2 R(x)0 R(x)1 P2 R(x)1 R(x)1

4
CONSISTENCY
  • Causal Consistency
  • Writes that are potentially causally related
    must be seen by all process in the same order.
    Concurrent writes may be seen in a different
    order on different machines.
  • P1 W(x)1 W(x)3
  • P2 R(x)1 W(x)2
  • P3 R(x)1 R(x)3 R(x)2
  • P4 R(x)1 R(x)2 R(x)3
  • PRAM Consistency
  • Writes done by a single process are received by
    all other process in the order in which they are
    issued, but writes from different processes may
    be seen in a different order by different
    processes.
  • Processor Consistency
  • For every memory location X, there should be a
    global agreement about the order of writes to X

5
CONSISTENCY
  • Weak Consistency
  • Using Synchronization variable which are
    sequentially consistent
  • No access to a synchronization variable is
    allowed until all previous writes have completed
    everywhere
  • No data access is allowed until all previous
    access to synchronization variable have been
    performed
  • Release Consistency
  • Barrier synchronization Acquire and Release
  • Acquire and Release should be processor
    consistent
  • Lazy release and Eager release consistencies
  • Entry Consistency
  • Locks for each shared variable or element

6
Directory based cache coherence
  • Need
  • Limited Bandwidth
  • Bus cycle times - ring out
  • Scalability
  • Disparity between bus and processor speed
  • Increase in Bandwidth as processor number
    increases
  • Drawback
  • No Broadcast capability
  • Complex protocol

7
Directory Schemes
  • Tangs scheme
  • Full-mapped
  • Each directory entry N bits status bits for N
    processors
  • Memory overhead scales as (square of N) assuming
    M a N
  • Censier scheme (Distributed)
  • Stenstrom scheme (Distributed)
  • Limited Directories
  • Classified as Dir i X, where X may be NB or B
    iltN
  • Eviction Pointer replacement
  • Resembles set associative cache and requires
    eviction policy
  • Efficient if memory is referenced by few
    processors
  • Memory overhead scales as (Milog N)
  • If X is NB, can allow more than i copies to exist

8
Directory Schemes
  • Chained Directories
  • Make use of pointers like linked lists
  • Complex cache-block replacement
  • splice intermediate cache out of the chain
  • Invalidate the location
  • Variation Doubly linked chain
  • Optimizes replacement process
  • Needs large average message block size
  • Comparison of full-mapped, limited, chained
    schemes
  • Metric Processor Utilization
  • Utilization depends on frequency of Memory
    reference and latency of memory system
  • Latency depends on topology, speed, number of
    processors, memory access latency, frequency and
    size of messages

9
Directory Schemes
  • Analysis
  • No coherence All addresses in trace are not
    shared. Gives upper bound
  • Only cache private data For comparison with
    other schemes
  • P-Thor minimize communication and has minimum
    synchronization points
  • Speech Poor performance of limited directories
    due to pointer thrashing
  • Performance improvement by system level
    optimizations
  • Tree barrier structure instead of linear
    barrier
  • Separating read only blocks from read/write
    blocks
  • Reducing the block size

10
Directory Schemes
  • Coarse Vector DiriCVr
  • Initially behaves as limited directory
  • Switches to fully mapped
  • Dir0B
  • 2 status bit for 4 states Absent, Present1
    present and clean in only one cache, Present
    present and clean in more than one cache,
    PresentM present and dirty in only one cache
  • LimitLess Directory Scheme
  • Combination of hardware and software
    techniques
  • Realize performance of full-map
    directory
  • Memory overhead of limited directory
  • Sectored Directory DirN/L
  • L sub-blocks share the directory
  • Overhead is MN/L

11
Directory Schemes
  • Directory Cache Dira1,a2
  • a1 entries for short limited directory
    pointers
  • a2 entries for long full-map pointers
  • Hierarchical Scheme

12
Hierarchical Cache Coherence Schemes
  • Network Architecture
  • Wilson Hierarchical cache/bus architecture
  • combination bus and directory scheme
  • cache contains a copy of all blocks cached
    underneath it
  • write Invalidate protocol
  • Higher level caches act as filters
  • Data Diffusion Machine
  • Hierarchy of busses with large processor
    caches
  • Write Invalidate protocol
  • Only state information in higher order caches
  • No global memory and cost effective

13
Hierarchical Full-mapped Directory Schemes
Descendants presence vector
tag bits
ackctr
MRU
INV
UP
MRQ
Tr
dirty
  • States of HFMD
  • ABS No entries in descendants cleared
    des.vector and Tr bit
  • ABT descendants entries being invalidated
    cleared des.vector and Tr bit
  • RO read only entries in the descendants
    set des.vector, cleared dirty and Tr
  • bits
  • RW a dirty (read write) entry is in the
    descendants set des.vector, dirty bit
  • and cleared TR bit
  • RT descendant entries have outstanding read
    requests set des.vector and Tr
  • bit, cleared dirty bit
  • WT descendant entries have outstanding
    write or modify request set
  • des.vector, dirty bit and Tr bit
  • INV descendant entries being invalidated
    from directory entry cleared
  • des.vector, set Tr bit and INV bit
Write a Comment
User Comments (0)
About PowerShow.com