Lecture 4: Directory Protocols - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 4: Directory Protocols

Description:

Title: PowerPoint Presentation Author: Rajeev Balasubramonian Last modified by: Administator Created Date: 9/20/2002 6:19:18 PM Document presentation format – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 19
Provided by: RajeevBalas170
Category:

less

Transcript and Presenter's Notes

Title: Lecture 4: Directory Protocols


1
Lecture 4 Directory Protocols
  • Topics directory-based cache coherence
    implementations

2
Split Transaction Bus
  • What would it take to implement the protocol
    correctly
  • while assuming a split transaction bus?
  • Split transaction bus a cache puts out a
    request, releases
  • the bus (so others can use the bus), receives
    its response
  • much later
  • Assumptions
  • only one request per block can be outstanding
  • separate lines for addr (request) and data
    (response)

3
Split Transaction Bus
Proc 1
Proc 2
Proc 3
Cache
Cache
Cache
Request lines
Response lines
4
Design Issues
  • When does the snoop complete? What if the snoop
    takes
  • a long time?
  • What if the buffer in a processor/memory is
    full? When
  • does the buffer release an entry? Are the
    buffers identical?
  • How does each processor ensure that a block does
    not
  • have multiple outstanding requests?
  • What determines the write order requests or
    responses?

5
Design Issues II
  • What happens if a processor is arbitrating for
    the bus and
  • witnesses another bus transaction for the same
    address?
  • If the processor issues a read miss and there is
    already a
  • matching read in the request table, can we
    reduce bus
  • traffic?

6
Scalable Multiprocessors
P1
P2
Pn
C1
C2
Cn
Mem 1
CA1
Mem 2
CA2
Mem n
CAn
Scalable interconnection network
CC NUMA Cache coherent non-uniform memory access
7
Directory-Based Protocol
  • For each block, there is a centralized
    directory that
  • maintains the state of the block in different
    caches
  • The directory is co-located with the
    corresponding memory
  • Requests and replies on the interconnect are no
    longer
  • seen by everyone the directory serializes
    writes

P
P
C
C
Mem
CA
Dir
Mem
CA
Dir
8
Definitions
  • Home node the node that stores memory and
    directory
  • state for the cache block in question
  • Dirty node the node that has a cache copy in
    modified state
  • Owner node the node responsible for supplying
    data
  • (usually either the home or dirty node)
  • Also, exclusive node, local node, requesting
    node, etc.

P
P
C
C
Mem
CA
Dir
Mem
CA
Dir
9
Protocol Steps
P1
P2
Pn
C1
C2
Cn
Mem 1
CA1
Mem 2
CA2
Mem n
CAn
Dir
Dir
Dir
Scalable interconnection network
  • What happens on a read miss and a write miss?
  • How is information stored in a directory?

10
Directory Organizations
  • Centralized Directory one fixed location
    bottleneck!
  • Flat Directories directory info is in a fixed
    place,
  • determined by examining the address can be
    further
  • categorized as memory-based or cache-based
  • Hierarchical Directories the processors are
    organized as a
  • logical tree structure and each parent keeps
    track of which
  • of its immediate children has a copy of the
    block less
  • storage (?), more searching, can exploit
    locality

11
Flat Memory-Based Directories
  • Directory is associated with memory and stores
    info
  • for all cache copies
  • A presence vector stores a bit for every
    processor, for
  • every memory block the overhead is a function
    of
  • memory/block size and processors
  • Reducing directory overhead

12
Flat Memory-Based Directories
  • Directory is associated with memory and stores
    info
  • for all cache copies
  • A presence vector stores a bit for every
    processor, for
  • every memory block the overhead is a function
    of
  • memory/block size and processors
  • Reducing directory overhead
  • Width pointers (keep track of processor ids of
    sharers)
  • (need overflow strategy), 2-level protocol to
    combine
  • info for multiple processors
  • Height increase block size, track info only for
    blocks
  • that are cached (note cache size ltlt memory
    size)

13
Flat Cache-Based Directories
  • The directory at the memory home node only
    stores a
  • pointer to the first cached copy the caches
    store
  • pointers to the next and previous sharers (a
    doubly linked
  • list)

Main memory
Cache 7
Cache 3
Cache 26
14
Flat Cache-Based Directories
  • The directory at the memory home node only
    stores a
  • pointer to the first cached copy the caches
    store
  • pointers to the next and previous sharers (a
    doubly linked
  • list)
  • Potentially lower storage, no bottleneck for
    network traffic,
  • Invalidates are now serialized (takes longer to
    acquire
  • exclusive access), replacements must update
    linked list,
  • must handle race conditions while updating list

15
Data Sharing Patterns
  • Two important metrics that guide our design
    choices
  • invalidation frequency and invalidation size
    turns out
  • that invalidation size is rarely greater than
    four
  • Read-only data constantly read, never updated
    (raytrace)
  • Producer-consumer flag-based synchronization,
    updates
  • from neighbors (Ocean)
  • Migratory reads and writes from a single
    processor for a
  • period of time (global sum)
  • Irregular unpredictable accesses (distributed
    task queue)

16
Protocol Optimizations
3
C1
C2
C1 attempts to read a block that is in Modified
state in C2
4
2
1
5
Mem
Request Response
3
C1
C2
C1
C2
2
2
3
4
4
1
1
Mem
Mem
Intervention Forwarding
Reply Forwarding
17
Serializing Writes for Coherence
  • Potential problems updates may be re-ordered by
    the
  • network General solution do not start the
    next write until
  • the previous one has completed
  • Strategies for buffering writes
  • buffer at home requires more storage at home
    node
  • buffer at requestors the request is forwarded
    to the
  • previous requestor and a linked list is
    formed
  • NACK and retry the home node nacks all requests
  • until the outstanding request has completed

18
Title
  • Bullet
Write a Comment
User Comments (0)
About PowerShow.com