Implementation and Verification of a Cache Coherence protocol using Spin PowerPoint PPT Presentation

presentation player overlay
1 / 23
About This Presentation
Transcript and Presenter's Notes

Title: Implementation and Verification of a Cache Coherence protocol using Spin


1
Implementation and Verification of a Cache
Coherence protocol using Spin
  • Steven Farago

2
Goal
  • To use Spin to design a plausible cache
    coherence protocol
  • Introduce nothing in the Spin model that would
    not be realistic in hardware (e.g. instant global
    knowledge between unrelated state machines)
  • To verify the correctness of the protocol

3
Background
  • Definition Cache Small, high-speed memory
    that is used by a single processor. All
    processor memory accesses are via the cache.
  • Problem
  • In a multiprocessor system, each processor could
    have a cache.
  • Each cache could contain (potentially different)
    data for the same addresses.
  • Given this, how to ensure that processors see a
    consistent picture of memory?

4
Coherence protocol
  • A Coherence protocol specifies how caches
    communicate with processors and each other so
    that processors will have a predictable view of
    memory.
  • Caches that always provide this predictable view
    of memory are said to be coherent.

5
A Definition of Coherence
  • A view of memory is coherent if the following
    property holds
  • Given cacheline A, two processors may not see
    storage accesses to A in a conflicting order.
  • Example
  • Processor 0 Processor 1 Processor 2
    Processor 3
  • Store A, 0 Load A, 0 Load A, 0
    Load A, 1
  • Store A, 1 Load A, 1 Load A, 0
    Load A, 0
  • Coherent
    Coherent NOT Coherent
  • Informally, a processor may not see old data
    after seeing new data.

6
Standard Coherence Protocol
  • MESI (Modified, Exclusive, Shared, Invalid)
  • Standard protocol that is supposed to guarantee
    cache coherence
  • Each block in the cacheline is marked with one of
    these states.
  • Cacheline accesses are only allowed if the cache
    states are correct w.r.t the coherence protocol
  • Examples
  • A cache that is marked invalid may not provide
    data to a processor.
  • Cacheline data may not be updated unless the line
    is in the Exclusive or Modified

7
System Model
  • Initial version
  • Three state machines
  • ProcessorModel Non-deterministically issues
    Loads and Stores to cache forever
  • CacheModel Two parts - initially combined into
    a single process
  • MainCache - Services processor requests.
  • Snooper - Responds to messages from memory
    controller
  • MemoryController - Services requests from each
    cache and maintains coherency among all

8
System Model
Processor
Processor
MainCache
MainCache
Snooper
Snooper
MemoryController
9
ProcessorModel
  • Simple
  • Continually issues Load/Store requests to
    associated Cache.
  • Communication done via Bus Model.
  • Read requests are blocking
  • Coherence verification done when Load receives
    data (via Spin assert statement)

10
CacheModel
  • Two parts MainCache and Snooper
  • MainCache services ProcessorModel Load and Store
    requests and initiates contact with the
    MemoryController when an invalid cache state is
    encountered
  • Snooper services independent request from
    MemoryController. Requests necessary for
    MemoryController to coordinate coherence
    responses.

11
MemoryControllerModel
  • Responsible for servicing Cache requests
  • 3 Types of requests
  • Data request Cache requires up-to-date data to
    supply to processor
  • Permission-to-store A Cache may not transition
    to the Modified state w/o MCs permission
  • A combination of these two
  • All types of requests may require MC to
    communicate with all system caches (via Snooper
    processes) to ensure coherence

12
Implementation of Busses
  • All processes represent independent state
    machines. Need communication mechanism
  • Use Spin depth 1 queues to simulate
    communication.
  • Destructive/Blocking read of queues requires
    global bool to indicate bus activity (required
    for polling).
  • Global between processes valid to make up for
    differences between Spin queues and real busses

13
Problems - Part 1
  • MainCache and Snooper initially implemented as a
    single process.
  • Process nondeterministically determines which to
    execute at each iteration
  • Communication between Processor/Cache and
    Cache/Memory done with blocking queues
  • Blocked receive in MainCache --gt Snooper cannot
    execute
  • Leads to deadlock in certain situations

14
Solution 1
  • Split MainCache and Snooper into separate
    processes.
  • Both can access global cacheData and cacheState
    variables independently

15
--gt Problems - Part2
  • As separate processes, Snooper and MainCache
    could change cache state unpredictably.
  • Race conditions Snooper changes cache
    state/data while MainCache is in mid-transaction
    --gt returns invalidated data to processor.

16
Solution 2
  • Add locking mechanism to cache.
  • MainCache or Snooper may only access cache if
    they first lock it.
  • Locking mechanism For simplicity, cheated by
    using Spins atomic keyword to implement test-set
    on a shared variable.
  • Assumption Real hardware would have some
    similar mechanism available to lock caches.
  • Question Revised model now equivalent to
    original??

17
--gt Problem 3
  • Memory controller allows multiple outstanding
    requests from caches.
  • Snooper of cache which has a MainCache request
    outstanding cannot respond to MC queries for
    other outstanding requests (due to locked
    cacheline).
  • Deadlock.

18
Solution 3
  • Disallow multiple outstanding Cache/MC
    transactions.
  • Introduce global bool variable shared across all
    caches outstandingBusOp.
  • A cache may only issue requests to the memory
    controller if no requests from other caches
    outstanding.
  • Global knowledge across all caches unrealistic.
  • Equivalent to retries from MC??

19
--gt Problem 4
  • Previous problems failed in Spin simulation
    within 1000 steps.
  • Given last solution, random simulation failures
    vanish in first 3000 steps.
  • Verification fails after 20000 steps
  • Cause of problem as yet unresolved

20
Verification
  • How to verify coherence generally??
  • Verify something stronger A processor will
    never see conflicting ordering of data if it
    always sees the newest data available in the
    system.
  • For all loads, assert that data is new

21
Modeling of Data
  • Concern that modeling data as random integer
    would cause Spin to run out of memory
  • Model data as a bit with values OLD and NEW.
  • All processor Stores store NEW data.
  • When transitioning to a Modified state, a cache
    will change all other values of data in memory
    and other caches to OLD
  • Global access to data here strictly a part of
    verification effort, not algorithm. Thus allowed.

22
Debugging
  • Found debugging parallel processes difficult.
  • Made much easier by Spins message sequence
    diagrams
  • Graphically shows sends and receives of all
    messages.
  • Requires use of Spin queues rather than globals
    for interprocess communication

23
Future work
  • Make existing protocol completely bug free
  • Activate additional features disabled for
    debugging purposes (e.g. bus transaction types)
  • Verify protocol specific rules
  • No two caches may be simultaneously Modified
  • Cache Modified or Exclusive --gt no other cache is
    Shared
Write a Comment
User Comments (0)
About PowerShow.com