Title: COMP 206: Computer Architecture and Implementation
1COMP 206Computer Architecture and Implementation
- Montek Singh
- Mon, Nov 14, 2005
- Topic Cache Coherence
2Outline
- Cache Coherence
- Reading HP3 Section 6.3 Appendix I
3Cache Coherence
- Common problem with multiple copies of mutable
information (in both hardware and software) - If a datum is copied and the copy is to match
the original at all times, then all changes to
the original must cause the copy to be
immediately updated or invalidated. (Richard L.
Sites, co-architect of DEC Alpha)
Copy becomes stale
1 2 3 4 A A A C - A B B
Copies diverge hard to recover from
4Example of Cache Coherence
- I/O in uniprocessor with primary unified cache
- MM copy and cache copy of memory block not always
coherent - WT cache
- MM copy stale while write update to MM in transit
- WB cache
- MM copy stale while cache copy Dirty
- Inconsistency of no concern if no one
reads/writes MM copy - If I/O directed to main memory, need to maintain
coherence
5Example of Cache Coherence (contd)
- Uniprocessor with a split primary cache
- I-cache contains instruction
- D-cache contains data
- Often contents are disjoint
- If self-modifying code is allowed, then same
cache block may appear in both caches, and
consistency must be enforced - MS-DOS allows self-modifying code
- Strong motivation for unified caches in Intel
i386 and i486 - Pentium has split primary cache, and supports SMC
by enforcing coherence between I and D caches - Coordinating primary and secondary caches in
uniprocessor - Shared memory multiprocessors
6Two Snoopy Protocols
- We will discuss two protocols
- A simple three-state protocol
- Section 6.3 Appendix I of HP3
- The MESI protocol
- IEEE standard
- Used by many machines, including Pentium and
PowerPC 601 - Snooping
- monitor memory bus activity by individual caches
- taking some actions based on this activity
- introduces a fourth category of miss to the 3C
model coherence misses - First, we need some notation to discuss the
protocols
7Notation Write-Through Cache
8Notation Write-Back Cache
9Three-State Write-Invalidate Protocol
- Minor modification of WB cache
- Assumptions
- Single bus and MM
- Two or more CPUs, each with WB cache
- Every cache block in one of three states
Invalid, Clean, Dirty (called Invalid, Shared,
Exclusive in Figure 6.10 of HP3) - MM copies of blocks have no state
- At any moment, a single cache owns bus (is bus
master) - Bus master does not obey bus command
- All misses (reads or writes) serviced by
- MM if all cache copies are Clean
- the only Dirty cache copy (which is no longer
Dirty), and MM copy is written instead of being
read
10Understanding the Protocol
- Only two global states
- Most up-to-date copy is MM copy, and
- all cache copies are Clean
- Most up-to-date copy is a single unique
- cache copy in state Dirty
- Bus owner Clean
- Another Clean copy exists
- Can read without notifying
- other caches
- Bus owner Dirty
- No other cache copies
- Can read or write without
- notifying other caches
- Bus owner Clean
- No other cache copies
- Can read without notifying
- other caches
11State Diagram of Cache Block (Part 1)
12State Diagram of Cache Block (Part 2)
13Comparison with Single WB Cache
- Similarities
- Read hit invisible on bus
- All misses visible on bus
- Differences
- In single WB cache, all misses are serviced by
MM in three-state protocol, misses are serviced
either by MM or by unique cache block holding
only Dirty copy - In single WB cache, write hit is invisible on
bus in three-state protocol, write hit of Clean
block - invalidates all other Clean blocks by a Bus Write
Miss (necessary action)
14Correctness of Three-State Protocol
- Problem State transition of FSM is supposed to
be atomic, but they are not in this protocol,
because of the bus - Example CPU read miss in Dirty state
- CPU access to cache detects a miss
- Request bus
- Acquire bus, and change state of cache block
- Evict dirty block to MM
- Put Bus Read Miss on bus
- Receive requested block from MM or another cache
- Release bus, and read from cache block just
received - Bus arbitration may cause gap between steps 2 and
3 - Whole sequence of operations no longer atomic
- App. I.1 argues that protocol will work correctly
if steps 3-7 are atomic, i.e., bus is not a
split-transaction bus
15Adding More Bits to Protocols
- Add third bit, called Shared, to Valid and Dirty
bits - Get five states (M, O, E, S, I)
- Developed in context of Futurebus, with
intention of explaining all snoopy protocols, all
of which use 3, 4, or 5 states
16MESI Protocol
- Four-state, write-invalidate
- Improved version of three-state protocol
- Clean state split into Exclusive and Shared
states - Dirty state equivalent to Modified state
- Several slightly different versions of MESI
protocol - Will describe version implemented by Futurebus
- PowerPC 601 MESI protocol does not support
cache-to-cache transfer of blocks
17State Diag. of MESI Cache Block (Part 1)
18State Diag. of MESI Cache Block (Part 2)
19Comparison with Three-State Protocol
- Similarities
- Read hit invisible on bus
- All misses handled the same way
- Differences
- Big improvement in handling write hits
- Write hit in Exclusive state invisible on bus
- Write hit in Shared state involves no block
transfer, only a control signal
- Exclusive state
- Can be read or written
- Shared state
- Can be read only
- Modified state
- Can be read and written
20Comments on Write-Invalidate Protocols
- Performance
- Processor can lose cache block through
invalidation by another processor - Average memory access time goes up, since writes
to shared blocks take more time (other copies
have to be invalidated) - Implementation
- Bus and CPU want to simultaneously access same
cache - Either same block or different blocks, but
conflict nonetheless - Three possible solutions
- Use a single tag array, and accept structural
hazards - Use two separate tag arrays for bus and CPU,
which must now be kept coherent at all times - Use a multiported tag array (both Intel Pentium
and PowerPC 601 use this solution)