Title: Cache Memories
1Cache Memories
- Fast processors need fast memories
- Fast RAM (SRAM) is expensive and small(i.e. low
memory density) - DRAM is cheaper bigger but slower
- Use a CACHE to get best of both worlds
2Computer Memory Hierarchy
3Cache Memories
- Cache Hidden Memory
- Cache a small very fast memory that holds copies
of recently used memory values - Cache operates transparently to the programmer
automatically decides what to keep what to
overwrite - Coherency Ensure that the contents of the cache
and the main memory are the SAME (whenever they
have to be)
4Cache Memory Consistency
- Fundamental Requirement
- Every read access to a memory address always
provides the most up-to-date data at that address - This requirement has to be satisfied even in a
multi-busmaster or multi-processor system. Copies
of memory areas may be residing in multiple cache
memories.
5Most Code and Data is local
- Program Execution most code addresses are very
close together - Think of a loop
- Most data variables are used very frequently
- A block of code will use a small number of
variables at a time. Typically, variables are
arranged in structures/objects/records, which
are stored in a block of memory
6Cache Memory Main Memory
7Cache Organisation
- Unified cache for code and data e.g. i486 More
efficient use of resources - Separate (Harvard) code and data caches e.g.
Pentium - Faster because you can access code and data in
the same clock cycle
8Code and Data Cache
9Cache Hits Misses
- Cache Hit if data required by the CPU is in the
cache we have a cache hit, otherwise a cache miss - Cache Hit Rate Proportion of memory accesses
satisfied by cache, Miss Rate more commonly
referred to - To prevent memory bottlenecks cache miss rate
needs to be no more than a few percent - Cache Line a block of data held in the cache
- Cache Line Fill occurs when a block of data is
read from main memory into a cache line
10Cache Definitions
- Cache LineThe smallest unit of storage that can
be allocated in a cache. Processor always reads
or writes entire cache lines. Popular cache line
size 16B-32B - Cache SetA cache set is a group of cache lines
into which a given line in memory can be mapped.
Every memory address is mapped to a specific set.
The number of cache lines per cache set depends
on the associativity of the cache. A cache with n
cache lines per cache set is called an - way
set-associative cache.
11Cache Organisition
- Direct-mapped cache1 cache line per cache set
- 2-way set-associative cache2 cache lines per
cache set - 4-way set-associative cache4 cache lines per
cache set - 8-way set-associative cache8 cache lines per
cache set
12Cache Definitions
- Cache Entry consists of
- Cache Directory EntryContains information such
as what data is stored in the cache. - Cache Memory EntryContains actual cache data
(cache lines)
13Cache Write Strategies
14Cache Invalidation Cache Flush
- If another processor or DMA unit writes to a main
memory location also kept in the cache the cache
controller must perform a Cache Invalidation - When a write-back is used, sometimes updated
cache data must be transferred to the main memory
on demand (e.g. after DMA request) Cache Flush
15Direct Mapped Cache (1)
16Direct Mapped Cache (2)
- Tag comparison and data access can be performed
at the same time direct mapped cache is the
fastest - Tag RAM is small, tag access is completed before
data access - Two items with the same cache set address will
contend for the use of a single cache entry - Cache Contention can lead to cache trashing
- Only bits not used to select within the line or
to address the cache RAM need to be stored in the
tag field
172-way Set-Associative Cache (1)
182-way Set-associative Cache (2)
- Effectively 2 x direct-mapped caches in parallel
- Each of two items that were in contention may
occupy a separate place in the cache - Moving from Direct-mapped cache to 2-way
set-associtive cache for a given cache sizeSet
address is one bit smaller, tag address is one
bit bigger than Direct-mapped case. Can you see
why?
19Cache Line Replacement
- When a cache miss causes a line fill and there is
vacancy in a set the new line will replace a
vacant line - However When a cache miss causes a line fill and
all lines in a set are occupiedyou have to
decide which line in a set will be replaced - This is done through a replacement algorithm.
Popular algorithms are - Least Recently Used (LRU)Algorithm controlled by
LRU bits in the cache directory - Random Allocaction
- Cyclic
20Cache Consistency (1)
- Challenge in handling caches
- Data in a cache may not always be the same as the
data in main memory - Remember that the data in the cache is controlled
by the CPU that owns the cache
21Cache Consistency (2)
- Suppose you have a second CPU or another possible
Bus Master (eg DMA Controller) - Suppose this device wants to access some data
thats in the CPUs cache - Second device can only access main memory
- How does it know if the data it wants to read is
up-to-date? - Solution Bus Snooping or Inquiry Cycles
22Cache Consistency (2)
- Inquiry cycles snoop cyclesInitiated by the
system to determine if a line is present in the
cache, and what state the line is in.
23MESI Protocol Whats that?
- Formal Mechanism for controlling cache
consistency using snooping - Every cache line is in 1 of 4 MESI states
(encoded in 2b) - Cache line can change state by
- memory read and write cycles
- inquiry cycles
24MESI States
- ModifiedAn M-state line is available in only one
cache and it is also MODIFIED (different from
main memory). An M-state line can be accessed
(read/written to) without sending a cycle out on
the bus - ExclusiveAn E-state line is also available in
only one cache in the system, but the line is not
MODIFIED (i.e., it is the same as main memory).
An E-state line can be accessed (read/written to)
without generating a bus cycle. A write to an
E-state line causes the line to become MODIFIED
25MESI States
- SharedThis state indicates that the line is
potentially shared with other caches (i.e., the
same line may exist in more than one cache). A
read to an S-state line does not generate bus
activity, but a write to a SHARED line generates
a write-through cycle on the bus. The
write-through cycle may invalidate this line in
other caches. A write to an S-state line updates
the cache - InvalidThis state indicates that the line is not
available in the cache. A read to this line will
be a MISS and may cause the processor to execute
a LINE FILL (fetch the whole line into the cache
from main memory). A write to an INVALID line
causes the processor to execute a write-through
cycle on the bus
26MESI States
27MESI State Transitions
28Cache Consistency and Bus Snooping (Inquiry
Cycles) -1
29Cache Consistency and Bus Snooping (Inquiry
Cycles) -2
30Cache Consistency and Bus Snooping (Inquiry
Cycles) -3
31Cache Consistency and Bus Snooping (Inquiry
Cycles) -4
32Cache Consistency and Bus Snooping (Inquiry
Cycles) -5
33Cache Consistency and Bus Snooping (Inquiry
Cycles) -6
34Cache Consistency and Bus Snooping (Inquiry
Cycles) -7
35Cache Consistency and Bus Snooping (Inquiry
Cycles) -8
36Cache Consistency and Bus Snooping (Inquiry
Cycles) -9
37L2-Caches and the MESI Protocol
L2-caches are larger than L1 but a bit
slower MESI protocol applies to both
caches Inclusion all address in L1 also in L2
38Intel Architecture Caches
39Pentium L1-caches
40Pentium L1 Cache elements
41Pentium Page Cacheability
42Pentium L2-cache