Title: Lecture 19: Cache Replacement Policy, Line Size, Write Method, and Multilevel Caches
1Lecture 19 Cache Replacement Policy, Line Size,
Write Method, and Multi-level Caches
2Cache Replacement Policy
- For direct-mapped cache, if a word is to be
loaded to the cache, it goes into a fixed
position, and replaces whatever was there before. - For set-associative or fully associative cache, a
word can be loaded to more than one possible
locations. We need a cache replacement policy to
decide which position to load the new word to.
3Cache Replacement Policy
- 3 common options
- Random replacement
- Simple
- But perhaps high miss rate
- First in, first out (FIFO)
- Rationale Oldest is most likely not needed
anymore - Implementation Maintain a queue
- Least recently used (LRU)
- Rationale The one that has been unused for the
longest time is most likely not needed anymore - Implementation Usually costly to implement
4Assumptions for following examples
- Assume 1 KB (that is, 1024 bytes) of main memory
- Assume 4-byte words
- Assume 32-word cache
- Assume memory is byte-addressed
- Therefore 1 KB memory needs 10-bit address
5Line Size
- Rather than fetching a single word from memory to
cache, fetch a whole block of l words, called a
line. - This takes advantage of spatial locality.
- The number of words in a line is a power of two.
Example 4 words in a line
9 8 7 6 5 4 3
2 1 0
Tag Index Line Word
6Line Size Example
2-way set associative cache, with 4-word lines,
and capacity to contain 32 words
9 8 7 6 5 4 3
2 1 0
Tag Index Line Word
Tag 2 Line 2
Index Tag 1 Line 1
00 01 10 11 1111
0011
Scenario Suppose CPU is requesting the word at
memory address 1010110100. Step 1 the index is
11, therefore look at the two tags at index
11. Suppose Tag 1 at index 11 is 1111 and Tag 2
is 0011. Since both tags dont match tag 1010
requested by CPU, we load words 1010110000
through 1010111100 to the cache.
7Line Size Example (continued)
Suppose that the cache replacement policy
determines that we should replace Set 1, then the
line will be loaded into Set 1, and the tag
changed.
Tag 2 Line 2
Index Tag 1 Line 1
00 01 10 11 1010
0011
Word 1010110000
Word 1010110100
Word 1010111000
Word 1010111100
8Line Size Example (continued)
The CPU memory request was word
1010110100. Therefore the second word
(highlighted) of the line is loaded to the CPU.
Tag 2 Line 2
Index Tag 1 Line 1
00 01 10 11 1010
0011
If after this the CPU accesses any other word in
the line, it will be found in the cache.
9Cache Write Method
- Suppose the CPU wants to write a word to memory.
- If the memory unit uses the cache, it has several
options. - 2 commonly-used options
- Write-through
- Write-back
10Write-Through
- On a write request by the CPU, check if the old
data is in the cache. - If the old data is in the cache (Write Hit),
write the new data into the cache, and also into
memory, replacing the old data in both cache and
memory. - If the old data is not in the cache (Write Miss),
either - Load the line to cache, and write the new data to
both cache and memory (this method is called
write-allocate), or - Just write the new data to memory, dont load the
line to cache (this method is called
no-write-allocate) - Advantage Keeps cache and memory consistent
- Disadvantage Needs to stall for memory access on
every memory write - To reduce this problem, use a write buffer. When
the CPU wants to write a word to memory, it puts
the word into the write buffer, and then
continues executing the instructions following
the memory write. Simultaneously, the write
buffer writes the words to memory.
11Write-Back
- When the instruction requires a write to memory,
- If there is a cache hit, write only to the cache.
- Later, if there is a cache miss, and this line
needs to be replaced, write the data back to
memory. - If there is a cache miss, either
- Load the line to cache, and write the new data to
both cache and memory (this method is called
write-allocate), or - Just write the new data to memory, dont load the
line to cache (this method is called
no-write-allocate) - In the Write-back method, we have a dirty bit
associated with each cache entry. If the dirty
bit is set to 1, we need to write back to memory
when this cache entry needs to be replaced. If
the dirty bit is 0, we dont need to write back
to memory, saving CPU stalls - Disadvantage of Write-Back approach
Inconsistency - memory contains stale data - Note Write-allocate is usually used with
write-back. No-write-allocate is usually used
with write-through.
12Cache Loading
- In the beginning, the cache contains junk.
- When the CPU makes a memory access, it compares
the tag field in the memory address to the tag in
the cache. Even if the tags match, we dont know
if the data is valid. - Therefore, we add a valid bit to each cache
entry. - In the beginning, all the valid bits are set to
0. - Later, as data are loaded from memory to cache,
the valid bit for the cache entry is set to 1. - To check if a word is in the cache, we have to
check if the cache tag matches the address tag,
and that the valid bit is 1.
13Instruction and Data Caches
- Can either have separate Instruction Cache and
Data Cache, or have one unified cache. - Advantage of separate cache Can access
Instruction Cache and Data Cache simultaneously
in the same cycle, as required by a pipelined
datapath - Advantage of unified cache More flexible, so may
have a higher hit rate
14Multiple-Level Caches
- More levels in the memory hierarchy
- Can have two levels of cache
- The Level-1 cache (or L1 cache, or internal
cache) is smaller and faster, and lies in the
processor next to the CPU. - The Level-2 cache (or L2 cache, or external
cache) is larger but slower, and lies outside the
processor. - Memory access first goes to the L1 cache. If L1
cache access is a miss, go to L2 cache. If L2
cache is a miss, go to main memory. If main
memory is a miss, go to virtual memory on hard
disk.