Lecture 19: Cache Replacement Policy, Line Size, Write Method, and Multilevel Caches - PowerPoint PPT Presentation

1 / 14

About This Presentation

Title:

Lecture 19: Cache Replacement Policy, Line Size, Write Method, and Multilevel Caches

Description:

... the CPU accesses any other word in the line, it will be found in the cache. ... Load the line to cache, and write the new data to both cache and memory (this ... – PowerPoint PPT presentation

Number of Views:157

Avg rating:3.0/5.0

Slides: 15

Provided by: soonte

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 19: Cache Replacement Policy, Line Size, Write Method, and Multilevel Caches

1
Lecture 19 Cache Replacement Policy, Line Size,
Write Method, and Multi-level Caches

Soon Tee Teoh
CS 147

2
Cache Replacement Policy

For direct-mapped cache, if a word is to be
loaded to the cache, it goes into a fixed
position, and replaces whatever was there before.
For set-associative or fully associative cache, a
word can be loaded to more than one possible
locations. We need a cache replacement policy to
decide which position to load the new word to.

3
Cache Replacement Policy

3 common options
Random replacement
Simple
But perhaps high miss rate
First in, first out (FIFO)
Rationale Oldest is most likely not needed
anymore
Implementation Maintain a queue
Least recently used (LRU)
Rationale The one that has been unused for the
longest time is most likely not needed anymore
Implementation Usually costly to implement

4
Assumptions for following examples

Assume 1 KB (that is, 1024 bytes) of main memory
Assume 4-byte words
Assume 32-word cache
Assume memory is byte-addressed
Therefore 1 KB memory needs 10-bit address

5
Line Size

Rather than fetching a single word from memory to
cache, fetch a whole block of l words, called a
line.
This takes advantage of spatial locality.
The number of words in a line is a power of two.

Example 4 words in a line
9 8 7 6 5 4 3
2 1 0
Tag Index Line Word
6
Line Size Example
2-way set associative cache, with 4-word lines,
and capacity to contain 32 words
9 8 7 6 5 4 3
2 1 0
Tag Index Line Word
Tag 2 Line 2
Index Tag 1 Line 1
00 01 10 11 1111

0011
Scenario Suppose CPU is requesting the word at
memory address 1010110100. Step 1 the index is
11, therefore look at the two tags at index
11. Suppose Tag 1 at index 11 is 1111 and Tag 2
is 0011. Since both tags dont match tag 1010
requested by CPU, we load words 1010110000
through 1010111100 to the cache.
7
Line Size Example (continued)
Suppose that the cache replacement policy
determines that we should replace Set 1, then the
line will be loaded into Set 1, and the tag
changed.
Tag 2 Line 2
Index Tag 1 Line 1
00 01 10 11 1010

0011
Word 1010110000
Word 1010110100
Word 1010111000
Word 1010111100
8
Line Size Example (continued)
The CPU memory request was word
1010110100. Therefore the second word
(highlighted) of the line is loaded to the CPU.
Tag 2 Line 2
Index Tag 1 Line 1
00 01 10 11 1010

0011
If after this the CPU accesses any other word in
the line, it will be found in the cache.
9
Cache Write Method

Suppose the CPU wants to write a word to memory.
If the memory unit uses the cache, it has several
options.
2 commonly-used options
Write-through
Write-back

10
Write-Through

On a write request by the CPU, check if the old
data is in the cache.
If the old data is in the cache (Write Hit),
write the new data into the cache, and also into
memory, replacing the old data in both cache and
memory.
If the old data is not in the cache (Write Miss),
either
Load the line to cache, and write the new data to
both cache and memory (this method is called
write-allocate), or
Just write the new data to memory, dont load the
line to cache (this method is called
no-write-allocate)
Advantage Keeps cache and memory consistent
Disadvantage Needs to stall for memory access on
every memory write
To reduce this problem, use a write buffer. When
the CPU wants to write a word to memory, it puts
the word into the write buffer, and then
continues executing the instructions following
the memory write. Simultaneously, the write
buffer writes the words to memory.

11
Write-Back

When the instruction requires a write to memory,
If there is a cache hit, write only to the cache.
Later, if there is a cache miss, and this line
needs to be replaced, write the data back to
memory.
If there is a cache miss, either
Load the line to cache, and write the new data to
both cache and memory (this method is called
write-allocate), or
Just write the new data to memory, dont load the
line to cache (this method is called
no-write-allocate)
In the Write-back method, we have a dirty bit
associated with each cache entry. If the dirty
bit is set to 1, we need to write back to memory
when this cache entry needs to be replaced. If
the dirty bit is 0, we dont need to write back
to memory, saving CPU stalls
Disadvantage of Write-Back approach
Inconsistency - memory contains stale data
Note Write-allocate is usually used with
write-back. No-write-allocate is usually used
with write-through.

12
Cache Loading

In the beginning, the cache contains junk.
When the CPU makes a memory access, it compares
the tag field in the memory address to the tag in
the cache. Even if the tags match, we dont know
if the data is valid.
Therefore, we add a valid bit to each cache
entry.
In the beginning, all the valid bits are set to
0.
Later, as data are loaded from memory to cache,
the valid bit for the cache entry is set to 1.
To check if a word is in the cache, we have to
check if the cache tag matches the address tag,
and that the valid bit is 1.

13
Instruction and Data Caches

Can either have separate Instruction Cache and
Data Cache, or have one unified cache.
Advantage of separate cache Can access
Instruction Cache and Data Cache simultaneously
in the same cycle, as required by a pipelined
datapath
Advantage of unified cache More flexible, so may
have a higher hit rate

14
Multiple-Level Caches

More levels in the memory hierarchy
Can have two levels of cache
The Level-1 cache (or L1 cache, or internal
cache) is smaller and faster, and lies in the
processor next to the CPU.
The Level-2 cache (or L2 cache, or external
cache) is larger but slower, and lies outside the
processor.
Memory access first goes to the L1 cache. If L1
cache access is a miss, go to L2 cache. If L2
cache is a miss, go to main memory. If main
memory is a miss, go to virtual memory on hard
disk.