Cache Memory - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Cache Memory

Description:

for two 4-word blocks. The tiny, very fast CPU register file. has room ... M=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/set. Address trace (reads) ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 33

Provided by: binyu5

Category:

more less

Transcript and Presenter's Notes

Title: Cache Memory

1
Cache Memory
2
Outline

General concepts
3 ways to organize cache memory
Issues with writes
Suggested Reading 6.4

3
Cache Memory

History
At very beginning, 3 levels
Registers, main memory, disk storage
10 years later, 4 levels
Register, SRAM cache, main DRAM memory, disk
storage
Modern processor, 45 levels
Registers, SRAM L1, L2(,L3) cache, main DRAM
memory, disk storage
Cache memories
are small, fast SRAM-based memories
are managed by hardware automatically
can be on-chip, on-die, off-chip

4
Cache Memory
CPU chip
pp. 488
register file
ALU
L1 cache
cache bus
system bus
memory bus
main memory
I/O bridge
bus interface
L2 cache
5
Cache Memory

L1 cache is on-chip
L2 cache is off-chip several years ago
L3 cache can be off-chip or on-chip
CPU looks first for data in L1, then in L2, then
in main memory
Hold frequently accessed blocks of main memory in
caches

6
Inserting an L1 cache between the CPU and main
memory
7
Generic Cache Memory Organization
8
Cache Memory
9
Cache Memory
10
Addressing caches
11
Direct-mapped cache

Simplest kind of cache
Characterized by exactly one line per set.

12
Accessing direct-mapped caches

Set selection
Use the set index bits to determine the set of
interest

13
Accessing direct-mapped caches

Line matching and word extraction
find a valid line in the selected set with a
matching tag (line matching)
then extract the word (word selection)

14
Accessing direct-mapped caches
15
Line Replacement on Misses in Directed Caches

If cache misses
Retrieve the requested block from the next level
in the memory hierarchy
Store the new block in one of the cache lines of
the set indicated by the set index bits

16
Line Replacement on Misses in Directed Caches

If the set is full of valid cache lines
One of the existing lines must be evicted
For a direct-mapped cache
Each set contains only one line
Current line is replaced by the newly fetched line

17
Direct-mapped cache simulation

M16 byte addresses
B2 bytes/block, S4 sets, E1 entry/set

18
Direct-mapped cache simulation
M16 byte addresses, B2 bytes/block, S4 sets,
E1 entry/set Address trace (reads) 0 0000 1
0001 13 1101 8 1000 0 0000
19
Direct-mapped cache simulation
M16 byte addresses, B2 bytes/block, S4 sets,
E1 entry/set Address trace (reads) 0 0000 1
0001 13 1101 8 1000 0 0000
20
Direct-mapped cache simulation
21
Why use middle bits as index?

High-Order Bit Indexing
Adjacent memory lines would map to same cache
entry
Poor use of spatial locality
Middle-Order Bit Indexing
Consecutive memory lines map to different cache
lines
Can hold C-byte region of address space in cache
at one time

22
Set associative caches

Characterized by more than one line per set

23
Accessing set associative caches

Set selection
identical to direct-mapped cache

24
Accessing set associative caches

Line matching and word selection
must compare the tag in each valid line in the
selected set.

25
Fully associative caches

Characterized by all of the lines in the only one
set
No set index bits in the address

26
Accessing fully associative caches

Word selection
must compare the tag in each valid line

27
Issues with Writes

Write hits
Write through
Cache updates its copy
Immediately writes the corresponding cache block
to memory
Write back
Defers the memory update as long as possible
Writing the updated block to memory only when it
is evicted from the cache
Maintains a dirty bit for each cache line

28
Issues with Writes

Write misses
Write-allocate
Loads the corresponding memory block into the
cache
Then updates the cache block
No-write-allocate
Bypasses the cache
Writes the word directly to memory
Combination
Write through, no-write-allocate
Write back, write-allocate

29
Multi-level caches
30
Cache performance metrics

Miss Rate
fraction of memory references not found in cache
(misses/references)
Typical numbers
3-10 for L1
Hit Rate
fraction of memory references found in cache (1 -
miss rate)

31
Cache performance metrics

Hit Time
time to deliver a line in the cache to the
processor (includes time to determine whether the
line is in the cache)
Typical numbers
1-2 clock cycle for L1
5-10 clock cycles for L2
Miss Penalty
additional time required because of a miss
Typically 25-100 cycles for main memory

32
Cache performance metrics