Memory Hierarchy Design Chapter 5 - PowerPoint PPT Presentation

About This Presentation

Title:

Memory Hierarchy Design Chapter 5

Description:

Memory Hierarchy Design Chapter 5 Karin Strauss – PowerPoint PPT presentation

Number of Views:149

Avg rating:3.0/5.0

Slides: 19

Provided by: Kari125

Learn more at: https://homes.cs.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Memory Hierarchy Design Chapter 5

1
Memory Hierarchy DesignChapter 5

Karin Strauss

2
Background

1980 no caches
1995 two levels of caches
2004 even three levels of caches
Why?
Processor-Memory gap

3
Processor-Memory Gap
µProc 60/yr.
1000
CPU
100
Performance
10
DRAM 7/yr.
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Source lecture handouts Prof. John Kubiatowicz,
CS252 U.C.Berkeley
4
Because

Memory speed is a limiting factor in performance
Caches are small and fast
Caches leverage the principle of locality
Temporal locality data that has been referenced
recently tends to be re-referenced soon
Spatial locality data close (in the address
space) to recently referenced data tends to be
referenced soon

5
Review

Cache block minimum unit of information that can
be present in the cache (several contiguous
memory positions)
Cache hit requested data can be found in cache
Cache miss requested data cannot be found in
cache
The four design questions
Where can a block be placed?
How can a block be found?
Which block should be replaced?
What happens on a write?

6
Where can a block be placed?
Suppose we need to place block 10
Directly mapped (1-way) 10 mod 8 2
2-way set associative 10 mod 4 set 2
4-way set associative 10 mod 2 set 0
fully associative (8-way, in this case)
anywhere
Placement set address mod ( sets)
Where ( sets) (cache size)/( ways)
7
How can a block be found?
Look at the address!
Block Address
Block Offset
Tag
Index
Block Offset
determines set (no index in fully associative
caches)
determines offset in block
block unique id primary key
8
Which block should be replaced?

Random
Least Recently Used (LRU)
True LRU may be too costly to implement in
hardware (requires a stack)
Simplified LRU
First in, First out (FIFO)

9
What happens on a write?

Write through every time a block is written, the
new value is propagated to the next memory level
Easier to implement
Makes displacement simple and fast
Reads never have to wait for a displacement to
finish
Writes may have to wait ?use a write buffer
Write back new value is propagated to the next
memory level only when block is displaced
Makes writes fast
Uses less memory bandwidth
Dirty bit may save additional bandwidth
no need to write clean blocks
Saves power

10
What happens on a write? (cont.)

Write allocate fetch on write
Entire block is brought into cache
No write allocate write around
Written word is sent to next memory level
Write policy and write miss policy are
independent, but usually
Write back ? write allocate
Write through ? no write allocate

11
Cache Performance

AMAT (hit time) (miss rate)(miss penalty)

Which system is faster?
Unified cache Split caches Split caches
Unified cache D-cache I-cache
Size 32KB 16KB 16KB
Miss rate 1.99 0.64 6.47
Hit time I1 / D2 1 1
Miss penalty 50 50 50
75 of accesses are instruction references
12
Solution

AMAT(split) 0.75(10.6450)
0.25(16.4750)
AMAT(split) 2.05
AMAT(unified) 0.75(11.9950)
0.25(21.9950)
AMAT(unified) 2.24
Miss Rate(split) 0.750.64 0.256.47
2.10
Miss Rate(unified) 1.99
Although split has a higher miss rate, it is
faster on avg!

13
Processor Performance

CPU time (proc cyc mem stall cyc)(clk cyc
time)
proc cyc ICCPI
mem stall cyc (mem accesses)(miss rate)(miss
penalty)

What is the total CPU time including the caches,
in function of IC and clk cyc time?
CPI (proc) 2.0
Miss penalty 50 cyc
Miss rate 2
Mem ref/inst 1.33
14
Processor Performance