Memory Hierarchy Design Chapter 5 - PowerPoint PPT Presentation

About This Presentation
Title:

Memory Hierarchy Design Chapter 5

Description:

Memory Hierarchy Design Chapter 5 Karin Strauss – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 19
Provided by: Kari125
Category:

less

Transcript and Presenter's Notes

Title: Memory Hierarchy Design Chapter 5


1
Memory Hierarchy DesignChapter 5
  • Karin Strauss

2
Background
  • 1980 no caches
  • 1995 two levels of caches
  • 2004 even three levels of caches
  • Why?
  • Processor-Memory gap

3
Processor-Memory Gap
µProc 60/yr.
1000
CPU
100
Performance
10
DRAM 7/yr.
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Source lecture handouts Prof. John Kubiatowicz,
CS252 U.C.Berkeley
4
Because
  • Memory speed is a limiting factor in performance
  • Caches are small and fast
  • Caches leverage the principle of locality
  • Temporal locality data that has been referenced
    recently tends to be re-referenced soon
  • Spatial locality data close (in the address
    space) to recently referenced data tends to be
    referenced soon

5
Review
  • Cache block minimum unit of information that can
    be present in the cache (several contiguous
    memory positions)
  • Cache hit requested data can be found in cache
  • Cache miss requested data cannot be found in
    cache
  • The four design questions
  • Where can a block be placed?
  • How can a block be found?
  • Which block should be replaced?
  • What happens on a write?

6
Where can a block be placed?
Suppose we need to place block 10
Directly mapped (1-way) 10 mod 8 2
2-way set associative 10 mod 4 set 2
4-way set associative 10 mod 2 set 0
fully associative (8-way, in this case)
anywhere
Placement set address mod ( sets)
Where ( sets) (cache size)/( ways)
7
How can a block be found?
Look at the address!
Block Address
Block Offset
Tag
Index
Block Offset
determines set (no index in fully associative
caches)
determines offset in block
block unique id primary key
8
Which block should be replaced?
  • Random
  • Least Recently Used (LRU)
  • True LRU may be too costly to implement in
    hardware (requires a stack)
  • Simplified LRU
  • First in, First out (FIFO)

9
What happens on a write?
  • Write through every time a block is written, the
    new value is propagated to the next memory level
  • Easier to implement
  • Makes displacement simple and fast
  • Reads never have to wait for a displacement to
    finish
  • Writes may have to wait ?use a write buffer
  • Write back new value is propagated to the next
    memory level only when block is displaced
  • Makes writes fast
  • Uses less memory bandwidth
  • Dirty bit may save additional bandwidth
  • no need to write clean blocks
  • Saves power

10
What happens on a write? (cont.)
  • Write allocate fetch on write
  • Entire block is brought into cache
  • No write allocate write around
  • Written word is sent to next memory level
  • Write policy and write miss policy are
    independent, but usually
  • Write back ? write allocate
  • Write through ? no write allocate

11
Cache Performance
  • AMAT (hit time) (miss rate)(miss penalty)

Which system is faster?
Unified cache Split caches Split caches
Unified cache D-cache I-cache
Size 32KB 16KB 16KB
Miss rate 1.99 0.64 6.47
Hit time I1 / D2 1 1
Miss penalty 50 50 50
75 of accesses are instruction references
12
Solution
  • AMAT(split) 0.75(10.6450)
    0.25(16.4750)
  • AMAT(split) 2.05
  • AMAT(unified) 0.75(11.9950)
    0.25(21.9950)
  • AMAT(unified) 2.24
  • Miss Rate(split) 0.750.64 0.256.47
    2.10
  • Miss Rate(unified) 1.99
  • Although split has a higher miss rate, it is
    faster on avg!

13
Processor Performance
  • CPU time (proc cyc mem stall cyc)(clk cyc
    time)
  • proc cyc ICCPI
  • mem stall cyc (mem accesses)(miss rate)(miss
    penalty)

What is the total CPU time including the caches,
in function of IC and clk cyc time?
CPI (proc) 2.0
Miss penalty 50 cyc
Miss rate 2
Mem ref/inst 1.33
14
Processor Performance
  • AMAT has large impact on performance
  • If CPI decreases, mem stall cyc represents a
    larger fraction of total cycles
  • If clock cycle time decreases, mem stall cyc
    represents more cycles

Note in ooo execution processors, part of the
memory access latency is overlapped with
computation
15
Improving Cache Performance
  • AMAT (hit time) (miss rate)(miss penalty)
  • Reducing hit time
  • Small and simple caches
  • No address translation
  • Pipelined cache access
  • Trace caches

16
Improving Cache Performance
  • AMAT (hit time) (miss rate)(miss penalty)
  • Reducing miss rate
  • Larger block size
  • Larger cache size
  • Higher associativity
  • Way prediction or pseudo-associative caches
  • Compiler optimizations (code/data layout)

17
Improving Cache Performance
  • AMAT (hit time) (miss rate)(miss penalty)
  • Reducing miss penalty
  • Multilevel caches
  • Critical word first
  • Read miss before write miss
  • Merging write buffers
  • Victim caches

18
Improving Cache Performance
  • AMAT (hit time) (miss rate)(miss penalty)
  • Reducing miss rate and miss penalty
  • Increase parallelism
  • Non-blocking caches
  • Prefetching
  • Hardware
  • Software
Write a Comment
User Comments (0)
About PowerShow.com