EECSCS 370 presentation

About This Presentation

Transcript and Presenter's Notes

Title: EECSCS 370

1
EECS/CS 370

Cache Interactions

2
Short Lecture

How are caches integrated into a pipelined
implementation?
Replace instruction memory with Icache.
Replace data memory with Dcache.
Issues
Memory access now have variable latency.
Both caches may miss at the same time.

3
Pipeline with caches
M U X
1

0
R0

14
R1
regA

7
R2
Register file
regB
7
M U X
PC
Dcache
ICache
nand 1 2 5
10
R3
21
75
3
11
R4

4
5
M U X
11

R5
data

R6
M U X

R7

add
lw
nand
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
4
Pipeline with caches after miss
M U X
1

0
R0

14
R1
regA

7
R2
Register file
regB
7
M U X
PC
Dcache
ICache
nand 1 2 5
10
R3
21

3
11
R4

4
M U X
11
75
R5
data

R6
M U X

R7

add
lw
noop
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
5
Pipeline with caches after hit
M U X
1

0
R0

14
R1
regA

7
R2
Register file
regB
14
M U X
PC
Dcache
ICache

10
R3
18
99
5
11
R4

3
4
M U X
7
75
R5
data

R6
M U X

R7

nand
add
lw
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
6
Blocking Caches on miss?

What about out of order execution pipelines?
No need to stop the the add instruction from
executing, but we cant let it write back to the
register file. Why?
What about another load instruction?
Blocking cache dont allow other cache
references
Non-blocking cache allow other cache references

7
Sample Question (PH pg 565)

Assume an instruction cache miss rate for gcc of
2 and a data cache miss rate of 4. If a
machine has a CPI of 2 without any memory stalls
and the miss penalty is 40 cycles for all misses,
determine how much faster a machine would run
with a perfect cache that never missed? Assume 36
of instructions are loads/stores.

Note PH pipeline cannot execute instruction OoO
8
Answer (PH pg 566)

Instruction count (I)
Icache stalls I ? 0.02 ? 40 0.8I
Dcache stalls I ? 0.36 ? 0.04 ? 40 0.56I
total memory stalls 1.36 I
CPI with perfect memory 2.0
CPI with memory stalls 3.36
Perfect memory performance is better by 3.36/2 or
1.68

9
Cache questions

Hit rate of LC2K1 code fragments
Direct mapped, associative, block size
Overhead requirements for caches
Tag, valid, dirty, LRU
Address bits usage (tag, index, offset)
Types of misses and how to reduce them
Compulsory
Capacity
Conflict

10
Victim cache
Victim cache
Direct mapped cache
Small fully associative cache
Blocks evicted from the direct mapped cache
are placed in the victim cache.
Both caches are searched in parallel.

Write a Comment

User Comments (0)

About PowerShow.com

EECSCS 370 PowerPoint PPT Presentation