EE108b Review Session 6 - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

EE108b Review Session 6

Description:

Number of Views:74

Avg rating:3.0/5.0

Slides: 18

Provided by: DGE97

Category:

Tags: ee108b | hints | review | session

Transcript and Presenter's Notes

Title: EE108b Review Session 6

1
EE108b Review Session 6

2
Agenda

3
Admin

4
HW 4 Hints

HW 4 Problem 1
Equation for write through cache
AMATThisLevel HitRateThisLevel
HitTimeThisLevel
MissRateThisLevel (HitTimeThisLevel
AMATNextLevel)
HitTimeThisLevel (HitRateThisLevel
MissRateThisLevel)
MissRateThisLevel (AMATNextLevel)
HitTimeThisLevel MissRateThisLevel
(AMATNextLevel)
Hit time is 1 cycle if unspecified, but not 0
seconds!
Perfect write through buffer i.e. no matter how
much data is fed into the buffer by the CPU, it
would never result in a stall.

5
HW 4 Hints

Equation for write back cache
AMATThisLevel HitTimeThisLevel
MissRateThisLevel (AMATNextLevel)
MissRateThisLevel DirtyRateThisLevel
(AMATNextLevel)
CPI equation
CPIoverall CPIbase InstrReadFreq
CyclesPerInstrRead DataReadFreq
CyclesPerDataRead DataWriteFreq
CyclesPerDataWrite
CyclesPerWhateverAction AMATWhateverAction
ClockFrequency

6
HW 4 Hints

HW 4 Problem 2
Part a consider the relative frequency of hits
and misses are encountered.
Part b what happens to the AMAT?
Part c consider what happens when the block size
is greater than 1 word e.g. B words in general?
Part d what will these elements map to in direct
mapped cache?
Part f what is the relationship between the
array and step size values so that in the 2-way
set associative case data would compete for the
same cache line, but not so with a 4-way set
associative cache?

7
HW 4 Hints

HW 4 Problem 3 Simple problem
I cache No writes so no Valid bit needed
D cache 1 Valid bit and 1 Dirty bit for the
Write Back
HW 4 Problem 4
Part a) e.data load_from_memory(a) - replace
this to simulate multiple word blocks
(you also will need add input arguments)

8
HW 4 Hints

HW 4 Problem 5
Part a) Lots of information not everything is
relevant
Ex L1 cache block size 32 G 5
Ex L2 cache 1024 lines I 10
Part b,c,d) Calculate AMAT for instruction and
data RWs Use hints!
Hint 1 First work out the AMAT of each level of
the memory hierarchy.
Hint 2 The AMAT is the sum of TLB AMAT and cache
AMAT. Do not blindly apply equations, think about
the sequence of events for an instruction read,
how should the penalty of a page fault be
incorporated into your calculation?

9
HW 4 Hints

10
Midterm Q2c

The first choice is a 2GHz, To emulate a saddu_b
or a saddu instruction on this processor, you
need 3 arithmetic, one load, and one branch
instruction
execution frequencies for the various MIPS
instructions
Instruction class Frequency CPI
Arithmetic (addu, ori, slt, etc) 40 1
Load/store (lw, sw, etc) 40 3
Branch (beq, bne, j, etc) 20 2
You also note that 30 of the branch instructions
are due to the emulation of saddu_b and saddu.
The alternative is to design your own MIPS
processor that implements saddu_b and saddu in
hardware.
Assuming that the CPI of saddu_b and saddu is 1
cycle
What is the minimum clock frequency needed for
the new proc?

11
HW 3 Problem 5c

Consider the following code snippet
lw t0, 0(t3)
add t0, t1, t2
sw t0, 16(t3)
lw t0, 32(t3)
addi t0, t0, 14
Suppose a type of 0-displacement addressing mode
is implemented in the MIPS processor for the lw
and sw instructions i.e. in place of
lw t0, 32(t3)
we have
addiu t3, t3, 32
lw t0, t3

12
HW 3 Problem 5c

In the new addressing mode however, address
calculation and data memory access never takes
place in the same instruction. Consequently, we
can combine the 2 stages into one i.e.
Instruction fetch
Instruction decode and register read
EITHER execution OR data memory access
Write back
In other words, instructions are made to span
over only 4 clock cycles instead of 5

13
HW 3 Problem 5c

CPI of all the instructions is 1 (except lw,sw)
CPI of lw, sw 1.5 because of cache misses ONLY
Instruction Frequency Load 21, Stores 12, ALU
46, Jump 21
13 of loads,11 of stores use a 0-displacement
25 chance for a load to be followed by a
dependant instruction. For example lw t0,
32(t3) followed by addi t0, t0, 14

14
Practice Problem

15
Practice Problem

Page offset
We need to be able to choose 1 of 16 KB (116384
bytes), in order to do so the page offset needs
to be 14 bits wide
Virtual page number
The address is 40-bits wide in total, 14 is used
for page offset, so the virtual page number is
26-bits wide.
TLB index (plus tag)
The TLB is 2-way set associative, and there are
256 entries in total. This means that there must
be 128 entries in each set, indexable using 7
bits this leaves 19 bits for the tag

16
Practice Problem
17
HW 3 Problem 6

Label1 lw 1,40(6)
beq 2,3,Label2 Taken
add 1, 6, 4
Label2 beq 1,2,Label1 Not Taken
sw 2,20(4)
and 1,1,4
Draw pipeline execution diagram for this code
assuming no delay slots and branches execute in
EX stage
Draw pipeline execution diagram for this code
assuming delay slots used
You do not know branch target until Decode Stage