EE108b Review Session 6 - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

EE108b Review Session 6

Description:

HW 4 Hints. Midterm Q2c or HW 3 Question 5c. Practice ... HW 4 Hints. HW 4 Problem 2: ... Part b,c,d) Calculate AMAT for instruction and data RWs Use hints! ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 18
Provided by: DGE97
Category:
Tags: ee108b | hints | review | session

less

Transcript and Presenter's Notes

Title: EE108b Review Session 6


1
EE108b Review Session 6
  • February 20, 2009
  • Raunaq Shah

2
Agenda
  • Admin
  • HW 4 Hints
  • Midterm Q2c or HW 3 Question 5c
  • Practice Problem
  • HW 3 Question 6 (time permitting)

3
Admin
  • HW 4
  • Due 24 Feb (
  • Submit in pairs
  • PA 2.2
  • Due 3 Mar
  • Submit Individually
  • Lab 4
  • Due 10 Mar
  • Submit in pairs

4
HW 4 Hints
  • HW 4 Problem 1
  • Equation for write through cache
  • AMATThisLevel HitRateThisLevel
    HitTimeThisLevel
  • MissRateThisLevel (HitTimeThisLevel
    AMATNextLevel)
  • HitTimeThisLevel (HitRateThisLevel
    MissRateThisLevel)
  • MissRateThisLevel (AMATNextLevel)
  • HitTimeThisLevel MissRateThisLevel
    (AMATNextLevel)
  • Hit time is 1 cycle if unspecified, but not 0
    seconds!
  • Perfect write through buffer i.e. no matter how
    much data is fed into the buffer by the CPU, it
    would never result in a stall.

5
HW 4 Hints
  • Equation for write back cache
  • AMATThisLevel HitTimeThisLevel
    MissRateThisLevel (AMATNextLevel)
    MissRateThisLevel DirtyRateThisLevel
    (AMATNextLevel)
  • CPI equation
  • CPIoverall CPIbase InstrReadFreq
    CyclesPerInstrRead DataReadFreq
    CyclesPerDataRead DataWriteFreq
    CyclesPerDataWrite
  • CyclesPerWhateverAction AMATWhateverAction
    ClockFrequency

6
HW 4 Hints
  • HW 4 Problem 2
  • Part a consider the relative frequency of hits
    and misses are encountered.
  • Part b what happens to the AMAT?
  • Part c consider what happens when the block size
    is greater than 1 word e.g. B words in general?
  • Part d what will these elements map to in direct
    mapped cache?
  • Part f what is the relationship between the
    array and step size values so that in the 2-way
    set associative case data would compete for the
    same cache line, but not so with a 4-way set
    associative cache?

7
HW 4 Hints
  • HW 4 Problem 3 Simple problem
  • I cache No writes so no Valid bit needed
  • D cache 1 Valid bit and 1 Dirty bit for the
    Write Back
  • HW 4 Problem 4
  • Part a) e.data load_from_memory(a) - replace
    this to simulate multiple word blocks
  • (you also will need add input arguments)

8
HW 4 Hints
  • HW 4 Problem 5
  • Part a) Lots of information not everything is
    relevant
  • Ex L1 cache block size 32 G 5
  • Ex L2 cache 1024 lines I 10
  • Part b,c,d) Calculate AMAT for instruction and
    data RWs Use hints!
  • Hint 1 First work out the AMAT of each level of
    the memory hierarchy.
  • Hint 2 The AMAT is the sum of TLB AMAT and cache
    AMAT. Do not blindly apply equations, think about
    the sequence of events for an instruction read,
    how should the penalty of a page fault be
    incorporated into your calculation?

9
HW 4 Hints
  • Part e) longest latency
  • TLB Misses
  • Page fault occurs on page table

10
Midterm Q2c
  • The first choice is a 2GHz, To emulate a saddu_b
    or a saddu instruction on this processor, you
    need 3 arithmetic, one load, and one branch
    instruction
  • execution frequencies for the various MIPS
    instructions
  • Instruction class Frequency CPI
  • Arithmetic (addu, ori, slt, etc) 40 1
  • Load/store (lw, sw, etc) 40 3
  • Branch (beq, bne, j, etc) 20 2
  • You also note that 30 of the branch instructions
    are due to the emulation of saddu_b and saddu.
  • The alternative is to design your own MIPS
    processor that implements saddu_b and saddu in
    hardware.
  • Assuming that the CPI of saddu_b and saddu is 1
    cycle
  • What is the minimum clock frequency needed for
    the new proc?

11
HW 3 Problem 5c
  • Consider the following code snippet
  • lw t0, 0(t3)
  • add t0, t1, t2
  • sw t0, 16(t3)
  • lw t0, 32(t3)
  • addi t0, t0, 14
  • Suppose a type of 0-displacement addressing mode
    is implemented in the MIPS processor for the lw
    and sw instructions i.e. in place of
  • lw t0, 32(t3)
  • we have
  • addiu t3, t3, 32
  • lw t0, t3

12
HW 3 Problem 5c
  • In the new addressing mode however, address
    calculation and data memory access never takes
    place in the same instruction. Consequently, we
    can combine the 2 stages into one i.e.
  • Instruction fetch
  • Instruction decode and register read
  • EITHER execution OR data memory access
  • Write back
  • In other words, instructions are made to span
    over only 4 clock cycles instead of 5

13
HW 3 Problem 5c
  • CPI of all the instructions is 1 (except lw,sw)
  • CPI of lw, sw 1.5 because of cache misses ONLY
  • Instruction Frequency Load 21, Stores 12, ALU
    46, Jump 21
  • 13 of loads,11 of stores use a 0-displacement
  • 25 chance for a load to be followed by a
    dependant instruction. For example lw t0,
    32(t3) followed by addi t0, t0, 14

14
Practice Problem
  • Consider a virtual memory system with the
    following properties
  • 40-bit virtual byte address
  • 16 KB pages
  • 36-bit physical byte address
  • 2-way set associative TLB
  • 256 TLB entries in total
  • Show the virtual-to-physical mapping with a figure

15
Practice Problem
  • Page offset
  • We need to be able to choose 1 of 16 KB (116384
    bytes), in order to do so the page offset needs
    to be 14 bits wide
  • Virtual page number
  • The address is 40-bits wide in total, 14 is used
    for page offset, so the virtual page number is
    26-bits wide.
  • TLB index (plus tag)
  • The TLB is 2-way set associative, and there are
    256 entries in total. This means that there must
    be 128 entries in each set, indexable using 7
    bits this leaves 19 bits for the tag

16
Practice Problem
17
HW 3 Problem 6
  • Label1 lw 1,40(6)
  • beq 2,3,Label2 Taken
  • add 1, 6, 4
  • Label2 beq 1,2,Label1 Not Taken
  • sw 2,20(4)
  • and 1,1,4
  • Draw pipeline execution diagram for this code
    assuming no delay slots and branches execute in
    EX stage
  • Draw pipeline execution diagram for this code
    assuming delay slots used
  • You do not know branch target until Decode Stage
Write a Comment
User Comments (0)
About PowerShow.com