CSECE 365 COMPUTER ARCHITECURE - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

CSECE 365 COMPUTER ARCHITECURE

Description:

ALU and adder =2ns(FP ALU) Register file(read or write )1 ns ... adder for PC 4= X ns. adder for branch address computation = Y ns ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 29
Provided by: ESO17
Category:

less

Transcript and Presenter's Notes

Title: CSECE 365 COMPUTER ARCHITECURE


1
CS/ECE 365 COMPUTER ARCHITECURE
  • Soundararajan Ezekiel
  • Department of Computer Science
  • Ohio Northern University

2
Performance of a single cycle CPU with FP
Instruction
  • if we have FP unit that requires 8ns for FP
    add--- 16ns for FP multiply
  • memory2ns
  • ALU and adder 2ns(FP ALU)
  • Register file(read or write )1 ns
  • Find performance Ratio between variable clock
    single clock

3
Assumption
  • All loads take the same time and comprise 31 of
    the instruction
  • All store--same time-- 21
  • R-format-- 27
  • Branch 5
  • jump 2
  • FP add and subtract take the same time ---
    together 7
  • FP multiply and divide same time --together 7

4
Instr class IM Reg read ALU op DM
Re write total Rformat
2 1 2
0 1 6 Lw
2 1 2
2 1
8 sw 2 1
2 2
7 branch 2 1
2
5 jump 2

2
5
Answer
  • cycle time for single cycle machine FP multiply
    2116120ns( longest instructions time)
  • the time for FP add instruction 218112ns
  • cycle time for variable cycle machine831721
    62755221272078.1ns
  • ratio20/8.12.469 variable clock machine is
    faster than single clock by 2.469

6
A multicycle implementation
  • from the above example, we break each instruction
    into a series of steps corresponding to the
    functional unit operations that were needed
  • use these to create multicycle implementation
  • each step 1 clock cycle
  • it allows functional unit to be used more than
    once per instruction as long as it is used on
    different clock cycles
  • this sharing reduces hardware requirement

7
the simple datapath for the MIPS architecture
8
High level view of multicycle datapath
PC
Instruction register
data
A
Address
Reg
ALU out
ALU
Instruction or data
Reg
B
Reg
Mem data reg
data
registers
memory
9
difference
  • single memory unit is used for both instruction
    and data
  • there is a single ALU, rather than an ALU and two
    adders
  • One or more registers are added after every major
    functional units to hold the output of that unit
    until that value is used in a subsequent clock
    cycle

10
Note
  • at the end of a clock cycle, all data that are
    used in subsequent clock cycles must be stored in
    a state element
  • data used by subsequent instructions in later
    clock cycle is stored into one of the
    programmer-visible state element( reg file, PC,
    memory)
  • data used by the same instruction in a later
    cycle must be stored into one of these additional
    registers

11
position of additional reg
  • the position of additional registers is
    determined by 2factor
  • 1. What combinational units will fit in a clock
    cycle
  • 2. What data are needed in later cycles
    implementing the instruction

12
temporary reg
  • The instruction Register(IR)gt Save the output
    of the memory for an instruction read
  • memory data register(MDR) gt for data read
  • The A and B registers are used to hold the
    register operand values read from the reg file
  • The ALUOut register holds the output of the ALU

13
write control signal
  • all the registers excepts the IR hold data only
    between a pair of adjacent clock cycles and will
    not need a write control signal
  • the IR needs to hold the instruction until the
    end of execution of that instruction and thus
    will need write control signal

14
multiplexor
  • several functional units are shared for different
    purposes
  • add--- extend multiplexors
  • one memory is used for both instruction and
    datagt add one mux
  • need a mux to select between the 2 sources for a
    memory address, namely PC and ALUOut(data access)

15
replace ALU
  • replacing 3 ALUs (single cycle path) by a single
    ALU gt it should accommodate all the inputs that
    used to go to 3 different ALUs
  • it requires 2 changes
  • an additional mux is added for the first ALU
    input. The multiplexor chooses between the A
    register and the PC

16
  • the multiplexor on the second ALU input is
    changed from 2-way to 4-way mux.
  • Two additional inputs to the multiplexor are the
    constant 4(used to increment the PC) and the sign
    extended and shifted offset filed

17
multicycle datapath for MIPS handles the basic
instructions
18
control signals
  • the datapath shown above multiple clock cycles
    per instruction, it will require a different set
    of control signals
  • PC, memory, reg, IR ---gt need write control
    signal
  • memory ---gt need read signal
  • two- input mux------ signal control line,
  • four-input mux needs----gt two-control limes

19
multicycle datapath for MIPS handles the basic
instructions fig 5.32
20
jump and branch instruction
  • multicycle data path still require additions to
    support branches and jumps
  • after these additions figure 5.33shows the
    complete multicycle datapath

21
Figure 5.33
22
performance of multicycle implementation
23
  • The correct answer should consider
  • the clock cycle time as well as
  • the execution time per instruction

24
Example
  • In estimating the performance of the single-cycle
    implementation, we assumed that only the major
    functional units had any delay(I.e.the delay of
    mux, control unit, PC access, sign extension unit
    and wires were considered to be negligible)
    Assume that we change the delays specified in
    last class such that we use a different type of
    adder for simple addition
  • ALU2 ns,
  • adder for PC4 X ns
  • adder for branch address computation Y ns

25
What would the cycle time be if X3, Y3
  • the key is to understand that it is the length of
    the longest path in the combinational logic that
    is determining the cycle time. We compute the
    length of longest path for each instruction and
    then must take the one with the maximum value.
  • At present the lw instruction is the longest path
    of 8 ns
  • ANS this change will not change the current max
    value of 8 ns

26
what would the cycle time be if X5 and Y5
  • consider beq instruction now it needs XY10 ns
  • so cycle time is 10 ns

27
what would the cycle time be if X1 and Y8
  • you may think beq needs XY9 ns
  • But it is not correct
  • You think about it

28
SUMMARY
Write a Comment
User Comments (0)
About PowerShow.com