CSECE 365 Computer Architecture - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

CSECE 365 Computer Architecture

Description:

A portion of datapath used for fetching instructions and incrementing the PC] Add ... An instruction is fetched from IM and PC is incremented ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 36
Provided by: ESO17
Category:

less

Transcript and Presenter's Notes

Title: CSECE 365 Computer Architecture


1
CS/ECE 365 Computer Architecture
  • Soundararajan Ezekiel
  • Department of Computer Science
  • Ohio Northern University

2
Simple Implementation Scheme
  • In this lecture we will build simple datapath and
    its control by assembling datapath segments from
    the last class and adding control lines
  • we cover
  • load word (lw)
  • store word(sw)
  • branch equal(beq)
  • ALU instructions ( add, sub, and, or, set on less
    than)
  • enhance for jump (j)

3
Creating single datapath
  • Assumption All instruction will take I clock
    cycle
  • No datapath resources Can be used more than
    once(if it need more than one
  • memory-- one for instruction and one for data
  • to share a datapath element between 2 different
    instructions classes -- this can be done by using
    multiplexor ( data selector)

4
Multiplexor
d
d
c
a
0
a
0
c
1
b
1
b
5
The data path for R-type instruction
ALU operation
3
Read Reg1
Read data1
Read Reg2
zero
Write Reg
ALU
Instruction
result
Read data2
data
Write data
RegWrite
6
data path for lw sw, does register access,
followed by a memory address calculation then
read or write from memory, and wirte into
register file if the instruction is a load
Memwrite
3
ALU operation
instruction
memwrite
Read reg1
Read data
zero
ALU
Read data 1
address
Read reg2
REG
result
Write data
Data memory
Read data 2
Write data
Write data
Reg write
MemRead
Sign extend
16
32
7
difference
  • the arithmetic-logic (R type) datapath and memory
    datapath are quite similar
  • Key difference
  • 1.second input for ALU
  • register --R-type
  • sign-extended lower half the instruction--memory
    instruction
  • 2. The value stored into a destination register
  • comes from ALU----R-type
  • comes from Memory--load

8
Combine datapath for memory and R-type instruction
9
Note
  • only a single register file--single ALU
  • 2 different sources for the second ALU input
  • 2 different sources for the data stored into the
    register file
  • we can use 2 multiplexor
  • one for ALU input
  • one for data input to the register file

10
Add one more portion
  • we can add instruction fetch portion of the data
    path
  • it include memory for instructions and separate
    memory for data
  • It requires both an adder and an ALU, since the
    adder is used to increment the PC while the other
    ALU is used for executing the instruction in the
    same clock cycle

11
A portion of datapath used for fetching
instructions and incrementing the PC
Add
PC
4
Instruction address
Instruction
Instruction memory
12
the instruction fetch portion of datapath is added
13
Add branch datapath
PC4 from ins datapath
Add
Shift left 2
Branch target
sum
Instruction
Read Reg1
ALU operation
Read data1
Read Reg2
3
Write Reg
ALU
zero
To branch control logic
Read data2
data
Write data
RegWrite
16
Sign extend
32
14
the simple datapath for the MIPS architecture
15
  • completed the single datapth
  • we can add the control unit
  • the control unit must be able to take inputs and
    generate a write signal for each state element,
    the selector control for each multiplexor, and
    the ALU control.
  • ALU control is different in a number of ways, and
    it will be useful to design first before we
    design the rest of control unit

16
The ALU Control
  • ALU has three control inputs
  • only five of the possible eight input
    combinations are used
  • 000---AND
  • 001 --- OR
  • 010 -- add
  • 110 --- subtract
  • 111 --- set on less than

17
  • depending on the instructions class, the ALU will
    not to perform one of these 5 function
  • lw sw gt we use ALU to compute the memory
    address by addition
  • R-type instructiongtone of 5 actions
  • for branch equalgt ALU perform a subtraction
  • 6 bit functions

18
ALUOp
  • it indicates whether the operation to be
    performed should be add(00)for loads and stores ,
    subtract(01) for beq or determined by the
    operation encoded in the function field

19
Ins ALUOp Ins Funct Desired ALUControl opcode o
peration filed ALU action input
LW 00 load word xxxxxx add 010
SW 00 store word xxxxxx add 010
beq 01 branch equal xxxxxx
subtract 110
R-type 10 add
100000 add 010
R-type 10 subtract
100010 sub 110
R-type 10 AND
100100 AND 000
R-type 10 OR
100101 or 001
R-type 10 set on less than
101010 slt 111
20
Truth Table for the 3 ALU control bits(called
operations)
ALUOP1 ALUOP0 F5 F4 F3
F2 F1 F0
OPERATION
0 0 X X X
X X X 010
X 1 X X X
X X X 110
1 X X X 0
0 0 0 010
1 X X X 0
0 1 0 110
1 X X X 0
1 0 0 000
1 X X X 0
1 0 1 001
1 X X X 1
0 1 0 111
21
Designing the main control unit
  • we described how to design an ALU that used
    function code and a 2-bit signal as its control
    inputs
  • we will identify the fields of an instruction
    and the control lines that are needed for the
    datapath we design

22
the 3 instruction classes use 2 different
instruction format
0
rs
rt
rd
shamt
funct
Field
31-26
25-21
20-16
15-11
10-6
5-0
Bit position
R-type instruction
address
Field
35 or 43
rs
rt
Bit position
31-26
25-21
20-16
15-0
Load store instruction
address
rs
rt
4
Field
31-26
25-21
20-16
15-0
Bit position
branch instruction
23
note
  • R format
  • all have an opcode 0
  • 3 register operands rs, rt and rd
  • fields rs rt are sources, rd is the destination
  • ALU function is in the funct fileds
  • shamt used for shifts-- we will ignore here
  • load-- opcode35 , rs base register, rt
    destination reg
  • store---opcode43, rt is source reg-- values
    stored in memory
  • beq --- opcode 4---registers rs rt are the source
    reg--compared for equality--

24
Observations
  • There are some major observations about this
    instructionformat
  • the op field, opcode, is always contained in bits
    32-26-- refer as Op5-0
  • the 2 registers are to be read are always
    specified by rs, rt-- positions 25-21,
    20-16--this is true for R-type, beq and store
  • the base reg for lad and store instructionis
    always in bit positions 25-21(rs)
  • the 16 bit offset for beq, lw , sw always in 15-0
  • the destination reg is in one of 2 places --lw
    it is rt(20-16)---R-type it is rd(15-11)---we
    need multiplexor to select

25
Operations on datapath
  • we will see how each instruction uses the
    datapath
  • R-type instruction (add t1, t2,t3)
  • 4 steps to execute an R-type instruction
  • An instruction is fetched from IM and PC is
    incremented
  • 2 regs t1,t2 are read from the reg file
  • The ALU operators on the data read from the
    registers file, using the functions codes
  • the result from the ALU is written into the
    register file using bits 15-11
  • figure 5.21-5.32 show these operations

26
Note
  • note this implementation is combinational
  • that is it is not really a series of 4 distinct
    steps
  • it operates in a single clock cycle

27
executions of load word
  • example lw t1, offset(t2)
  • 5 steps figure 5.24 and 5.25 on page 368
  • An instruction is fetched from IM and PC is
    incremented
  • regs t2 is read from the reg file
  • the ALU computes the sum of the value read from
    the regist4er file, and the sign-extended, lower
    16 bits of the instruction (offset)
  • the sum from the ALU is used as the address for
    the data memory
  • the data from the memory unit is written into the
    register file, the register destination is given
    by bits 20-16 of the instruction (t1)

28
operation of branch-on-equal-instruction
  • example beq t1,t2,offset
  • 4 steps figure 5.26 on page 369
  • An instruction is fetched from IM and PC is
    incremented
  • 2 regs t1,t2 are read from the reg file
  • The ALU performs a subtract on the data values
    read from the register file. the value of PC4 is
    added to the sign-extended, lower 16 bits of the
    instruction(offset) shifted left by 2, the result
    is the branch target address
  • the zero result from the ALU is used to decide
    which adder result to store into the PC

29
Final step
  • using all of these information , figure 5.27,
    page 372 shows the complete single cycle
    implementation

30
why single cycle implementation is not used
  • although it will work correctly,
  • not used in modern design-- inefficient
  • clock cycle must be the same for every
    instruction in this caseCPI1
  • clock cycle is determined by the longest possible
    path in the machine which is load--use 5
    functional units--Im-- reg file--ALU--DM--reg
    file
  • although CPI1, overall performance is not very
    good--several instruction classes could fit in a
    shorter clock cycle

31
performance of single cycle machines
  • assume that the operation time for the major
    functional units in this implementation are the
    following
  • memory units 2ns
  • ALU and adder 2 ns
  • Register file (read or write) 1ns
  • Assume Mux, control unit, Pc accesses, sign
    extension, wire no delay

32
  • 1. An implementation in which every instruction
    operates in 1 clock cycle of a fixed length
  • 2. An implementation where every instruction
    executes I clock cycle using a variable-length
    clock, which for each instruction is only as long
    as it needs to be
  • To compare the performance, assume the following
    instruction mix, 24 loads, 12stores, 44
    R-format ins, 18 branches, and 2 jumps

33
Answer
  • compare CPU time
  • CPU time ICCPIClock cycle
  • since CPI1
  • CPU execution time IC Clock cycle time

34
Instr class IM Reg read ALU op Dm
Re write total Rformat
2 1 2
0 1 6 Lw
2 1 2
2 1
8 sw 2 1
2 2
7 branch 2 1
2
5 jump 2

2
35
  • Clock cycle for machine with with single clock
    for all instruction will be 8 ns
  • for variable clock 824712644518226
    .3 ns
  • cpu performance 8/6.31.27
  • variable clock implementation will be 1.27 times
    faster than single clock cycle
  • draw back implementation of variable clock cycle
    is very difficult
Write a Comment
User Comments (0)
About PowerShow.com