CSECE 365 Computer Architecture presentation

About This Presentation

Transcript and Presenter's Notes

Title: CSECE 365 Computer Architecture

1
CS/ECE 365 Computer Architecture

Soundararajan Ezekiel
Department of Computer Science
Ohio Northern University

2
Simple Implementation Scheme

In this lecture we will build simple datapath and
its control by assembling datapath segments from
the last class and adding control lines
we cover
load word (lw)
store word(sw)
branch equal(beq)
ALU instructions ( add, sub, and, or, set on less
than)
enhance for jump (j)

3
Creating single datapath

Assumption All instruction will take I clock
cycle
No datapath resources Can be used more than
once(if it need more than one
memory-- one for instruction and one for data
to share a datapath element between 2 different
instructions classes -- this can be done by using
multiplexor ( data selector)

4
Multiplexor
d
d
c
a
0
a
0
c
1
b
1
b
5
The data path for R-type instruction
ALU operation
3
Read Reg1
Read data1
Read Reg2
zero
Write Reg
ALU
Instruction
result
Read data2
data
Write data
RegWrite
6
data path for lw sw, does register access,
followed by a memory address calculation then
read or write from memory, and wirte into
register file if the instruction is a load
Memwrite
3
ALU operation
instruction
memwrite
Read reg1
Read data
zero
ALU
Read data 1
address
Read reg2
REG
result
Write data
Data memory
Read data 2
Write data
Write data
Reg write
MemRead
Sign extend
16
32
7
difference

the arithmetic-logic (R type) datapath and memory
datapath are quite similar
Key difference
1.second input for ALU
register --R-type
sign-extended lower half the instruction--memory
instruction
2. The value stored into a destination register
comes from ALU----R-type
comes from Memory--load

8
Combine datapath for memory and R-type instruction
9
Note

only a single register file--single ALU
2 different sources for the second ALU input
2 different sources for the data stored into the
register file
we can use 2 multiplexor
one for ALU input
one for data input to the register file

10
Add one more portion

we can add instruction fetch portion of the data
path
it include memory for instructions and separate
memory for data
It requires both an adder and an ALU, since the
adder is used to increment the PC while the other
ALU is used for executing the instruction in the
same clock cycle

11
A portion of datapath used for fetching
instructions and incrementing the PC
Add
PC
4
Instruction address
Instruction
Instruction memory
12
the instruction fetch portion of datapath is added
13
Add branch datapath
PC4 from ins datapath
Add
Shift left 2
Branch target
sum
Instruction
Read Reg1
ALU operation
Read data1
Read Reg2
3
Write Reg
ALU
zero
To branch control logic
Read data2
data
Write data
RegWrite
16
Sign extend
32
14
the simple datapath for the MIPS architecture
15

completed the single datapth
we can add the control unit
the control unit must be able to take inputs and
generate a write signal for each state element,
the selector control for each multiplexor, and
the ALU control.
ALU control is different in a number of ways, and
it will be useful to design first before we
design the rest of control unit

16
The ALU Control

ALU has three control inputs
only five of the possible eight input
combinations are used
000---AND
001 --- OR
010 -- add
110 --- subtract
111 --- set on less than

depending on the instructions class, the ALU will
not to perform one of these 5 function
lw sw gt we use ALU to compute the memory
address by addition
R-type instructiongtone of 5 actions
for branch equalgt ALU perform a subtraction
6 bit functions

18
ALUOp

it indicates whether the operation to be
performed should be add(00)for loads and stores ,
subtract(01) for beq or determined by the
operation encoded in the function field

19
Ins ALUOp Ins Funct Desired ALUControl opcode o
peration filed ALU action input
LW 00 load word xxxxxx add 010
SW 00 store word xxxxxx add 010
beq 01 branch equal xxxxxx
subtract 110
R-type 10 add
100000 add 010
R-type 10 subtract
100010 sub 110
R-type 10 AND
100100 AND 000
R-type 10 OR
100101 or 001
R-type 10 set on less than
101010 slt 111
20
Truth Table for the 3 ALU control bits(called
operations)
ALUOP1 ALUOP0 F5 F4 F3
F2 F1 F0
OPERATION
0 0 X X X
X X X 010
X 1 X X X
X X X 110
1 X X X 0
0 0 0 010
1 X X X 0
0 1 0 110
1 X X X 0
1 0 0 000
1 X X X 0
1 0 1 001
1 X X X 1
0 1 0 111
21
Designing the main control unit

we described how to design an ALU that used
function code and a 2-bit signal as its control
inputs
we will identify the fields of an instruction
and the control lines that are needed for the
datapath we design

22
the 3 instruction classes use 2 different
instruction format
0
rs
rt
rd
shamt
funct
Field
31-26
25-21
20-16
15-11
10-6
5-0
Bit position
R-type instruction
address
Field
35 or 43
rs
rt
Bit position
31-26
25-21
20-16
15-0
Load store instruction
address
rs
rt
4
Field
31-26
25-21
20-16
15-0
Bit position
branch instruction
23
note

R format
all have an opcode 0
3 register operands rs, rt and rd
fields rs rt are sources, rd is the destination
ALU function is in the funct fileds
shamt used for shifts-- we will ignore here
load-- opcode35 , rs base register, rt
destination reg
store---opcode43, rt is source reg-- values
stored in memory
beq --- opcode 4---registers rs rt are the source
reg--compared for equality--

24
Observations

There are some major observations about this
instructionformat
the op field, opcode, is always contained in bits
32-26-- refer as Op5-0
the 2 registers are to be read are always
specified by rs, rt-- positions 25-21,
20-16--this is true for R-type, beq and store
the base reg for lad and store instructionis
always in bit positions 25-21(rs)
the 16 bit offset for beq, lw , sw always in 15-0
the destination reg is in one of 2 places --lw
it is rt(20-16)---R-type it is rd(15-11)---we
need multiplexor to select

25
Operations on datapath

we will see how each instruction uses the
datapath
R-type instruction (add t1, t2,t3)
4 steps to execute an R-type instruction
An instruction is fetched from IM and PC is
incremented
2 regs t1,t2 are read from the reg file
The ALU operators on the data read from the
registers file, using the functions codes
the result from the ALU is written into the
register file using bits 15-11
figure 5.21-5.32 show these operations

26
Note

note this implementation is combinational
that is it is not really a series of 4 distinct
steps
it operates in a single clock cycle

27
executions of load word

example lw t1, offset(t2)
5 steps figure 5.24 and 5.25 on page 368
An instruction is fetched from IM and PC is
incremented
regs t2 is read from the reg file
the ALU computes the sum of the value read from
the regist4er file, and the sign-extended, lower
16 bits of the instruction (offset)
the sum from the ALU is used as the address for
the data memory
the data from the memory unit is written into the
register file, the register destination is given
by bits 20-16 of the instruction (t1)

28
operation of branch-on-equal-instruction

example beq t1,t2,offset
4 steps figure 5.26 on page 369
An instruction is fetched from IM and PC is
incremented
2 regs t1,t2 are read from the reg file
The ALU performs a subtract on the data values
read from the register file. the value of PC4 is
added to the sign-extended, lower 16 bits of the
instruction(offset) shifted left by 2, the result
is the branch target address
the zero result from the ALU is used to decide
which adder result to store into the PC

29
Final step

using all of these information , figure 5.27,
page 372 shows the complete single cycle
implementation

30
why single cycle implementation is not used

although it will work correctly,
not used in modern design-- inefficient
clock cycle must be the same for every
instruction in this caseCPI1
clock cycle is determined by the longest possible
path in the machine which is load--use 5
functional units--Im-- reg file--ALU--DM--reg
file
although CPI1, overall performance is not very
good--several instruction classes could fit in a
shorter clock cycle

31
performance of single cycle machines

assume that the operation time for the major
functional units in this implementation are the
following
memory units 2ns
ALU and adder 2 ns
Register file (read or write) 1ns
Assume Mux, control unit, Pc accesses, sign
extension, wire no delay

1. An implementation in which every instruction
operates in 1 clock cycle of a fixed length
2. An implementation where every instruction
executes I clock cycle using a variable-length
clock, which for each instruction is only as long
as it needs to be
To compare the performance, assume the following
instruction mix, 24 loads, 12stores, 44
R-format ins, 18 branches, and 2 jumps

33
Answer

compare CPU time
CPU time ICCPIClock cycle
since CPI1
CPU execution time IC Clock cycle time

34
Instr class IM Reg read ALU op Dm
Re write total Rformat
2 1 2
0 1 6 Lw
2 1 2
2 1
8 sw 2 1
2 2
7 branch 2 1
2
5 jump 2

2
35

Clock cycle for machine with with single clock
for all instruction will be 8 ns
for variable clock 824712644518226
.3 ns
cpu performance 8/6.31.27
variable clock implementation will be 1.27 times
faster than single clock cycle
draw back implementation of variable clock cycle
is very difficult

Write a Comment

User Comments (0)

About PowerShow.com

CSECE 365 Computer Architecture PowerPoint PPT Presentation