Title: CSECE 365 Computer Architecture
1CS/ECE 365 Computer Architecture
- Soundararajan Ezekiel
- Department of Computer Science
- Ohio Northern University
2Simple Implementation Scheme
- In this lecture we will build simple datapath and
its control by assembling datapath segments from
the last class and adding control lines - we cover
- load word (lw)
- store word(sw)
- branch equal(beq)
- ALU instructions ( add, sub, and, or, set on less
than) - enhance for jump (j)
3Creating single datapath
- Assumption All instruction will take I clock
cycle - No datapath resources Can be used more than
once(if it need more than one - memory-- one for instruction and one for data
- to share a datapath element between 2 different
instructions classes -- this can be done by using
multiplexor ( data selector)
4Multiplexor
d
d
c
a
0
a
0
c
1
b
1
b
5The data path for R-type instruction
ALU operation
3
Read Reg1
Read data1
Read Reg2
zero
Write Reg
ALU
Instruction
result
Read data2
data
Write data
RegWrite
6data path for lw sw, does register access,
followed by a memory address calculation then
read or write from memory, and wirte into
register file if the instruction is a load
Memwrite
3
ALU operation
instruction
memwrite
Read reg1
Read data
zero
ALU
Read data 1
address
Read reg2
REG
result
Write data
Data memory
Read data 2
Write data
Write data
Reg write
MemRead
Sign extend
16
32
7difference
- the arithmetic-logic (R type) datapath and memory
datapath are quite similar - Key difference
- 1.second input for ALU
- register --R-type
- sign-extended lower half the instruction--memory
instruction - 2. The value stored into a destination register
- comes from ALU----R-type
- comes from Memory--load
8Combine datapath for memory and R-type instruction
9Note
- only a single register file--single ALU
- 2 different sources for the second ALU input
- 2 different sources for the data stored into the
register file - we can use 2 multiplexor
- one for ALU input
- one for data input to the register file
10Add one more portion
- we can add instruction fetch portion of the data
path - it include memory for instructions and separate
memory for data - It requires both an adder and an ALU, since the
adder is used to increment the PC while the other
ALU is used for executing the instruction in the
same clock cycle
11A portion of datapath used for fetching
instructions and incrementing the PC
Add
PC
4
Instruction address
Instruction
Instruction memory
12the instruction fetch portion of datapath is added
13Add branch datapath
PC4 from ins datapath
Add
Shift left 2
Branch target
sum
Instruction
Read Reg1
ALU operation
Read data1
Read Reg2
3
Write Reg
ALU
zero
To branch control logic
Read data2
data
Write data
RegWrite
16
Sign extend
32
14the simple datapath for the MIPS architecture
15- completed the single datapth
- we can add the control unit
- the control unit must be able to take inputs and
generate a write signal for each state element,
the selector control for each multiplexor, and
the ALU control. - ALU control is different in a number of ways, and
it will be useful to design first before we
design the rest of control unit
16The ALU Control
- ALU has three control inputs
- only five of the possible eight input
combinations are used - 000---AND
- 001 --- OR
- 010 -- add
- 110 --- subtract
- 111 --- set on less than
17- depending on the instructions class, the ALU will
not to perform one of these 5 function - lw sw gt we use ALU to compute the memory
address by addition - R-type instructiongtone of 5 actions
- for branch equalgt ALU perform a subtraction
- 6 bit functions
18ALUOp
- it indicates whether the operation to be
performed should be add(00)for loads and stores ,
subtract(01) for beq or determined by the
operation encoded in the function field
19Ins ALUOp Ins Funct Desired ALUControl opcode o
peration filed ALU action input
LW 00 load word xxxxxx add 010
SW 00 store word xxxxxx add 010
beq 01 branch equal xxxxxx
subtract 110
R-type 10 add
100000 add 010
R-type 10 subtract
100010 sub 110
R-type 10 AND
100100 AND 000
R-type 10 OR
100101 or 001
R-type 10 set on less than
101010 slt 111
20Truth Table for the 3 ALU control bits(called
operations)
ALUOP1 ALUOP0 F5 F4 F3
F2 F1 F0
OPERATION
0 0 X X X
X X X 010
X 1 X X X
X X X 110
1 X X X 0
0 0 0 010
1 X X X 0
0 1 0 110
1 X X X 0
1 0 0 000
1 X X X 0
1 0 1 001
1 X X X 1
0 1 0 111
21Designing the main control unit
- we described how to design an ALU that used
function code and a 2-bit signal as its control
inputs - we will identify the fields of an instruction
and the control lines that are needed for the
datapath we design
22the 3 instruction classes use 2 different
instruction format
0
rs
rt
rd
shamt
funct
Field
31-26
25-21
20-16
15-11
10-6
5-0
Bit position
R-type instruction
address
Field
35 or 43
rs
rt
Bit position
31-26
25-21
20-16
15-0
Load store instruction
address
rs
rt
4
Field
31-26
25-21
20-16
15-0
Bit position
branch instruction
23note
- R format
- all have an opcode 0
- 3 register operands rs, rt and rd
- fields rs rt are sources, rd is the destination
- ALU function is in the funct fileds
- shamt used for shifts-- we will ignore here
- load-- opcode35 , rs base register, rt
destination reg - store---opcode43, rt is source reg-- values
stored in memory - beq --- opcode 4---registers rs rt are the source
reg--compared for equality--
24Observations
- There are some major observations about this
instructionformat - the op field, opcode, is always contained in bits
32-26-- refer as Op5-0 - the 2 registers are to be read are always
specified by rs, rt-- positions 25-21,
20-16--this is true for R-type, beq and store - the base reg for lad and store instructionis
always in bit positions 25-21(rs) - the 16 bit offset for beq, lw , sw always in 15-0
- the destination reg is in one of 2 places --lw
it is rt(20-16)---R-type it is rd(15-11)---we
need multiplexor to select
25Operations on datapath
- we will see how each instruction uses the
datapath - R-type instruction (add t1, t2,t3)
- 4 steps to execute an R-type instruction
- An instruction is fetched from IM and PC is
incremented - 2 regs t1,t2 are read from the reg file
- The ALU operators on the data read from the
registers file, using the functions codes - the result from the ALU is written into the
register file using bits 15-11 - figure 5.21-5.32 show these operations
26Note
- note this implementation is combinational
- that is it is not really a series of 4 distinct
steps - it operates in a single clock cycle
27executions of load word
- example lw t1, offset(t2)
- 5 steps figure 5.24 and 5.25 on page 368
- An instruction is fetched from IM and PC is
incremented - regs t2 is read from the reg file
- the ALU computes the sum of the value read from
the regist4er file, and the sign-extended, lower
16 bits of the instruction (offset) - the sum from the ALU is used as the address for
the data memory - the data from the memory unit is written into the
register file, the register destination is given
by bits 20-16 of the instruction (t1)
28operation of branch-on-equal-instruction
- example beq t1,t2,offset
- 4 steps figure 5.26 on page 369
- An instruction is fetched from IM and PC is
incremented - 2 regs t1,t2 are read from the reg file
- The ALU performs a subtract on the data values
read from the register file. the value of PC4 is
added to the sign-extended, lower 16 bits of the
instruction(offset) shifted left by 2, the result
is the branch target address - the zero result from the ALU is used to decide
which adder result to store into the PC
29Final step
- using all of these information , figure 5.27,
page 372 shows the complete single cycle
implementation
30why single cycle implementation is not used
- although it will work correctly,
- not used in modern design-- inefficient
- clock cycle must be the same for every
instruction in this caseCPI1 - clock cycle is determined by the longest possible
path in the machine which is load--use 5
functional units--Im-- reg file--ALU--DM--reg
file - although CPI1, overall performance is not very
good--several instruction classes could fit in a
shorter clock cycle
31performance of single cycle machines
- assume that the operation time for the major
functional units in this implementation are the
following - memory units 2ns
- ALU and adder 2 ns
- Register file (read or write) 1ns
- Assume Mux, control unit, Pc accesses, sign
extension, wire no delay
32- 1. An implementation in which every instruction
operates in 1 clock cycle of a fixed length - 2. An implementation where every instruction
executes I clock cycle using a variable-length
clock, which for each instruction is only as long
as it needs to be - To compare the performance, assume the following
instruction mix, 24 loads, 12stores, 44
R-format ins, 18 branches, and 2 jumps
33Answer
- compare CPU time
- CPU time ICCPIClock cycle
- since CPI1
- CPU execution time IC Clock cycle time
34Instr class IM Reg read ALU op Dm
Re write total Rformat
2 1 2
0 1 6 Lw
2 1 2
2 1
8 sw 2 1
2 2
7 branch 2 1
2
5 jump 2
2
35- Clock cycle for machine with with single clock
for all instruction will be 8 ns - for variable clock 824712644518226
.3 ns - cpu performance 8/6.31.27
- variable clock implementation will be 1.27 times
faster than single clock cycle - draw back implementation of variable clock cycle
is very difficult