Title: Single-cycle%20Multi-cycle%20FSM%20controller%20Multi-cycle%20microcontroller
1EECS 322 Computer Architecture
Single-cycleMulti-cycle FSM controllerMulti-cyc
le microcontroller
2MIPS instruction formats
o
p
r
s
r
t
I
m
m
e
d
i
a
t
e
Arithmetic add rd,rs,rt
o
p
r
s
r
t
r
d
.
.
.
f
u
n
c
t
Data Transfer lw rd,offset(rs) sw
rd,offset(rs)
o
p
r
s
r
t
A
d
d
r
e
s
s
l
a
t
i
v
e
a
d
d
r
e
s
s
i
n
g
o
p
r
s
r
t
A
d
d
r
e
s
s
Conditional branch beq rd,rs,raddr
P
C
5
.
P
s
e
u
d
o
d
i
r
e
c
t
a
d
d
r
e
s
s
i
n
g
Unconditional jump j addr
o
p
A
d
d
r
e
s
s
P
C
3Single Cycle Implementation
- Calculate instruction cycle time assuming
negligible delays except - memory (2ns), ALU and adders (2ns), register file
access (1ns)
Adder2 PC?PCsignext(IR15-0) ltlt2
Adder3 Arithmetic ALU
Adder1 PC ? PC 4
Single Cycle 2 adders 1 ALU
4Single/Multi-Clock Comparison
add 6ns Fetch(2ns)RegR(1ns)ALU(2ns)RegW(2ns
) lw 8ns Fetch(2ns)RegR(1ns)ALU(2ns)MemR(2n
s)RegW(2ns) sw 7ns Fetch(2ns)RegR(1ns)ALU(
2ns)MemW(2ns) beq 5ns Fetch(2ns)RegR(1ns)A
LU(2ns) j 2ns Fetch(2ns)
Architectural improved performance without
speeding up the clock!
5Some Design Trade-offs
High level design techniques Algorithms change
instruction usage minimize ? ninstruction
tinstruction Architecture Datapath, FSM,
Microprogramming adders ripple versus carry
lookahead multiplier types, Lower level
design techniques (closer to physical
design) clocking single verus multi
clock technology layout tools better place and
route process technology 0.5 micron to .18
micron
6Single-cycle problems
- Single Cycle Problems
- what if we had a more complicated instruction
like floating point? (fadd 30ns, fmul100ns) - wasteful of area (2 adders 1 ALU)
- One Solution
- use a smaller cycle time (if the technology can
do it) - have different instructions take different
numbers of cycles - a multicycle datapath (1 ALU)
- Multi-cycle approach
- We will be reusing functional units ALU used
to increment PC (Adder1) and to compute
address (Adder2) - Memory used for instruction and data
7Reality Check Intel 8086 clock cycles
Arithmetic 3 add reg16, reg16
118-133 mul dxax, reg16 very slow!!
128-154 imul dxax, reg16
114-162 div dxax, reg16
165-184 idiv dxax, reg16 Data Transfer
14 mov reg16, mem16 15 mov mem16,
reg16Conditional Branch
4/16 je displacement8Unconditional Jump
15 jmp segmentoffset16
8Multi-cycle Datapath
Multi-cycle 1 ALU Controller
9Multi-cycle Datapath with controller
10Multi-cycle 5 execution steps
- T1 (a,lw,sw,beq,j) Instruction Fetch
- T2 (a,lw,sw,beq,j) Instruction Decode and
Register Fetch - T3 (a,lw,sw,beq,j) Execution, Memory Address
Calculation, or Branch Completion - T4 (a,lw,sw) Memory Access or R-type
instruction completion - T5 (a,lw) Write-back step INSTRUCTIONS TAKE
FROM 3 - 5 CYCLES!
11Multi-cycle Approach
All operations in each clock cycle Ti are done in
parallel not sequential! For example, T1, IR
MemoryPC and PCPC4 are done simultaneously!
T1 T2 T3 T4 T5
Between Clock T2 and T3 the microcode sequencer
will do a dispatch 1
12Multi-cycle using Microprogramming
Microcode controller
Finite State Machine( hardwired control )
M
i
c
r
o
c
o
d
e
s
t
o
r
a
g
e
C
o
m
b
i
n
a
t
i
o
n
a
l
c
o
n
t
r
o
l
l
o
g
i
c
D
a
t
a
p
a
t
h
c
o
n
t
r
o
l
o
u
t
p
u
t
s
D
a
t
a
p
a
t
h
c
o
n
t
r
o
l
O
u
t
p
u
t
s
firmware
o
u
t
p
u
t
s
O
u
t
p
u
t
s
I
n
p
u
t
1
I
n
p
u
t
s
S
e
q
u
e
n
c
i
n
g
M
i
c
r
o
p
r
o
g
r
a
m
c
o
u
n
t
e
r
c
o
n
t
r
o
l
A
d
d
e
r
N
e
x
t
s
t
a
t
e
A
d
d
r
e
s
s
s
e
l
e
c
t
l
o
g
i
c
S
t
a
t
e
r
e
g
i
s
t
e
r
I
n
p
u
t
s
f
r
o
m
i
n
s
t
r
u
c
t
i
o
n
r
e
g
i
s
t
e
r
o
p
c
o
d
e
f
i
e
l
d
I
n
p
u
t
s
f
r
o
m
i
n
s
t
r
u
c
t
i
o
n
r
e
g
i
s
t
e
r
o
p
c
o
d
e
f
i
e
l
d
Requires microcode memory to be faster than main
memory
13Microcode Trade-offs
- Distinction between specification and
implementation is sometimes blurred - Specification Advantages
- Easy to design and write (maintenance)
- Design architecture and microcode in parallel
- Implementation (off-chip ROM) Advantages
- Easy to change since values are in memory
- Can emulate other architectures
- Can make use of internal registers
- Implementation Disadvantages, SLOWER now that
- Control is implemented on same chip as processor
- ROM is no longer faster than RAM
- No need to go back and make changes
14Microinstruction format
15Microinstruction format Maximally vs. Minimally
Encoded
- No encoding
- 1 bit for each datapath operation
- faster, requires more memory (logic)
- used for Vax 780 an astonishing 400K of
memory! - Lots of encoding
- send the microinstructions through logic to get
control signals - uses less memory, slower
- Historical context of CISC
- Too much logic to put on a single chip with
everything else - Use a ROM (or even RAM) to hold the microcode
- Its easy to add new instructions
16Microprogramming program
17Microprogramming program overview
T1 T2 T3 T4 T5
Fetch
Fetch1
Dispatch 1
Mem1
Rformat1
BEQ1
JUMP1
Dispatch 2
Rformat11
LW2
SW2
LW21
18Microprogram steping T1 Fetch
(Done in parallel) IR?MEMORYPC PC ? PC 4
Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqFetch
add pc 4 ReadPC ALU Seq
19T2 Fetch 1
A?RegIR25-21 B?RegIR20-16
ALUOut?PCsignext(IR15-0) ltlt2
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq add
pc ExtSh Read D1
20T3 Dispatch 1 Mem1
ALUOut ? A sign_extend(IR15-0)
Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqMem1
add A ExtSh D2
21T4 Dispatch 2 LW2
MDR ? MemoryALUOut
Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqLW2
ReadALU Seq
22T5 LW21
Reg IR20-16 ? MDR
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq W
MDR Fetch
23T4 Dispatch 2 SW2
Memory ALUOut ? B
Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqSW2
WriteALU Fetch
24T3 Dispatch 1 Rformat1
ALUOut ? A op(IR31-26) B
op(IR31-26)
Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqRf...
1 op A B Seq
25T4 Dispatch 1 Rformat11
Reg IR15-11 ? ALUOut
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq W
ALU Fetch
26T3 Dispatch 1 BEQ1
If (A - B 0) PC ? ALUOut
ALUOut Address computed in T2 !
Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqBEQ1
subt A B ALUOut-0 Fetch
27T3 Dispatch 1 Jump1
PC ? PC31-28 IR25-0ltlt2
Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqJump1
Jaddr Fetch
28The Big Picture