Title: CS 161Computer Architecture Chapter 5 Lecture 12
1CS 161Computer Architecture Chapter 5Lecture 12
- Instructor L.N. Bhuyan
- www.cs.ucr.edu/bhuyan
2Datapath Control Points
MemRead
IRWrite
RegWrite
PCWrite
PCSrc
MemWrite
ALUSrcA
IorD
RegDst
PCWrite- Cond
PC
M u x
ReadReg1
Address
M u x
2521
Readdata1
z
Mem
A L U
A
ReadReg2
ALU- Out
M u x
2016
Read Data
Readdata2
WriteReg
B
M u x
150
Write Data
4
1511
0 1M 2 u 3 x
IR
Regs
3
WriteData
MDR
M u x
ltlt 2
2
2
(funct) 50
ALUSrcB
MemtoReg
ALUOp
3Multicycle Instruction Execution
- All instructions execute in 3-5 cycles
- 3 cycles beq
- 4 cycles R-type, sw
- 5 cycles lw
- 1 fetch instruction, PCPC4
- 2 decode, fetch registers, brnch target
- 3 execute/compute address/branch
- 4 access memory/complete R-type
- 5 (lw) store memory
4Summary
5Implementing a Finite State Machine
- internal storage (current state register)
- two combinational circuits
- next state function output function
N
e
x
t
s
t
a
t
e
Next-stateFunction
current state reg
C
l
o
c
k
I
n
p
u
t
s
OutputFunction
O
u
t
p
u
t
s
6FSM diagram for Multicycle Machine
cycle1
cycle2
MemRead ALUSrcA 0 IorD 0 IRWrite ALUSrcB
1 ALUOp 0PCWrite PCSrc 0
start new instruction
ALUSrcA 0 ALUSrcB 3 ALUOp 0
1
state 0
lw/sw
beq
R-format
8
cycle3
6
ALUSrcA 1 ALUSrcB 0 ALUOp 1 PCWriteCond PCSrc
1
2
ALUSrcA 1 ALUSrcB 0 ALUOp 2
ALUSrcA 1 ALUSrcB 2 ALUOp 0
Branch Completion
Memory Access
R-format execution
7FSM controller execution cycles 3-5
from state 6
from state 2
sw
to state 0
lw
3
7
5
cycle4
RegDst 1 RegWrite MemtoReg 0
MemRead IorD 1
MemWrite IorD 1
memory access (step 4)
memory access (step 4)
R-format completion (step 4)
4
cycle5
RegDst 0 RegWrite MemtoReg 1
write-back (step 5)
8Add Jump
- Note
- dont care if not mentioned
- asserted if name only
- otherwise exact value
- How many state bits will we need?
9Simple Questions
- How many cycles will it take to execute this
code? lw t2, 0(t3) lw t3,
4(t3) beq t2, t3, Label assume not add
t5, t2, t3 sw t5, 8(t3)Label ...What
is going on during the 8th cycle of execution? - In what cycle does the actual addition of t2
and t3 takes place?
10Implementing the FSM controller
P
C
W
r
i
t
e
P
C
W
r
i
t
e
C
o
n
d
PLA or ROM implementation of both next-state and
output functions
I
o
r
D
M
e
m
R
e
a
d
M
e
m
W
r
i
t
e
DatapathControl Points
I
R
W
r
i
t
e
M
e
m
t
o
R
e
g
P
C
Src
A
L
U
O
p
O
u
t
p
u
t
s
A
L
U
S
r
c
B
A
L
U
S
r
c
A
R
e
g
W
r
i
t
e
R
e
g
D
s
t
N
S
3
N
S
2
Next-state
N
S
1
I
n
p
u
t
s
N
S
0
5
4
3
2
1
0
p
p
p
p
p
p
3
2
1
0
O
O
O
O
O
O
S
S
S
S
Instruction register opcode field
state register
11PLA Implementation
12ROM Implementation
- ROM "Read Only Memory"
- values of memory locations are fixed ahead of
time - A ROM can be used to implement a truth table
- if the address is m-bits, we can address 2m
entries in the ROM. - our outputs are the bits of data that the address
points to.m is the "height", and n is the
"width equal to number of outputs.
13ROM Implementation
- How many inputs are there? 6 bits for opcode, 4
bits for state 10 address lines (i.e., 210
1024 different addresses) - How many outputs are there? 16 datapath-control
outputs, 4 state bits 20 outputs - ROM is 210 x 20 20K bits (and a rather
unusual size, so go for next size chip) - Rather wasteful due to lots of dont care
situations gt the outputs onlydepend on states,
not opcodes.
14ROM vs PLA
- Break up the table into two parts 4 state bits
tell you the 16 outputs, 24 x 16 bits of ROM
10 bits tell you the 4 next state bits, 210 x 4
bits of ROM Total 4.3K bits of ROM gt Lots of
savings. - PLA is much smaller can share product terms
only need entries that produce an active
output can take into account don't cares - Size is (inputs product-terms) (outputs
product-terms) For this example
(10x17)(20x17) 510 PLA cellsPLA cells usually
about the size of a ROM cell (slightly bigger)
15Alternative to FSM for Multi-cycle?
- MIPS-lite has (about) 7 instructions, 10 FSM
states - Real machines have 100 or more instructions real
controllers have hundreds, or even thousands of
states! - Problem FSM Bubble-diagram too large
16Observation about real machines
- Machine Language next instruction to be executed
is usually implied - PC register determines instruction
- next instruction always at PC4 (unless branch or
jump) - FSM Controller often only one exit arc from
current state to next state - Suppose borrow idea from Machine Language,
represent each control step as some kind of
instruction? - Leads to Microprogrammed Control
n
n1
n2
n3
17Micro-programmed Control
- In microprogrammed control, FSM states become
microinstructions of a microprogram (microcode) - one FSM stateone microinstruction
- usually represent each micro-instruction
textually, like an assembly instruction - FSM current state register becomes the
microprogram counter (micro-PC) - normal sequencing add 1 to micro-PC to get next
micro-instruction - microprogram branch separate logic determines
next microinstruction
18Microprogramming Vs Hardwired Control
- Microprogramming offers flexibility for design
and architectural changes. The control memory
(ROM) can be reprogrammed or replaced. Hardwired
control is difficult to design for complex set
architecture. Once it is designed, no further
change is possible - Microprogramming is slow because the control
memory is accessed in every cycle. Memory access
is slow. Hardwired control is fast because the
cycle time depends on the combinational logic
delay of the control unit, which is much less
than memory access time.
19Microprogramming
-
- What are the microinstructions ?
20Microprogramming
- A specification methodology
- appropriate if hundreds of opcodes, modes,
cycles, etc. - signals specified symbolically using
microinstructionsWill two
implementations of the same architecture have the
same microcode? What would a microassembler do?
21Microinstruction format
22Maximally vs. Minimally Encoded
- No encoding
- 1 bit for each datapath operation
- faster, requires more memory (logic)
- used for Vax 780 an astonishing 400K of memory!
- Lots of encoding
- send the microinstructions through logic to get
control signals - uses less memory, slower
- Historical context of CISC
- Too much logic to put on a single chip with
everything else - Use a ROM (or even RAM) to hold the microcode
- Its easy to add new instructions
23Microcode Trade-offs
- Distinction between specification and
implementation is sometimes blurred - Specification Advantages
- Easy to design and write
- Design architecture and microcode in parallel
- Implementation (off-chip ROM) Advantages
- Easy to change since values are in memory
- Can emulate other architectures
- Can make use of internal registers
- Implementation Disadvantages, SLOWER now that
- Control is implemented on same chip as processor
- ROM is no longer faster than RAM
- No need to go back and make changes
24Historical Perspective
- In the 60s and 70s microprogramming was very
important for implementing machines - This led to more sophisticated ISAs and the VAX
- In the 80s RISC processors based on pipelining
became popular - Pipelining the microinstructions is also
possible! - Implementations of IA-32 architecture processors
since 486 use - hardwired control for simpler instructions
(few cycles, FSM control implemented using PLA
or random logic) - microcoded control for more complex
instructions (large numbers of cycles, central
control store) - The IA-64 architecture uses a RISC-style ISA and
can be implemented without a large central
control store
25Pentium 4
- Somewhere in all that control we must handle
complex instructions - Processor executes simple microinstructions, 70
bits wide (hardwired) - 120 control lines for integer datapath (400 for
floating point) - If an instruction requires more than 4
microinstructions to implement, control from
microcode ROM (8000 microinstructions) - Its complicated!
26Chapter 5 Summary
- If we understand the instructions We can build
a simple processor! - If instructions take different amounts of time,
multi-cycle is better - Datapath implemented using
- Combinational logic for arithmetic
- State holding elements to remember bits
- Control implemented using
- Combinational logic for single-cycle
implementation - Finite state machine for multi-cycle
implementation
27Pipelining (Chap 6)
- Techniques illustrated in chapter 5 are at the
heart of every computer - All recent computers, however, go beyond
techniques of chapter 5, and use pipelining to
improve performance - By overlapping execution of multiple
instructions, pipelining can achieve - throughput close to 1 instruction per clock
cycle (like single-cycle machine) - with a clock cycle time determined by the delay
of individual datapath components (like
multi-cycle machine)