CS 161Computer Architecture Chapter 5 Lecture 12 - PowerPoint PPT Presentation

About This Presentation

Title:

CS 161Computer Architecture Chapter 5 Lecture 12

Description:

4 cycles: R-type, sw. 5 cycles: lw. 1: fetch instruction, PC=PC 4 ... used for Vax 780 an astonishing 400K of memory! Lots of encoding: ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 28

Provided by: davep173

Learn more at: http://www.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS 161Computer Architecture Chapter 5 Lecture 12

1
CS 161Computer Architecture Chapter 5Lecture 12

Instructor L.N. Bhuyan
www.cs.ucr.edu/bhuyan

2
Datapath Control Points
MemRead
IRWrite
RegWrite
PCWrite
PCSrc
MemWrite
ALUSrcA
IorD
RegDst
PCWrite- Cond
PC
M u x
ReadReg1
Address
M u x
2521
Readdata1
z
Mem
A L U
A
ReadReg2
ALU- Out
M u x
2016
Read Data
Readdata2
WriteReg
B
M u x
150
Write Data
4
1511
0 1M 2 u 3 x
IR
Regs
3
WriteData
MDR
M u x
ltlt 2
2
2
(funct) 50
ALUSrcB
MemtoReg
ALUOp
3
Multicycle Instruction Execution

All instructions execute in 3-5 cycles
3 cycles beq
4 cycles R-type, sw
5 cycles lw
1 fetch instruction, PCPC4
2 decode, fetch registers, brnch target
3 execute/compute address/branch
4 access memory/complete R-type
5 (lw) store memory

4
Summary
5
Implementing a Finite State Machine

internal storage (current state register)
two combinational circuits
next state function output function

N
e
x
t

s
t
a
t
e
Next-stateFunction

current state reg
C
l
o
c
k
I
n
p
u
t
s
OutputFunction
O
u
t
p
u
t
s
6
FSM diagram for Multicycle Machine
cycle1
cycle2
MemRead ALUSrcA 0 IorD 0 IRWrite ALUSrcB
1 ALUOp 0PCWrite PCSrc 0
start new instruction
ALUSrcA 0 ALUSrcB 3 ALUOp 0
1
state 0
lw/sw
beq
R-format
8
cycle3
6
ALUSrcA 1 ALUSrcB 0 ALUOp 1 PCWriteCond PCSrc
1
2
ALUSrcA 1 ALUSrcB 0 ALUOp 2
ALUSrcA 1 ALUSrcB 2 ALUOp 0
Branch Completion
Memory Access
R-format execution
7
FSM controller execution cycles 3-5
from state 6
from state 2
sw
to state 0
lw
3
7
5
cycle4
RegDst 1 RegWrite MemtoReg 0
MemRead IorD 1
MemWrite IorD 1
memory access (step 4)
memory access (step 4)
R-format completion (step 4)
4
cycle5
RegDst 0 RegWrite MemtoReg 1
write-back (step 5)
8
Add Jump

Note
dont care if not mentioned
asserted if name only
otherwise exact value
How many state bits will we need?

9
Simple Questions

How many cycles will it take to execute this
code? lw t2, 0(t3) lw t3,
4(t3) beq t2, t3, Label assume not add
t5, t2, t3 sw t5, 8(t3)Label ...What
is going on during the 8th cycle of execution?
In what cycle does the actual addition of t2
and t3 takes place?

10
Implementing the FSM controller
P
C
W
r
i
t
e
P
C
W
r
i
t
e
C
o
n
d
PLA or ROM implementation of both next-state and
output functions
I
o
r
D
M
e
m
R
e
a
d
M
e
m
W
r
i
t
e
DatapathControl Points
I
R
W
r
i
t
e
M
e
m
t
o
R
e
g
P
C
Src
A
L
U
O
p
O
u
t
p
u
t
s
A
L
U
S
r
c
B
A
L
U
S
r
c
A
R
e
g
W
r
i
t
e
R
e
g
D
s
t
N
S
3

N
S
2
Next-state
N
S
1
I
n
p
u
t
s
N
S
0
5
4
3
2
1
0
p
p
p
p
p
p
3
2
1
0
O
O
O
O
O
O
S
S
S
S
Instruction register opcode field
state register
11
PLA Implementation
12
ROM Implementation

ROM "Read Only Memory"
values of memory locations are fixed ahead of
time
A ROM can be used to implement a truth table
if the address is m-bits, we can address 2m
entries in the ROM.
our outputs are the bits of data that the address
points to.m is the "height", and n is the
"width equal to number of outputs.

13
ROM Implementation

How many inputs are there? 6 bits for opcode, 4
bits for state 10 address lines (i.e., 210
1024 different addresses)
How many outputs are there? 16 datapath-control
outputs, 4 state bits 20 outputs
ROM is 210 x 20 20K bits (and a rather
unusual size, so go for next size chip)
Rather wasteful due to lots of dont care
situations gt the outputs onlydepend on states,
not opcodes.

14
ROM vs PLA

Break up the table into two parts 4 state bits
tell you the 16 outputs, 24 x 16 bits of ROM
10 bits tell you the 4 next state bits, 210 x 4
bits of ROM Total 4.3K bits of ROM gt Lots of
savings.
PLA is much smaller can share product terms
only need entries that produce an active
output can take into account don't cares
Size is (inputs product-terms) (outputs
product-terms) For this example
(10x17)(20x17) 510 PLA cellsPLA cells usually
about the size of a ROM cell (slightly bigger)

15
Alternative to FSM for Multi-cycle?

MIPS-lite has (about) 7 instructions, 10 FSM
states
Real machines have 100 or more instructions real
controllers have hundreds, or even thousands of
states!
Problem FSM Bubble-diagram too large

16
Observation about real machines

Machine Language next instruction to be executed
is usually implied
PC register determines instruction
next instruction always at PC4 (unless branch or
jump)
FSM Controller often only one exit arc from
current state to next state
Suppose borrow idea from Machine Language,
represent each control step as some kind of
instruction?
Leads to Microprogrammed Control

n
n1
n2
n3
17
Micro-programmed Control

In microprogrammed control, FSM states become
microinstructions of a microprogram (microcode)
one FSM stateone microinstruction
usually represent each micro-instruction
textually, like an assembly instruction
FSM current state register becomes the
microprogram counter (micro-PC)
normal sequencing add 1 to micro-PC to get next
micro-instruction
microprogram branch separate logic determines
next microinstruction

18
Microprogramming Vs Hardwired Control

Microprogramming offers flexibility for design
and architectural changes. The control memory
(ROM) can be reprogrammed or replaced. Hardwired
control is difficult to design for complex set
architecture. Once it is designed, no further
change is possible
Microprogramming is slow because the control
memory is accessed in every cycle. Memory access
is slow. Hardwired control is fast because the
cycle time depends on the combinational logic
delay of the control unit, which is much less
than memory access time.

19
Microprogramming

What are the microinstructions ?

20
Microprogramming

A specification methodology
appropriate if hundreds of opcodes, modes,
cycles, etc.
signals specified symbolically using
microinstructionsWill two
implementations of the same architecture have the
same microcode? What would a microassembler do?

21
Microinstruction format
22
Maximally vs. Minimally Encoded

No encoding
1 bit for each datapath operation
faster, requires more memory (logic)
used for Vax 780 an astonishing 400K of memory!
Lots of encoding
send the microinstructions through logic to get
control signals
uses less memory, slower
Historical context of CISC
Too much logic to put on a single chip with
everything else
Use a ROM (or even RAM) to hold the microcode
Its easy to add new instructions

23
Microcode Trade-offs

Distinction between specification and
implementation is sometimes blurred
Specification Advantages
Easy to design and write
Design architecture and microcode in parallel
Implementation (off-chip ROM) Advantages
Easy to change since values are in memory
Can emulate other architectures
Can make use of internal registers
Implementation Disadvantages, SLOWER now that
Control is implemented on same chip as processor
ROM is no longer faster than RAM
No need to go back and make changes

24
Historical Perspective

In the 60s and 70s microprogramming was very
important for implementing machines
This led to more sophisticated ISAs and the VAX
In the 80s RISC processors based on pipelining
became popular
Pipelining the microinstructions is also
possible!
Implementations of IA-32 architecture processors
since 486 use
hardwired control for simpler instructions
(few cycles, FSM control implemented using PLA
or random logic)
microcoded control for more complex
instructions (large numbers of cycles, central
control store)
The IA-64 architecture uses a RISC-style ISA and
can be implemented without a large central
control store

25
Pentium 4

Somewhere in all that control we must handle
complex instructions
Processor executes simple microinstructions, 70
bits wide (hardwired)
120 control lines for integer datapath (400 for
floating point)
If an instruction requires more than 4
microinstructions to implement, control from
microcode ROM (8000 microinstructions)
Its complicated!

26
Chapter 5 Summary

If we understand the instructions We can build
a simple processor!
If instructions take different amounts of time,
multi-cycle is better
Datapath implemented using
Combinational logic for arithmetic
State holding elements to remember bits
Control implemented using
Combinational logic for single-cycle
implementation
Finite state machine for multi-cycle
implementation

27
Pipelining (Chap 6)

Techniques illustrated in chapter 5 are at the
heart of every computer
All recent computers, however, go beyond
techniques of chapter 5, and use pipelining to
improve performance
By overlapping execution of multiple
instructions, pipelining can achieve
throughput close to 1 instruction per clock
cycle (like single-cycle machine)
with a clock cycle time determined by the delay
of individual datapath components (like
multi-cycle machine)