CS 161Computer Architecture Chapter 5 Lecture 12 - PowerPoint PPT Presentation

About This Presentation
Title:

CS 161Computer Architecture Chapter 5 Lecture 12

Description:

4 cycles: R-type, sw. 5 cycles: lw. 1: fetch instruction, PC=PC 4 ... used for Vax 780 an astonishing 400K of memory! Lots of encoding: ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 28
Provided by: davep173
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: CS 161Computer Architecture Chapter 5 Lecture 12


1
CS 161Computer Architecture Chapter 5Lecture 12
  • Instructor L.N. Bhuyan
  • www.cs.ucr.edu/bhuyan

2
Datapath Control Points
MemRead
IRWrite
RegWrite
PCWrite
PCSrc
MemWrite
ALUSrcA
IorD
RegDst
PCWrite- Cond
PC
M u x
ReadReg1
Address
M u x
2521
Readdata1
z
Mem
A L U
A
ReadReg2
ALU- Out
M u x
2016
Read Data
Readdata2
WriteReg
B
M u x
150
Write Data
4
1511
0 1M 2 u 3 x
IR
Regs
3
WriteData
MDR
M u x
ltlt 2
2
2
(funct) 50
ALUSrcB
MemtoReg
ALUOp
3
Multicycle Instruction Execution
  • All instructions execute in 3-5 cycles
  • 3 cycles beq
  • 4 cycles R-type, sw
  • 5 cycles lw
  • 1 fetch instruction, PCPC4
  • 2 decode, fetch registers, brnch target
  • 3 execute/compute address/branch
  • 4 access memory/complete R-type
  • 5 (lw) store memory

4
Summary
5
Implementing a Finite State Machine
  • internal storage (current state register)
  • two combinational circuits
  • next state function output function

N
e
x
t

s
t
a
t
e
Next-stateFunction

current state reg
C
l
o
c
k
I
n
p
u
t
s
OutputFunction
O
u
t
p
u
t
s
6
FSM diagram for Multicycle Machine
cycle1
cycle2
MemRead ALUSrcA 0 IorD 0 IRWrite ALUSrcB
1 ALUOp 0PCWrite PCSrc 0
start new instruction
ALUSrcA 0 ALUSrcB 3 ALUOp 0
1
state 0
lw/sw
beq
R-format
8
cycle3
6
ALUSrcA 1 ALUSrcB 0 ALUOp 1 PCWriteCond PCSrc
1
2
ALUSrcA 1 ALUSrcB 0 ALUOp 2
ALUSrcA 1 ALUSrcB 2 ALUOp 0
Branch Completion
Memory Access
R-format execution
7
FSM controller execution cycles 3-5
from state 6
from state 2
sw
to state 0
lw
3
7
5
cycle4
RegDst 1 RegWrite MemtoReg 0
MemRead IorD 1
MemWrite IorD 1
memory access (step 4)
memory access (step 4)
R-format completion (step 4)
4
cycle5
RegDst 0 RegWrite MemtoReg 1
write-back (step 5)
8
Add Jump
  • Note
  • dont care if not mentioned
  • asserted if name only
  • otherwise exact value
  • How many state bits will we need?

9
Simple Questions
  • How many cycles will it take to execute this
    code? lw t2, 0(t3) lw t3,
    4(t3) beq t2, t3, Label assume not add
    t5, t2, t3 sw t5, 8(t3)Label ...What
    is going on during the 8th cycle of execution?
  • In what cycle does the actual addition of t2
    and t3 takes place?

10
Implementing the FSM controller
P
C
W
r
i
t
e
P
C
W
r
i
t
e
C
o
n
d
PLA or ROM implementation of both next-state and
output functions
I
o
r
D
M
e
m
R
e
a
d
M
e
m
W
r
i
t
e
DatapathControl Points
I
R
W
r
i
t
e
M
e
m
t
o
R
e
g
P
C
Src
A
L
U
O
p
O
u
t
p
u
t
s
A
L
U
S
r
c
B
A
L
U
S
r
c
A
R
e
g
W
r
i
t
e
R
e
g
D
s
t
N
S
3

N
S
2
Next-state
N
S
1
I
n
p
u
t
s
N
S
0
5
4
3
2
1
0
p
p
p
p
p
p
3
2
1
0
O
O
O
O
O
O
S
S
S
S
Instruction register opcode field
state register
11
PLA Implementation
12
ROM Implementation
  • ROM "Read Only Memory"
  • values of memory locations are fixed ahead of
    time
  • A ROM can be used to implement a truth table
  • if the address is m-bits, we can address 2m
    entries in the ROM.
  • our outputs are the bits of data that the address
    points to.m is the "height", and n is the
    "width equal to number of outputs.

13
ROM Implementation
  • How many inputs are there? 6 bits for opcode, 4
    bits for state 10 address lines (i.e., 210
    1024 different addresses)
  • How many outputs are there? 16 datapath-control
    outputs, 4 state bits 20 outputs
  • ROM is 210 x 20 20K bits (and a rather
    unusual size, so go for next size chip)
  • Rather wasteful due to lots of dont care
    situations gt the outputs onlydepend on states,
    not opcodes.

14
ROM vs PLA
  • Break up the table into two parts 4 state bits
    tell you the 16 outputs, 24 x 16 bits of ROM
    10 bits tell you the 4 next state bits, 210 x 4
    bits of ROM Total 4.3K bits of ROM gt Lots of
    savings.
  • PLA is much smaller can share product terms
    only need entries that produce an active
    output can take into account don't cares
  • Size is (inputs product-terms) (outputs
    product-terms) For this example
    (10x17)(20x17) 510 PLA cellsPLA cells usually
    about the size of a ROM cell (slightly bigger)

15
Alternative to FSM for Multi-cycle?
  • MIPS-lite has (about) 7 instructions, 10 FSM
    states
  • Real machines have 100 or more instructions real
    controllers have hundreds, or even thousands of
    states!
  • Problem FSM Bubble-diagram too large

16
Observation about real machines
  • Machine Language next instruction to be executed
    is usually implied
  • PC register determines instruction
  • next instruction always at PC4 (unless branch or
    jump)
  • FSM Controller often only one exit arc from
    current state to next state
  • Suppose borrow idea from Machine Language,
    represent each control step as some kind of
    instruction?
  • Leads to Microprogrammed Control

n
n1
n2
n3
17
Micro-programmed Control
  • In microprogrammed control, FSM states become
    microinstructions of a microprogram (microcode)
  • one FSM stateone microinstruction
  • usually represent each micro-instruction
    textually, like an assembly instruction
  • FSM current state register becomes the
    microprogram counter (micro-PC)
  • normal sequencing add 1 to micro-PC to get next
    micro-instruction
  • microprogram branch separate logic determines
    next microinstruction

18
Microprogramming Vs Hardwired Control
  • Microprogramming offers flexibility for design
    and architectural changes. The control memory
    (ROM) can be reprogrammed or replaced. Hardwired
    control is difficult to design for complex set
    architecture. Once it is designed, no further
    change is possible
  • Microprogramming is slow because the control
    memory is accessed in every cycle. Memory access
    is slow. Hardwired control is fast because the
    cycle time depends on the combinational logic
    delay of the control unit, which is much less
    than memory access time.

19
Microprogramming
  • What are the microinstructions ?

20
Microprogramming
  • A specification methodology
  • appropriate if hundreds of opcodes, modes,
    cycles, etc.
  • signals specified symbolically using
    microinstructionsWill two
    implementations of the same architecture have the
    same microcode? What would a microassembler do?

21
Microinstruction format
22
Maximally vs. Minimally Encoded
  • No encoding
  • 1 bit for each datapath operation
  • faster, requires more memory (logic)
  • used for Vax 780 an astonishing 400K of memory!
  • Lots of encoding
  • send the microinstructions through logic to get
    control signals
  • uses less memory, slower
  • Historical context of CISC
  • Too much logic to put on a single chip with
    everything else
  • Use a ROM (or even RAM) to hold the microcode
  • Its easy to add new instructions

23
Microcode Trade-offs
  • Distinction between specification and
    implementation is sometimes blurred
  • Specification Advantages
  • Easy to design and write
  • Design architecture and microcode in parallel
  • Implementation (off-chip ROM) Advantages
  • Easy to change since values are in memory
  • Can emulate other architectures
  • Can make use of internal registers
  • Implementation Disadvantages, SLOWER now that
  • Control is implemented on same chip as processor
  • ROM is no longer faster than RAM
  • No need to go back and make changes

24
Historical Perspective
  • In the 60s and 70s microprogramming was very
    important for implementing machines
  • This led to more sophisticated ISAs and the VAX
  • In the 80s RISC processors based on pipelining
    became popular
  • Pipelining the microinstructions is also
    possible!
  • Implementations of IA-32 architecture processors
    since 486 use
  • hardwired control for simpler instructions
    (few cycles, FSM control implemented using PLA
    or random logic)
  • microcoded control for more complex
    instructions (large numbers of cycles, central
    control store)
  • The IA-64 architecture uses a RISC-style ISA and
    can be implemented without a large central
    control store

25
Pentium 4
  • Somewhere in all that control we must handle
    complex instructions
  • Processor executes simple microinstructions, 70
    bits wide (hardwired)
  • 120 control lines for integer datapath (400 for
    floating point)
  • If an instruction requires more than 4
    microinstructions to implement, control from
    microcode ROM (8000 microinstructions)
  • Its complicated!

26
Chapter 5 Summary
  • If we understand the instructions We can build
    a simple processor!
  • If instructions take different amounts of time,
    multi-cycle is better
  • Datapath implemented using
  • Combinational logic for arithmetic
  • State holding elements to remember bits
  • Control implemented using
  • Combinational logic for single-cycle
    implementation
  • Finite state machine for multi-cycle
    implementation

27
Pipelining (Chap 6)
  • Techniques illustrated in chapter 5 are at the
    heart of every computer
  • All recent computers, however, go beyond
    techniques of chapter 5, and use pipelining to
    improve performance
  • By overlapping execution of multiple
    instructions, pipelining can achieve
  • throughput close to 1 instruction per clock
    cycle (like single-cycle machine)
  • with a clock cycle time determined by the delay
    of individual datapath components (like
    multi-cycle machine)
Write a Comment
User Comments (0)
About PowerShow.com