CS1104 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

CS1104

Description:

Hence, clock cycle is 8ns. Clock cycle is determined by the longest path in the ... However, several other instructions could fit into a shorter clock cycle ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 41
Provided by: samarjitch1
Category:
Tags: cs1104 | cycle

less

Transcript and Presenter's Notes

Title: CS1104


1
CS1104 Computer Organization
  • PART 2 Computer Architecture
  • Lecture 7
  • Multicycle Control and Datapath

2
Single Cycle Implementation
  • Calculate cycle time assuming negligible delays
    except
  • memory (2ns), ALU and adders (2ns), register file
    access (1ns)

3
Why single cycle implementation is not used?
  • Assume the following access times Memory (2ns),
    ALU adders (2ns), reg. file access (1ns)
  • Fixed length clock longest instruction is the
    lw which requires 8 ns
  • Load uses five functional units instruction
    memory, register file, ALU, data memory, register
    file once again
  • Hence, clock cycle is 8ns
  • Clock cycle is determined by the longest path in
    the machine (lw in this case)
  • However, several other instructions could fit
    into a shorter clock cycle

4
Why single cycle implementation is not used?
  • R-type Instruction fetch, Reg access, ALU, Reg
    access
  • Load Instruction fetch, Reg access, ALU, Mem
    access, Reg access
  • Store Instruction fetch, Reg access, ALU, Mem
    access
  • Branch Instruction fetch, Reg access, ALU
  • Jump Instruction fetch

Note the difference between Load and Jump. This
difference becomes even more significant of there
are floating-point instructions.
5
Multicycle implementation Basics
  • In the previous slide, the execution of each
    instruction was broken into several steps
  • In a multicycle implementation, each such step
    executes in 1 clock cycle
  • Hence, different instructions require different
    number of clock cycles
  • Advantages
  • More efficient
  • A functional unit can be used more than once per
    instruction, as long as it is used in different
    clock cycles (so less hardware is required)
  • But the design is more complex

6
Single-Cycle versus Multicycle
  • In a multicycle architecture
  • Single memory unit for both instruction and data
  • Single ALU, rather than one ALU and two adders
  • One or more registers added after each functional
    unit to hold the output of that unit, until the
    value is used in the next clock cycle

Multicycle architecture
Single cycle architecture
7
Multicycle implementation Additional Registers
  • Instruction Register, Memory Data Register,
    Registers A and B in front of the Reg file and
    ALUOut (reg in front of the ALU)
  • At the end of each clock cycle, the data to be
    used in subsequent clock cycles is stored in a
    state element
  • data to be used in subsequent instructions in a
    later clock cycle is stored in a
    programmer-visible state element like reg file,
    PC or memory
  • data used by the same instruction in a later
    cycle is stored in one of the additional
    registers

8
Multicycle implementation Basics
  • Each clock cycle can accommodate at most one of
    the following operations
  • a memory access
  • a register file access (two reads or one write)
  • an ALU operation
  • Hence, any data produced by one of the above
    three functional units must be saved into a
    temporary register for use in a later cycle

9
Multicycle implementation Additional Registers
I
n
s
t
r
u
c
t
i
o
n
r
e
g
i
s
t
e
r
D
a
t
a
P
C
A
d
d
r
e
s
s
A
R
e
g
i
s
t
e
r


I
n
s
t
r
u
c
t
i
o
n
M
e
m
o
r
y
R
e
g
i
s
t
e
r
s
A
L
U
A
L
U
O
u
t
o
r

d
a
t
a
R
e
g
i
s
t
e
r


M
e
m
o
r
y
d
a
t
a

B
D
a
t
a
r
e
g
i
s
t
e
r
R
e
g
i
s
t
e
r


All registers except the Instruction register
(IR) hold data only between a pair of adjacent
clock cycles (and hence do not need a write
control signal)
10
Multicycle implementation Examples
ALU used to compute PC PC 4
The same ALU is also used for R-type
instructions, branch address computation,
computing memory address in the case of lw/sw
instructions
11
Multicycle Approach Summary
  • Break up the instructions into steps, each step
    takes a cycle
  • balance the amount of work to be done
  • restrict each cycle to use only one major
    functional unit
  • At the end of a cycle
  • store values for use in later cycles (easiest
    thing to do)
  • introduce additional internal registers
  • Notice we distinguish
  • processor state programmer visible registers
  • internal state programmer invisible registers
    (like IR, MDR, A, B, and ALUout)

12
Multicycle implementation Steps
  • Instruction fetch
  • Instruction decode and register fetch
  • Execution, memory address computation or branch
    completion
  • Memory access or R-type instruction completion
  • Memory read completion

common for all instructions
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
13
Step 1 Instruction Fetch
  • Use PC to get instruction and put it in the
    Instruction Register
  • Increment the PC by 4 and put the result back in
    the PC
  • Can be described succinctly using RTL
    "Register-Transfer Language" IR
    MemoryPC PC PC 4Can we figure out the
    values of the control signals?What is the
    advantage of updating the PC now?

This step is common for all instructions
(obviously!)
14
Step 2 Instruction Decode and Register Fetch
  • Read registers rs and rt in case we need them
  • Compute the branch address in case the
    instruction is a branch
  • Previous two actions are done optimistically (no
    harm is done)
  • RTL A RegIR25-21 B RegIR20-16
    ALUOut PC(sign-extend(IR15-0)ltlt 2)
  • We aren't setting any control lines based on the
    instruction type (we are busy "decoding" it in
    our control logic)

This step is also common for all instructions
15
Step 3 (instruction dependent)
  • ALU is performing one of four functions, based on
    instruction type
  • Memory Reference ALUOut A
    sign-extend(IR15-0)
  • R-type ALUOut A op B
  • Branch if (AB) PC ALUOut
  • Jump
  • PC PC31-28 (IR25-0ltlt2)

16
Step 4 (R-type or memory-access)
  • Loads and stores access memory MDR
    MemoryALUOut or MemoryALUOut B
  • R-type instructions finish RegIR15-11
    ALUOutThe write actually takes place at the
    end of the cycle on the edge

17
Step 5 Write-back step
  • Memory read completion step RegIR20-16
    MDR

18
Summary execution steps
Steps taken to execute any instruction class
19
Determining the values of the control signals
for each of Steps 1 5(we will show only Step
1)
20
Step 1 Instruction Fetch Step
  • IR MemoryPC
  • PC PC 4

MemRead1
IorD0
IRWrite1
21
Step 1 Instruction Fetch Step
  • IR MemoryPC
  • PC PC 4

Increment PC by 4 ALUSrcA0 ALUSrcB01
ALUOp00 (for ALU to ADD)
22
Step 1 Instruction Fetch Step
  • IR MemoryPC
  • PC PC 4

Store incremented instruction address back to PC
PCSource00 PCWrite1
23
Determining the values of the control
signalsSteps 2 5 are similar(Please work
them out on your own)
24
Designing the Control Unit
25
Finite state machines (FSMs)
  • Finite state machines
  • a set of states
  • next state function (determined by current state
    and the input)
  • output function (determined by current state and
    possibly input)

26
Finite state machines FSMs)
  • State is an abstraction
  • You may consider the state of a FSM to be a
    variable or a function, or a collection of
    variables or functions
  • If the output depends only on the current state,
    then it is a Moore machine. If the output depends
    on the state and the input then it is a Mealy
    machine

output 0
output 1
This machine has two states. How does the output
behave when the input 1?
27
Moore machine
  • The output function depends only on the current
    state
  • The next state function depends on the current
    state and the input

28
Implementing the Control
  • Value of control signals is dependent upon
  • what instruction is being executed
  • which step is being performed
  • Use the information we have accumulated (ex
    control signals for Step 1) to specify a finite
    state machine (FSM)
  • specify the finite state machine graphically, or
  • use microprogramming
  • Implementation can be derived from specification

29
FSM high level view
Start/reset
Instruction fetch, decode and register fetch
Memory access instructions
R-type instructions
Branch instruction
Jump instruction
30
FSM implementation of the control unit
31
FSM for memory reference instructions
32
FSMs for other instructions
Branch instruction
Jump instruction
R-type instructions
33
The Full FSM for the Control Unit
Obtained by simply joining the FSMs in the
previous slides
34
Finite State Machine for Control
  • Implementation

35
PLA (programmed logic array) Implementation
opcode
AND plane (computes minterms)
current state
datapath control
OR plane (computes sum terms)
next state
36
ROM Implementation
  • ROM "Read Only Memory"
  • values of memory locations are fixed ahead of
    time
  • A ROM can be used to implement a truth table
  • if the address is m-bits, we can address 2m
    entries in the ROM
  • our outputs are the bits of data that the address
    points to

address
data
ROM
0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1
0 1 1 1 0 1 1 1
n bits
m bits
m is the "heigth", and n is the "width"
37
ROM Implementation
  • How many inputs are there? 6 bits for opcode, 4
    bits for state 10 address lines (i.e., 210
    1024 different addresses)
  • How many outputs are there? 16 datapath-control
    outputs, 4 state bits 20 outputs
  • ROM is 210 x 20 20K bits (very large and a
    rather unusual size)
  • Rather wasteful, since for lots of the entries,
    the outputs are the same i.e., opcode is often
    ignored

38
ROM Implementation
  • Cheaper implementation
  • Exploit the fact that the FSM is a Moore machine
    gt
  • Control outputs only depend on current state and
    not on other incoming control signals !
  • Next state depends on all inputs
  • Break up the table into two parts 4 state bits
    tell you the 16 outputs, 24 x 16 bits of
    ROM 10 bits tell you the 4 next state bits,
    210 x 4 bits of ROM Total number of bits
    4.3K bits of ROM

39
Other implementation options
  • Microprogramming
  • Read Section 5.7 of the textbook (in the CD)
  • This is not included in the syllabus

40
Required Reading
  • Textbook (3rd edition)
  • Section 5.5 and Section 5.7 (optional)
  • 2nd edition of the textbook
  • Section 5.4 and Section 5.5 (optional)
Write a Comment
User Comments (0)
About PowerShow.com