Pipelining - PowerPoint PPT Presentation

About This Presentation

Title:

Pipelining

Description:

... Kaufmann Publishers. Pipelining. Multiple instructions are overlapped in execution. Instruction fetch and execution is divided into steps. ... – PowerPoint PPT presentation

Number of Views:13

Avg rating:3.0/5.0

Slides: 47

Provided by: toda76

Category:

more less

Transcript and Presenter's Notes

Title: Pipelining

1
Pipelining

Multiple instructions are overlapped in
execution.
Instruction fetch and execution is divided into
steps.
A stage in the pipeline takes care of a step.
All stages in the pipeline operate concurrently.
We must have separate resources for each stage.
In MIPS
five steps
each step takes from 1 to 2 ns
the nonpipelined execution of an instruction
takes from 5 to 8 ns
the pipelined execution of an instruction takes
10 ns (all instructions executed in a similar
way)
if pipeline is full, new output every 2 ns

2
Pipelining

Improve performance by increasing instruction
throughput
Ideal speedup is number of stages in the
pipeline. We dont always achieve it.

3
Pipelining

Notice that
the register file operations take 1 ns
writing is done during the first half of the
clock cycle
reading is done during the second half of the
clock cycle
This will help us later

4
Pipelining

What makes it easy
all instructions are the same length
just a few instruction formats
memory operands appear only in loads and stores
aligned data one memory access for one data
item
Hazards make it hard
next instruction cannot execute in the following
clock cycle
structural, control and data hazards
Well build a simple pipeline and look at these
issues
Well talk about modern processors and what
really makes it hard
exception handling
trying to improve performance with out-of-order
execution, etc.

5
Hazards

structural hazards
competition in accessing hardware resources
e.g accessing the memory at the same time
control hazards
problems in controlling the program flow
e.g. branch instructions
data hazards
accessing data that is not yet complete
e.g. an instruction depends on a previous one

6
Resolving Structural Hazard

Suppose, that we had a single memory instead of
two memories.
Data accesses from the memory would be
simultaneous to instruction fetches.
Some structural hazards can be resolved with
extra hardware. If not, stall the pipeline

7
Resolving Control Hazard by Pipeline Stall

Assumption enough extra hardware so that all
branch computations are ready in stage 2.
The next instruction is stalled one extra clock
cycle before starting.
For longer pipelines we often cannot resolve the
branch in the second stage, thus we need another
better solution.

8
Resolving Control Hazard by Prediction

Simple approach always predict that branches
will fail
right the pipeline proceeds at full speed
wrong the pipeline stalls
Another approach predict that branches to an
earlier address are taken
usually right in the case of a loop
Dynamic prediction keep a history for each
branch as taken or untaken.
When the guess is wrong, instructions following
the wrongly guessed branch must have no effect.
The pipeline must be restarted from the proper
address.

9
Resolving Control Hazard by Prediction

Prediction no branch

correct!
wrong!
10
Resolving Control Hazard by Delayed Branch

The next sequential instruction is always
executed.
Assemblers and compilers usually fill the branch
delay slots.
an earlier instruction is moved into the delay
slot
if not found, insert NOP

11
Data Hazard

An example
add s0, t0, t1
sub t2, s0, t3
add writes in stage 5
sub reads data in stage 2
three stalls required
We cannot rely on compilers to avoid data hazards
by rearranging the instruction sequence
these dependencies happen just too often
the delay is just too long
Solution forwarding or bypassing
we dont need to wait for the instruction to
complete
get the missing item early from the internal
resources
Stalls are still needed in some instruction
sequences

12
Forwarding
no stalls
load-use data hazard
one stall
13
Instruction steps mapped onto the datapath

14
Pipelined Datapath

Reuse of functional units in every clock cycle
Additional hardware
Separation of pipeline stages by pipeline
registers
Functional units if used by several instructions
at the same time (for removing structural
hazards)
Extended control
Strict sequentialisation of instruction (every
instruction goes through all stages)
Check for hazards
Introduce stalls to remove hazards

15
Problems

Usually data moves from left to right data
moving from right to left affects later
instructions
Write back into the register file can lead to
data hazards
Selection of the next value of the PC leads to
control hazards

16
Pipelined Datapath
17
Pipelined Datapath

We must add wide enough pipeline registers to
store all the data.
The write register number must be passed from the
instruction.
Adders
single cycle present
multicycle absent (ALU took care of
calculations)
pipeline present

18
Graphically Representing Pipelines

Can help with answering questions like
how many cycles does it take to execute this
code?
what is the ALU doing during cycle 4?
use this representation to help understand
datapaths

19
Traditional Pipeline Diagram

Not as informative as the previous one

20
Pipelined Control
21
Pipelined control

Data travels through the pipeline stages
All data belonging to an instruction must be kept
together
Information transfer only through pipeline
registers
Control information must travel with the
instruction

22
Pipelined control

Instruction fetch / PC Increment
identical for all instructions
read instruction memory
write PC
Instruction decode / Register file read
identical for all instructions
Execution / address calculation
signals RegDst, ALUOp, ALUSrc
Memory access
signals Branch, MemRead, MemWrite
Write Back
signals MemtoReg, RegWrite

23
Pipelined Control

Pass control signals along just like the data

bits 11-15/ reg/
new mem/
16-20 instr
PC ALU
24
Datapath with Control
25
Dependencies

Problem with starting next instruction before
present is finished
dependencies that go backward in time are data
hazards

26
Software Solution

Have compiler guarantee no hazards
Insert no operations sub 2, 1, 3
nop and 12, 2, 5 nop or 13,
6, 2 add 14, 2, 2 sw 15, 100(2)
Problem this really slows us down!

27
Dependency Detection

Hazard conditions
EX/MEM.RegisterRd ID/EX.RegisterRs next
EX/MEM.RegisterRd ID/EX.RegisterRt
instruction
MEM/WB.RegisterRd ID/EX.RegisterRs after
two
MEM/WB.RegisterRd ID/EX.RegisterRt
instructions

28
Forwarding

register file forwarding to handle read/write to
same register
ALU forwarding

29
ALU without Forwarding
e
g
i
s
t
e
r
R
d
30
ALU with Forwarding
31
Forwarding MUX Control Values

MUX control Source Explanation
ForwardA00 ID/EX 1st ALU operand
comes from register file
ForwardA10 EX/MEM 1st ALU operand
forwarded from the prior ALU result
ForwardA01 MEM/WB 1st ALU operand
forwarded from data memory or an earlier
ALU result
ForwardB00 ID/EX 2nd ALU operand
comes from register file
ForwardB10 EX/MEM 2nd ALU operand
forwarded from the prior ALU result
ForwardB01 MEM/WB 2nd ALU operand
forwarded from data memory or an earlier
ALU result

32
Data Hazards and Stalls

Forwarding does not solve all problems.
Load word can still cause a hazard
lw 2, 20(1)
and 4, 2, 5
An instruction tries to read a register following
a load instruction that writes to the same
register.
We need a hazard detection unit to stall the
pipeline.

33
Data Hazards and Stalls
34
Hazard Detection

if (ID/EX.MemRead and
((ID/EX.RegisterRt IF/ID.RegisterRs) or
(ID/EX.RegisterRt IF/ID.RegisterRt)))
stall the pipeline
check for load instructions
check if the register to be loaded is part of
the current instruction
We can stall the pipeline by keeping an
instruction in the same stage.

35
Stalling
36
Hazard Detection Unit

The hazard detection unit stalls if the load-use
hazard test is true.

37
Branch Hazards

When we decide to branch, other instructions are
in the pipeline!
We are predicting branch not taken
need to add hardware for flushing instructions if
we are wrong

38
Reducing the Delay of Branches

Move branch decision earlier in the pipeline, so
that fewer instructions need be flushed.
Select branch address either at
end of EX stage (two cycle penalty) or at
end of ID stage (one cycle penalty)
Move the branch address adder to ID stage
Branch detection in ID stage
EXCLUSIVE-OR of the bits of the registers
OR of the results
Clear instruction field in IF/ID pipeline ?
creates a NOP

39
Flushing Instructions

40
Dynamic Branch Prediction

Analyse the branch history
keep a list of recent branch instructions
save low order bits of the address only - limits
the precision, but its only a prediction
Action
if branch is taken, set mark
if branch is not taken, reset mark
Prediction accuracy is limited (twice wrong
in a loop)
Improvement 2 bit prediction scheme
prediction must be wrong twice before it is
changed
better prediction for loops (once wrong in a
loop)

41
Exceptions