Enhancing Performance with Pipelining - PowerPoint PPT Presentation

About This Presentation

Title:

Enhancing Performance with Pipelining

Description:

Need to send back either ALU result or memory value to the register file. Pipeline Control (2) ... in the WB stage, because the register file is able to be ... – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 73

Provided by: zanziba

Category:

more less

Transcript and Presenter's Notes

Title: Enhancing Performance with Pipelining

1
Enhancing Performance with Pipelining
Slides developed by Rami Abielmona and modified
by Miodrag Bolic High-Level Computer Systems
Design
2
Presentation Outline (1)

What is pipelining ?
Pipeline Taxonomies
Instruction Pipelines
MIPS Instruction Pipeline
Pipeline Hazards
MIPS Pipelined Datapath
Load Word Instruction Example
Pipeline Datapath Example
Pipeline Control
Pipeline Instruction Example

3
Presentation Outline (2)

Pipeline Hazards
Control Hazards
Data Hazards
Detecting Data Hazards
Resolving Data Hazards
Forwarding Example
Stalling Example
Branch Hazards
Branching Example
Key terms

4
What is Pipelining ? (1)

There are two main ways to increase the
performance of a processor through high-level
system architecture
Increasing the memory access speed
Increasing the number of supported concurrent
operations
Pipelining !
Parallelism ?
Pipelining is the process by which instructions
are parallelized over several overlapping stages
of execution, in order to maximize datapath
efficiency

5
What is Pipelining ? (2)

Pipelining is analogous to many everyday
scenarios
Car manufacturing process
Batch laundry jobs
Basically, any assembly-line operation applies
Two important concepts
New inputs are accepted at one end before
previously accepted inputs appear as outputs at
the other end
The number of operations performed per second is
increased, even though the elapsed time needed to
perform any one operation remains the same

6
What is Pipelining ? (3)

Looking at the textbooks example, we have a
4-stage pipeline of laundry tasks
Place one dirty load of clothes into washer
Place the washed clothes into a dryer
Place a dry load on a table and fold
Put the clothes away
Graphically speaking
Sequential (top) vs.
Pipelined (bottom) execution

7
Pipeline Taxonomies

There are two types of pipelines used in computer
systems
Arithmetic pipelines
Used to pipeline data intensive functionalities
Instruction pipelines
Used to pipeline the basic instruction fetch and
execute sequence
Other classifications include
Linear vs. nonlinear pipelines
Presence (or lack) of feedforward and feedback
paths between stages
Static vs. dynamic pipelines
Dynamic pipelines are multifunctional, taking on
a different form depending on the function being
executed
Scalar vs. vector pipelines
Vector pipelines specifically target computations
using vector data

8
MIPS Instruction Pipeline (1)

Let us now introduce the pipeline were working
with
Its a 5-stage instruction, linear, static and
scalar pipeline, consisting of the following
steps
Fetch instruction from Memory (IF)
Read registers while decoding the instruction
(ID)
Execute the operation or calculate an address
(EX)
Access an operand in data memory (MEM)
Write the result into a register (WB)
Again, theoretically, pipeline speedup number
of stages in pipeline

9
MIPS Instruction Pipeline (2)

Inst. Fetch (2ns), Reg. read/write (1ns), ALU op.
(2ns), Data access (2ns)

10
Single Cycle, Multiple Cycle, vs. Pipeline 1
Cycle 1
Cycle 2
Clk
Single Cycle Implementation
Load
Store
Waste
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
Multiple Cycle Implementation
Load
Store
R-type
Pipeline Implementation
Load
Store
R-type
11
Why Pipeline?

Suppose
100 instructions are executed
The single cycle machine has a cycle time of 45
ns
The multicycle and pipeline machines have cycle
times of 10 ns
The multicycle machine has a CPI of 4.6
Single Cycle Machine
45 ns/cycle x 1 CPI x 100 inst 4500 ns
Multicycle Machine
10 ns/cycle x 4.6 CPI x 100 inst 4600 ns
Ideal pipelined machine
10 ns/cycle x (1 CPI x 100 inst 4 cycle drain)
1040 ns
Ideal pipelined vs. single cycle speedup
4500 ns / 1040 ns 4.33
What has not yet been considered?

12
MIPS Instruction Pipeline (3) 2

What makes it easy
all instructions are the same length
just a few instruction formats
memory operands appear only in loads and stores
What makes it hard?
structural hazards suppose we had only one
memory
control hazards need to worry about branch
instructions
data hazards an instruction depends on a
previous instruction
Well build a simple pipeline and look at these
issues

13
Pipeline Hazards 1

structural hazards attempt to use the same
resource two different ways at the same time
E.g., two instructions try to read the same
memory at the same time
data hazards attempt to use item before it is
ready
instruction depends on result of prior
instruction still in the pipeline
add r1, r2, r3
sub r4, r2, r1
control hazards attempt to make a decision
before condition is evaulated
branch instructions
beq r1, loop
add r1, r2, r3
Can always resolve hazards by waiting
pipeline control must detect the hazard
take action (or delay action) to resolve hazards

14
MIPS Pipelined Datapath (1)

What do we need to split the datapath into stages
?

15
MIPS Pipelined Datapath (2)

Pipeline registers (buffers) are similar to
multicycle processor design

16
Load Word Instruction (1)

Instruction fetch stage

17
Load Word Instruction (2)

Instruction decode and register file read stage

18
Load Word Instruction (3)

Execute or address calculation stage

19
Load Word Instruction (4)

Memory access stage

20
Load Word Instruction (5)

Write back stage

21
Load Word Corrected Datapath

Write register number comes from the MEM/WB
pipeline register along with the data

22
Graphical Representations
Multiple-clock cycle (vs. single-clock cycle)
pipelined diagrams
23
Pipeline Datapath Example (1)

Single-cycle pipeline diagram with one
instruction on the pipeline

24
Pipeline Datapath Example (2)

Single-cycle pipeline diagram with two
instructions on the pipeline

25
Pipeline Control (1)

What control signals are required ?
First, notice that the pipeline registers are
written every clock cycle, hence do not require
explicit control signals, otherwise
Instruction fetch and PC increment
Again, asserted at every clock cycle
Instruction decode and register file read
Again, asserted at every clock cycle
Execution and address calculation
Need to select the result register, the ALU
operation, and either Read data 2 or the
sign-extended immediate for the ALU
Memory access
Need to read from memory, write to memory or
complete branch
Write back
Need to send back either ALU result or memory
value to the register file

26
Pipeline Control (2)
27
Pipeline Control (3)
28
Pipeline Datapath with Control
29
Pipeline Instruction Example (1)
30
Pipeline Instruction Example (2)
31
Pipeline Instruction Example (3)
32
Pipeline Instruction Example (4)
33
Pipeline Instruction Example (5)
34
Pipeline Instruction Example (6)
35
Pipeline Instruction Example (7)
36
Pipeline Instruction Example (8)
37
Pipeline Instruction Example (9)
38
Pipeline Hazards

Structural hazard
Occurs when a combination of instructions is not
supported by the datapath
For example, a unified memory unit would need to
be accessed in stages 1 (IF) and 4 (MEM), which
would cause a contention
Pipeline outright fails in the presence of
structural hazards
Control hazard
Occurs when a decision is made based on the
results of one instructions, while others are
executing
For example, a branch instruction is either taken
or not
Solutions that exist are stalling and predicting
Data hazard
Occurs when an instruction depends on the results
of an instruction resident on the pipeline
For example, adding two register contents and
storing their result into a third register, then
using that registers contents for another
operation
Solutions that exist are based on forwarding

39
Control Hazards - Stalling

Three major solutions
Stall
Predict
Delayed branch slot
Stalling involves always waiting for the PC to be
updated with the correct address before moving on
A pipeline stall (or bubble) allows us to perform
this wait
Quite costly, as we have to stall even if the
branch fails

40
Control Hazards - Predicting

Predicting involves guessing whether the branch
is taken or not, and acting on that guess
If correct, then proceed with normal pipeline
execution
If incorrect, then stall pipeline execution

41
Control Hazards Delayed branch

Delayed branch involves executing the next
sequential instruction with the branch taking
place after that delayed branch slot
The assembler automatically adjusts the
instructions to make it transparent from the
programmer
The instruction has to be safe, as in it
shouldnt affect the branch
Longer pipelines requires the use of more branch
delay slots
Actual MIPS architecture solution

42
Data Hazards Forwarding (1)

Forwarding involves providing the inputs to a
stage of one instruction before the completion of
another instruction
Valid if destination stage is later in time than
the source stage
Left diagram shows typical forwarding scenario
(add then sub)
Right diagram shows that we still need a stall in
the case of a load-use data hazard (load then
R-type)

43
Data Hazards Forwarding (2)

sub 2, 1, 3
and 12, 2, 5
or 13, 6, 2
add 14, 2, 2
sw 14, 100(2)

44
Data Hazards Crude Solution

We could insert no operation (nop) instructions
to delay the pipeline execution until the correct
result is in the register file
sub 2, 1, 3
nop
nop
and 12, 2, 5
or 13, 6, 2
add 14, 2, 2
sw 14, 100(2)
Too slow as it adds extra useless clock cycles
In reality, we try to find useful instructions to
execute between data-dependent instructions, but
this happens too often to be efficient

45
Data Hazards Detection (1)

Let us try to formalize detecting a data hazard
EX/MEM.RegisterRd ID/EX.RegisterRs
EX/MEM.RegisterRd ID/EX.RegisterRt
MEM/WB.RegisterRd ID/EX.RegisterRs
MEM/WB.RegisterRd ID/EX.RegisterRt
sub 2, 1, 3
and 12, 2, 5 Data hazard of type 1
or 13, 6, 2 Data hazard of type 4
add 14, 2, 2 No data hazard register file
sw 14, 100(2) No data hazard correct operation

46
Data Hazards Detection (2)

Two modifications are in order
Firstly, we dont have to forward all the time!
Some instructions dont write registers (e.g.
beq)
Use RegWrite signal in WB control block to
determine condition
Secondly, the 0 register must always return 0
Cant limit programmer of using it as a
destination register
Use RegisterRd to determine if 0 is being used
If (EX/MEM.RegWrite (EX/MEM.RegisterRd ? 0)
(EX/MEM.RegisterRdID/EX.RegisterRs)) ForwardA
10
If (EX/MEM.RegWrite (EX/MEM.RegisterRd ? 0)
(EX/MEM.RegisterRdID/EX.RegisterRt)) ForwardB
10
If (MEM/WB.RegWrite (MEM/WB.RegisterRd ? 0)
(MEM/WB.RegisterRdID/EX.RegisterRs)) ForwardA
01
If (MEM/WB.RegWrite (MEM/WB.RegisterRd ? 0)
(MEM/WB.RegisterRdID/EX.RegisterRt)) ForwardB
01
Let us examine the hardware changes to our
datapath

47
Data Hazards Forwarding Unit (1)
48
Data Hazards Forwarding Unit (2)

Remember that there is no hazard in the WB stage,
because the register file is able to be written
and read in the same stage

49
Data Hazards Forwarding Unit (3)
50
Data Hazards Forwarding Unit (4)
51
Forwarding Example (1)
52
Forwarding Example (2)
53
Forwarding Example (3)
54
Forwarding Example (4)
55
Data Hazards Stalling (1)

lw 2, 20(1)
and 4, 2, 5
or 8, 2, 6
add 9, 4, 2
slt 1, 6, 7

56
Data Hazards Stalling (2)

Let us try to formalize detecting a stalling data
hazard
If (ID/EX.MemRead ((ID/EX.RegisterRt
IF/ID.RegisterRs) or (ID/EX.RegisterRt
IF/ID/RegisterRt)))
On the condition being true, we stall the
pipeline!

57
Data Hazards Stalling (3)
58
Stalling Example (1)
59
Stalling Example (2)
60
Stalling Example (3)
61
Stalling Example (4)
62
Stalling Example (5)
63
Stalling Example (6)
64
Branch Hazards

Other instructions are on the pipeline when we
find out whether we take the branch or not!

65
Branch Hazards Stalling (1)

Two solutions
Assume branch is not taken
Dynamic branch prediction
Weve already discussed the first solution
Note that three instruction stages have to be
flushed when the branch is taken
Done similarly to a data hazard stall (control
values set to 0s)
We can increase branch performance by moving the
branch decision to the ID stage (rather than the
MEM stage)
Branch target address calculated by moving adder
into ID stage
Branch decision done by comparing Rs and Rt
Flushing the IF stage instruction involves nop
instructions