Title: Pipelining
1Pipelining
- Multiple instructions are overlapped in
execution. - Instruction fetch and execution is divided into
steps. - A stage in the pipeline takes care of a step.
- All stages in the pipeline operate concurrently.
- We must have separate resources for each stage.
- In MIPS
- five steps
- each step takes from 1 to 2 ns
- the nonpipelined execution of an instruction
takes from 5 to 8 ns - the pipelined execution of an instruction takes
10 ns (all instructions executed in a similar
way) - if pipeline is full, new output every 2 ns
2Pipelining
- Improve performance by increasing instruction
throughput -
-
- Ideal speedup is number of stages in the
pipeline. We dont always achieve it.
3Pipelining
- Notice that
- the register file operations take 1 ns
- writing is done during the first half of the
clock cycle - reading is done during the second half of the
clock cycle - This will help us later
4Pipelining
- What makes it easy
- all instructions are the same length
- just a few instruction formats
- memory operands appear only in loads and stores
- aligned data one memory access for one data
item - Hazards make it hard
- next instruction cannot execute in the following
clock cycle - structural, control and data hazards
- Well build a simple pipeline and look at these
issues - Well talk about modern processors and what
really makes it hard - exception handling
- trying to improve performance with out-of-order
execution, etc.
5Hazards
- structural hazards
- competition in accessing hardware resources
- e.g accessing the memory at the same time
- control hazards
- problems in controlling the program flow
- e.g. branch instructions
- data hazards
- accessing data that is not yet complete
- e.g. an instruction depends on a previous one
6Resolving Structural Hazard
- Suppose, that we had a single memory instead of
two memories. - Data accesses from the memory would be
simultaneous to instruction fetches. - Some structural hazards can be resolved with
extra hardware. If not, stall the pipeline
7Resolving Control Hazard by Pipeline Stall
- Assumption enough extra hardware so that all
branch computations are ready in stage 2. - The next instruction is stalled one extra clock
cycle before starting. - For longer pipelines we often cannot resolve the
branch in the second stage, thus we need another
better solution.
8Resolving Control Hazard by Prediction
- Simple approach always predict that branches
will fail - right the pipeline proceeds at full speed
- wrong the pipeline stalls
- Another approach predict that branches to an
earlier address are taken - usually right in the case of a loop
- Dynamic prediction keep a history for each
branch as taken or untaken. - When the guess is wrong, instructions following
the wrongly guessed branch must have no effect.
The pipeline must be restarted from the proper
address.
9Resolving Control Hazard by Prediction
correct!
wrong!
10Resolving Control Hazard by Delayed Branch
- The next sequential instruction is always
executed. - Assemblers and compilers usually fill the branch
delay slots. - an earlier instruction is moved into the delay
slot - if not found, insert NOP
11Data Hazard
- An example
- add s0, t0, t1
- sub t2, s0, t3
- add writes in stage 5
- sub reads data in stage 2
- three stalls required
- We cannot rely on compilers to avoid data hazards
by rearranging the instruction sequence - these dependencies happen just too often
- the delay is just too long
- Solution forwarding or bypassing
- we dont need to wait for the instruction to
complete - get the missing item early from the internal
resources - Stalls are still needed in some instruction
sequences
12Forwarding
no stalls
load-use data hazard
one stall
13Instruction steps mapped onto the datapath
14Pipelined Datapath
- Reuse of functional units in every clock cycle
- Additional hardware
- Separation of pipeline stages by pipeline
registers - Functional units if used by several instructions
at the same time (for removing structural
hazards) - Extended control
- Strict sequentialisation of instruction (every
instruction goes through all stages) - Check for hazards
- Introduce stalls to remove hazards
15Problems
- Usually data moves from left to right data
moving from right to left affects later
instructions - Write back into the register file can lead to
data hazards - Selection of the next value of the PC leads to
control hazards
16Pipelined Datapath
17Pipelined Datapath
- We must add wide enough pipeline registers to
store all the data. - The write register number must be passed from the
instruction. - Adders
- single cycle present
- multicycle absent (ALU took care of
calculations) - pipeline present
18Graphically Representing Pipelines
-
- Can help with answering questions like
- how many cycles does it take to execute this
code? - what is the ALU doing during cycle 4?
- use this representation to help understand
datapaths
19Traditional Pipeline Diagram
- Not as informative as the previous one
20Pipelined Control
21Pipelined control
- Data travels through the pipeline stages
- All data belonging to an instruction must be kept
together - Information transfer only through pipeline
registers - Control information must travel with the
instruction
22Pipelined control
- Instruction fetch / PC Increment
- identical for all instructions
- read instruction memory
- write PC
- Instruction decode / Register file read
- identical for all instructions
- Execution / address calculation
- signals RegDst, ALUOp, ALUSrc
- Memory access
- signals Branch, MemRead, MemWrite
- Write Back
- signals MemtoReg, RegWrite
23Pipelined Control
- Pass control signals along just like the data
bits 11-15/ reg/
new mem/
16-20 instr
PC ALU
24Datapath with Control
25Dependencies
- Problem with starting next instruction before
present is finished - dependencies that go backward in time are data
hazards
26Software Solution
- Have compiler guarantee no hazards
- Insert no operations sub 2, 1, 3
nop and 12, 2, 5 nop or 13,
6, 2 add 14, 2, 2 sw 15, 100(2) - Problem this really slows us down!
27Dependency Detection
- Hazard conditions
- EX/MEM.RegisterRd ID/EX.RegisterRs next
- EX/MEM.RegisterRd ID/EX.RegisterRt
instruction - MEM/WB.RegisterRd ID/EX.RegisterRs after
two - MEM/WB.RegisterRd ID/EX.RegisterRt
instructions
28Forwarding
- register file forwarding to handle read/write to
same register - ALU forwarding
-
29ALU without Forwarding
e
g
i
s
t
e
r
R
d
30ALU with Forwarding
31Forwarding MUX Control Values
- MUX control Source Explanation
- ForwardA00 ID/EX 1st ALU operand
comes from register file - ForwardA10 EX/MEM 1st ALU operand
forwarded from the prior ALU result - ForwardA01 MEM/WB 1st ALU operand
forwarded from data memory or an earlier
ALU result - ForwardB00 ID/EX 2nd ALU operand
comes from register file - ForwardB10 EX/MEM 2nd ALU operand
forwarded from the prior ALU result - ForwardB01 MEM/WB 2nd ALU operand
forwarded from data memory or an earlier
ALU result
32Data Hazards and Stalls
- Forwarding does not solve all problems.
- Load word can still cause a hazard
- lw 2, 20(1)
- and 4, 2, 5
- An instruction tries to read a register following
a load instruction that writes to the same
register. - We need a hazard detection unit to stall the
pipeline.
33Data Hazards and Stalls
34Hazard Detection
- if (ID/EX.MemRead and
- ((ID/EX.RegisterRt IF/ID.RegisterRs) or
- (ID/EX.RegisterRt IF/ID.RegisterRt)))
- stall the pipeline
- check for load instructions
- check if the register to be loaded is part of
the current instruction - We can stall the pipeline by keeping an
instruction in the same stage.
35Stalling
36Hazard Detection Unit
- The hazard detection unit stalls if the load-use
hazard test is true.
37Branch Hazards
- When we decide to branch, other instructions are
in the pipeline! - We are predicting branch not taken
- need to add hardware for flushing instructions if
we are wrong
38Reducing the Delay of Branches
- Move branch decision earlier in the pipeline, so
that fewer instructions need be flushed. - Select branch address either at
- end of EX stage (two cycle penalty) or at
- end of ID stage (one cycle penalty)
- Move the branch address adder to ID stage
- Branch detection in ID stage
- EXCLUSIVE-OR of the bits of the registers
- OR of the results
- Clear instruction field in IF/ID pipeline ?
creates a NOP
39Flushing Instructions
40Dynamic Branch Prediction
- Analyse the branch history
- keep a list of recent branch instructions
- save low order bits of the address only - limits
the precision, but its only a prediction - Action
- if branch is taken, set mark
- if branch is not taken, reset mark
- Prediction accuracy is limited (twice wrong
in a loop) - Improvement 2 bit prediction scheme
- prediction must be wrong twice before it is
changed - better prediction for loops (once wrong in a
loop)
41Exceptions
- Some exception types
- Overflow
- Illegal opcode
- Invoking an operating system service
- I/O device request
- Actions
- Load PC with exception handling address
- Flush instructions from the pipeline
- Leave registers untouched
- Save offending instruction address in EPC
42Exceptions
43Final Data Path
44Superscalar and Dynamic Pipelining
- Superpipelining
- Increased number of pipeline stages
- Superscalar
- Increased number of parallel units
- Multiple instructions issued in one cycle
- Parallel instructions must be independent
- Usually all units arent replicated ? limitations
- Dynamic pipeline scheduling
- Rescheduling of instructions by hardware to avoid
pipeline stalls - Out of order execution is possible
- Speculative execution and dynamic branch
prediction
45Superscalar MIPS
- Two instructions in parallel (ALU oper OR
branch) AND (load OR store)
46Dynamically Scheduled Pipeline