Title: CSCE 212 Chapter 6 Enhancing Performance with Pipelining
1CSCE 212Chapter 6Enhancing Performance with
Pipelining
- Instructor Jason D. Bakos
2Pipelining
3MIPS Pipeline
- Basic idea
- Execute multiple instructions in parallel
- Split instruction execution into 5 stages
- Instructions execute in assembly-line
fetch
decode
execute
memory
write back
op/func
ctrl/NOOP
control
MemoryDataIn
address
A
MemRead MemWrite Address MemoryOut MemoryIn
PC
RegFile
rs/rt
ALU
R
B
SE/imm SE/imm4
4
SHAMT
A, B registers control for execute/memory/wb rs/
rt/rd
instruction register
R register control for memory/wb rs/rt/rd
MDR register control for wb rs/rt/rd
4Pipelined MIPS
5Pipelined MIPS
6Pipelined Control
7Pipelined Control
8Pipelined Control
9MIPS ISA
- MIPS pipeline stages
- Fetch (F)
- read next instruction from memory, increment
address counter - assume 1 cycle to access memory
- Decode (D)
- read register operands, resolve instruction in
control signals, compute branch target - Execute (E)
- execute arithmetic/resolve branches
- Memory (M)
- perform load/store accesses to memory, take
branches - assume 1 cycle to access memory
- Write back (W)
- write arithmetic results to register file
10Hazards
- Hazards are data flow problems that arise as a
result of pipelining - Limits the amount of parallelism, sometimes
induces penalties that prevent one instruction
per clock cycle - Structural hazards
- Two operations require a single piece of hardware
- Structural hazards can be overcome by adding
additional hardware - Control hazards
- Conditional control instructions are not resolved
until late in the pipeline, requiring subsequent
instruction fetches to be predicted - Flushed if prediction does not hold (make sure no
state change) - Branch hazards can use dynamic prediction/speculat
ion, branch delay slot - Data hazards
- Instruction from one pipeline stage is
dependant of data computed in another pipeline
stage
11Hazards
- Data hazards
- Register values read in decode, written during
write-back - RAW hazard occurs when dependent inst. separated
by less than 2 slots - Examples
- ADD 2,X,X (E) ADD 2,X,X (M) ADD 2,3,4
(W) - ADD X,2,X (D)
- ADD X,2,X (D)
- ADD X,2,3 (D)
- In most cases, data generated in same stage as
data is required (EX) - Data forwarding
- ADD 2,X,X (M) ADD 2,X,X (W) ADD 2,3,4
(out-of-pipe) - ADD X,2,X (E)
- ADD X,2,X (E)
- ADD X,2,3 (E)
12Load Hazards
- Stalls required when data is not produced in same
stage as it is needed for a subsequent
instruction - Example
- LW 2, 0(X) (M)
- ADD X, 2 (E)
- When this occurs, insert a bubble into EX
state, stall F and D - LW 2, 0(X) (W)
- NOOP (M)
- ADD X, 2 (E)
- Forward from W to E
13Data Hazards Forwarding
14Data Hazards Stalling for Load Hazard
15Control Hazards
- Need to make a branch decision based on data that
has yet to be produced - add 2,3,4
- beqz 2,loop
- Which stage is branch resolved?
- Approaches
- stall
- insert bubbles after all branches
- always predict untaken
- if taken, instructions entering DEC and EX (and
MEM?) transfer as NOOPs - branch delay slot
- instruction following branch is always executed
- dynamic branch predictors
16Control Hazards
- Instructions are fetched every clock cycle
- Branch decisions happen in the EX stage
- Solutions
- Assume branch not taken (performs a flush of IF,
ID, EX by inserting a nop into the pipeline
registers on the clock edge) - Reduce the delay by moving the branch decision up
- Requires additional hardware (comparators, etc.)
- Might increase cycle time, since register read
and resolution are now in series and must be
performed in half a cycle to allow for parallel
register writes! - Requires forwarding and stall hardware for new
data hazards
17Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
- add 6,5,2
- lw 7,0(6)
- addi 7,7,10
-
- add 6,4,2
- sw 7,0(6)
- addi 2,2,4
- blt 2,3,loop
- add 6,5,2
8 instructions, 15 - 4 cycles, CPI 11/8
18Moving up Branch Resolution
19Moving up Branch Resolution
20Scheduling the Branch Delay Slot
21Dynamic Branch Prediction
- Assume taken/not-taken (static)
- Loops have branches that are usually taken
- When wrong, we flush pipeline stages
- Deeper pipelines have higher branch penalties
(misprediction penalty) - Solution
- Look up address of branch to check if branch was
previously taken - One-bit schemes
- Two-bit schemes (must be wrong twice to change
prediction)