Title: Csci 136 Computer Architecture II
1Csci 136 Computer Architecture II Branch
Hazards, Exceptions
- Xiuzhen Cheng
- cheng_at_gwu.edu
2Announcement
- Homework assignment 10, Due time Before class,
April 12 - Readings Sections 6.4 6.5
- Problems 6.17-6.19, 6.21-6.22, 6.33-6.36,
6.39-6.40 (six of them will be graded. Your TA
will give hints in the lab sections.) - Project 3 is due on April 10, 2005
- Quiz 4 April 12, 2005
- Final Thursday, May 12, 1240AM-240PM
- Note you must pass final to pass this course!
3The Big Picture Where are We Now?
- The Five Classic Components of a Computer
- Current Topics
- Control/Branch Hazard
- Exceptions
Processor
Input
Control
Memory
Datapath
Output
4Review on Data Hazards, Forwarding, Stall
- When does a data hazard happen?
- Data dependencies
- Using forwarding to overcome data hazards
- Data is available after ALU stage
- Forwarding conditions
- Stall the pipeline for load-use instructions
- Data is available after MEM stage (lw
instruction) - Hazard detection conditions
- Why in ID stage?
5Review on Data Hazards
6Review on Data Hazards, Forwarding, Stall
PC4
Sign-extend
7LW and SW
- lw 5, 0(15)beq 5, 0, Exitsw 5, 100(15)
- lw 5, 0(15)add 8, 8, 8sw 5, 100(15)
8SW is in MEM Stage
sw
lw
Sign-Ext
EX/MEM
- MEM/WB.RegWrite and EX/MEM.MemWrite and
- MEM/WB.RegisterRd EX/MEM.RegisterRd and
- MEM/WB.RegisterRD ! 0
Data memory
9SW is In EX Stage
sw
lw
Sign-Ext
- ID/EX.MemWrite and MEM/WB.RegWrite and
- MEM/WB.RegisterRd ID/EX.RegisterRt and
- MEM/WB.RegisterRd ! 0
10More Cases
- lw 15, 0(8) load-use,sw 5, 100(15)
stall pipeline - R-Type followed by sw?
- The result from R-Type will be saved into memory
- R-Type will overwrite base register for sw
11An Example
- 40 lw 2, 20(1)
- 44 and 4, 2, 5
- 48 or 8, 2, 4
- Clock Cycle 1
- Clock Cycle 2
- Clock Cycle 3
- Clock Cycle 4
12Clock 1
Lw 2, 20(1)
44
PC4
Sign-extend
Clock 1
13Clock 2
Lw 2, 20(1)
And 4, 2, 5
11
010
0001
44
48
PC4
1
20
Sign-extend
1
2
2
Clock 2
14Clock 3
And 4, 2, 5
Or 8, 2, 4
Lw 2, 20(1)
10
11
000
010
1100
44
52
PC4
1
2
5
20
Sign-extend
2
1
5
2
5
2
4
Clock 3
15Clock 4
And 4, 2, 5
Or 8, 2, 4
Bubble
Lw 2, 20(1)
10
00
000
000
11
1100
44
52
PC4
2
5
Sign-extend
2
5
5
4
Clock 4
16Clock 5
And 4, 2, 5
Or 8, 2, 4
Bubble
Lw 2, 20(1)
10
10
000
000
00
11
1100
44
PC4
2
2
4
5
Sign-extend
2
2
4
5
4
5
2
4
8
4
Clock 5
17Branch Hazards
Control hazard attempt to make a decision before
condition is evaluated
18Branch Hazards
19Observations
- Branch decision does not occur until MEM stage 3
CCs are wasted. Current design, non-optimized - Is it possible to reduce branch delay?
- YES
- In EXE stage?
- Two CCs branch delay
- In ID Stage?
- One CC branch delay
- How? for beq x, y, label, x xor y then or
all bits, much faster than ALU operation. Also we
have a separate ALU to compute branch address. - 3 strategies
- Delayed branch Static branch prediction Dynamic
branch Prediction
20Delayed Branch
- Will always execute the instruction following the
branch. - Only one will be executed
- Done by compiler or assembler
- 50 successful rate
- Losing popularity
- Why?
- More pipeline stages
- Superscalar
21Scheduling the Branch Delay Slot
Independent instruction, best choice
B is good when branch taking probability is high.
It must be OK to execute the sub instruction
when the branch goes to the unexpected direction
22Static Branch Prediction
- Assume the branch will not be taken If
prediction is wrong, clear the effect of
sequential instruction execution. - How to discard instructions in the pipeline?
- Branch decision is made at MEM stage
instructions in IF, ID, EX stages need to be
discarded. - Branch decision is made at ID stage only flush
IF/ID pipeline register!
23Static Branch Prediction
24Static Branch Prediction
IF.Flush
25Pipelined Branch An Example
36
40
44
28
44
72
4
8
10
IF.Flush
26Pipelined Branch An Example
72
27Dynamic Branch Prediction
- Static branch prediction is crude!
- Take history into consideration
- If a branch was taken last time, then fetching
the new instruction from the same place - Branch prediction buffer indexed by the lower
bits of the branch instruction - This memory contains a bit (or bits) which tells
whether the branch was recently taken or not - Is the prediction correct? Any bad effect?
- 1-bit prediction scheme
- 2-bit prediction scheme
28Observation
- Since we move branch prediction to the ID stage,
we need to copy forwarding control related
hardware to the ID stage too! - Beq following lw
- Hazard detection unit should work.
29In-Class Exercise
- Consider a loop branch that branches nine times
in a row, then is not taken once. What is the
prediction accuracy for this branch, assuming the
prediction bit for this branch remains in the
prediction buffer? - 1-bit prediction?
- With 2-bit prediction?
30Performance Comparision
- Compare the performance of single-cycle,
multi-cycle and pipelined datapath - 200ps for memory access, 100ps for ALU operation,
50ps for register file access - 25 loads, 10 stores, 11 branches, 2 jumps,
52 ALU ops - For piplelined datapath,
- 50 of load are immediately followed an
instruction that uses the result - Branch delay on misprediction is 1 clock cycle
and 25 branches are mispredicted - Jump delay is 1 clock cycle
31Exceptions
- Exceptions events other than branch or jump that
change the normal flow of instruction - Arithmetic overflow, undefined instruction, etc
- Internal of the processor
- Interrupts from external IO interrupts
- Use arithmetic overflow as an example
- When an overflow is detected, we need to transfer
control to the exception handling routine at
location 0x 8000 0180 immediately because we do
not want this invalid value to contaminate other
registers or memory locations - Similar idea as branch hazard
- Detected in the EX stage
- De-assert all control signals in EX and ID
stages, flush IF/ID
32Exceptions
80000180
33Example
- sub 11, 2, 4
- and 12, 2, 5
- or 13, 2, 6
- add 1, 2, 1 -- overflow occurs
- slt 15, 6, 7
- lw 16, 50(7)
- Exceptions handling routine
- 0x 8000 0180 sw 25, 1000(0)
- 0x 8000 0184 sw 26, 1004(0)
34Example
80000180
Clock 6
35Example
80000180
Clock 7
36Questions?