Title: COMP 206: Computer Architecture and Implementation
1COMP 206Computer Architecture and Implementation
- Montek Singh
- Wed., Sep 15, 2004 Mon., Sep 20, 2004
- Topic Pipelining (Intermediate Concepts)
2Outline
- Pipelining basics (contd.)
- Pipelining example
- Pipelining notation and terminology
- Hazards
- Structural hazards
- Data hazards
- Hazard resolution
- Reading Appendix A (HP3)
3How About Control Signals?
- Key Observation Control Signals at Stage N
Func (Instr. at Stage N) for N Exec, Mem, or
WrB. - Control Signals at Exec Stage Func(Loads Exec)
- What about Ifetch and Reg/Dec?
Ifetch
Reg/Dec
Exec
Mem
ALUOpAdd
Wr
Branch
RegWr
ExtOp1
1
0
PC4
PC4
IF/ID
Imm16
PC4
Imm16
PC
Data Mem
Rs
Zero
busA
A
Ra
busB
Exec Unit
Ex/Mem Loads Address
RA
Do
Rb
IUnit
ID/Ex Register
Mem/Wr Register
Rt
WA
RFile
Di
Rw
Di
Rt
0
I
Rd
1
MemWr
ALUSrc1
MemtoReg
RegDst0
4Pipeline Control
- Main Control generates control signals during
Reg/Dec - Control signals for Exec (ExtOp, ALUSrc, ...) are
used 1 cycle later - Control signals for Mem (MemWr, Branch) are used
2 cycles later - Control signals for WrB (MemtoReg,MemWr) are used
3 cycles later
5A More Extensive Pipelining Example
- End of Cycle 4 Loads Mem, R-types Exec,
Stores Reg, Beqs Ifetch - End of Cycle 5 Loads WrB, R-types Mem, Stores
Exec, Beqs Reg - End of Cycle 6 R-types WrB, Stores Mem, Beqs
Exec - End of Cycle 7 Stores WrB, Beqs Mem
6Pipelining Example End of Cycle 4
- 0 Loads Mem 4 R-types Exec 8 Stores
Reg 12 Beqs Ifetch
7Pipelining Example End of Cycle 5
- 0 Lws Wr 4 Rs Mem 8 Stores Exec 12
Beqs Reg 16 Rs Ifetch
8Pipelining Example End of Cycle 6
- 4 Rs Wr 8 Stores Mem 12 Beqs Exec 16
Rs Reg 20 Rs Ifetch
9Pipelining Example End of Cycle 7
- 8 Stores Wr 12 Beqs Mem 16 Rs Exec
20 Rs Reg 24 Rs Ifet
10CPU Designs Summary
- Disadvantages of the Single Cycle Processor
- Long cycle time
- Cycle time wasted for the faster instructions
- Multiple Clock Cycle Processor
- Divide the instructions into smaller steps
- Execute each step (instead of the entire
instruction) in 1 cycle - Pipelined Processor
- Natural enhancement of the multiple clock cycle
processor - Each functional unit used only once per
instruction - If an instruction is going to use a functional
unit - it must use it at the same stage as all other
instructions - Pipeline Control
- each stages control signal depends ONLY on the
instruction that is currently in that stage
11Single Cycle vs. Multiple Cycle vs. Pipelined
Cycle 1
Cycle 2
Clk
Single Cycle Implementation
Load
Store
Waste
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
Multiple Cycle Implementation
Load
Store
R-type
Wr
Ifetch
Reg
Exec
Mem
Ifetch
Ifetch
Reg
Exec
Mem
Pipelined Implementation
Load
Ifetch
Reg
Exec
Mem
Wr
Ifetch
Reg
Exec
Mem
Wr
Store
R-type
12Pipelining Notation, Terminology etc.
- Time
- Discrete time steps
- Represented as 1, 2, 3,
- Space
- Pipe stages or segments (things that do
processing) - Represented as P, Q, R, S (or F, D, X, M, W for
the MIPS pipeline) - Operands
- Instructions or data items
- Things that flow through, and are processed by,
the pipeline - Represented as a, b, c,
- In drawing pipelines, we conceal the obvious fact
that each operand undergoes some changes in each
pipe stage
13Notations for Describing Pipelines
- Space-time diagram,
- or Gantt chart
- Reservation table by stages
- Rows represent pipeline
- stages
- Unbounded one way
- Notation of HP3
- Reservation table by
- instructions
- Rows represent operands
- Unbounded both ways
14Basic Terms
- Filling a pipeline
- Flushing or draining a pipeline
- Stage or segment delay
- Each stage may have a different stage delay
- Beat time ( max stage delay), or clock cycle
time - Number of stages
- End-to-end latency
- number of stages beat time
- Stages are separated by latches (registers)
15Speedup Throughput of a Pipeline
16Pipeline Hazards Structural Hazard
- A relation between two instructions indicating
that the two instructions may want to use the
same hardware resource (function unit, register
file port, shared bus, cache port, etc.) at the
same time - In principle, can always be eliminated by
duplicating resources - Low hardware utilization
- Increased cost
- MIPS pipeline as designed so far does not have
structural hazard - But we had to avoid it (see example later)
- Usually occurs when a functional unit is not
fully pipelined (e.g., in floating point pipeline)
17Example Unified I- and D-Memory
These diagrams are invalid structural hazard on
single memory port
Pipeline diagrams with hazards resolved
18Resolving Structural Hazards
- Early resolution (scheduling)
- Done well before the collision could occur, and
usually at a place different from where the
collision could happen - Example instructions are delayed in the ID stage
- Late resolution
- Done at the place where the collision might
happen - Done just before the collision is about to happen
- Example Using an arbiter or a priority encoder
- One instruction wins
- Others are denied access, stall, and wait for
their next chance - Why allow structural hazards in the first place?
- Reduce cost
- Reduce unit latency (by avoiding pipeline latch
delays) - Hazards may be infrequent (make common case
fast)
19Example Cost of Structural Hazard
Suppose that 40 of instruction mix are loads or
stores, and that the ideal CPI of the pipelined
machine is 1. Assume that the machine with the
structural hazard has a clock rate that is 5
higher than the clock rate of the machine
without the hazard. Which pipeline is faster,
and by how much?
20Data Hazard Setup
D(u) domain of instruction u The set of
all memory locations, registers
(including implicit ones), flags, condition
codes etc. that may be read by
instruction u
Instruction u
R(u) range of instruction u The set of
all memory locations, registers
(including implicit ones), flags, condition
codes etc. that may be written by
instruction u
- u lt v is a relation that means that instruction
- u precedes instruction v in the original program
- order (i.e., on an unpipelined machine)
- The relation lt is irreflexive, anti-symmetric,
- and transitive
Instruction u Instruction v
21Data Hazard Definition
Given two instructions u and v, such that u lt v,
there is a data hazard between them if any of the
following conditions holds
The existence of one of these conditions means
that a change in the order of reading/writing
operands by the instructions from the order seen
by sequentially executing instructions on
an unpipelined machine could violate the intended
semantics
22Why Data Hazards Occur
- Pipelining changes relative timing of
instructions - Reads and writes occur at fixed positions of the
pipeline - So, if two instructions are too close (function
of pipeline structure), order of reads and writes
could change and produce incorrect values - This instruction sequence exchanges values in R1
and R2 - On unpipelined MIPS, back-to-back execution of
sequence produces correct results - On current pipelined MIPS, initiation of sequence
in consecutive cycles produces incorrect results - Reads are early, writes are late, so RAW hazards
would be violated
XOR R2, R2, R1 XOR R1, R1, R2 XOR R2, R2, R1
23Data Dependence and Hazards
- True (value, flow) dependence between
instructions u and v means u produces a result
value that v uses - This is a producer-consumer relationship
- This is a dependence based on values, not on the
names of the containers of the values - Every true dependence is a RAW hazard
- Not every RAW hazard is a true dependence
- Any RAW hazard that cannot be removed by renaming
is a true dependence
Original program 1 A BC 2 A DE 3 G AH
Renamed Program 1 X BC 2 A DE 3 G AH
True dependence (2,3) RAW hazard (1,3), (2,3)
True dependence (2,3) RAW hazard (2,3)
24More on Hazards
- RAW hazards corresponding to value dependences
are most difficult to deal with, since they can
never be eliminated - The second instruction is waiting for information
produced by the first instruction - WAR and WAW hazards are name dependences
- Two instructions happen to use the same register
(name), although they dont have to - Can often be eliminated by renaming, either in
software or hardware - Implies the use of additional resources, hence
additional cost - Renaming is not always possible implicit
operands such as accumulator, PC, or condition
codes cannot be renamed - These hazards dont cause problems for MIPS
pipeline - Relative timing does not change even with
pipelined execution, because reads occur early
and writes occur late in pipeline
25The Precedence Relation
- Consider a straight line program in original
program order - Define a relation D (the dependence relation)
between pairs of instructions (u, v) as follows - D(u, v) if and only if (u lt v), and there is a
WAR, WAW, or RAW hazard between instructions u
and v - D is irreflexive and anti-symmetric but not
transitive - Define the precedence relation P as the
transitive closure of the dependence relation D - P is irreflexive, anti-symmetric, and transitive
- Represent P by graph of its transitive reduction
- precedence graph
- If P(u,v), then u must precede v in execution
- the two instructions cannot be interchanged, and
in a pipeline they must maintain a sufficient
distance
ADD R4, R5, R6 ADD R3, R4, R5 ADD R2, R3, R7
26Example of Precedence Relation
1ADD R1, R7, R8 2SW 2000(R9), R8 3LW R3,
0(R1) 4LW R4, 3000(R9) 5ADD R5, R3,
R4 6MUL R6, R5, R5
1
2
3
4
5
Assume that registers R7, R8, R9 are already
initialized such that (R7)(R8) (R9)2000 holds
6
27Data Hazard Effect on Pipelining
1ADD R1, R2, R3 2SUB R4, R5, R1 3AND R6, R1,
R7 4OR R8, R1, R9 5XOR R10, R1, R11
28Value Forwarding/Bypassing
- There is slack in how soon a value is actually
available and how late it is actually required in
the pipeline - Result of R-type available at end of X stage
- Operand of dependent R-type not needed until
beginning of X stage - Communication of values among instructions
happens through register file - Globally known names of containers of values
- Accessed at fixed stages of pipeline (read in D,
written in W) - Forwarding/bypassing/short-circuiting corresponds
to establishing a direct path between the
producer of a value and its consumer, bypassing
the container - Allows us to exploit slack
- Requires additional resources (forwarding paths
and controller) - Identify all forwarding paths needed on MIPS
(Figure in book is incomplete)
29Forwarding Example 2
1ADD R1, R2, R3 2LW R4, 0(R1) 3SW 12(R1), R4
30Forwarding Stalling Example 3
L1LW R2, 40(R8) L2LW R3, 60(R8) AADD R4, R2,
R3 SSW 60(R8), R4
- Load has a latency
- of one cycle that cannot
- be hidden, as seen
- between L2 and A
31Forwarding Stalling Example 4
LLW R1, 0(R1) SSUB R4, R1, R5 AAND R6, R1,
R7 OOR R8, R1, R9
No forwarding needed from L to A can resolve
this by writing register file in first half of
cycle and reading it in second half of cycle.
32Load Data Forwarding
MEM/WB
EX/MEM
ID/EX
Forward B
Registers
Data
Memory
Forward A
Rd
Rt
Rs
Forwarding
Unit