COMP 206: Computer Architecture and Implementation - PowerPoint PPT Presentation

About This Presentation

Title:

COMP 206: Computer Architecture and Implementation

Description:

Title: Lecture 6 Author: Montek Singh Last modified by: Montek Singh Created Date: 3/13/2000 2:52:39 AM Document presentation format: Letter Paper (8.5x11 in) – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 30

Provided by: Montek9

Learn more at: https://www.cs.unc.edu

Category:

more less

Transcript and Presenter's Notes

Title: COMP 206: Computer Architecture and Implementation

1
COMP 206Computer Architecture and Implementation

Montek Singh
Mon, Sep 19, 2005
Topic Pipelining (Intermediate Concepts)

2
Outline

Pipelining basics (contd.)
Pipelining example
Pipelining notation and terminology
Hazards
Structural hazards
Data hazards
Hazard resolution
Reading Appendix A (HP3)

3
How About Control Signals?

Key Observation Control Signals at Stage N
Func (Instr. at Stage N) for N Exec, Mem, or
WrB.
Control Signals at Exec Stage Func(Loads Exec)
What about Ifetch and Reg/Dec?

Ifetch
Reg/Dec
Exec
Mem
ALUOpAdd
Wr
Branch
RegWr
ExtOp1
1
0
PC4
PC4
IF/ID
Imm16
PC4
Imm16
PC
Data Mem
Rs
Zero
busA
A
Ra
busB
Exec Unit
Ex/Mem Loads Address
RA
Do
Rb
IUnit
ID/Ex Register
Mem/Wr Register
Rt
WA
RFile
Di
Rw
Di
Rt
0
I
Rd
1
MemWr
ALUSrc1
MemtoReg
RegDst0
4
Pipeline Control

Main Control generates control signals during
Reg/Dec
Control signals for Exec (ExtOp, ALUSrc, ...) are
used 1 cycle later
Control signals for Mem (MemWr, Branch) are used
2 cycles later
Control signals for WrB (MemtoReg,MemWr) are used
3 cycles later

5
A More Extensive Pipelining Example

End of Cycle 4 Loads Mem, R-types Exec,
Stores Reg, Beqs Ifetch
End of Cycle 5 Loads WrB, R-types Mem, Stores
Exec, Beqs Reg
End of Cycle 6 R-types WrB, Stores Mem, Beqs
Exec
End of Cycle 7 Stores WrB, Beqs Mem

6
Pipelining Example End of Cycle 4

0 Loads Mem 4 R-types Exec 8 Stores
Reg 12 Beqs Ifetch

7
Pipelining Example End of Cycle 5

0 Lws Wr 4 Rs Mem 8 Stores Exec 12
Beqs Reg 16 Rs Ifetch

8
Pipelining Example End of Cycle 6

4 Rs Wr 8 Stores Mem 12 Beqs Exec 16
Rs Reg 20 Rs Ifetch

9
Pipelining Example End of Cycle 7

8 Stores Wr 12 Beqs Mem 16 Rs Exec
20 Rs Reg 24 Rs Ifet

10
CPU Designs Summary

Disadvantages of the Single Cycle Processor
Long cycle time
Cycle time wasted for the faster instructions
Multiple Clock Cycle Processor
Divide the instructions into smaller steps
Execute each step (instead of the entire
instruction) in 1 cycle
Pipelined Processor
Natural enhancement of the multiple clock cycle
processor
Each functional unit used only once per
instruction
If an instruction is going to use a functional
unit
it must use it at the same stage as all other
instructions
Pipeline Control
each stages control signal depends ONLY on the
instruction that is currently in that stage

11
Single Cycle vs. Multiple Cycle vs. Pipelined
Cycle 1
Cycle 2
Clk
Single Cycle Implementation
Load
Store
Waste
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
Multiple Cycle Implementation
Load
Store
R-type
Wr
Ifetch
Reg
Exec
Mem
Ifetch
Ifetch
Reg
Exec
Mem
Pipelined Implementation
Load
Ifetch
Reg
Exec
Mem
Wr
Ifetch
Reg
Exec
Mem
Wr
Store
R-type
12
Pipelining Notation, Terminology etc.

Time
Discrete time steps
Represented as 1, 2, 3,
Space
Pipe stages or segments (things that do
processing)
Represented as P, Q, R, S (or F, D, X, M, W for
the MIPS pipeline)
Operands
Instructions or data items
Things that flow through, and are processed by,
the pipeline
Represented as a, b, c,
In drawing pipelines, we conceal the obvious fact
that each operand undergoes some changes in each
pipe stage

13
Notations for Describing Pipelines

Space-time diagram,
or Gantt chart
Reservation table by stages
Rows represent pipeline
stages
Unbounded one way

Notation of HP3
Reservation table by
instructions
Rows represent operands
Unbounded both ways

14
Basic Terms

Filling a pipeline
Flushing or draining a pipeline
Stage or segment delay
Each stage may have a different stage delay
Beat time ( max stage delay), or clock cycle
time
Number of stages
End-to-end latency
number of stages beat time
Stages are separated by latches (registers)

15
Speedup Throughput of a Pipeline
16
Pipeline Hazards Structural Hazard

A relation between two instructions indicating
that
the two instructions may want to use the same
hardware resource (function unit, register file
port, shared bus, cache port, etc.)
at the same time
In principle, eliminated by duplicating resources
Low hardware utilization
Increased cost
MIPS pipeline as designed so far does not have
structural hazard
But we had to avoid it (see example later)
Usually occurs when a functional unit is not
fully pipelined (e.g., in floating point pipeline)

17
Example Unified I- and D-Memory
These diagrams are invalid structural hazard on
single memory port
Pipeline diagrams with hazards resolved
18
Resolving Structural Hazards

Early resolution (scheduling)
Done well before the collision could occur, and
usually at a place different from where the
collision could happen
Example instructions are delayed in the ID stage
Late resolution
Done at the place where the collision might
happen
Done just before the collision is about to happen
Example Using an arbiter or a priority encoder
One instruction wins
Others are denied access, stall, and wait for
their next chance
Why allow structural hazards in the first place?
Reduce cost
Reduce unit latency (by avoiding pipeline latch
delays)
Hazards may be infrequent (make common case
fast)

19
Example Cost of Structural Hazard
Suppose that 40 of instruction mix are loads or
stores, and that the ideal CPI of the pipelined
machine is 1. Assume that the machine with the
structural hazard has a clock rate that is 5
higher than the clock rate of the machine
without the hazard. Which pipeline is faster,
and by how much?
20
Data Hazard Setup
D(u) domain of instruction u The set of
all memory locations, registers
(including implicit ones), flags, condition
codes etc. that may be read by
instruction u
Instruction u
R(u) range of instruction u The set of
all memory locations, registers
(including implicit ones), flags, condition
codes etc. that may be written by
instruction u

u lt v is a relation that means that instruction
u precedes instruction v in the original program
order (i.e., on an unpipelined machine)
The relation lt is irreflexive, anti-symmetric,
and transitive

Instruction u Instruction v
21
Data Hazard Definition
Given two instructions u and v, such that u lt v,
there is a data hazard between them if any of the
following conditions holds
The existence of one of these conditions means
that a change in the order of reading/writing
operands by the instructions from the order seen
by sequentially executing instructions on
an unpipelined machine could violate the intended
semantics
22
Why Data Hazards Occur

Pipelining changes relative timing of
instructions
Reads and writes occur at fixed positions of the
pipeline
So, if two instructions are too close (function
of pipeline structure), order of reads and writes
could change and produce incorrect values
This instruction sequence exchanges values in R1
and R2
On unpipelined MIPS, back-to-back execution of
sequence produces correct results
On current pipelined MIPS, initiation of sequence
in consecutive cycles produces incorrect results
Reads are early, writes are late, so RAW hazards
would be violated

XOR R2, R2, R1 XOR R1, R1, R2 XOR R2, R2, R1
23
Data Dependence and Hazards

True (value, flow) dependence between
instructions u and v means u produces a result
value that v uses
This is a producer-consumer relationship
This is a dependence based on values, not on the
names of the containers of the values
Every true dependence is a RAW hazard
Not every RAW hazard is a true dependence
Any RAW hazard that cannot be removed by renaming
is a true dependence

Original program 1 A BC 2 A DE 3 G AH
Renamed Program 1 X BC 2 A DE 3 G AH
True dependence (2,3) RAW hazard (1,3), (2,3)
True dependence (2,3) RAW hazard (2,3)
24
More on Hazards

RAW hazards corresponding to value dependences
are most difficult to deal with, since they can
never be eliminated
The second instruction is waiting for information
produced by the first instruction
WAR and WAW hazards are name dependences
Two instructions happen to use the same register
(name), although they dont have to
Can often be eliminated by renaming, either in
software or hardware
Implies the use of additional resources, hence
additional cost
Renaming is not always possible implicit
operands such as accumulator, PC, or condition
codes cannot be renamed
These hazards dont cause problems for MIPS
pipeline
Relative timing does not change even with
pipelined execution, because reads occur early
and writes occur late in pipeline

25
The Precedence Relation

Consider a straight line program in original
program order
Define a relation D (the dependence relation)
between pairs of instructions (u, v) as follows
D(u, v) if and only if (u lt v), and there is a
WAR, WAW, or RAW hazard between instructions u
and v
D is irreflexive and anti-symmetric but not
transitive
Define the precedence relation P as the
transitive closure of the dependence relation D
P is irreflexive, anti-symmetric, and transitive
Represent P by graph of its transitive reduction
precedence graph
If P(u,v), then u must precede v in execution
the two instructions cannot be interchanged, and
in a pipeline they must maintain a sufficient
distance

ADD R4, R5, R6 ADD R3, R4, R5 ADD R2, R3, R7
26
Example of Precedence Relation
1ADD R1, R7, R8 2SW 2000(R9), R8 3LW R3,
0(R1) 4LW R4, 3000(R9) 5ADD R5, R3,
R4 6MUL R6, R5, R5
1
2
3
4
5
Assume that registers R7, R8, R9 are already
initialized such that (R7)(R8) (R9)2000 holds
6
27
Data Hazard Effect on Pipelining
1ADD R1, R2, R3 2SUB R4, R5, R1 3AND R6, R1,
R7 4OR R8, R1, R9 5XOR R10, R1, R11
28
Value Forwarding/Bypassing

There is slack in how soon a value is actually
available and how late it is actually required in
the pipeline
Result of R-type available at end of X stage
Operand of dependent R-type not needed until
beginning of X stage
Communication of values among instructions
happens through register file
Globally known names of containers of values
Accessed at fixed stages of pipeline (read in D,
written in W)
Forwarding/bypassing/short-circuiting corresponds
to establishing a direct path between the
producer of a value and its consumer, bypassing
the container
Allows us to exploit slack
Requires additional resources (forwarding paths
and controller)
Identify all forwarding paths needed on MIPS
(Figure in book is incomplete)

29
Forwarding Example 2
1ADD R1, R2, R3 2LW R4, 0(R1) 3SW 12(R1), R4
30
Forwarding Stalling Example 3
L1LW R2, 40(R8) L2LW R3, 60(R8) AADD R4, R2,
R3 SSW 60(R8), R4

Load has a latency
of one cycle that cannot
be hidden, as seen
between L2 and A

31
Forwarding Stalling Example 4
LLW R1, 0(R1) SSUB R4, R1, R5 AAND R6, R1,
R7 OOR R8, R1, R9
No forwarding needed from L to A can resolve
this by writing register file in first half of
cycle and reading it in second half of cycle.
32
Load Data Forwarding
MEM/WB
EX/MEM
ID/EX
Forward B
Registers
Data
Memory
Forward A
Rd
Rt
Rs
Forwarding
Unit

Write a Comment

User Comments (0)