COMP 206: Computer Architecture and Implementation - PowerPoint PPT Presentation

About This Presentation
Title:

COMP 206: Computer Architecture and Implementation

Description:

1. COMP 206: Computer Architecture and Implementation ... Rb. Rw. RegWr. ExtOp=1. Exec. Unit. busA. busB. Imm16. ALUOp=Add. ALUSrc=1. Mux. 1. 0. MemtoReg ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 30
Provided by: Montek5
Learn more at: http://www.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: COMP 206: Computer Architecture and Implementation


1
COMP 206Computer Architecture and Implementation
  • Montek Singh
  • Wed., Sep 15, 2004 Mon., Sep 20, 2004
  • Topic Pipelining (Intermediate Concepts)

2
Outline
  • Pipelining basics (contd.)
  • Pipelining example
  • Pipelining notation and terminology
  • Hazards
  • Structural hazards
  • Data hazards
  • Hazard resolution
  • Reading Appendix A (HP3)

3
How About Control Signals?
  • Key Observation Control Signals at Stage N
    Func (Instr. at Stage N) for N Exec, Mem, or
    WrB.
  • Control Signals at Exec Stage Func(Loads Exec)
  • What about Ifetch and Reg/Dec?

Ifetch
Reg/Dec
Exec
Mem
ALUOpAdd
Wr
Branch
RegWr
ExtOp1
1
0
PC4
PC4
IF/ID
Imm16
PC4
Imm16
PC
Data Mem
Rs
Zero
busA
A
Ra
busB
Exec Unit
Ex/Mem Loads Address
RA
Do
Rb
IUnit
ID/Ex Register
Mem/Wr Register
Rt
WA
RFile
Di
Rw
Di
Rt
0
I
Rd
1
MemWr
ALUSrc1
MemtoReg
RegDst0
4
Pipeline Control
  • Main Control generates control signals during
    Reg/Dec
  • Control signals for Exec (ExtOp, ALUSrc, ...) are
    used 1 cycle later
  • Control signals for Mem (MemWr, Branch) are used
    2 cycles later
  • Control signals for WrB (MemtoReg,MemWr) are used
    3 cycles later

5
A More Extensive Pipelining Example
  • End of Cycle 4 Loads Mem, R-types Exec,
    Stores Reg, Beqs Ifetch
  • End of Cycle 5 Loads WrB, R-types Mem, Stores
    Exec, Beqs Reg
  • End of Cycle 6 R-types WrB, Stores Mem, Beqs
    Exec
  • End of Cycle 7 Stores WrB, Beqs Mem

6
Pipelining Example End of Cycle 4
  • 0 Loads Mem 4 R-types Exec 8 Stores
    Reg 12 Beqs Ifetch

7
Pipelining Example End of Cycle 5
  • 0 Lws Wr 4 Rs Mem 8 Stores Exec 12
    Beqs Reg 16 Rs Ifetch

8
Pipelining Example End of Cycle 6
  • 4 Rs Wr 8 Stores Mem 12 Beqs Exec 16
    Rs Reg 20 Rs Ifetch

9
Pipelining Example End of Cycle 7
  • 8 Stores Wr 12 Beqs Mem 16 Rs Exec
    20 Rs Reg 24 Rs Ifet

10
CPU Designs Summary
  • Disadvantages of the Single Cycle Processor
  • Long cycle time
  • Cycle time wasted for the faster instructions
  • Multiple Clock Cycle Processor
  • Divide the instructions into smaller steps
  • Execute each step (instead of the entire
    instruction) in 1 cycle
  • Pipelined Processor
  • Natural enhancement of the multiple clock cycle
    processor
  • Each functional unit used only once per
    instruction
  • If an instruction is going to use a functional
    unit
  • it must use it at the same stage as all other
    instructions
  • Pipeline Control
  • each stages control signal depends ONLY on the
    instruction that is currently in that stage

11
Single Cycle vs. Multiple Cycle vs. Pipelined
Cycle 1
Cycle 2
Clk
Single Cycle Implementation
Load
Store
Waste
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
Multiple Cycle Implementation
Load
Store
R-type
Wr
Ifetch
Reg
Exec
Mem
Ifetch
Ifetch
Reg
Exec
Mem
Pipelined Implementation
Load
Ifetch
Reg
Exec
Mem
Wr
Ifetch
Reg
Exec
Mem
Wr
Store
R-type
12
Pipelining Notation, Terminology etc.
  • Time
  • Discrete time steps
  • Represented as 1, 2, 3,
  • Space
  • Pipe stages or segments (things that do
    processing)
  • Represented as P, Q, R, S (or F, D, X, M, W for
    the MIPS pipeline)
  • Operands
  • Instructions or data items
  • Things that flow through, and are processed by,
    the pipeline
  • Represented as a, b, c,
  • In drawing pipelines, we conceal the obvious fact
    that each operand undergoes some changes in each
    pipe stage

13
Notations for Describing Pipelines
  • Space-time diagram,
  • or Gantt chart
  • Reservation table by stages
  • Rows represent pipeline
  • stages
  • Unbounded one way
  • Notation of HP3
  • Reservation table by
  • instructions
  • Rows represent operands
  • Unbounded both ways

14
Basic Terms
  • Filling a pipeline
  • Flushing or draining a pipeline
  • Stage or segment delay
  • Each stage may have a different stage delay
  • Beat time ( max stage delay), or clock cycle
    time
  • Number of stages
  • End-to-end latency
  • number of stages beat time
  • Stages are separated by latches (registers)

15
Speedup Throughput of a Pipeline
16
Pipeline Hazards Structural Hazard
  • A relation between two instructions indicating
    that the two instructions may want to use the
    same hardware resource (function unit, register
    file port, shared bus, cache port, etc.) at the
    same time
  • In principle, can always be eliminated by
    duplicating resources
  • Low hardware utilization
  • Increased cost
  • MIPS pipeline as designed so far does not have
    structural hazard
  • But we had to avoid it (see example later)
  • Usually occurs when a functional unit is not
    fully pipelined (e.g., in floating point pipeline)

17
Example Unified I- and D-Memory
These diagrams are invalid structural hazard on
single memory port
Pipeline diagrams with hazards resolved
18
Resolving Structural Hazards
  • Early resolution (scheduling)
  • Done well before the collision could occur, and
    usually at a place different from where the
    collision could happen
  • Example instructions are delayed in the ID stage
  • Late resolution
  • Done at the place where the collision might
    happen
  • Done just before the collision is about to happen
  • Example Using an arbiter or a priority encoder
  • One instruction wins
  • Others are denied access, stall, and wait for
    their next chance
  • Why allow structural hazards in the first place?
  • Reduce cost
  • Reduce unit latency (by avoiding pipeline latch
    delays)
  • Hazards may be infrequent (make common case
    fast)

19
Example Cost of Structural Hazard
Suppose that 40 of instruction mix are loads or
stores, and that the ideal CPI of the pipelined
machine is 1. Assume that the machine with the
structural hazard has a clock rate that is 5
higher than the clock rate of the machine
without the hazard. Which pipeline is faster,
and by how much?
20
Data Hazard Setup
D(u) domain of instruction u The set of
all memory locations, registers
(including implicit ones), flags, condition
codes etc. that may be read by
instruction u
Instruction u
R(u) range of instruction u The set of
all memory locations, registers
(including implicit ones), flags, condition
codes etc. that may be written by
instruction u
  • u lt v is a relation that means that instruction
  • u precedes instruction v in the original program
  • order (i.e., on an unpipelined machine)
  • The relation lt is irreflexive, anti-symmetric,
  • and transitive

Instruction u Instruction v
21
Data Hazard Definition
Given two instructions u and v, such that u lt v,
there is a data hazard between them if any of the
following conditions holds
The existence of one of these conditions means
that a change in the order of reading/writing
operands by the instructions from the order seen
by sequentially executing instructions on
an unpipelined machine could violate the intended
semantics
22
Why Data Hazards Occur
  • Pipelining changes relative timing of
    instructions
  • Reads and writes occur at fixed positions of the
    pipeline
  • So, if two instructions are too close (function
    of pipeline structure), order of reads and writes
    could change and produce incorrect values
  • This instruction sequence exchanges values in R1
    and R2
  • On unpipelined MIPS, back-to-back execution of
    sequence produces correct results
  • On current pipelined MIPS, initiation of sequence
    in consecutive cycles produces incorrect results
  • Reads are early, writes are late, so RAW hazards
    would be violated

XOR R2, R2, R1 XOR R1, R1, R2 XOR R2, R2, R1
23
Data Dependence and Hazards
  • True (value, flow) dependence between
    instructions u and v means u produces a result
    value that v uses
  • This is a producer-consumer relationship
  • This is a dependence based on values, not on the
    names of the containers of the values
  • Every true dependence is a RAW hazard
  • Not every RAW hazard is a true dependence
  • Any RAW hazard that cannot be removed by renaming
    is a true dependence

Original program 1 A BC 2 A DE 3 G AH
Renamed Program 1 X BC 2 A DE 3 G AH
True dependence (2,3) RAW hazard (1,3), (2,3)
True dependence (2,3) RAW hazard (2,3)
24
More on Hazards
  • RAW hazards corresponding to value dependences
    are most difficult to deal with, since they can
    never be eliminated
  • The second instruction is waiting for information
    produced by the first instruction
  • WAR and WAW hazards are name dependences
  • Two instructions happen to use the same register
    (name), although they dont have to
  • Can often be eliminated by renaming, either in
    software or hardware
  • Implies the use of additional resources, hence
    additional cost
  • Renaming is not always possible implicit
    operands such as accumulator, PC, or condition
    codes cannot be renamed
  • These hazards dont cause problems for MIPS
    pipeline
  • Relative timing does not change even with
    pipelined execution, because reads occur early
    and writes occur late in pipeline

25
The Precedence Relation
  • Consider a straight line program in original
    program order
  • Define a relation D (the dependence relation)
    between pairs of instructions (u, v) as follows
  • D(u, v) if and only if (u lt v), and there is a
    WAR, WAW, or RAW hazard between instructions u
    and v
  • D is irreflexive and anti-symmetric but not
    transitive
  • Define the precedence relation P as the
    transitive closure of the dependence relation D
  • P is irreflexive, anti-symmetric, and transitive
  • Represent P by graph of its transitive reduction
  • precedence graph
  • If P(u,v), then u must precede v in execution
  • the two instructions cannot be interchanged, and
    in a pipeline they must maintain a sufficient
    distance

ADD R4, R5, R6 ADD R3, R4, R5 ADD R2, R3, R7
26
Example of Precedence Relation
1ADD R1, R7, R8 2SW 2000(R9), R8 3LW R3,
0(R1) 4LW R4, 3000(R9) 5ADD R5, R3,
R4 6MUL R6, R5, R5
1
2
3
4
5
Assume that registers R7, R8, R9 are already
initialized such that (R7)(R8) (R9)2000 holds
6
27
Data Hazard Effect on Pipelining
1ADD R1, R2, R3 2SUB R4, R5, R1 3AND R6, R1,
R7 4OR R8, R1, R9 5XOR R10, R1, R11
28
Value Forwarding/Bypassing
  • There is slack in how soon a value is actually
    available and how late it is actually required in
    the pipeline
  • Result of R-type available at end of X stage
  • Operand of dependent R-type not needed until
    beginning of X stage
  • Communication of values among instructions
    happens through register file
  • Globally known names of containers of values
  • Accessed at fixed stages of pipeline (read in D,
    written in W)
  • Forwarding/bypassing/short-circuiting corresponds
    to establishing a direct path between the
    producer of a value and its consumer, bypassing
    the container
  • Allows us to exploit slack
  • Requires additional resources (forwarding paths
    and controller)
  • Identify all forwarding paths needed on MIPS
    (Figure in book is incomplete)

29
Forwarding Example 2
1ADD R1, R2, R3 2LW R4, 0(R1) 3SW 12(R1), R4
30
Forwarding Stalling Example 3
L1LW R2, 40(R8) L2LW R3, 60(R8) AADD R4, R2,
R3 SSW 60(R8), R4
  • Load has a latency
  • of one cycle that cannot
  • be hidden, as seen
  • between L2 and A

31
Forwarding Stalling Example 4
LLW R1, 0(R1) SSUB R4, R1, R5 AAND R6, R1,
R7 OOR R8, R1, R9
No forwarding needed from L to A can resolve
this by writing register file in first half of
cycle and reading it in second half of cycle.
32
Load Data Forwarding
MEM/WB
EX/MEM
ID/EX
Forward B
Registers
Data
Memory
Forward A
Rd
Rt
Rs
Forwarding
Unit
Write a Comment
User Comments (0)
About PowerShow.com