Instruction Level Parallelism and Tomasulo - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Instruction Level Parallelism and Tomasulo

Description:

Reading for today: chapter A.8. Reading for Monday: chapter ... Tomasulo Example Cycle 1. ENGS 116 Lecture 8. 19. Tomasulo Example Cycle 2. ENGS 116 Lecture 8 ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 25
Provided by: thayerengr7
Category:

less

Transcript and Presenter's Notes

Title: Instruction Level Parallelism and Tomasulo


1
Instruction Level Parallelism andTomasulos
approach
  • Vincent H. Berk
  • October 7, 2005
  • Reading for today chapter A.8
  • Reading for Monday chapter 3.2 3.6
  • Homework 2 due Friday 14th, 2.8, A.2, A.13,
    3.6ab, 3.10, 4.5, 4.8, (4.13 optional)

2
Instruction Level Parallelism
  • Pipeline CPI Ideal pipeline CPI Structural
    stalls Data hazard stalls Control stalls
  • Reduce stalls, reduce CPI
  • Reduce CPI, increase IPC
  • Instruction-level parallelism (ILP) seeks to
    reduce stalls
  • Loop-level parallelism is easiest to see
  • for (i1 ilt100 ii1)
  • Ai Bi Ci
  • Di Ei Fi

3
Instruction Level Parallelism
  • ILP in SW (static) or HW (dynamic)
  • HW intensive ILP dominates desktop and server
    markets
  • SW compiler intensive approaches more likely seen
    in embedded systems

4
Dependences
  • Two instructions are parallel if they can execute
    simultaneously in a pipeline without causing any
    stalls (assuming no structural hazards) and can
    be reordered
  • Two instructions that are dependent are not
    parallel and cannot be reordered
  • Types of dependences
  • Data dependences
  • Name dependences
  • Control dependences

5
Dependences
  • Dependences are properties of programs
  • Hazards are properties of the pipeline
    organization
  • Dependence indicates the potential for a hazard
  • Compiler concerned about dependences in program,
    whether or not a HW hazard occurs depends on a
    given pipeline

6
Review of Hazards
  • Consider instructions i and j, where i occurs
    before j.
  • RAW (read after write) j tries to read a
    source before i writes it, so j gets the old
    value
  • WAW (write after write) j tries to write an
    operand before it is written by i (only possible
    in pipelines that write in more than one pipe
    stage or allow an instruction to proceed even
    when a previous instruction is stalled)
  • WAR (write after read) j tries to write a
    destination before it is read by i, so i
    incorrectly gets the new value (only possible
    when some instructions can write results early in
    the pipeline and other instructions can read
    sources late in the pipeline)

7
Data Dependences
  • (True) Data dependences (RAW if a hazard for HW)
  • Instruction i produces a result used by
    instruction j, or
  • Instruction j is data dependent on instruction k,
    and instruction k is data dependent on
    instruction i.
  • Easy to determine for registers (fixed names)
  • Hard for memory
  • Does 100(R4) 20(R6)?
  • From different loop iterations, does 20(R6)
    20(R6)?

8
Name Dependences
  • Another kind of dependence called name
    dependence two instructions use same name but
    dont exchange data
  • Antidependence (WAR if a hazard for HW)
  • Instruction j writes a register or memory
    location that instruction i reads from and
    instruction i is executed first
  • Output dependence (WAW if a hazard for HW)
  • Instruction i and instruction j write the same
    register or memory location ordering between
    instructions must be preserved

9
Name Dependences
  • Hard for memory accesses
  • Does 100(R4) 20 (R6)?
  • From different loop iterations, does 20(R6)
    20(R6)?
  • Example of renaming
  • DIV.D F0,F2,F4 DIV.D F0,F2,F4
  • ADD.D F6,F0,F8 ADD.D S,F0,F8
  • S.D F6, 0(R1) S.D S, 0(R1)
  • SUB.D F8,F10,F14 SUB.D T,F10,F14
  • MUL.D F6,F10,F8 MUL.D F6,F10,T

10
Control Dependence
  • Final kind of dependence called control
    dependence
  • Example
  • if pl S1
  • if p2 S2
  • S1 is control dependent on p1 and S2 is control
    dependent on p2 but not on p1.
  • Note that S2 could be data dependent on S1.

11
Control Dependences
  • Two (obvious) constraints on control dependences
  • An instruction that is control dependent on a
    branch cannot be moved before the branch so that
    its execution is no longer controlled by the
    branch
  • An instruction that is not control dependent on a
    branch cannot be moved to after the branch so
    that its execution is controlled by the branch
  • Control dependences often relaxed to get
    parallelism get same effect if we preserve order
    of exceptions and data flow

12
Hardware Schemes for ILP
  • Why in hardware at run time?
  • Works when dependence is not known at run time
  • Simplifies compiler
  • Allows code for one machine to run well on
    another
  • Key idea Allow instructions behind stall to
    proceed
  • DIVD F0, F2, F4
  • ADDD F10, F0, F8
  • SUBD F12, F8, F14
  • Enables out-of-order execution ? out-of-order
    completion
  • ID stage checks for both structural hazards and
    data dependences

13
Hardware Schemes for ILP
  • Out-of-order execution divides ID stage
  • 1. Issue decode instructions, check for
    structural hazards
  • 2. Read operands wait until no data hazards,
    then read operands

14
Tomasulos Algorithm
  • For IBM 360/91 about 3 years after CDC 6600
  • Goal High performance without special compilers
  • Differences between IBM 360 CDC 6600 ISA
  • IBM has only 2 register specifiers/instruction
    vs. 3 in CDC 6600
  • IBM has 4 FP registers vs. 8 in CDC 6600
  • Differences between Tomasulos Algorithm
    Scoreboard
  • Control buffers (called reservation stations)
    distributed with functional units vs. centralized
    in scoreboard
  • Registers in instructions replaced by pointers to
    reservation station buffer
  • HW renaming of registers to avoid WAR, WAW
    hazards
  • Common data bus (CDB) broadcasts results to
    functional units
  • Load and stores treated as functional units as
    well
  • Alpha 21264, HP 8000, MIPS 10000, Pentium III,
    PowerPC 604, ...

15
Three Stages of Tomasulo Algorithm
  • 1. Issue Get instruction from FP operation
    queue
  • If reservation station free, issues instruction
    sends operands (renames registers).
  • 2. Execution Operate on operands (EX)
  • When operands ready then execute if not ready,
    watch common data bus for result.
  • 3. Write result Finish execution (WB)
  • Write on common data bus to all awaiting units
    mark reservation station available.
  • Common data bus data source (come from bus)

16
Tomasulo Organization
From Instruction Unit
FP Registers
From Memory

Load Buffers
FP Op Queue
Store Buffers
Operand Bus
To Memory
Operation Bus
FP Mul Res. Station
FP Add Res. Station
Reservation Stations
FP Adders
FP Multipliers
Common data bus (CDB)
17
Reservation Station Components
  • Op Operation to perform in the unit (e.g., or
    )
  • Qj, Qk Reservation stations producing source
    registers
  • Vj, Vk Value of source operands
  • Rj, Rk Flags indicating when Vj, Vk are ready
  • Busy Indicates reservation station and FU is
    busy
  • Register result status Indicates which
    functional unit will write each register, if one
    exists. Blank when no pending instructions will
    write that register.

18
Tomasulo Example Cycle 1
19
Tomasulo Example Cycle 2
20
Tomasulo Example Cycle 3
Register names are renamed in reservation
stations Load1 completing who is waiting for
Load1?
21
Tomasulo Example Cycle 4
Load2 completing who is waiting for it?
22
Tomasulo Example Cycle 5
23
Tomasulo Example Cycle 6
24
Tomasulo Summary
  • Reservation stations renaming to larger set of
    registers buffering source operands
  • Prevents registers as bottleneck
  • Avoids WAR, WAW hazards of scoreboard
  • Allows loop unrolling in HW
  • Not limited to basic blocks
  • (integer units get ahead, beyond branches)
  • Lasting Contributions
  • Dynamic scheduling
  • Register renaming
  • Load/store disambiguation
  • 360/91 descendants are Pentium III PowerPC 604
    MIPS R10000 HP-PA 8000 Alpha 21264
Write a Comment
User Comments (0)
About PowerShow.com