Computer System Architecture Dependency and OOO Execution - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Computer System Architecture Dependency and OOO Execution

Description:

Changing the order of instructions to reduce the number of stall cycles. VLIW processors ... Pipeline does not stall on WAW and WAR hazards. Data forwarding ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 23
Provided by: SMI107
Category:

less

Transcript and Presenter's Notes

Title: Computer System Architecture Dependency and OOO Execution


1
Computer System ArchitectureDependency and OOO
Execution
  • Lynn Choi
  • School of Electrical Engineering

2
Instruction-Level Parallelism
  • ILP (Instruction Level Parallelism)
  • The program characteristics that allows the
    overlapped or parallel execution of instructions
  • Data dependences and control dependences limit
    the ILP
  • As long as instruction A is (data and control)
    independent with instruction B, A and B can be
    executed in parallel
  • Processors exploit ILP to improve performance
  • Two approaches
  • Hardware approach rely on hardware to discover
    and exploit the parallelism dynamically at
    runtime
  • Pipelining overlapping the execution of
    instructions in different pipeline stages
  • Out-of-order execution
  • Superscalar processors
  • Software approach rely on compiler to find and
    expose the parallelism at compile time
  • Code scheduling (local and global scheduling)
  • Loop unrolling, software pipelining, trace
    scheduling
  • Changing the order of instructions to reduce the
    number of stall cycles
  • VLIW processors

3
Three Forms of Data Dependence
  • True dependence (Read-After-Write)
  • Also called flow dependence
  • Require pipeline interlock
  • Data bypass (forwarding) can reduce the producer
    latency
  • Make values generated by FUs immediately
    available
  • Output dependence (Write-After-Write)
  • Anti dependence (Write-After-Read)
  • Both of them are called false dependencies
  • Require pipeline interlock or register renaming

4
Control Dependence
  • A control dependence determines the ordering of
    an instruction i with respect to a branch
    instruction
  • Every instruction, except for those in the basic
    block of the program, is control dependent on
    some set of branches
  • Example
  • If p1
  • S1
  • If p2
  • S2
  • S1 is control dependent on p1, and S2 is control
    dependent on p2 but not on p1.
  • Control dependence impose the following two
    constraints
  • An instruction that is control dependent on a
    branch cannot be moved before the branch
  • An instruction that is not control dependent on a
    branch cannot be moved after the branch

5
In-Order Pipeline
  • In-order issue
  • If an instruction is stalled in the pipeline, no
    later instructions can proceed. However, once
    issued to FUs, in general the instruction need
    not be stalled.
  • Instruction can complete out-of-order
  • Dependency resolution mechanism
  • Pipeline interlock
  • Need reg-id comparators between sources and
    destinations of instructions in REG stage and the
    destinations of instructions in the EXE and WRB
    stages
  • Comparators needed for both interlock and bypass
  • Scoreboard
  • A busy bit for each register
  • For long latency operations such as MEM
    operations
  • Instead of comparators, you need to check
    scoreboard for operand availability
  • Comparators are still needed for bypass!

6
Example
  • FET-DEC-REG-EXE-WRB
  • What kind of dependence violations are possible?
  • Single-issue 5-stage in-order pipeline with the
    following pipelined FUs
  • 2 INT unit (1 cycle INT operation)
  • 1 FP unit (4 cycle FP operation)
  • 2 MEM pipelines (2 cycle MEM operation)
  • How many comparators do you need for the previous
    example?
  • RAW
  • 2 srcs 2 stages (E, W) 2 INT 8
  • 2 srcs 2 stages (E4, W) 1 FP 4
  • 2 srcs 2 stages (E2, W) 2 MEM 8
  • WAW
  • 1 dest 2 stages (E, W) 2 INT 4
  • 1 dest 2 stages (E4, W) 1 FP 2
  • 1 dest 2 stages (E2, W) 2 MEM 4
  • WAW hazard can happen only for MEM and FP
    pipelines, therefore, you can remove 4 WAW
    comparators for INT pipeline.
  • How many more comparators for 2-issue in-order
    superscalar pipeline?

7
Out-Of-Order Machines
  • Anti-dependence can happen in OOO machines
  • DIV F0, F2, F4
  • ADD F10, F0, F8
  • SUB F8, F8, F14
  • Different approaches
  • Scoreboarding
  • Tomasulos Algorithm
  • Register Update Unit

8
Scoreboarding - CDC6600 -
  • Scoreboard
  • One bit per register indicates whether or not
    there is a pending update
  • pipeline stalls on WAW and WAR dependences
  • FET-DEC/ISS-REG-EXE-WRB
  • ISSUE stage check for WAW and structure hazards
  • Pipeline stalls on output dependence
  • Allows only 1 pending update
  • REG stage
  • Resolve RAW hazards
  • Instructions are sent to FUs out of order
  • WRB stage
  • Once the execution completes, check for WAR
    hazards
  • Instruction buffers
  • Instruction buffer between FET and DEC/ISS stages
  • Can be omitted
  • (Centralized) instruction window between ISS and
    REG stages

9
Tomasulos Algorithm - Reservation Station
  • Used in IBM 360/91 floating point unit (1967)
  • Three ideas
  • OOO execution using reservation stations (RS)
  • Distributed instruction windows
  • Register renaming to remove anti and output
    dependencies
  • Read available input operands from RF and store
    them into RS (WAR removal)
  • Assign new storage for output (WAW removal)
  • Pipeline does not stall on WAW and WAR hazards
  • Data forwarding using common data bus
  • Bypass the data directly to the waiting
    instructions in RS
  • Both register file and RS (source and dest)
    monitor the result bus and update data when a
    matching tag is found

10
Tomasulos Algorithm
  • FET-DEC/REN/ISS-REG-EXE-WRB-COM
  • REN/ISS stage check structural hazard
    (reservation station entry) and read available
    operands from register file (register renaming
    for WAR) and assign RS entry for destination (WAW
    hazard)
  • REG stage monitor common data bus and read
    operands into RS if there is a match determine
    highest priority operations among ready
    operations (wakeup)
  • EXE execute and forward result to RS and RF
  • Instruction buffers
  • Instruction queue between FET and DEC/ISS stages
  • Can be omitted
  • Reservation station between ISS and REG stages
  • Reorder buffer between WRB and COM stages
  • Not in original proposal (IBM 360/91)

11
Renaming
  • Removes anti and output dependencies
  • Allows more than one pending update
  • Several forms of renaming
  • Tomasulos algorithm
  • Reservation station for additional storage for
    name dependencies and common data bus for data
    bypass
  • Reorder buffer with associative lookup
  • Associative lookup maps the reg id to the reorder
    buffer entry as soon as an entry is allocated
  • Register map table with separate physical
    register file
  • Register map table (DEC 21264)
  • Registe alias table (Intel P6)

12
Renaming
  • Assign one physical register for every
    instruction with a destination register
  • With 80 instructions in flight (reorder buffer
    size)
  • You need roughly 80 physical registers (except
    branch and stores)
  • Physical registers are single-assignment
    registers
  • Register renaming involves data dependence
  • checking among the instructions that are
    simultaneously being renamed
  • Renaming bandwidth limited by
  • Data dependence checking
  • Number of read ports needed for register map
    table

13
Renaming
14
Rename Example (P6)
15
Rename Example (P6)
16
Rename Example (P6)
17
Rename Example (P6)
18
PowerPC 620 - OOO example -
19
DEC 21264 - OOO example -
20
DEC 21264 - OOO example -
21
Intel P6 - OOO example -
22
Exercises and Discussion
  • There can be many instruction buffers in an OOO
    processor. Name those buffers and explain their
    functions.
  • What happens on a branch misprediction in OOO
    processors?
Write a Comment
User Comments (0)
About PowerShow.com