Lecture 13: Dynamic Scheduling - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Lecture 13: Dynamic Scheduling

Description:

output operand (register) not busy (WAW, WAR) due to earlier instruction ... check status of source registers and set ready bits. When each result is generated ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 19
Provided by: andrew638
Category:

less

Transcript and Presenter's Notes

Title: Lecture 13: Dynamic Scheduling


1
Lecture 13 Dynamic Scheduling
  • Last Time
  • Executing instructions in parallel
  • Static scheduling Reducing impact of data hazards
  • Today
  • Dynamic Scheduling
  • Out of order issue
  • Register renaming
  • Reservation stations
  • Reorder buffer

2
The Problem with Static Scheduling
  • In-order execution
  • an unexpected long latency blocks ready
    instructions from executing
  • binaries need to be rescheduled for each new
    implementation
  • small number of named registers becomes a
    bottleneck

LW R1, C //miss 50 cyclesLW R2, D MUL R3,
R1, R2SW R3, CLW R4, B //readyADD R5, R4,
R9SW R5, ALW R6, FLW R7, GADD R8, R6,
R7SW R8, E
3
Dynamic Scheduling
  • Determine execution order of instructions at run
    time
  • Schedule with knowledge of run-time variable
    latency
  • cache misses
  • Compatibility advantages
  • avoid need to recompile old binaries
  • avoid bottleneck of small named register sets
  • but still need to deal with spills
  • Significant hardware complexity

4
Dynamic SchedulingBasic Concept
Window of Waiting Instructions on operands
resources
Sequential Instruction Stream
Execution Resources
Instructions waiting to commit
LW R1,ALW R2,BADD R3,R1,R2 SW R3,CLW R4,8(A)
LW R5,8(B)ADD R6,R4,R5 SW R6,8(C)LW R7,16(A)LW
R8,16(B) ADD R9,R7,R8 SW R9,16(C) LW R10,24(A) LW
R11,24(B)
Register File
ADD R3,R1,R2 SW R3,CADD R6,R4,R5 SW R6,8(C)LW R
7,16(A)LW R8,16(B) ADD R9,R7,R8 SW R9,16(C) LW R1
0,24(A) LW R11,24(B)
LW R4,8(A)LW R5,8(B)
IP
Issue Logic
5
Example
  • 10 cycle data memory (cache) miss
  • 3 cycle MUL latency
  • 2 cycle add latency

6
Implementation Issues
  • Instruction window
  • fixed number of instruction slots (e.g., 32)
  • generic or
  • partitioned over execution units
  • fetch next sequential instruction whenever a slot
    is free
  • mark input and output registers busy
  • slots monitor register status and execution unit
    reservation tables
  • Issue when
  • all input operands available
  • output operand (register) not busy (WAW, WAR) due
    to earlier instruction
  • execution unit is available
  • Commit when
  • all previous instructions have committed
  • why?

7
Register Scoreboard
Register File
  • Tracks register writes
  • busy pending write
  • Detect hazards for scheduler

ADD R3,R1,R2
  • Wait until R1 is valid
  • Mark R3 valid when complete

SUB R4,R0,R3
  • Wait for R3

What about
valid bit ( 0 if write pending)
LD R3,(0)R0ADD R4,R3,R5LD R3,(4)R0
8
Implementing A Simple Instruction Window
result reg
ADD R3,R1,R2 SW R3,0(C)ADD R6,R4,R5 SW R6,8(C) L
W R7,16(A)
src1
src2
issue order
dst
reg
rdy
reg
rdy
3
ADD
R3
R1
0
R2
1
5
SW
R3
0
C
1
2
ADD
R6
R4
0
R5
0
4
SW
R6
0
C
1
LW
R7
A
1
1
1
Result sequence R4, R7, R5, R1, R6, R3
Often called reservation stations reg name,
value
9
Implementing a Simple Instruction Window (2)
  • Add an instruction to the window
  • only when dest register is not busy
  • mark destination register busy
  • check status of source registers and set ready
    bits
  • When each result is generated
  • compare dest register field to all waiting
    instruction source register fields
  • update ready bits
  • mark dest register not busy
  • Issue an instruction when
  • execution resource is available
  • all source operands are ready
  • Result
  • issues instructions out of order as soon as
    source registers are available
  • allows only one operation in the window per
    destination register

10
Register Renaming (1)
What about this sequence? 1 LW R1,
0(R4)2 ADD R2, R1, R33 LW R1, 4(R4)4 ADD R5,
R1, R3
Cant add 3 to the window since R1 is already
busy Need 2 R1s!
11
Register Renaming (2)
value
P1
A
0
Rename Table
P2
5
1
P3
C
1
P4
0
1
P5
E
0
P6
F
1
P7
3
1
Virtual Registers
P8
2
0
Add a tag field to each register - translates
from virtual to physical register name
Physical Registers
In window
Next instruction
LW R1, 0(R4)ADD R2, R1, R3
LW R1, 4(R4)
12
Register Renaming (3)
LW
P5
data
1
1
S1
ADD
P2
P5
0
data
1
S2
LW
P4
data
1
1
S3
When result generatedcompare tag of result to
not-ready source fieldsgrab data if match
ADD
P6
P4
0
data
1
S4
Add instruction to window even if dest register
is busy When adding instruction to window read
data of non-busy source registers and
retain read tags of busy source registers and
retain write tag of destination register with
slot number
LW R1,0(R4)ADD R2,R1,R3LW R1,4(R4)ADD R5,R1,R3
13
Example Execution
LW R1, 0(R2)ADD R1, R1, R1SW R1, 0(R2)ADD R1,
R3, R3SW R1, 4(R2)ADD R1, R2, R2SW R1, 8(R2)
14
Some Issues
  • How do we rename several (2-4) instructions per
    cycle?
  • How do we make sure that the correct value winds
    up in the register?
  • How do we make sure events (exceptions) are
    handled in the right order?
  • When can we move a load past a store?

15
Retirement and Re-order Buffers
  • Must commit instructions in order
  • check exceptions
  • update visible register state
  • update memory
  • Maintain slots as a circular buffer
  • commit instruction at head when it is finished
  • fetch new instructions to tail

Head
Tail
16
Dynamic Scheduling and Memory Operations
  • Store cannot update memory until instruction
    commits
  • but value can be used by subsequent loads before
    commit
  • A load cannot execute before a preceding store
    unless they are known to be to different
    addresses
  • disambiguation
  • hard at compile time, easy at run time

Memory Conflict Resolution (Memory Order Buffer)
SW R4,0(R3) LW R5,0(R6) ADD R7,R5,R8
17
Some History, Dynamic SchedulingThen and Now
  • IBM 360/91
  • Reservation stations (register renaming)
  • Tomasulos algorithm
  • optimized for storage-to-register instructions
  • CDC 6600
  • Scoreboard
  • Intel P6 (Pentium Pro/Pentium II)
  • Converts CISC instructions to one or more RISC
    instructions
  • Reservation stations (register renaming)
  • In-order retirement

18
Next Time
  • Prediction/Speculation
  • Branch prediction
  • Static
  • Direction
  • Target
  • Case Study
  • PowerPC 620
Write a Comment
User Comments (0)
About PowerShow.com