Exploring WakeupFree Instruction Scheduling - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Exploring WakeupFree Instruction Scheduling

Description:

Ttagdirve = c0 (c1 c2xIW)xN (c3 c4xIW c5xIW2)xN2 ... Complete with newly decoded instructions due to replay. 9. A General Model: WF-Replay ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 29
Provided by: Mob591
Category:

less

Transcript and Presenter's Notes

Title: Exploring WakeupFree Instruction Scheduling


1
Exploring Wakeup-Free Instruction Scheduling
  • Jie S. Hu, N. Vijaykrishnan, and Mary Jane Irwin
  • Microsystems Design Lab
  • The Pennsylvania State University

2
Outline
  • Motivation
  • Case study Cyclone
  • Towards high-performance wakeup-free scheduler
  • A general model
  • Employing pre-check scheme
  • A segmented issue queue
  • Conclusions and future work

3
Superscalar Issue Queue
rdyL
rdyL
OR
Wakeup Logic Delay Ttagdrive Ttagmatch
TmatchOR
opd tagL
opd tagL


tagIW
tag1
opd tagR
opd tagR

Ttagdirve c0 (c1 c2xIW)xN (c3 c4xIW
c5xIW2)xN2 Ttagmatch ,TmatchOR c0 c1xIW
c2xIW2 S. Palacharla et al., ISCA24

rdyR
rdyR
OR
instN-1
inst0
4
Superscalar Issue Queue
Issue Queue
Selection Logic Tselection c0 c1xlog4N S.
Palacharla et al., ISCA24
from/to other subtrees
root cell
5
Challenges in Dynamic Instruction Scheduling
  • Broadcast-based dynamic scheduler
  • Higher complexity
  • Power hungry
  • A major limiter to clock frequency increasing
    issue queue size, issue width, wire delay, and
    shorten logic levels per pipeline stage
  • Complexity Effective Issue
  • Speculative wakeup Stark et.al.
  • Dependency chain based ordering Canal/Gonzalez
    ICS 00//01 Michaud/Seznec HPCA01
  • Segmented Issue queue Raasch et.al. ISCA 2002
  • Wakeup-free dynamic scheduler Ernst ISCA 2003
    et.al.
  • Lower complexity
  • Lower power consumption
  • Better scalability
  • Have to trade performance loss

6
Our Goals
  • Explore the predictability of instruction issue
    latency
  • Identify the performance impediments in
    wakeup-free architectures
  • Design high-performance wakeup-free schedulers

7
Cyclone Conflict in the Main Queue
FP benchmarks
Int benchmarks
Enforce ordered placement to avoid conflict
between instructions with different latencies
Order Enforced
8
Possible Structural Problems
  • Instruction promotion/forwarding incurs conflict
    along the path
  • Very limited instruction pool for selection
  • Only entries in column 0 in the main queue can be
    issued
  • Ready instructions (not in column 0) are delayed
    due to conflict
  • Limited number of issue ports has less tolerance
    to mispredicted ready instructions
  • Waste issue port
  • Prevent ready instruction from issue
  • Complete with newly decoded instructions due to
    replay

9
A General Model WF-Replay
How to relax the structural constraints?
Instruction is removed if no replay is needed
Timing Table
Wakeup-Free Issue Queue
register file ready bits
replay?
Rename Pre-schedule
From decoder
to FUs
lat
lat
lat
lat
lat
lat
from FUs
Given much wider issue width
Selection Logic
Collapsing issue queue without promotion.
Conventional random selection logic
10
Instruction Pre-scheduling
Timing Table
Register Mapping Table
lat0
I0
max

I1
lat1
max

I2
lat2
max

I3
lat3
max

dep check
MUX control
reschedule?
Rename/ PSCHED0
PSCHED1
Adapted from Cyclone, D. Ernst et. al., ISCA03
11
Latency Triggered Selection
Wakeup-Free Issue Queue
lat
lat
lat
lat
lat
lat
lat
lat
root cell
12
WF-Replay IPC (F4-I8 vs F4-I4)
WF-Replay loses 9.7 performance (IPC) to Base as
the issue width reduces to 4 instruction per cycle
Issue Width 8
Issue Width 4
13
Competition at Issue Ports?
Issue Width 8
Issue Width 4
14
Precheck to Avoid Competition
  • Competition at issue port may delay ready
    (predictive) instructions
  • Delayed instructions may again compete with
    instructions dependent on them
  • Causing more instructions falsely ready or to be
    delayed
  • Wider issue port can avoid unnecessary
    competition at cost of higher complexity
  • Solution preventing falsely ready instructions
    from selection by pre-checking register ready bits

15
WF-Precheck Scheduler
Selection request is filtered by ry bit
Selection Logic
Wakeup-Free Issue Queue
Issuing
Rename Pre-schedule
From decoder
to FUs
lat
ry
lat
ry
lat
ry
lat
ry
lat
ry
lat
ry
Timing Table
from Mem.
Register Ready Bit Register
Precheck register ready bits when predicted
latency reaches 0
Only issue truly ready instructions
Trade replay for pre-check
16
Complexity of Pre-checking
On the average, 40.2 instructions have both
source operands ready and 45.4 instructions have
one source operand ready at pre-schedule stage.
Pre-check request is less than 2 per cycle.
17
Issue Port Competition (F4-I4)
18
WF-Precheck IPC (F4-I4)
19
Impact of Load Related Predictions
20
How about Selection Logic?
Issue Queue
Selection Logic Tselection c0 c1xlog4N S.
Palacharla et al., ISCA24
from/to other subtrees
root cell
21
WF-Segment Issue Queue
gt4
Dispatch Routing
Rename / Pre-scheduling
Switchback path
from decoder
3-4
1-2
0
ry
ry
ry
ry
ry
ry
ry
ry
Time Table
Register Ready Bits
Selection Logic
4 issue ports
from FUs Mem.
to FUs
22
WF-Segment Issue Queue
On the average, WF-Segment trades 3 IPC loss to
WF-Precheck and 5 loss to the Base for
optimizing selection logic.
23
Conclusions
  • Explore and identify the performance impediments
    in wakeup-free scheduling
  • High-performance wakeup-free dynamic schedulers
  • WF-Replay eliminates structural constraints
  • WF-Precheck avoids unnecessary competition at
    issue ports
  • WF-Segment optimizes selection logic for high
    clock speed

24
Future Work
  • Routing complexity analysis in WF-Segment
    scheduler
  • Power analysis for wakeup-free schedulers
  • Sophisticated pre-scheduler

25
Thank You!
26
Wire Delay Challenges
  • Increasing pipeline depth for high performance
  • Clock period (FO4) decreases dramatically
  • Cross-chip wire delay will be up to 10 cycles as
    technology shrinks

M. S. Hrishikesh et al, ISCA29
Stephen W. Keckler et al, ISSCC03
27
Precheck as A Single Stage
28
Load/Store Dependence Predictor
Write a Comment
User Comments (0)
About PowerShow.com