Lecture 8: Instruction Fetch, ILP Limits - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 8: Instruction Fetch, ILP Limits

Description:

Title: PowerPoint Presentation Author: Rajeev Balasubramonian Last modified by: Rajeev Created Date: 9/20/2002 6:19:18 PM Document presentation format – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 24
Provided by: RajeevBalas151
Category:

less

Transcript and Presenter's Notes

Title: Lecture 8: Instruction Fetch, ILP Limits


1
Lecture 8 Instruction Fetch, ILP Limits
  • Today advanced branch prediction, limits of ILP
  • (Sections 3.4-3.5, 3.8-3.14)

2
1-Bit Prediction
  • For each branch, keep track of what happened
    last time
  • and use that outcome as the prediction
  • What are prediction accuracies for branches 1
    and 2 below
  • while (1)
  • for (i0ilt10i)
    branch-1
  • for (j0jlt20j)
    branch-2

3
2-Bit Prediction
  • For each branch, maintain a 2-bit saturating
    counter
  • if the branch is taken counter
    min(3,counter1)
  • if the branch is not taken counter
    max(0,counter-1)
  • If (counter gt 2), predict taken, else predict
    not taken
  • Advantage a few atypical branches will not
    influence the
  • prediction (a better measure of the common
    case)
  • Especially useful when multiple branches share
    the same
  • counter (some bits of the branch PC are used to
    index
  • into the branch predictor)
  • Can be easily extended to N-bits (in most
    processors, N2)

4
Correlating Predictors
  • Basic branch prediction maintain a 2-bit
    saturating
  • counter for each entry (or use 10 branch PC
    bits to index
  • into one of 1024 counters) captures the
    recent
  • common case for each branch
  • Can we take advantage of additional information?
  • If a branch recently went 01111, expect 0 if
    it
  • recently went 11101, expect 1 can we have a
  • separate counter for each case?
  • If the previous branches went 01, expect 0 if
    the
  • previous branches went 11, expect 1 can we
    have
  • a separate counter for each case?
  • Hence, build correlating predictors

5
Local/Global Predictors
  • Instead of maintaining a counter for each branch
    to
  • capture the common case,
  • Maintain a counter for each branch and
    surrounding pattern
  • If the surrounding pattern belongs to the branch
    being
  • predicted, the predictor is referred to as a
    local predictor
  • If the surrounding pattern includes neighboring
    branches,
  • the predictor is referred to as a global
    predictor

6
Global Predictor
A single register that keeps track of recent
history for all branches
Table of 16K entries of 2-bit saturating counters
00110101
8 bits
6 bits
Branch PC
Also referred to as a two-level predictor
7
Local Predictor
Also a two-level predictor that only uses local
histories at the first level
Branch PC
Table of 16K entries of 2-bit saturating counters
Use 6 bits of branch PC to index into local
history table
10110111011001
14-bit history indexes into next level
Table of 64 entries of 14-bit histories for a
single branch
8
Tournament Predictors
  • A local predictor might work well for some
    branches or
  • programs, while a global predictor might work
    well for others
  • Provide one of each and maintain another
    predictor to
  • identify which predictor is best for each branch

Alpha 21264 1K entries in level-1 1K entries in
level-2 4K entries 12-bit global history 4K
entries Total capacity ?
Local Predictor
M U X
Global Predictor
Branch PC
Tournament Predictor
Table of 2-bit saturating counters
9
Predictor Comparison
  • Note that predictors of equal capacity must be
    compared
  • Sizes of each level have to be selected to
    optimize prediction accuracy
  • Influencing factors degree of interference
    between branches, program
  • likely to benefit from local/global history

10
Branch Target Prediction
  • In addition to predicting the branch direction,
    we must
  • also predict the branch target address
  • Branch PC indexes into a predictor table
    indirect branches
  • might be problematic
  • Most common indirect branch return from a
    procedure
  • can be easily handled with a stack of return
    addresses

11
Multiple Instruction Issue
  • The out-of-order processor implementation can be
    easily
  • extended to have multiple instructions in each
    pipeline stage
  • Increased complexity (lower clock speed!)
  • more reads and writes per cycle to register map
    table
  • more read and write ports in issue queue
  • more tags being broadcast to issue queue every
    cycle
  • higher complexity for bypassing/forwarding among
    FUs
  • more register read and write ports
  • more ports in the LSQ
  • more ports in the data cache
  • more ports in the ROB

12
ILP Limits
  • The perfect processor
  • Infinite registers (no WAW or WAR hazards)
  • Perfect branch direction and target prediction
  • Perfect memory disambiguation
  • Perfect instruction and data caches
  • Single-cycle latencies for all ALUs
  • Infinite ROB size (window of in-flight
    instructions)
  • No limit on number of instructions in each
    pipeline stage
  • The last instruction may be scheduled in the
    first cycle
  • The only constraint is a true dependence
    (register or
  • memory RAW hazards) (with value prediction,
    how would
  • the perfect processor behave?)

13
Infinite Window Size and Issue Rate
14
Effect of Window Size
  • Window size is effected by register file/ROB
    size, branch mispredict rate,
  • fetch bandwidth, etc.
  • We will use a window size of 2K instrs and a max
    issue rate of 64 for
  • subsequent experiments

15
Imperfect Branch Prediction
  • Note no branch mispredict penalty branch
    mispredict restricts window size
  • Assume a large tournament predictor for
    subsequent experiments

16
Effect of Name Dependences
  • More registers ? fewer WAR and WAW constraints
    (usually register file size
  • goes hand in hand with in-flight window size)
  • 256 int and fp registers for subsequent
    experiments

17
Memory Dependences
18
Limits of ILP Summary
  • Int programs are more limited by branches,
    memory
  • disambiguation, etc., while FP programs are
    limited most
  • by window size
  • We have not yet examined the effect of branch
    mispredict
  • penalty and imperfect caching
  • All of the studied factors have relatively
    comparable
  • influence on CPI window/register size, branch
    prediction,
  • memory disambiguation
  • Can we do better? Yes better compilers, value
    prediction,
  • memory dependence prediction, multi-path
    execution

19
Pentium III (P6 Microarchitecture) Case Study
  • 14-stage pipeline 8 for fetch/decode/dispatch,
    3 for o-o-o,
  • 3 for commit ? branch mispredict penalty of
    10-15 cycles
  • Out-of-order execution with a 40-entry ROB (40
    temporary
  • or virtual registers) and 20 reservation
    stations
  • Each x86 instruction gets converted into
    RISC-like
  • micro-ops on average, one CISC instr ? 1.37
    micro-ops
  • Three instructions in each pipeline stage ? 3
    instructions
  • can simultaneously leave the pipeline ? ideal
    CPmI 0.33
  • ? ideal CPI 0.45

20
Branch Prediction
  • 512-entry global two-level branch predictor and
    512-entry
  • BTB ? 20 combined mispredict rate
  • For every instruction committed, 0.2
    instructions on the
  • mispredicted path are also executed (wasted
    power!)
  • Mispredict penalty is 10-15 cycles

21
Where is Time Lost?
  • Branch mispredict stalls
  • Cache miss stalls (dominated by L1D misses)
  • Instruction fetch stalls (happens often because
    subsequent
  • stages are stalled, and occasionally because of
    an I-cache
  • miss

22
CPI Performance
  • Owing to stalls, the processor can fall behind
    (no instructions are committed
  • for 55 of all cycles), but then recover with
    multi-instruction commits (31 of
  • all cycles) ? average CPI 1.15 (Int) and 2.0
    (FP)
  • Overlap of different stalls ? CPI is not the sum
    of individual stalls
  • IPC is also an attractive metric

23
Title
  • Bullet
Write a Comment
User Comments (0)
About PowerShow.com