Dynamic Branch Prediction - PowerPoint PPT Presentation

About This Presentation
Title:

Dynamic Branch Prediction

Description:

Dynamic Branch Prediction During Context Switches T NT Jonathan Creekmore Nicolas Spiegelberg Overview Branch Prediction Techniques Context Switching Compression of ... – PowerPoint PPT presentation

Number of Views:184
Avg rating:3.0/5.0
Slides: 41
Provided by: Jonath254
Learn more at: http://www.ece.uah.edu
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Branch Prediction


1
Dynamic Branch Prediction
  • During Context Switches

Jonathan Creekmore Nicolas Spiegelberg
2
Overview
  • Branch Prediction Techniques
  • Context Switching
  • Compression of Branch Tables
  • Simulation
  • Hardware Model
  • Results
  • Analysis

3
Case for Branch Prediction
  • Multiple instructions at one time
  • Between 15 and 20
  • Branches occur every 5 instructions
  • if, while, for, function calls, etc.
  • Stalling pipeline is unacceptable
  • Lose all advantage of multiple instruction issue

4
Context Switch Time
  • Cause program execution to be paused
  • State of program is saved
  • New program is executed
  • Eventually, original program begins executing
    again
  • Not all of the CPU state is saved
  • Such as the branch predictor tables

5
Context Switch Time
  • 1 set of branch predictor state
  • Context switch causes a new application to use
    the previous applications branch predictor state
  • Degrades performance for all applications
  • Solution Save the state of the branch predictor
    at context switch time

6
Saving Branch State Table
  • Simple branch predictors still have large number
    of bits
  • Storing and restoring the branch predictor should
    not take too long
  • Lose the gain of storing/restoring if it takes
    longer than the warm-up time of the branch
    predictor

7
Compression
  • Compression is the key
  • Requires less storage
  • Needs to be done carefully
  • Some lossless compression schemes can inflate
    number of bits
  • Luckily, lossy compression is acceptible

8
Semi-Lossy Compression
  • Applies to 2-bit predictors
  • Key is to store just taken/not-taken state
  • Ignores strong/weak

9
Semi-Lossy Decompression
10
Lossy Compression
  • Branch prediction is just an educated guess
  • Achieve higher compression ratio if some
    information is lost
  • Majority rules
  • Used by correlating branch predictor

11
Lossy Compression
4x
T
T
NT
NT
NT
T
NT
NT
T
NT
NT
12
Lossy Decompression
  • Reinitialize all elements for an address to the
    stored value
  • Best case -- all elements are correct
  • Worst cast -- 50 of elements are correct
  • Remember Branch predictors are just educated
    guesses

13
Simulation
  • Modified SimpleScalars sim-bpred to support
    context switching
  • Not necessary to actually switch between programs
  • On context switch, corrupt branch predictor table
    according to a dirty percentage to simulate
    another program running

14
Simulation
  • Testing compression/decompression becomes simple
  • Instead of corrupting branch predictor table,
    replace entries with the value after
    compression/decompression
  • Testing with
  • 2-bit semi-lossy compression
  • 4-bit lossy compression
  • 8-bit lossy compression

15
Hardware Model
  • Compression and decompression blocks are fully
    pipelined
  • Compression and decompression blocks can handle n
    bits of compressed data at a time
  • Compression and decompression occur simultaneously

16
Hardware Model
  • Utilize data independence
  • Compress 128 bits into 64 bits at one time
  • Pipeline overhead should be minimal compared to
    clock cycle savings

17
Programs Simulated
  • Several SPEC2000 CINT200 programs simulated
  • 164.gzip Compression
  • 175.vpr FPGA Place and route
  • 181.mcf Combinatorial Optimization
  • 197.parser Word Processing
  • 256.bzip2 Compression

18
Predictor Types
  • 2048 entry bimodal predictor (4096 bits)
  • 4096 entry bimodal predictor (8192 bits)
  • 1024 entry two-level predictor with 4-bit history
    size (16384 bits)
  • 4096 entry two-level predictor with 8-bit history
    size (1048576 bits)
  • 8192 entry two-level predictor with 8-bit history
    size (2097152 bits)

19
2048 Entry Bimodal Predictor
20
2048 Entry Bimodal Predictor
21
2048 Entry Bimodal Predictor
22
4096 Entry Bimodal Predictor
23
4096 Entry Bimodal Predictor
24
4096 Entry Bimodal Predictor
25
1024 entry two-level predictor with 4-bit history
size
26
1024 entry two-level predictor with 4-bit history
size
27
1024 entry two-level predictor with 4-bit history
size
28
4096 entry two-level predictor with 8-bit history
size
29
4096 entry two-level predictor with 8-bit history
size
30
4096 entry two-level predictor with 8-bit history
size
31
8192 entry two-level predictor with 8-bit history
size
32
8192 entry two-level predictor with 8-bit history
size
33
8192 entry two-level predictor with 8-bit history
size
34
Timing Comparison
Miss Penalty 10 clock cycles
Bandwidth 64 bits per clock cycle
35
Timing Equations
General Timing Equation
Special Case for ratio of 0
36
Timing Comparison
Miss Penalty 15 clock cycles
Bandwidth 64 bits per clock cycle
37
Timing Comparison
Miss Penalty 10 clock cycles
Bandwidth 128 bits per clock cycle
38
Summary
  • Dynamic Branch Prediction is necessary for modern
    high-performance processors
  • Context switches reduce the effect of dynamic
    branch prediction
  • Naïvely saving the branch predictor state is
    costly

39
Summary
  • Compression can be used to improve the cost of
    saving branch predictor state
  • Higher compression ratios improve fixed
    save/restore time at a cost of increasing the
    number of mispredictions
  • For low frequency context switches, yields an
    improvement in performance

40
Questions
Write a Comment
User Comments (0)
About PowerShow.com