Dynamic Branch Prediction - PowerPoint PPT Presentation

About This Presentation

Title:

Dynamic Branch Prediction

Description:

Dynamic Branch Prediction During Context Switches T NT Jonathan Creekmore Nicolas Spiegelberg Overview Branch Prediction Techniques Context Switching Compression of ... – PowerPoint PPT presentation

Number of Views:186

Avg rating:3.0/5.0

Slides: 41

Provided by: Jonath254

Learn more at: http://www.ece.uah.edu

Category:

more less

Transcript and Presenter's Notes

Title: Dynamic Branch Prediction

1
Dynamic Branch Prediction

During Context Switches

Jonathan Creekmore Nicolas Spiegelberg
2
Overview

Branch Prediction Techniques
Context Switching
Compression of Branch Tables
Simulation
Hardware Model
Results
Analysis

3
Case for Branch Prediction

Multiple instructions at one time
Between 15 and 20
Branches occur every 5 instructions
if, while, for, function calls, etc.
Stalling pipeline is unacceptable
Lose all advantage of multiple instruction issue

4
Context Switch Time

Cause program execution to be paused
State of program is saved
New program is executed
Eventually, original program begins executing
again
Not all of the CPU state is saved
Such as the branch predictor tables

5
Context Switch Time

1 set of branch predictor state
Context switch causes a new application to use
the previous applications branch predictor state
Degrades performance for all applications
Solution Save the state of the branch predictor
at context switch time

6
Saving Branch State Table

Simple branch predictors still have large number
of bits
Storing and restoring the branch predictor should
not take too long
Lose the gain of storing/restoring if it takes
longer than the warm-up time of the branch
predictor

7
Compression

Compression is the key
Requires less storage
Needs to be done carefully
Some lossless compression schemes can inflate
number of bits
Luckily, lossy compression is acceptible

8
Semi-Lossy Compression

Applies to 2-bit predictors
Key is to store just taken/not-taken state
Ignores strong/weak

9
Semi-Lossy Decompression
10
Lossy Compression

Branch prediction is just an educated guess
Achieve higher compression ratio if some
information is lost
Majority rules
Used by correlating branch predictor

11
Lossy Compression
4x
T
T
NT
NT
NT
T
NT
NT
T
NT
NT
12
Lossy Decompression

Reinitialize all elements for an address to the
stored value
Best case -- all elements are correct
Worst cast -- 50 of elements are correct
Remember Branch predictors are just educated
guesses

13
Simulation

Modified SimpleScalars sim-bpred to support
context switching
Not necessary to actually switch between programs
On context switch, corrupt branch predictor table
according to a dirty percentage to simulate
another program running

14
Simulation

Testing compression/decompression becomes simple
Instead of corrupting branch predictor table,
replace entries with the value after
compression/decompression
Testing with
2-bit semi-lossy compression
4-bit lossy compression
8-bit lossy compression

15
Hardware Model

Compression and decompression blocks are fully
pipelined
Compression and decompression blocks can handle n
bits of compressed data at a time
Compression and decompression occur simultaneously

16
Hardware Model

Utilize data independence
Compress 128 bits into 64 bits at one time
Pipeline overhead should be minimal compared to
clock cycle savings

17
Programs Simulated

Several SPEC2000 CINT200 programs simulated
164.gzip Compression
175.vpr FPGA Place and route
181.mcf Combinatorial Optimization
197.parser Word Processing
256.bzip2 Compression

18
Predictor Types

2048 entry bimodal predictor (4096 bits)
4096 entry bimodal predictor (8192 bits)
1024 entry two-level predictor with 4-bit history
size (16384 bits)
4096 entry two-level predictor with 8-bit history
size (1048576 bits)
8192 entry two-level predictor with 8-bit history
size (2097152 bits)

19
2048 Entry Bimodal Predictor
20
2048 Entry Bimodal Predictor
21
2048 Entry Bimodal Predictor
22
4096 Entry Bimodal Predictor
23
4096 Entry Bimodal Predictor
24
4096 Entry Bimodal Predictor
25
1024 entry two-level predictor with 4-bit history
size
26
1024 entry two-level predictor with 4-bit history
size
27
1024 entry two-level predictor with 4-bit history
size
28
4096 entry two-level predictor with 8-bit history
size
29
4096 entry two-level predictor with 8-bit history
size
30
4096 entry two-level predictor with 8-bit history
size
31
8192 entry two-level predictor with 8-bit history
size
32
8192 entry two-level predictor with 8-bit history
size
33
8192 entry two-level predictor with 8-bit history
size
34
Timing Comparison
Miss Penalty 10 clock cycles
Bandwidth 64 bits per clock cycle
35
Timing Equations
General Timing Equation
Special Case for ratio of 0
36
Timing Comparison
Miss Penalty 15 clock cycles
Bandwidth 64 bits per clock cycle
37
Timing Comparison
Miss Penalty 10 clock cycles
Bandwidth 128 bits per clock cycle
38
Summary

Dynamic Branch Prediction is necessary for modern
high-performance processors
Context switches reduce the effect of dynamic
branch prediction
Naïvely saving the branch predictor state is
costly

39
Summary

Compression can be used to improve the cost of
saving branch predictor state
Higher compression ratios improve fixed
save/restore time at a cost of increasing the
number of mispredictions
For low frequency context switches, yields an
improvement in performance

40
Questions

Write a Comment

User Comments (0)