Hardwarebased Devirtualization VPC Prediction - PowerPoint PPT Presentation

About This Presentation
Title:

Hardwarebased Devirtualization VPC Prediction

Description:

Source code: Shape *s = ...; a = s- area(); // virtual function call. Static assembly code: R1 = MEM[R2] // function address lookup. call R1 // a register ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 53
Provided by: hyes
Learn more at: http://users.ece.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Hardwarebased Devirtualization VPC Prediction


1
Hardware-based Devirtualization (VPC Prediction)
  • Hyesoon Kim, Jose A. Joao, Onur Mutlu, Chang
    Joo Lee, Yale N. Patt, Robert Cohn



2
Outline
  • Background and Motivation
  • VPC (Virtual Program Counter) Prediction
  • Results
  • Conclusion

3
Direct vs. Indirect Branch
A
A
R1 MEMR2 branch R1
br.cond TARGET
N
T
?
A1
TARG
a
b
d
r
Indirect Branch
Conditional (Direct) Branch
  • Indirect branches are costly on processor
    performance
  • Much more difficult to predict than conditional
    (direct) branches multiple target addresses
  • Indirect branch predictor requires a large
    structure

4
Source Code Examples
  • Switch structures
  • Virtual function calls

Source code Shape s a s-area()
// virtual function call Static
assembly code R1 MEMR2 //
function address lookup call R1
// a register-indirect call
5
Indirect Branch Mispredictions
Data from Intel Core Duo processor
6
Branch Predictor
Direction Predictor
..1001010
GHR
Hash
PC Addr
0x0800
TARG2
TARG2
Predicted target
Indirect Branch Predictor
T
TARG1
PC1
Direct Branch?
Indirect Branch?
Branch Target Buffer (BTB)
7
Outline
  • Background and Motivation
  • VPC (Virtual Program Counter) Prediction
  • Results
  • Conclusion

8
VPC Prediction Basic Idea
  • Key idea Treat an indirect branch as
    multiple virtual
    conditional branches
  • Only for prediction purposes
  • Use the conditional branch predictor

9
VPC Branch Predictor
Direction Predictor
..1001010
GHR
Hash
PC Addr
0x0800
VPC2
VPC1
TARG2
Predicted target
TARG1
Branch Target Buffer
10
VPC Prediction Basic Idea
  • Key idea Treat an indirect branch as
    multiple virtual
    conditional branches
  • Only for prediction purposes
  • Use the conditional branch predictor
  • Benefits
  • No separate complex structure
  • Can be applied to any other conditional branch
    prediction algorithm
  • Improve conditional branch prediction algorithm
  • Will improve the indirect branch prediction
    accuracy

11
Inspiration Static Devirtualization
  • Source code
  • Shape s
  • a s-area()
    // an indirect call

Optimized source code Shape s
if (s-type Rectangle) // a conditional
branch at PC X a Rectanglearea()
else if (s-type Circle) // a
conditional branch at PC Y a
Circlearea() else a
s-area() // an indirect call
at PC Z

Small talk(84), Calder and Grunwald (94),
Garret et al. (94) , Ishizaki et al.(00)
12
VPC Prediction
  • Source code
  • Shape s
  • a s-area() // an
    indirect call
  • Static assembly code
  • R1 MEMR2
  • call R1 //
    PC L
  • Dynamic virtual branches (for prediction
    purposes)
  • conditional jump TARGET1 // virtual PC
    L
  • conditional jump TARGET2 // virtual PC
    L XOR HASHVAL1
  • conditional jump TARGET3 // virtual PC
    L XOR HASHVAL2
  • conditional jump TARGET4 // virtual PC
    L XOR HASHVAL3

13
Virtual PC Address Generation
  • Use original PC address and iteration counter
    value

Hash value table
iteration counter value
14
VPC Prediction Process-I
Direction Predictor
Real Instruction
GHR
call R1 // PC L
1111
not taken
Virtual Instructions
PC
L
  • cond. jump TARG1 // VPC L
  • cond. jump TARG2 // VPC VL2
  • cond. jump TARG3 // VPC VL3
  • cond. jump TARG4 // VPC VL4

BTB
Next iteration
TARG1
15
VPC Prediction Process-II
Direction Predictor
Real Instruction
VGHR
call R1 // PC L
1110
Virtual Instructions
VPC
VL2
  • cond. jump TARG1 // VPC L
  • cond. jump TARG2 // VPC VL2
  • cond. jump TARG3 // VPC VL3
  • cond. jump TARG4 // VPC VL4

not taken
BTB
TARG2
Next iteration
16
VPC Prediction Process-III
Direction Predictor
Real Instruction
taken
VGHR
call R1 // PC L
1100
Virtual Instructions
VPC
  • cond. jump TARG1 // VPC L
  • cond. jump TARG2 // VPC VL2
  • cond. jump TARG3 // VPC VL3
  • cond. jump TARG4 // VPC VL4

VL3
BTB
Predicted Target TARG3
TARG3
17
VPC Prediction Algorithm
  • Access the conditional branch predictor and the
    BTB with VPCA and VGHR
  • Compute VPCA and VGHR for the next iteration
  • VPCA PC XOR HASHVALiter
  • VGHR VGHR
  • Predicted not taken Move to the next iteration
  • Predicted taken Use the target in the BTB as the
    target of an indirect branch
  • Give up and stall if
  • Iteration count MAX_ITER or BTB miss

18
VPC Training Algorithm
  • An iterative process when an indirect branch is
    retired (not on the critical path)
  • Update the conditional branch predictor
  • Virtual branch has a correct target Taken
  • Virtual branch has a wrong target Not-taken
  • Update replacement policy bits of the correct
    target in the BTB
  • Insert the correct target into the BTB
  • Conditional branch predictor taken
  • Replace the least frequently used target (LFU)

19
Hardware Cost and Complexity
Taken/Not Taken
Predict?
Direct/Indirect
Target Address
20
Outline
  • Background and Motivation
  • VPC Prediction
  • Results
  • Conclusion

21
Simulation Methodology
  • Pin-based x86 Simulator
  • Processor configuration
  • 4K-entry BTB
  • 64KB perceptron conditional branch predictor
  • Minimum 30-cycle branch misprediction penalty
  • 8-wide, 512-entry instruction window
  • Less aggressive processor (in the paper)
  • Gshare, O-GEHL conditional branch predictors
  • Indirect branch intensive benchmarks
  • 5 SPEC CPU2000, 5 SPEC CPU 2006, 2 other C
  • IBM server benchmarks (OLTP) (in the paper)

22
VPC MPKI
23
VPC Performance
24
Different Direction Predictors
98 98.3 99
Conditional branch accuracy ()
Improving conditional branch prediction accuracy
also improves indirect branch prediction accuracy!
25
VPC vs. Static Devirtualization
  • Advantages
  • Enables other compiler optimizations (function
    inlining)
  • Can reduce the number of mispredictions
  • Disadvantages/Limitations
  • Not all indirect branches can be statically
    devirtualized
  • Extensive static analysis/profiling
  • Lack of adaptivity to run-time input set and
    phase behavior
  • VPC prediction can be used with
  • statically devirtualized binaries
  • 10 improvement on top of static devirtualization

26
Outline
  • Background and Motivation
  • VPC Prediction
  • Results
  • Conclusion

27
Conclusion
  • VPC dynamically converts indirect branches into
    multiple conditional branches uses the existing
    conditional branch prediction hardware
  • VPC prediction reduces the branch misprediction
    penalty without significant extra hardware
    storage.
  • Baseline 26 IPC improvement
  • O-GEHL 31 IPC improvement
  • VPC can be an enabler encouraging programmers to
    use object-oriented programming styles

28
Thank you!
  • Questions?

29
VPC vs. Cascaded IBP
30
VPC vs. Other Indirect BP
TTC Chang et al. (96) Cascaded Driesen and
Holzle(98)
31
Iterative prediction
  • It doesnt hurt performance significantly
  • Results
  • Why?
  • Most prediction is within a few iterations.
  • Results

32
VPC Hit Iteration Counter
33
Can the BTB be pipelined?
  • Yes
  • The next iteration of VPC can be started without
    knowing the previous iteration in the pipeline.
  • Consecutive VPC prediction iterations can be
    simply pipelined.
  • If the iteration is not needed then simply
    discard the prediction.

34
Is 4K-entry BTB too large?
  • Pentium 4 has a 4K-entry BTB
  • IBM Z series (z990) has an 8K-entry BTB
  • AMD Athlon and Hammer have 2K-entry BTBs

35
BTB Size Effects
36
VPC Prediction Accuracy
37
Target Distribution
38
VPC vs. Tagged Target Cache
39
VPC Prediction Delay Effects
40
VPC with O-GEHL BP
41
VPC with a Less Aggressive Processor
42
Server Benchmarks
43
Server Benchmarks (VPC vs. TTC)
44
VPC Prediction vs. Compiler-Based
Devirtualization (With TTC)
45
Conditional Br. Prediction Effects
VPC Prediction reduces the accuracy of direction
branch prediction but not that much!
46
Indirect Branch Mispredictions
47
VPC Prediction with Static Devirtualization
  • VPC prediction can be used with static
    devirtualized binaries.
  • Not all indirect branches could be devirtualized

48
VPC Training Correct Prediction
Retirement Real Instruction
call R1 // PC L
Known Correct predicted, predicted iter 3
Update the BTB replacement counter
49
VPC Training Misprediction
Retirement Real Instruction
call R1 // PC L
Known Mispredicted, correct target address
Update the BTB replacement counter
50
VPC Training Misprediction
Retirement Real Instruction
call R1 // PC L
Known Mispredicted, correct target address
No Target
51
VPC Training Misprediction
Retirement Real Instruction
call R1 // PC L
Known Mispredicted, correct target address
Replacement
?
Taken
Insert
0
52
Does VPC need an extra BTB port?
  • No
  • A read from the BTB is only needed when a branch
    is mispredicted.
  • 95 branches are correctly predicted with VPC.
  • The read is performed only there is a available
    BTB port.
Write a Comment
User Comments (0)
About PowerShow.com