Fully Dynamic Specialization - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Fully Dynamic Specialization

Description:

Explicitly annotate static data ... Good candidate instructions are predictable: result in (only) a few hot values ... Case study: Interpreter ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 25

Provided by: ajsha9

Learn more at: http://osq.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Fully Dynamic Specialization

1
Fully Dynamic Specialization

AJ Shankar
OSQ Lunch
9 December 2003

2
Thats Why They Play the Game

Programs are executed because we cant determine
their behavior statically!
Idea Optimize programs dynamically to take
advantage of runtime information we cant get
statically
Look at portions of the program for predictable
inputs that we can optimize for

3
Specialization

Recompile portions of the program, using known
runtime values as constants
Possibly many variants of the same code
Allow for fallback to original code when
assumptions are not met
Predictable recurrent

4
How It Works
LOAD pc
X
X

Chose a good region of code to specialize after
a good predictable instruction
Insert dispatch that checks the result of the
chosen instruction
Recompile code for different results of the
instruction
During execution, jump to appropriate specialized
code

Dispatch(X)
Dispatch(X)
Dispatch(X)
Spec1
Spec2
Default
Spec1
Spec2
Default
Spec1
Spec2
Default

Rest of Code
5
Tying Things Together

If Foo is specialized on X
And because of X, Y is constant
And Foo calls Bar with param Y
And Bar is specialized on Y
Foo can jump straight to that specialized version
of Bar

Method Foo
Method Bar
Dispatch
Dispatch
Spec_X
Spec_Y
Spec_Z
Bar(Y)

6
When Is This a Good Idea?

Any app whose execution is heavily dependent on
input
For instance
Interpreters
Raytracers
Dynamic content producers (CGI scripts, etc.)

7
Specialization Is Hard!

Specializing code at runtime is costly
Can even slow the program down
Existing specializers rely on static annotations
to clue them in about profitable areas
Difficult to get right
Limits specialization potential

8
Existing DyC, Cyclone, etc.

Explicitly annotate static data
No support for automatic specialization of
frequently-executed code
Could compile lots of useless stuff
No concrete store information
Doesnt take advantage of the fact that memory
location X is constant for the lifetime of the
program

9
Existing Calpa

Mock, et al, 2000. Extension to DyC.
Profile execution on sample input to derive
annotations
But converting a concrete profile to an abstract
annotation means
Still unable to detect concrete memory constants
Frequently executed code for arbitrary input?
Still needs source, is offline!

10
Motivating Example Interpreter

while(1)
i instrspc
switch(instr.opcode)
case ADD
envi.res envi.op1 envi.op2
pc
break
case BNEQ
if (envi.op1 ! 0)
pc envi.op2
else pc
break
...

Sample interpreted program X 10 WHILE (Z
! 0) Y XZ

X is constant after initialization
concrete memory location
Y XZ executed frequently

11
Motivating Example Interpreter

while(1)
i instrspc
switch(instr.opcode)
case ADD
envi.res envi.op1 envi.op2
pc
break
case BNEQ
if (envi.op1 ! 0)
pc envi.op2
else pc
break
...

Sample interpreted program X 10 WHILE (Z
! 0) Y XZ
while(1) while (pc 15) // Y X
Z env3 10 env2 // Z ! 0 ? if
(env2 0) pc 19 else // normal
loop
12
A More Concrete Approach

Do everything at runtime!
Specialize on execution-time hot values
Know which concrete memory locations are constant
Other benefits of this approach
Specialize temporally, as execution progresses
Specialize dynamically loaded libraries as well
No annotations or source code necessary

13
A Quick Recap
LOAD pc
X
X
LOAD pc

Chose a good region of code to specialize
Insert dispatch that checks the result of the
chosen instruction (the trigger)
Recompile code for different values of a hot
instruction
During execution, jump to appropriate specialized
code

Dispatch(X)
Dispatch(X)
Dispatch(X)
Dispatch(pc)
Spec1
Spec2
Default
Spec1
Spec2
Default
Spec1
Spec2
Default
pc15
pc27
while(1)

Rest of Code
14
The Details

Need to identify the best predictable instruction
Specializing on its result should provide the
greatest benefit
To find it, gather profile information about all
instructions
Need to actually do the specializing

15
Instrumentation Hot Values

Whats a hot value? One that occurs frequently as
the result of an instruction
x 2 has two very hot values, 0 and 1
Good candidate instructions are predictable
result in (only) a few hot values
For instance, small_constant_tablex, but not
rand(x)
Case study Interpreter
Predictable instructions LOAD pc, instr.opcode
instr instrspc
switch(instr.opcode)

16
Instrumentation Store Profile

Keep track of memory locations that have been
written to
Idea if a location hasnt been written to yet,
it probably wont be later, either
Case study Interpreter
Store profile says envY written to a lot, but
envX, instrs never written to
regsinstr.res regsinstr.op1
regsinstr.op2

17
Invalidating Specialized Code

Memory locations may not really be constant
When constant memory is overwritten, must
invalidate or modify specializations that
depended on it
How does Calpa handle invalidation?
Computes points-to set
Inserts invalidation calls at all appropriate
points (offline)
Too costly an approach, without modification

18
Invalidation Options
Class Interpreter private Instruction
instrs void SetInstrs(Instruction is)
instrs is

Write barrier
Still feasible if field is private
On-entry checks
Feasible if specialization depends on a small
number of memory locations
e.g. Factor(BigInt x)
Hardware support
e.g. Mondrian
Ideal solution
Possible to simulate?

Hot Instruction
CheckMem
Dispatch
Invalidate
Spec1
Default
19
Specialization Procedure

Recap We know which instructions are good
candidates, what their hot values are, and what
parts of memory are likely to be invariant
Want to compile different versions of the same
block of code relative to a chosen trigger
instruction
Each version is keyed on a hot value of that
instruction
What instruction, if any, should be a basis for
specialization?

20
Specialization Algorithm

Find good candidate instructions
Predictable
Frequently executed
For each candidate instruction
Simultaneously evaluate method using constant
propagation for some of its hot values
Compute overall cost/benefit
Choose the best instruction

21
Algorithm Pseudo-code

foreach(value v in hot values)
worklist.push(ltstart node, vgt)
previously_emitted ltunspecialized nodes,
default stategt
while (ltn, sgt pop worklist)
ltn', s'gt evaluate(ltn, sgt) // uses store
information, fixes jumps
foreach (n'' in succ(n'))
// have we already seen this node/state pair
before?
prev_instr previously_emittedltn'', s'gt
if (prev_instr) // if so, link to it
n'.modify_jump_to(n''-gtprev_instr)
else // otherwise, keep evaluating
worklist.push(ltn'', s'gt)
instr emit_instruction(n')
// remember this pair in case we see it again
previously_emittedltn', s'gt instr

22
Specializing the Interpreter

while(1)
i instrspc
switch(instr.opcode)
case ADD
envi.res envi.op1 envi.op2
pc
break
case BNEQ
if (envi.op1 ! 0)
pc envi.op2
else pc
break
...

Candidates
Instr.opcode Executed very frequently A small
handful of values
pc Executed very frequently More values, but
still reasonable
23
Specializing on instr.opcode
Dispatch(opcode)
LOOP i instrspc
switch(ADD)
switch(i.opcode)
i.opcode ADD
switch(ADD)
benefit 1
case ADD

i.opcode ADD
case ADD
benefit 2
envi.res envi.op1envi.op2
i.opcode ADD
envi.res envi.op1envi.op2
pc pc 1
i.opcode ADD
pc pc 1
goto LOOP
i.opcode ADD
goto LOOP
benefit 3
i.opcode ADD
LOOP i instrspc

Other values of opcode have similar results
24
Specializing on pc
Y X Z
Dispatch(pc)
LOOP i instrs15
LOOP i instrspc
pc 15
LOOP i instrs15
benefit 1
switch(i.opcode)
pc 15 i ADD Y, X, Z
switch(ADD)
benefit 2
case ADD

pc 15 i ADD Y, X, Z
case ADD
benefit 3
envi.res envi.op1envi.op2
pc 15 i ADD Y, X, Z
envY 10 envZ
benefit 6
pc 15 i ADD Y, X, Z
pc pc 1
pc 15 1
benefit 7
pc 16 i ADD Y, X, Z
goto LOOP
LOOP i instrs16
benefit 8
pc 16 i BNEQ Z, 15
switch(BNEQ)
benefit 9
pc 16 i BNEQ Z, 15
if (envZ ! 0)
benefit 10
pc 16 i BNEQ Z, 15
pc
benefit
25
Final Result

Choose to specialize on pc because benefit is far
greater than for instr.opcode
Generate different versions for each of the
hottest values of pc
Terminate loop unrolling either naturally (when
we dont know what pc is anymore) or with a
simple heuristic

26
Heuristics

Algorithm may not terminate when unrolling loops
Simple heuristic widen variables when weve seen
the same node, say, 10 times (or use frequency
statistics)
Algorithm may generate lots of code
Need to only look at parts of state that matter
Widen somewhere
Other issues Algorithm may be slow
Need better way to prune off bad candidates

27
Implementation Ideas

Use Dynamo
Hot trace as basis for specialization
Intuitively, follow the lifetime of an object as
it travels through the program across function
boundaries
Unfortunately, closed-source, and API isnt
expressive enough

28
Implementation Ideas