Title: Fully Dynamic Specialization
1Fully Dynamic Specialization
- AJ Shankar
- OSQ Lunch
- 9 December 2003
2Thats Why They Play the Game
- Programs are executed because we cant determine
their behavior statically! - Idea Optimize programs dynamically to take
advantage of runtime information we cant get
statically - Look at portions of the program for predictable
inputs that we can optimize for
3Specialization
- Recompile portions of the program, using known
runtime values as constants - Possibly many variants of the same code
- Allow for fallback to original code when
assumptions are not met - Predictable recurrent
4How It Works
LOAD pc
X
X
- Chose a good region of code to specialize after
a good predictable instruction - Insert dispatch that checks the result of the
chosen instruction - Recompile code for different results of the
instruction - During execution, jump to appropriate specialized
code
Dispatch(X)
Dispatch(X)
Dispatch(X)
Spec1
Spec2
Default
Spec1
Spec2
Default
Spec1
Spec2
Default
Rest of Code
5Tying Things Together
- If Foo is specialized on X
- And because of X, Y is constant
- And Foo calls Bar with param Y
- And Bar is specialized on Y
- Foo can jump straight to that specialized version
of Bar
Method Foo
Method Bar
Dispatch
Dispatch
Spec_X
Spec_Y
Spec_Z
Bar(Y)
6When Is This a Good Idea?
- Any app whose execution is heavily dependent on
input - For instance
- Interpreters
- Raytracers
- Dynamic content producers (CGI scripts, etc.)
7Specialization Is Hard!
- Specializing code at runtime is costly
- Can even slow the program down
- Existing specializers rely on static annotations
to clue them in about profitable areas - Difficult to get right
- Limits specialization potential
8Existing DyC, Cyclone, etc.
- Explicitly annotate static data
- No support for automatic specialization of
frequently-executed code - Could compile lots of useless stuff
- No concrete store information
- Doesnt take advantage of the fact that memory
location X is constant for the lifetime of the
program
9Existing Calpa
- Mock, et al, 2000. Extension to DyC.
- Profile execution on sample input to derive
annotations - But converting a concrete profile to an abstract
annotation means - Still unable to detect concrete memory constants
- Frequently executed code for arbitrary input?
- Still needs source, is offline!
10Motivating Example Interpreter
- while(1)
- i instrspc
- switch(instr.opcode)
- case ADD
- envi.res envi.op1 envi.op2
- pc
- break
- case BNEQ
- if (envi.op1 ! 0)
- pc envi.op2
- else pc
- break
- ...
-
Sample interpreted program X 10 WHILE (Z
! 0) Y XZ
- X is constant after initialization
- concrete memory location
- Y XZ executed frequently
11Motivating Example Interpreter
- while(1)
- i instrspc
- switch(instr.opcode)
- case ADD
- envi.res envi.op1 envi.op2
- pc
- break
- case BNEQ
- if (envi.op1 ! 0)
- pc envi.op2
- else pc
- break
- ...
-
Sample interpreted program X 10 WHILE (Z
! 0) Y XZ
while(1) while (pc 15) // Y X
Z env3 10 env2 // Z ! 0 ? if
(env2 0) pc 19 else // normal
loop
12A More Concrete Approach
- Do everything at runtime!
- Specialize on execution-time hot values
- Know which concrete memory locations are constant
- Other benefits of this approach
- Specialize temporally, as execution progresses
- Specialize dynamically loaded libraries as well
- No annotations or source code necessary
13A Quick Recap
LOAD pc
X
X
LOAD pc
- Chose a good region of code to specialize
- Insert dispatch that checks the result of the
chosen instruction (the trigger) - Recompile code for different values of a hot
instruction - During execution, jump to appropriate specialized
code
Dispatch(X)
Dispatch(X)
Dispatch(X)
Dispatch(pc)
Spec1
Spec2
Default
Spec1
Spec2
Default
Spec1
Spec2
Default
pc15
pc27
while(1)
Rest of Code
14The Details
- Need to identify the best predictable instruction
- Specializing on its result should provide the
greatest benefit - To find it, gather profile information about all
instructions - Need to actually do the specializing
15Instrumentation Hot Values
- Whats a hot value? One that occurs frequently as
the result of an instruction - x 2 has two very hot values, 0 and 1
- Good candidate instructions are predictable
result in (only) a few hot values - For instance, small_constant_tablex, but not
rand(x) - Case study Interpreter
- Predictable instructions LOAD pc, instr.opcode
- instr instrspc
- switch(instr.opcode)
16Instrumentation Store Profile
- Keep track of memory locations that have been
written to - Idea if a location hasnt been written to yet,
it probably wont be later, either - Case study Interpreter
- Store profile says envY written to a lot, but
envX, instrs never written to - regsinstr.res regsinstr.op1
regsinstr.op2
17Invalidating Specialized Code
- Memory locations may not really be constant
- When constant memory is overwritten, must
invalidate or modify specializations that
depended on it - How does Calpa handle invalidation?
- Computes points-to set
- Inserts invalidation calls at all appropriate
points (offline) - Too costly an approach, without modification
18Invalidation Options
Class Interpreter private Instruction
instrs void SetInstrs(Instruction is)
instrs is
- Write barrier
- Still feasible if field is private
- On-entry checks
- Feasible if specialization depends on a small
number of memory locations - e.g. Factor(BigInt x)
- Hardware support
- e.g. Mondrian
- Ideal solution
- Possible to simulate?
Hot Instruction
CheckMem
Dispatch
Invalidate
Spec1
Default
19Specialization Procedure
- Recap We know which instructions are good
candidates, what their hot values are, and what
parts of memory are likely to be invariant - Want to compile different versions of the same
block of code relative to a chosen trigger
instruction - Each version is keyed on a hot value of that
instruction - What instruction, if any, should be a basis for
specialization?
20Specialization Algorithm
- Find good candidate instructions
- Predictable
- Frequently executed
- For each candidate instruction
- Simultaneously evaluate method using constant
propagation for some of its hot values - Compute overall cost/benefit
- Choose the best instruction
21Algorithm Pseudo-code
- foreach(value v in hot values)
- worklist.push(ltstart node, vgt)
- previously_emitted ltunspecialized nodes,
default stategt - while (ltn, sgt pop worklist)
- ltn', s'gt evaluate(ltn, sgt) // uses store
information, fixes jumps - foreach (n'' in succ(n'))
- // have we already seen this node/state pair
before? - prev_instr previously_emittedltn'', s'gt
- if (prev_instr) // if so, link to it
- n'.modify_jump_to(n''-gtprev_instr)
- else // otherwise, keep evaluating
- worklist.push(ltn'', s'gt)
-
-
- instr emit_instruction(n')
- // remember this pair in case we see it again
- previously_emittedltn', s'gt instr
22Specializing the Interpreter
- while(1)
- i instrspc
- switch(instr.opcode)
- case ADD
- envi.res envi.op1 envi.op2
- pc
- break
- case BNEQ
- if (envi.op1 ! 0)
- pc envi.op2
- else pc
- break
- ...
-
Candidates
Instr.opcode Executed very frequently A small
handful of values
pc Executed very frequently More values, but
still reasonable
23Specializing on instr.opcode
Dispatch(opcode)
LOOP i instrspc
switch(ADD)
switch(i.opcode)
i.opcode ADD
switch(ADD)
benefit 1
case ADD
i.opcode ADD
case ADD
benefit 2
envi.res envi.op1envi.op2
i.opcode ADD
envi.res envi.op1envi.op2
pc pc 1
i.opcode ADD
pc pc 1
goto LOOP
i.opcode ADD
goto LOOP
benefit 3
i.opcode ADD
LOOP i instrspc
Other values of opcode have similar results
24Specializing on pc
Y X Z
Dispatch(pc)
LOOP i instrs15
LOOP i instrspc
pc 15
LOOP i instrs15
benefit 1
switch(i.opcode)
pc 15 i ADD Y, X, Z
switch(ADD)
benefit 2
case ADD
pc 15 i ADD Y, X, Z
case ADD
benefit 3
envi.res envi.op1envi.op2
pc 15 i ADD Y, X, Z
envY 10 envZ
benefit 6
pc 15 i ADD Y, X, Z
pc pc 1
pc 15 1
benefit 7
pc 16 i ADD Y, X, Z
goto LOOP
LOOP i instrs16
benefit 8
pc 16 i BNEQ Z, 15
switch(BNEQ)
benefit 9
pc 16 i BNEQ Z, 15
if (envZ ! 0)
benefit 10
pc 16 i BNEQ Z, 15
pc
benefit
25Final Result
- Choose to specialize on pc because benefit is far
greater than for instr.opcode - Generate different versions for each of the
hottest values of pc - Terminate loop unrolling either naturally (when
we dont know what pc is anymore) or with a
simple heuristic
26Heuristics
- Algorithm may not terminate when unrolling loops
- Simple heuristic widen variables when weve seen
the same node, say, 10 times (or use frequency
statistics) - Algorithm may generate lots of code
- Need to only look at parts of state that matter
- Widen somewhere
- Other issues Algorithm may be slow
- Need better way to prune off bad candidates
27Implementation Ideas
- Use Dynamo
- Hot trace as basis for specialization
- Intuitively, follow the lifetime of an object as
it travels through the program across function
boundaries - Unfortunately, closed-source, and API isnt
expressive enough
28Implementation Ideas
- JikesRVM
- Java VM written in Java
- Has a primitive framework for sampling
- Has a fairly sophisticated framework for dynamic
recompilation - Does aggressive inlining
- Only instrument hot traces (but compiler is slow)