Title: A brief overview of A RegionBased Compilation Technique for a Java JustInTime Compiler 1 Presented b
1A brief overview of A Region-Based Compilation
Technique for a Java Just-In-Time Compiler
1Presented by George Toderici
2Java Overview
- Java is a language which is translated by the
compiler into byte-code, which represents a
machine language for a virtual machine - The byte-code can either be interpreted or
compiled - A Just-In-Time compiler usually compiles the
byte-code on the fly
3Java Overview (2)
- JIT compilers suffer from the fact that they use
a a lot of memory, and are generally very slow.
The programs, once compiled can be quite fast.
However, when loading new (rare) classes
slowdowns occur because the JIT has to recompile
them - Interpretors are even slower, but tend not to
suffer from the slowness associated with on the
fly compilation
4Hybrid Approach
- It would make sense to combine compiling code on
the fly with interpreting code in order to speed
up applications - Use the JIT technique for those portions of the
code which need to be executed very frequently,
and interpret the rest - This has already been done before
- The Java HotSpot 5 compiler uses dynamic
profile information in order to decide what
functions to compile
5Motivation
- Most compilers are based on methods which compile
all code regardless whether its rare or not - Most of the processing time is usually spent
processing relatively small regions of code which
are executed many times, but not necessarily
entire functions which are executed numerous
times - Instead of compiling whole functions as does
HotSpot 5, it may be faster to recompile just
those regions which are hot
6Classical On Stack Replacement
- Java code
- public class Benchmark
- public static void main(String arg)
- int sum 0
- for(int i 0 i
- sum i
-
- For the code above, in the Java HotSpot compiler
2 the function main is interpreted until the
compiler detects that more than 10000 iterations
have been completed in the function main. - The function is declared hot and will be
compiled. - However, the compiled code will not be executed
until the next call to main()! There is only one
call to main in a program.
7Classical On Stack Replacement (2)
- Java code
- public class Benchmark
- public static void main(String arg)
- int sum 0
- for(int i 0 i
- sum i
-
- In order to begin executing the compiled version
of the code, there must be a linkage between the
interpreters basic block state and the compiled
basic block state. - Consequently, information must be saved and sent
between the interpreter and the compiled versions
of the code (e.g., stack frame, live variables,
etc.)
8System Overview
- The paper uses a Mixed Mode Interpreter as the
testbed platform - Mixed Mode Interpreters execute both JIT
generated code and interpreted code in the same
program. - In order for the JIT to generate compiled code,
it needs to identify what methods to compile. - A counter for counting both method invocation
frequencies and loop iterations is allocated to
each method - When the counter exceeds a threshold, the method
is considered as frequently invoked (or hot) and
the first level of compilation is triggered
9System Overview (2)
- The compiler has three optimization levels 0, 1,
and 2 - L0 basic method inlining, devirtualization of
method calls based on class hierarchy analysis
and type flow analysis and produces either
guarded or unguarded code. Preexistence analysis
is performed to safely remove guard code and
backup code without requiring OSR - L1 fully fledged method inlining, other data
flow optimizations - L2 escape analysis, stack object allocation,
code scheduling, DAG-based optimizations
10System Overview (3)
- The L0 optimizing compiler is run as an
application thread. The L1 and L2 compilations
are performed by a separate thread in the
background - The L1 or L2 recompilation is triggered by the
sampling profiler depending on the hotness level
of the executed code - The hotness level is decided by a profiler
11The Profiler
- Two types of profiling methods are used
- The sampling profiler periodically monitors the
program counters of the application threads - The instrumentation profiler
- The instrumentation profiler is used when
detailed information needs to be collected about
a specific method. The instrumentation code is
inserted in the L0 compiled code and is initially
disabled.
12The Profiler (2)
- When the sampler decides a method is hot enough
to be promoted, the instrumentation code is
enabled and detailed information is thus
collected. The code is disabled after a certain
amount of information is collected. - The instrumentation code collects information
about virtual/interface call receiver type
distributions and basic block execution
frequencies - The receiver type profile drives inlining if the
call site is dynamically monomorphic (has only
one target type) - Basic block frequencies determine rare code. RBC
optimizations are based on this information
(L1/L2 only)
13Region Exit Handling
- The idea is to handle this in a way that exploits
RBC for function inlining - Several options to handle this problem (all
assume OSR) - Fall back on the interpreter (HotSpot Compiler)
- Drive recompilation with deoptimization. If the
rare cases occur frequently, the optimized code
is replaced with the deoptimized code. - Drive recompilation with the same optimization
level. The recompiled version has several entry
points corresponding to region boundaries in the
original and is used for both transitions and
future method invocation.
14Region Exit Handling (2)
- Each of the three ways to deal with REH has its
share of strong points and weaknesses, but they
all depend on what do we understand by rare
code - The first method works best if we consider the
limit case in which rare code is extremely
sparse, but if this does not hold, many
optimizations that could potentially be performed
are missed. - Ideally, it would be great to be able to choose
at runtime which method to use, based on profile
information
15Region Exit Handling (3)
- The authors implement the third variant because
it suits more the RBC case (we already know that
the regions in cause are non-rare, as they have
already been marked as hot and we do not want to
decompile nor deoptimze them)
16On Stack Replacement
- The optimized code has multiple entry points
which are added artificially so that there can be
a jump between the normal and the recompiled
versions - If the two regions are of similar shape, the jump
between them can happen without OSR
17Region Based Compilation
- RBC is performed only in level-1/2 compilation
using profile information from the
instrumentation code found on level 0 - However, sometimes profile information is not
available for all methods. - Consequently a heuristic is combined with the
actual profile information to decide the region
selection
18Region Based Compilation Intra-method region
selection
19Intra-Procedural Region Selection
- The method identifies code to be removed, rather
choosing code to be optimized. - This is because a dynamic compiler needs to be
more conservative due to the high cost of OSR.
20Intra-Procedural Region Selection Algorithm
Overview
- Initialize blocks with a guess whether they are
rare or not. If profile information is available,
use that to decide. - Propagate the information along backward data
flow until it converges for all BBs - Traverse basic blocks to determine the
transitions from non-rare BBs to rare BBs and
generate code to fix the potential problems
21Intra-Procedural Region Selection Heuristic
Function
- The basic heuristic for identifying region types
is the following - Compiler-generated backup blocks are rare
(devirtualization of method invocation) - Blocks that end up with exception throwing are
rare - Exception handler blocks are rare
- Blocks that end with a normal return are NOT rare
22Intra-Procedural Region Selection Non-rare to
Rare transitions
- Transitions from non-rare to rare BBs are
identified and marked as rare block entry points - For each rare block entry point, live analysis is
performed to find the set of live variables - Generate a new region exit BB (RE-BB) for each
entry point and replace the original by
redirecting the control flow to the newly
generate block - RE-BB contains a single instruction (recompile)
that holds all live variables at the entry point
as its operands so that all information for OSR
is available for recompilation
23Region Based Compilation Partial Inlining
24Partial Inlining
- Main Idea
- Inline all small methods
- For the others, they are first processed by
region selection and then they are re-assessed
based on the reduced code size. If inlinable, the
inlining is performed only on the non-rare
portions of the code
25Partial Inlining (2)
- During inlining devirtualization of dynamically
dispatched call sites is also performed. This is
based on class hierarchy analysis and receiver
type distribution profile information. This
determines whether the backup paths generated can
be removed or not. - Extra things done update live variable
information in the RE-BBs from the previous step
26Partial Inlining (3)
- Advantages of coupling inlining with region
selection - Since the inlining is only applied to the
non-rare regions, we are guaranteed not to waste
code on rare paths, hence conserving the inlining
budget - Inlining is done after the methods have been
processed by the intra-method region selection.
This means that a reduced amount of code is used
for inlining (again, conserving the inlining
budget)
27Other Optimizations
28Optimizations
- Partial Dead Code Elimination
- Partial Escape Analysis
29Partial Dead Code Elimination
- Since we have to keep a list of all live
variables (both stack and heap) for each RE-BB,
the average lifespan of some of the variables is
considerably increased when comparing to
function-based compiling - Partial dead code elimination can reduce the
magnitude of the problem by pushing computations
that are only live in region exit paths into
RE-BBs
30Partial Dead Code Elimination (2)
- Algorithm
- Maintain two sets of live variables one from the
RE-BBs and one from the non-rare paths - Using a code motion algorithm, move computations
that use variables defined in the RE-BB set, but
not into the other. The computations are copied
both to the non-rare region and the RE-BB but
dead code elimination eventually will remove the
copy from the non-rare region
31Partial Escape Analysis
- Escape analysis identifies if an object may
escape the method or thread. It can be used to
speed up program considerably as it allows object
allocation on stack, as opposed to on heap (heap
allocation is much slower) - Usually objects escape from rare code or backup
paths from devirtualized methods, hence it has
limited applicability
32Partial Escape Analysis (2)
- The analysis is modified to work only for
non-rare paths (optimistic assumption) - Some regions may conclude that a certain
(parameter) object is non-escaping because of the
optimistic assumption of not analyzing rare
paths. Hence, when the execution exits from a
region boundary we need to recompile all
(calling) methods that use the summary
information. - Hence, we check whether the arguments are
included in the list of live variables at any
region exit point within the method and suppress
generating the summary information if this is the
case
33Benchmarking
- (Or the end of text-only slides)
- Except the last slide and the references, of
course
34Performance improvement over FBC in several
benchmarks
RBC-noopt no optimization RBC-nopi PEA,
PDCE RBC-full All optimizations
used RBC-offline All optimizations used,
combined with apriori profile information
Or decrease, as in the case of mtrt / noopt
35Compilation Overhead Time Ratio
36Code Size Ratio
37Peak Memory Usage
38Contributions of this paper
- The design and implementation of a region-based
compilation technique in a dynamic (Java)
compiler - Performance analysis of the RBC method proposed
using a production level JIT compiler as a test
bed platform (the compiler name used as a test
bed platform was not disclosed in the paper)
39The End
40References
- 1. T. Suganuma, T. Yasue, T. Nakatani A
Region-Based Compilation Technique for a Java
Just-In-Time Compiler - 2. The HotSpot Server Compiler http//java.sun.com
/developer/technicalArticles/Networking/HotSpot/on
stack.html - 3. The FLEX Compiler Mailing Listhttp//www.flex-
compiler.lcs.mit.edu/Harpoon/hypermail/java-dev/01
52.html - 4. Escape Analysis for Java http//citeseer.ist.p
su.edu/choi99escape.html - 5. Suns Java HotSpot Server Compiler
- http//java.sun.com/products/hotspot/
41Basic Block Background
- Each basic block has two sets associated with it
(whose meaning depends on the type of analysis
performed at that time) 3 - The In set, is set of properties that are derived
from the other basic blocks which are linked to
this one - The Out set is derived from the In set and the
transfer function defined by the analysis
performed on the instruction held within this
basic block - The transfer function defines what new properties
are generated (the Gen set) and which ones are
killed (the Kill set) by the instructions passed
to it through a mapping function that is
implementation specific.