A brief overview of A RegionBased Compilation Technique for a Java JustInTime Compiler 1 Presented b - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

A brief overview of A RegionBased Compilation Technique for a Java JustInTime Compiler 1 Presented b

Description:

Transitions from non-rare to rare BBs are identified and marked as rare block entry points ... Compiler http://java.sun.com/developer/technicalArticles ... – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 42

Provided by: vet4

Category:

more less

Transcript and Presenter's Notes

Title: A brief overview of A RegionBased Compilation Technique for a Java JustInTime Compiler 1 Presented b

1
A brief overview of A Region-Based Compilation
Technique for a Java Just-In-Time Compiler
1Presented by George Toderici
2
Java Overview

Java is a language which is translated by the
compiler into byte-code, which represents a
machine language for a virtual machine
The byte-code can either be interpreted or
compiled
A Just-In-Time compiler usually compiles the
byte-code on the fly

3
Java Overview (2)

JIT compilers suffer from the fact that they use
a a lot of memory, and are generally very slow.
The programs, once compiled can be quite fast.
However, when loading new (rare) classes
slowdowns occur because the JIT has to recompile
them
Interpretors are even slower, but tend not to
suffer from the slowness associated with on the
fly compilation

4
Hybrid Approach

It would make sense to combine compiling code on
the fly with interpreting code in order to speed
up applications
Use the JIT technique for those portions of the
code which need to be executed very frequently,
and interpret the rest
This has already been done before
The Java HotSpot 5 compiler uses dynamic
profile information in order to decide what
functions to compile

5
Motivation

Most compilers are based on methods which compile
all code regardless whether its rare or not
Most of the processing time is usually spent
processing relatively small regions of code which
are executed many times, but not necessarily
entire functions which are executed numerous
times
Instead of compiling whole functions as does
HotSpot 5, it may be faster to recompile just
those regions which are hot

6
Classical On Stack Replacement

Java code
public class Benchmark
public static void main(String arg)
int sum 0
for(int i 0 i
sum i

For the code above, in the Java HotSpot compiler
2 the function main is interpreted until the
compiler detects that more than 10000 iterations
have been completed in the function main.
The function is declared hot and will be
compiled.
However, the compiled code will not be executed
until the next call to main()! There is only one
call to main in a program.

7
Classical On Stack Replacement (2)

Java code
public class Benchmark
public static void main(String arg)
int sum 0
for(int i 0 i
sum i

In order to begin executing the compiled version
of the code, there must be a linkage between the
interpreters basic block state and the compiled
basic block state.
Consequently, information must be saved and sent
between the interpreter and the compiled versions
of the code (e.g., stack frame, live variables,
etc.)

8
System Overview

The paper uses a Mixed Mode Interpreter as the
testbed platform
Mixed Mode Interpreters execute both JIT
generated code and interpreted code in the same
program.
In order for the JIT to generate compiled code,
it needs to identify what methods to compile.
A counter for counting both method invocation
frequencies and loop iterations is allocated to
each method
When the counter exceeds a threshold, the method
is considered as frequently invoked (or hot) and
the first level of compilation is triggered

9
System Overview (2)

The compiler has three optimization levels 0, 1,
and 2
L0 basic method inlining, devirtualization of
method calls based on class hierarchy analysis
and type flow analysis and produces either
guarded or unguarded code. Preexistence analysis
is performed to safely remove guard code and
backup code without requiring OSR
L1 fully fledged method inlining, other data
flow optimizations
L2 escape analysis, stack object allocation,
code scheduling, DAG-based optimizations

10
System Overview (3)

The L0 optimizing compiler is run as an
application thread. The L1 and L2 compilations
are performed by a separate thread in the
background
The L1 or L2 recompilation is triggered by the
sampling profiler depending on the hotness level
of the executed code
The hotness level is decided by a profiler

11
The Profiler

Two types of profiling methods are used
The sampling profiler periodically monitors the
program counters of the application threads
The instrumentation profiler
The instrumentation profiler is used when
detailed information needs to be collected about
a specific method. The instrumentation code is
inserted in the L0 compiled code and is initially
disabled.

12
The Profiler (2)

When the sampler decides a method is hot enough
to be promoted, the instrumentation code is
enabled and detailed information is thus
collected. The code is disabled after a certain
amount of information is collected.
The instrumentation code collects information
about virtual/interface call receiver type
distributions and basic block execution
frequencies
The receiver type profile drives inlining if the
call site is dynamically monomorphic (has only
one target type)
Basic block frequencies determine rare code. RBC
optimizations are based on this information
(L1/L2 only)

13
Region Exit Handling

The idea is to handle this in a way that exploits
RBC for function inlining
Several options to handle this problem (all
assume OSR)
Fall back on the interpreter (HotSpot Compiler)
Drive recompilation with deoptimization. If the
rare cases occur frequently, the optimized code
is replaced with the deoptimized code.
Drive recompilation with the same optimization
level. The recompiled version has several entry
points corresponding to region boundaries in the
original and is used for both transitions and
future method invocation.

14
Region Exit Handling (2)

Each of the three ways to deal with REH has its
share of strong points and weaknesses, but they
all depend on what do we understand by rare
code
The first method works best if we consider the
limit case in which rare code is extremely
sparse, but if this does not hold, many
optimizations that could potentially be performed
are missed.
Ideally, it would be great to be able to choose
at runtime which method to use, based on profile
information

15
Region Exit Handling (3)

The authors implement the third variant because
it suits more the RBC case (we already know that
the regions in cause are non-rare, as they have
already been marked as hot and we do not want to
decompile nor deoptimze them)

16
On Stack Replacement

The optimized code has multiple entry points
which are added artificially so that there can be
a jump between the normal and the recompiled
versions
If the two regions are of similar shape, the jump
between them can happen without OSR

17
Region Based Compilation

RBC is performed only in level-1/2 compilation
using profile information from the
instrumentation code found on level 0
However, sometimes profile information is not
available for all methods.
Consequently a heuristic is combined with the
actual profile information to decide the region
selection

18
Region Based Compilation Intra-method region
selection
19
Intra-Procedural Region Selection

The method identifies code to be removed, rather
choosing code to be optimized.
This is because a dynamic compiler needs to be
more conservative due to the high cost of OSR.

20
Intra-Procedural Region Selection Algorithm
Overview

Initialize blocks with a guess whether they are
rare or not. If profile information is available,
use that to decide.
Propagate the information along backward data
flow until it converges for all BBs
Traverse basic blocks to determine the
transitions from non-rare BBs to rare BBs and
generate code to fix the potential problems

21
Intra-Procedural Region Selection Heuristic
Function

The basic heuristic for identifying region types
is the following
Compiler-generated backup blocks are rare
(devirtualization of method invocation)
Blocks that end up with exception throwing are
rare
Exception handler blocks are rare
Blocks that end with a normal return are NOT rare

22
Intra-Procedural Region Selection Non-rare to
Rare transitions

Transitions from non-rare to rare BBs are
identified and marked as rare block entry points
For each rare block entry point, live analysis is
performed to find the set of live variables
Generate a new region exit BB (RE-BB) for each
entry point and replace the original by
redirecting the control flow to the newly
generate block
RE-BB contains a single instruction (recompile)
that holds all live variables at the entry point
as its operands so that all information for OSR
is available for recompilation

23
Region Based Compilation Partial Inlining
24
Partial Inlining

Main Idea
Inline all small methods
For the others, they are first processed by
region selection and then they are re-assessed
based on the reduced code size. If inlinable, the
inlining is performed only on the non-rare
portions of the code

25
Partial Inlining (2)

During inlining devirtualization of dynamically
dispatched call sites is also performed. This is
based on class hierarchy analysis and receiver
type distribution profile information. This
determines whether the backup paths generated can
be removed or not.
Extra things done update live variable
information in the RE-BBs from the previous step

26
Partial Inlining (3)

Advantages of coupling inlining with region
selection
Since the inlining is only applied to the
non-rare regions, we are guaranteed not to waste
code on rare paths, hence conserving the inlining
budget
Inlining is done after the methods have been
processed by the intra-method region selection.
This means that a reduced amount of code is used
for inlining (again, conserving the inlining
budget)

27
Other Optimizations
28
Optimizations

Partial Dead Code Elimination
Partial Escape Analysis

29
Partial Dead Code Elimination

Since we have to keep a list of all live
variables (both stack and heap) for each RE-BB,
the average lifespan of some of the variables is
considerably increased when comparing to
function-based compiling
Partial dead code elimination can reduce the
magnitude of the problem by pushing computations
that are only live in region exit paths into
RE-BBs

30
Partial Dead Code Elimination (2)

Algorithm
Maintain two sets of live variables one from the
RE-BBs and one from the non-rare paths
Using a code motion algorithm, move computations
that use variables defined in the RE-BB set, but
not into the other. The computations are copied
both to the non-rare region and the RE-BB but
dead code elimination eventually will remove the
copy from the non-rare region

31
Partial Escape Analysis

Escape analysis identifies if an object may
escape the method or thread. It can be used to
speed up program considerably as it allows object
allocation on stack, as opposed to on heap (heap
allocation is much slower)
Usually objects escape from rare code or backup
paths from devirtualized methods, hence it has
limited applicability

32
Partial Escape Analysis (2)

The analysis is modified to work only for
non-rare paths (optimistic assumption)
Some regions may conclude that a certain
(parameter) object is non-escaping because of the
optimistic assumption of not analyzing rare
paths. Hence, when the execution exits from a
region boundary we need to recompile all
(calling) methods that use the summary
information.
Hence, we check whether the arguments are
included in the list of live variables at any
region exit point within the method and suppress
generating the summary information if this is the
case

33
Benchmarking

(Or the end of text-only slides)
Except the last slide and the references, of
course

34
Performance improvement over FBC in several
benchmarks
RBC-noopt no optimization RBC-nopi PEA,
PDCE RBC-full All optimizations
used RBC-offline All optimizations used,
combined with apriori profile information
Or decrease, as in the case of mtrt / noopt
35
Compilation Overhead Time Ratio
36
Code Size Ratio
37
Peak Memory Usage
38
Contributions of this paper

The design and implementation of a region-based
compilation technique in a dynamic (Java)
compiler
Performance analysis of the RBC method proposed
using a production level JIT compiler as a test
bed platform (the compiler name used as a test
bed platform was not disclosed in the paper)

39
The End

Any questions?

40
References

1. T. Suganuma, T. Yasue, T. Nakatani A
Region-Based Compilation Technique for a Java
Just-In-Time Compiler
2. The HotSpot Server Compiler http//java.sun.com
/developer/technicalArticles/Networking/HotSpot/on
stack.html
3. The FLEX Compiler Mailing Listhttp//www.flex-
compiler.lcs.mit.edu/Harpoon/hypermail/java-dev/01
52.html
4. Escape Analysis for Java http//citeseer.ist.p
su.edu/choi99escape.html
5. Suns Java HotSpot Server Compiler
http//java.sun.com/products/hotspot/

41
Basic Block Background

Each basic block has two sets associated with it
(whose meaning depends on the type of analysis
performed at that time) 3
The In set, is set of properties that are derived
from the other basic blocks which are linked to
this one
The Out set is derived from the In set and the
transfer function defined by the analysis
performed on the instruction held within this
basic block
The transfer function defines what new properties
are generated (the Gen set) and which ones are
killed (the Kill set) by the instructions passed
to it through a mapping function that is
implementation specific.