A brief overview of A RegionBased Compilation Technique for a Java JustInTime Compiler 1 Presented b - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

A brief overview of A RegionBased Compilation Technique for a Java JustInTime Compiler 1 Presented b

Description:

Transitions from non-rare to rare BBs are identified and marked as rare block entry points ... Compiler http://java.sun.com/developer/technicalArticles ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 42
Provided by: vet4
Category:

less

Transcript and Presenter's Notes

Title: A brief overview of A RegionBased Compilation Technique for a Java JustInTime Compiler 1 Presented b


1
A brief overview of A Region-Based Compilation
Technique for a Java Just-In-Time Compiler
1Presented by George Toderici
2
Java Overview
  • Java is a language which is translated by the
    compiler into byte-code, which represents a
    machine language for a virtual machine
  • The byte-code can either be interpreted or
    compiled
  • A Just-In-Time compiler usually compiles the
    byte-code on the fly

3
Java Overview (2)
  • JIT compilers suffer from the fact that they use
    a a lot of memory, and are generally very slow.
    The programs, once compiled can be quite fast.
    However, when loading new (rare) classes
    slowdowns occur because the JIT has to recompile
    them
  • Interpretors are even slower, but tend not to
    suffer from the slowness associated with on the
    fly compilation

4
Hybrid Approach
  • It would make sense to combine compiling code on
    the fly with interpreting code in order to speed
    up applications
  • Use the JIT technique for those portions of the
    code which need to be executed very frequently,
    and interpret the rest
  • This has already been done before
  • The Java HotSpot 5 compiler uses dynamic
    profile information in order to decide what
    functions to compile

5
Motivation
  • Most compilers are based on methods which compile
    all code regardless whether its rare or not
  • Most of the processing time is usually spent
    processing relatively small regions of code which
    are executed many times, but not necessarily
    entire functions which are executed numerous
    times
  • Instead of compiling whole functions as does
    HotSpot 5, it may be faster to recompile just
    those regions which are hot

6
Classical On Stack Replacement
  • Java code
  • public class Benchmark
  • public static void main(String arg)
  • int sum 0
  • for(int i 0 i
  • sum i
  • For the code above, in the Java HotSpot compiler
    2 the function main is interpreted until the
    compiler detects that more than 10000 iterations
    have been completed in the function main.
  • The function is declared hot and will be
    compiled.
  • However, the compiled code will not be executed
    until the next call to main()! There is only one
    call to main in a program.

7
Classical On Stack Replacement (2)
  • Java code
  • public class Benchmark
  • public static void main(String arg)
  • int sum 0
  • for(int i 0 i
  • sum i
  • In order to begin executing the compiled version
    of the code, there must be a linkage between the
    interpreters basic block state and the compiled
    basic block state.
  • Consequently, information must be saved and sent
    between the interpreter and the compiled versions
    of the code (e.g., stack frame, live variables,
    etc.)

8
System Overview
  • The paper uses a Mixed Mode Interpreter as the
    testbed platform
  • Mixed Mode Interpreters execute both JIT
    generated code and interpreted code in the same
    program.
  • In order for the JIT to generate compiled code,
    it needs to identify what methods to compile.
  • A counter for counting both method invocation
    frequencies and loop iterations is allocated to
    each method
  • When the counter exceeds a threshold, the method
    is considered as frequently invoked (or hot) and
    the first level of compilation is triggered

9
System Overview (2)
  • The compiler has three optimization levels 0, 1,
    and 2
  • L0 basic method inlining, devirtualization of
    method calls based on class hierarchy analysis
    and type flow analysis and produces either
    guarded or unguarded code. Preexistence analysis
    is performed to safely remove guard code and
    backup code without requiring OSR
  • L1 fully fledged method inlining, other data
    flow optimizations
  • L2 escape analysis, stack object allocation,
    code scheduling, DAG-based optimizations

10
System Overview (3)
  • The L0 optimizing compiler is run as an
    application thread. The L1 and L2 compilations
    are performed by a separate thread in the
    background
  • The L1 or L2 recompilation is triggered by the
    sampling profiler depending on the hotness level
    of the executed code
  • The hotness level is decided by a profiler

11
The Profiler
  • Two types of profiling methods are used
  • The sampling profiler periodically monitors the
    program counters of the application threads
  • The instrumentation profiler
  • The instrumentation profiler is used when
    detailed information needs to be collected about
    a specific method. The instrumentation code is
    inserted in the L0 compiled code and is initially
    disabled.

12
The Profiler (2)
  • When the sampler decides a method is hot enough
    to be promoted, the instrumentation code is
    enabled and detailed information is thus
    collected. The code is disabled after a certain
    amount of information is collected.
  • The instrumentation code collects information
    about virtual/interface call receiver type
    distributions and basic block execution
    frequencies
  • The receiver type profile drives inlining if the
    call site is dynamically monomorphic (has only
    one target type)
  • Basic block frequencies determine rare code. RBC
    optimizations are based on this information
    (L1/L2 only)

13
Region Exit Handling
  • The idea is to handle this in a way that exploits
    RBC for function inlining
  • Several options to handle this problem (all
    assume OSR)
  • Fall back on the interpreter (HotSpot Compiler)
  • Drive recompilation with deoptimization. If the
    rare cases occur frequently, the optimized code
    is replaced with the deoptimized code.
  • Drive recompilation with the same optimization
    level. The recompiled version has several entry
    points corresponding to region boundaries in the
    original and is used for both transitions and
    future method invocation.

14
Region Exit Handling (2)
  • Each of the three ways to deal with REH has its
    share of strong points and weaknesses, but they
    all depend on what do we understand by rare
    code
  • The first method works best if we consider the
    limit case in which rare code is extremely
    sparse, but if this does not hold, many
    optimizations that could potentially be performed
    are missed.
  • Ideally, it would be great to be able to choose
    at runtime which method to use, based on profile
    information

15
Region Exit Handling (3)
  • The authors implement the third variant because
    it suits more the RBC case (we already know that
    the regions in cause are non-rare, as they have
    already been marked as hot and we do not want to
    decompile nor deoptimze them)

16
On Stack Replacement
  • The optimized code has multiple entry points
    which are added artificially so that there can be
    a jump between the normal and the recompiled
    versions
  • If the two regions are of similar shape, the jump
    between them can happen without OSR

17
Region Based Compilation
  • RBC is performed only in level-1/2 compilation
    using profile information from the
    instrumentation code found on level 0
  • However, sometimes profile information is not
    available for all methods.
  • Consequently a heuristic is combined with the
    actual profile information to decide the region
    selection

18
Region Based Compilation Intra-method region
selection
19
Intra-Procedural Region Selection
  • The method identifies code to be removed, rather
    choosing code to be optimized.
  • This is because a dynamic compiler needs to be
    more conservative due to the high cost of OSR.

20
Intra-Procedural Region Selection Algorithm
Overview
  • Initialize blocks with a guess whether they are
    rare or not. If profile information is available,
    use that to decide.
  • Propagate the information along backward data
    flow until it converges for all BBs
  • Traverse basic blocks to determine the
    transitions from non-rare BBs to rare BBs and
    generate code to fix the potential problems

21
Intra-Procedural Region Selection Heuristic
Function
  • The basic heuristic for identifying region types
    is the following
  • Compiler-generated backup blocks are rare
    (devirtualization of method invocation)
  • Blocks that end up with exception throwing are
    rare
  • Exception handler blocks are rare
  • Blocks that end with a normal return are NOT rare

22
Intra-Procedural Region Selection Non-rare to
Rare transitions
  • Transitions from non-rare to rare BBs are
    identified and marked as rare block entry points
  • For each rare block entry point, live analysis is
    performed to find the set of live variables
  • Generate a new region exit BB (RE-BB) for each
    entry point and replace the original by
    redirecting the control flow to the newly
    generate block
  • RE-BB contains a single instruction (recompile)
    that holds all live variables at the entry point
    as its operands so that all information for OSR
    is available for recompilation

23
Region Based Compilation Partial Inlining
24
Partial Inlining
  • Main Idea
  • Inline all small methods
  • For the others, they are first processed by
    region selection and then they are re-assessed
    based on the reduced code size. If inlinable, the
    inlining is performed only on the non-rare
    portions of the code

25
Partial Inlining (2)
  • During inlining devirtualization of dynamically
    dispatched call sites is also performed. This is
    based on class hierarchy analysis and receiver
    type distribution profile information. This
    determines whether the backup paths generated can
    be removed or not.
  • Extra things done update live variable
    information in the RE-BBs from the previous step

26
Partial Inlining (3)
  • Advantages of coupling inlining with region
    selection
  • Since the inlining is only applied to the
    non-rare regions, we are guaranteed not to waste
    code on rare paths, hence conserving the inlining
    budget
  • Inlining is done after the methods have been
    processed by the intra-method region selection.
    This means that a reduced amount of code is used
    for inlining (again, conserving the inlining
    budget)

27
Other Optimizations
28
Optimizations
  • Partial Dead Code Elimination
  • Partial Escape Analysis

29
Partial Dead Code Elimination
  • Since we have to keep a list of all live
    variables (both stack and heap) for each RE-BB,
    the average lifespan of some of the variables is
    considerably increased when comparing to
    function-based compiling
  • Partial dead code elimination can reduce the
    magnitude of the problem by pushing computations
    that are only live in region exit paths into
    RE-BBs

30
Partial Dead Code Elimination (2)
  • Algorithm
  • Maintain two sets of live variables one from the
    RE-BBs and one from the non-rare paths
  • Using a code motion algorithm, move computations
    that use variables defined in the RE-BB set, but
    not into the other. The computations are copied
    both to the non-rare region and the RE-BB but
    dead code elimination eventually will remove the
    copy from the non-rare region

31
Partial Escape Analysis
  • Escape analysis identifies if an object may
    escape the method or thread. It can be used to
    speed up program considerably as it allows object
    allocation on stack, as opposed to on heap (heap
    allocation is much slower)
  • Usually objects escape from rare code or backup
    paths from devirtualized methods, hence it has
    limited applicability

32
Partial Escape Analysis (2)
  • The analysis is modified to work only for
    non-rare paths (optimistic assumption)
  • Some regions may conclude that a certain
    (parameter) object is non-escaping because of the
    optimistic assumption of not analyzing rare
    paths. Hence, when the execution exits from a
    region boundary we need to recompile all
    (calling) methods that use the summary
    information.
  • Hence, we check whether the arguments are
    included in the list of live variables at any
    region exit point within the method and suppress
    generating the summary information if this is the
    case

33
Benchmarking
  • (Or the end of text-only slides)
  • Except the last slide and the references, of
    course

34
Performance improvement over FBC in several
benchmarks
RBC-noopt no optimization RBC-nopi PEA,
PDCE RBC-full All optimizations
used RBC-offline All optimizations used,
combined with apriori profile information
Or decrease, as in the case of mtrt / noopt
35
Compilation Overhead Time Ratio
36
Code Size Ratio
37
Peak Memory Usage
38
Contributions of this paper
  • The design and implementation of a region-based
    compilation technique in a dynamic (Java)
    compiler
  • Performance analysis of the RBC method proposed
    using a production level JIT compiler as a test
    bed platform (the compiler name used as a test
    bed platform was not disclosed in the paper)

39
The End
  • Any questions?

40
References
  • 1. T. Suganuma, T. Yasue, T. Nakatani A
    Region-Based Compilation Technique for a Java
    Just-In-Time Compiler
  • 2. The HotSpot Server Compiler http//java.sun.com
    /developer/technicalArticles/Networking/HotSpot/on
    stack.html
  • 3. The FLEX Compiler Mailing Listhttp//www.flex-
    compiler.lcs.mit.edu/Harpoon/hypermail/java-dev/01
    52.html
  • 4. Escape Analysis for Java http//citeseer.ist.p
    su.edu/choi99escape.html
  • 5. Suns Java HotSpot Server Compiler
  • http//java.sun.com/products/hotspot/

41
Basic Block Background
  • Each basic block has two sets associated with it
    (whose meaning depends on the type of analysis
    performed at that time) 3
  • The In set, is set of properties that are derived
    from the other basic blocks which are linked to
    this one
  • The Out set is derived from the In set and the
    transfer function defined by the analysis
    performed on the instruction held within this
    basic block
  • The transfer function defines what new properties
    are generated (the Gen set)  and which ones are
    killed (the Kill set) by the instructions passed
    to it through a mapping function that is
    implementation specific.
Write a Comment
User Comments (0)
About PowerShow.com