HotSpotTM: A Huge Step Beyond JITs - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

HotSpotTM: A Huge Step Beyond JITs

Description:

Gains of better optimization may not justify extra compile time ... A thread is launched to compile the methods in the hot spots ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 32
Provided by: zhanyo
Category:

less

Transcript and Presenter's Notes

Title: HotSpotTM: A Huge Step Beyond JITs


1
HotSpotTM A Huge Step Beyond JITs
  • Zhanyong Wan
  • May 1st, 2000

2
Sources of Information
  • From Suns web-site
  • HotSpot white paper
  • http//java.sun.com/products/hotspot/whitepaper.h
    tml
  • Various articles on Suns web-site
  • http//java.sun.com/products/hotspot/
  • From other web-sites
  • Java on Steroids Sun's High-Performance Java
    Implementation, U. Hölzle et.al. (slides from
    HotChips IX, August 1997)
  • http//www.cs.ucsb.edu/oocsb/papers/HotChips.pdf
  • The HotSpot Virtual Machine, Bill Venners
  • http//www.artima.com/designtechniques/hotspot.ht
    ml
  • HotSpot A new breed of virtual machine, Eric
    Amstrong
  • http//www.javaworld.com/jw-03-1998/f_jw-03-hotsp
    ot.html

3
Overview
  • Why Java is different
  • Why JIT is not good enough
  • What HotSpot does
  • The HotSpot architecture
  • Memory model
  • Thread model
  • Adaptive optimization
  • Conclusions

4
History
  • 1st generation JVM
  • Purely interpreting
  • 30 - 50 times slower than C
  • 2nd generation JVM
  • JIT compilers
  • 3 - 10 times slower than C
  • Static compilers
  • Better performance than JITs

5
The Future?
  • HotSpot
  • Dynamic, fully optimizing compiler
  • Close-to-C performance
  • May even exceed the speed of C in the future

6
Questions of Interest
  • How is it possible that HotSpot runs programs
    faster than the native code generated by a static
    optimizing Java compiler?
  • How does HotSpot score? (The collection of
    technologies used by HotSpot.)
  • Where did they get the ideas?
  • Which of these technologies also apply in other
    systems (e.g. JIT, static source code/bytecode
    compiler, C)?
  • Can Java be made to surpass the performance of
    C, or is this a hype?

7
Why Java Is Different (to C)
  • Granularity of factoring
  • Smaller classes
  • Smaller methods
  • More frequent calls
  • Standard compiler analysis fails
  • Dynamic dispatch
  • Slower calls for virtual functions
  • Much more frequent than in C
  • Sophisticated run-time system
  • Allocation, garbage collection
  • Threads, synchronization
  • Dynamically changing program
  • Classes loaded/discarded on the fly

8
Why Java Is Different (contd)
  • Distributed in a portable form
  • A compiler can generate optimal machine code for
    a particular processor version
  • e.g. Pentium vs. Pentium II
  • Welcomes dynamic compilation (developed in the
    last decade)!

9
Find the Java Bottleneck
  • Time used in a typical Java program executed w/
    JDK interpreter
  • Allocation/GC 1/6
  • Synchronization 1/6
  • Byte code 2/3
  • Native methods negligible
  • Performance critical code the hot spots

10
Why JIT Is Not Good Enough
  • Compiles on method-by-method basis when a method
    is first invoked
  • Compilation consumes user time
  • Startup latency
  • Dilemma either good code or fast compiler
  • Gains of better optimization may not justify
    extra compile time
  • More concerned w/ generating code quickly than w/
    generating the quickest code
  • Root of problem compilation is too eager

11
The Baaad Way to Optimize
  • People try to help the optimization lore
  • Make methods final or static
  • Large classes/methods
  • Avoid interfaces (interface method invocation
    much slower than regular dynamic method dispatch)
  • Avoid creating lots of short-lived objects
  • Avoid synchronization (very expensive)
  • Against good OO design!
  • Premature optimization is the root of all evil.
    (Donald Knuth)

12
The HotSpot Way to Optimize
  • Optimize only when you know you have a problem
  • A program starts off being interpreted
  • A profiler collects run-time info in the
    background
  • After a while, a set of hot spots is identified
  • A thread is launched to compile the methods in
    the hot spots
  • Execution of the program is not blocked
  • Take your time! fully optimizing
  • Take advantage of the late compilation run-time
    info used
  • Once a method is compiled, it doesnt need to be
    interpreted
  • Native code can be discarded when the hot spots
    change
  • Keeping the footprint small
  • Bytecode is always kept around

13
The HotSpot Way (contd)
  • Tackles each of the bottlenecks
  • Adaptive optimization
  • Fast, accurate garbage collection
  • Fast thread synchronization
  • Performance
  • 2-3 times faster than JITs
  • Comparable to C
  • Most importantly, eliminates the performance
    excuse for poor designs/code

14
The HotSpot Architecture
  • Memory model
  • Thread model
  • Adaptive compiler

15
The HotSpot Memory Model
  • Object references
  • Java 2 SDK as indirect handles
  • Relocating objects made easy
  • A significant performance bottleneck
  • HotSpot as direct pointers
  • A performance boost
  • GC must adjust all reference to an object when it
    is relocated
  • Object headers
  • Java 2 SDK 3-word
  • HotSpot 2-word
  • 2 bits for GC mark (reference count removed?)
  • An 8 savings in heap size

16
Garbage Collection Background
  • GC traditionally considered inefficient
  • Takes 1/6 of the time in an interpreting JVM
  • Even worse in a JIT VM
  • Modern GC technology
  • Performs substantially better than explicit
    freeing
  • How can this be true?
  • Unnecessary copies avoided
  • Memory segmentation, space locality

17
The HotSpot Garbage Collector
  • A high-level GC framework
  • New collection algorithms can be plugged-in
  • Currently has 3 cooperating GC algorithms
  • Major features
  • Fast allocation and reclamation
  • Fully accurate guarantees full memory
    reclamation
  • Completely eliminates memory fragmentation
  • Incremental, no perceivable pauses (usually lt
    10ms)
  • Small memory overhead
  • 2-bit GC mark per object
  • 2-word object header (instead of 3- in Java 2
    SDK)

18
The HotSpot GC Accuracy
  • A partially accurate (conservative) collector
    must
  • Either avoid relocating objects
  • Or use handles to refer indirectly to objects
    (slow)
  • The HotSpot collector
  • Fully accurate
  • All inaccessible objects can be reclaimed
  • All objects can be relocated
  • Eliminates memory fragmentation
  • Increases memory locality

19
The HotSpot GC the Structure
  • Three cooperating collectors
  • A generational copying collector
  • For short-lived objects
  • A mark-compact old object collector
  • For longer-lived objects when the live object set
    is small
  • An incremental pauseless collector
  • For longer-lived objects when the live object set
    is big

20
Generational Copying Collector
  • Observation the vast majority (often gt 95) of
    the objects are very short-lived
  • The way it works
  • A memory area is reserved as an object nursery
  • Allocation is just updating a pointer and
    checking for overflow extremely fast
  • By the time the nursery overflows, most objects
    in it are dead the collector just moves the few
    survivors to the old object memory area

21
Mark-Compact Collector
  • Rare case
  • Triggered by low-memory conditions or
    programmatic requests
  • Time proportional to the size of the set of live
    objects
  • Calls for an incremental collector when the size
    is large

22
Incremental Pauseless Collector
  • An alternative to the mark-compact collector
  • Relatively constant pause time even w/ extremely
    large data set
  • Suitable for server applications and soft-real
    time applications (games, animations)
  • The way it works
  • The train algorithm
  • Breaks up GC pauses into tiny pauses
  • Not a hard-real time algorithm no guarantee for
    upper limit on pause times
  • Side-benefit better memory locality
  • Tends to relocate tightly-coupled objects together

23
The HotSpot Thread Model
  • Native thread support
  • Currently supports Solaris 32bit Windows
  • Preemption
  • Multiprocessing
  • Per-thread activation stack is shared w/ native
    methods
  • Fast calls between C and Java

24
Thread Synchronization
  • takes 1/6 of the time in an interpreting JVM
  • (I think) the proportion can be even higher for a
    JIT
  • HotSpots thread synchronization
  • Ultra-fast (a breakthrough)
  • Constant time for all uncontended (no rival)
    synch
  • Fully scalable to multiprocessor
  • Makes fine-grain synch practical, encouraging
    good OO design

25
Adaptive Inlining
  • Method invocations reduce the effectiveness of
    optimizers
  • Standard optimizers dont perform well across
    method boundaries (need bigger block of code)
  • Inlining is the solution
  • Inlining has problems
  • Increased memory foot-print
  • Inlining is harder w/ OO languages because of
    dynamic dispatching (worse in Java than in C)
  • HotSpot uses run-time information to
  • Inline only the critical methods
  • Limit the set of methods that might be invoked at
    a certain point

26
Dynamic Deoptimization
  • Simple inlining may violate the Java semantics
  • A program can change the patterns of method
    invocation
  • Java program can change on the fly via dynamic
    class loading/discarding
  • Optimizations may become invalid
  • Must be able to deoptimize dynamically!
  • HotSpot can deoptimize (revert back to bytecode?)
    a hot spot even during the execution of the code
    for it.

27
Fully Optimizing Compiler
  • Performs all the classic optimizations
  • Dead code elimination
  • Loop invariant hoisting
  • Common sub-expression elimination
  • Constant propagation
  • And more
  • Java-specific optimizations
  • Null-check elimination
  • Range-check elimination
  • Global graph coloring register allocator
  • Highly portable
  • Relying on a small machine description file

28
Transparent Debugging Profiling Semantics
  • Native code generation optimization fully
    transparent to the programmer
  • Uses two stacks
  • One real, one simulating
  • Overhead of two stacks?
  • Pure bytecode semantics easy debugging
    profiling
  • Question whats the point of a transparent
    profiling semantics?

29
Performance Evaluation
  • Micro-benchmarks not the way
  • No or few method calls/synchronizations
  • Small live data set
  • No correlation w/ real programs
  • Give unrealistic results for HotSpot
  • SPEC JVM98 benchmark
  • The only industry-standard benchmark for Java
  • Predictive of the performance across a number of
    real applications

30
Where are the ideas from?
  • Mostly from the last decades academic work
  • Dynamic compilation
  • Modern GC
  • HotSpot puts them together
  • Academic research is relevant!

31
(My) Conclusions
  • HotSpot is great
  • Many new technologies previously only seen in
    academia
  • Java performance may come close to or exceed the
    current implementation of C
  • However Suns argument that Java can be faster
    than C is not convincing yet
  • C has better control on machine resources
  • Many technologies used in HotSpot can be
    exploited for C as well. Especially
  • Fast synchronization
  • Dynamic compilation
  • Maybe GC (for some dialects of C)
  • Whether Java can exceed C remains to be tested
Write a Comment
User Comments (0)
About PowerShow.com