Title: HotSpotTM: A Huge Step Beyond JITs
1HotSpotTM A Huge Step Beyond JITs
- Zhanyong Wan
- May 1st, 2000
2Sources of Information
- From Suns web-site
- HotSpot white paper
- http//java.sun.com/products/hotspot/whitepaper.h
tml - Various articles on Suns web-site
- http//java.sun.com/products/hotspot/
- From other web-sites
- Java on Steroids Sun's High-Performance Java
Implementation, U. Hölzle et.al. (slides from
HotChips IX, August 1997) - http//www.cs.ucsb.edu/oocsb/papers/HotChips.pdf
- The HotSpot Virtual Machine, Bill Venners
- http//www.artima.com/designtechniques/hotspot.ht
ml - HotSpot A new breed of virtual machine, Eric
Amstrong - http//www.javaworld.com/jw-03-1998/f_jw-03-hotsp
ot.html
3Overview
- Why Java is different
- Why JIT is not good enough
- What HotSpot does
- The HotSpot architecture
- Memory model
- Thread model
- Adaptive optimization
- Conclusions
4History
- 1st generation JVM
- Purely interpreting
- 30 - 50 times slower than C
- 2nd generation JVM
- JIT compilers
- 3 - 10 times slower than C
- Static compilers
- Better performance than JITs
5The Future?
- HotSpot
- Dynamic, fully optimizing compiler
- Close-to-C performance
- May even exceed the speed of C in the future
6Questions of Interest
- How is it possible that HotSpot runs programs
faster than the native code generated by a static
optimizing Java compiler? - How does HotSpot score? (The collection of
technologies used by HotSpot.) - Where did they get the ideas?
- Which of these technologies also apply in other
systems (e.g. JIT, static source code/bytecode
compiler, C)? - Can Java be made to surpass the performance of
C, or is this a hype?
7Why Java Is Different (to C)
- Granularity of factoring
- Smaller classes
- Smaller methods
- More frequent calls
- Standard compiler analysis fails
- Dynamic dispatch
- Slower calls for virtual functions
- Much more frequent than in C
- Sophisticated run-time system
- Allocation, garbage collection
- Threads, synchronization
- Dynamically changing program
- Classes loaded/discarded on the fly
8Why Java Is Different (contd)
- Distributed in a portable form
- A compiler can generate optimal machine code for
a particular processor version - e.g. Pentium vs. Pentium II
- Welcomes dynamic compilation (developed in the
last decade)!
9Find the Java Bottleneck
- Time used in a typical Java program executed w/
JDK interpreter - Allocation/GC 1/6
- Synchronization 1/6
- Byte code 2/3
- Native methods negligible
- Performance critical code the hot spots
10Why JIT Is Not Good Enough
- Compiles on method-by-method basis when a method
is first invoked - Compilation consumes user time
- Startup latency
- Dilemma either good code or fast compiler
- Gains of better optimization may not justify
extra compile time - More concerned w/ generating code quickly than w/
generating the quickest code - Root of problem compilation is too eager
11The Baaad Way to Optimize
- People try to help the optimization lore
- Make methods final or static
- Large classes/methods
- Avoid interfaces (interface method invocation
much slower than regular dynamic method dispatch) - Avoid creating lots of short-lived objects
- Avoid synchronization (very expensive)
- Against good OO design!
- Premature optimization is the root of all evil.
(Donald Knuth)
12The HotSpot Way to Optimize
- Optimize only when you know you have a problem
- A program starts off being interpreted
- A profiler collects run-time info in the
background - After a while, a set of hot spots is identified
- A thread is launched to compile the methods in
the hot spots - Execution of the program is not blocked
- Take your time! fully optimizing
- Take advantage of the late compilation run-time
info used - Once a method is compiled, it doesnt need to be
interpreted - Native code can be discarded when the hot spots
change - Keeping the footprint small
- Bytecode is always kept around
13The HotSpot Way (contd)
- Tackles each of the bottlenecks
- Adaptive optimization
- Fast, accurate garbage collection
- Fast thread synchronization
- Performance
- 2-3 times faster than JITs
- Comparable to C
- Most importantly, eliminates the performance
excuse for poor designs/code
14The HotSpot Architecture
- Memory model
- Thread model
- Adaptive compiler
15The HotSpot Memory Model
- Object references
- Java 2 SDK as indirect handles
- Relocating objects made easy
- A significant performance bottleneck
- HotSpot as direct pointers
- A performance boost
- GC must adjust all reference to an object when it
is relocated - Object headers
- Java 2 SDK 3-word
- HotSpot 2-word
- 2 bits for GC mark (reference count removed?)
- An 8 savings in heap size
16Garbage Collection Background
- GC traditionally considered inefficient
- Takes 1/6 of the time in an interpreting JVM
- Even worse in a JIT VM
- Modern GC technology
- Performs substantially better than explicit
freeing - How can this be true?
- Unnecessary copies avoided
- Memory segmentation, space locality
17The HotSpot Garbage Collector
- A high-level GC framework
- New collection algorithms can be plugged-in
- Currently has 3 cooperating GC algorithms
- Major features
- Fast allocation and reclamation
- Fully accurate guarantees full memory
reclamation - Completely eliminates memory fragmentation
- Incremental, no perceivable pauses (usually lt
10ms) - Small memory overhead
- 2-bit GC mark per object
- 2-word object header (instead of 3- in Java 2
SDK)
18The HotSpot GC Accuracy
- A partially accurate (conservative) collector
must - Either avoid relocating objects
- Or use handles to refer indirectly to objects
(slow) - The HotSpot collector
- Fully accurate
- All inaccessible objects can be reclaimed
- All objects can be relocated
- Eliminates memory fragmentation
- Increases memory locality
19The HotSpot GC the Structure
- Three cooperating collectors
- A generational copying collector
- For short-lived objects
- A mark-compact old object collector
- For longer-lived objects when the live object set
is small - An incremental pauseless collector
- For longer-lived objects when the live object set
is big
20Generational Copying Collector
- Observation the vast majority (often gt 95) of
the objects are very short-lived - The way it works
- A memory area is reserved as an object nursery
- Allocation is just updating a pointer and
checking for overflow extremely fast - By the time the nursery overflows, most objects
in it are dead the collector just moves the few
survivors to the old object memory area
21Mark-Compact Collector
- Rare case
- Triggered by low-memory conditions or
programmatic requests - Time proportional to the size of the set of live
objects - Calls for an incremental collector when the size
is large
22Incremental Pauseless Collector
- An alternative to the mark-compact collector
- Relatively constant pause time even w/ extremely
large data set - Suitable for server applications and soft-real
time applications (games, animations) - The way it works
- The train algorithm
- Breaks up GC pauses into tiny pauses
- Not a hard-real time algorithm no guarantee for
upper limit on pause times - Side-benefit better memory locality
- Tends to relocate tightly-coupled objects together
23The HotSpot Thread Model
- Native thread support
- Currently supports Solaris 32bit Windows
- Preemption
- Multiprocessing
- Per-thread activation stack is shared w/ native
methods - Fast calls between C and Java
24Thread Synchronization
- takes 1/6 of the time in an interpreting JVM
- (I think) the proportion can be even higher for a
JIT - HotSpots thread synchronization
- Ultra-fast (a breakthrough)
- Constant time for all uncontended (no rival)
synch - Fully scalable to multiprocessor
- Makes fine-grain synch practical, encouraging
good OO design
25Adaptive Inlining
- Method invocations reduce the effectiveness of
optimizers - Standard optimizers dont perform well across
method boundaries (need bigger block of code) - Inlining is the solution
- Inlining has problems
- Increased memory foot-print
- Inlining is harder w/ OO languages because of
dynamic dispatching (worse in Java than in C) - HotSpot uses run-time information to
- Inline only the critical methods
- Limit the set of methods that might be invoked at
a certain point
26Dynamic Deoptimization
- Simple inlining may violate the Java semantics
- A program can change the patterns of method
invocation - Java program can change on the fly via dynamic
class loading/discarding - Optimizations may become invalid
- Must be able to deoptimize dynamically!
- HotSpot can deoptimize (revert back to bytecode?)
a hot spot even during the execution of the code
for it.
27Fully Optimizing Compiler
- Performs all the classic optimizations
- Dead code elimination
- Loop invariant hoisting
- Common sub-expression elimination
- Constant propagation
- And more
- Java-specific optimizations
- Null-check elimination
- Range-check elimination
- Global graph coloring register allocator
- Highly portable
- Relying on a small machine description file
28Transparent Debugging Profiling Semantics
- Native code generation optimization fully
transparent to the programmer - Uses two stacks
- One real, one simulating
- Overhead of two stacks?
- Pure bytecode semantics easy debugging
profiling - Question whats the point of a transparent
profiling semantics?
29Performance Evaluation
- Micro-benchmarks not the way
- No or few method calls/synchronizations
- Small live data set
- No correlation w/ real programs
- Give unrealistic results for HotSpot
- SPEC JVM98 benchmark
- The only industry-standard benchmark for Java
- Predictive of the performance across a number of
real applications
30Where are the ideas from?
- Mostly from the last decades academic work
- Dynamic compilation
- Modern GC
- HotSpot puts them together
- Academic research is relevant!
31(My) Conclusions
- HotSpot is great
- Many new technologies previously only seen in
academia - Java performance may come close to or exceed the
current implementation of C - However Suns argument that Java can be faster
than C is not convincing yet - C has better control on machine resources
- Many technologies used in HotSpot can be
exploited for C as well. Especially - Fast synchronization
- Dynamic compilation
- Maybe GC (for some dialects of C)
- Whether Java can exceed C remains to be tested