Title: JITCompilerAssisted Distributed Java Virtual Machine
1JIT-Compiler-Assisted Distributed Java Virtual
Machine
- Wenzhang Zhu, Cho-Li Wang, Weijian Fang and
Francis C. M. Lau - Department of Computer Science and Information
Systems - The University of Hong Kong
- Presented by Cho-Li Wang
2Outline
- Distributed Java Virtual Machine
- Design Tradeoffs
- Related work
- JESSICA2 features
- Experimental results
- Conclusion future work
- A raytracing demo
3Distributed Java Virtual Machine (DJVM)
import java.util. class worker extends
Thread private long n public worker(long
N) nN public void run() long sum0
for(long i0 iltn i) sumi
System.out.println(Nn Sum"sum) public
class test static final int N100 public
static void main(String args) worker w
new workerN Random r new Random() for
(int i0 iltN i) wi new
worker(r.nextLong()) for (int i0 iltN i)
wi.start() try for (int i0 iltN i)
wi.join() catch (Exception e)
Java thread
- A distributed Java Virtual Machine (DJVM)
consists of a group of extended JVMs running on a
distributed environment to support true parallel
execution of a multithreaded Java application. - A DJVM provides all the JVM services, that are
compliant with the Java language specification,
as if running on a single machine Single System
Image (SSI).
(Single System Image)
Bytecode Execution Engine
DJVM
Heap
Class
Thread
JVM
JVM
JVM
JVM
4Design Tradeoffs of a DJVM
- How to manage the threads?
- Distributed thread scheduling
- Initial placement vs thread migration
- How to store the data ?
- Distributed heap (object store)
- Java memory model (memory consistency)
- Can an off-the-shelf DSM be used as the heap?
- How to process the bytecode ?
- Execution Engine Interpretation, Just-in-Time
(JIT) compilation, Static compilation
Thread Sched
Exec Engine
Heap
5Related work
Remote Creation
Intr
Embedded OO-based DSM (Proxy)
- cJVM (IBM Haifa Research)
- Interpreter mode execution
- built-in object caching
- JAVA/DSM (Rice University)
- Interpreter mode execution
- Heap built on top of a page-based DSM
- JESSICA(HKU)
- Thread migration
- Interpreter mode execution
- Heap built on top of a page-based DSM
- Jackal, Hyperion
- Static compilation
- Link to object-based DSM
Manual Distribution
Intr
Page-based DSM
Transparent Migration
Intr
Page-based DSM
Remote Creation
Static compilation
OO-based DSM
6JESSICA2 (Java-Enabled Single-System-Image
Computing Architecture)
A Multithreaded Java Program
Thread Migration
JIT Compiler Mode
Portable Java Frame
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
Master
Worker
Worker
Worker
Worker
Worker
Global Object Space
7JESSICA2 Main Features
- Transparent Java thread migration
- Runtime capturing and restoring of thread
execution context. - No source code modification no bytecode
instrumentation (preprocessing) no new API
introduced - Enable dynamic load balancing on clusters
- JIT compiler-based execution engine (JITEE)
- Operated in Just-In-Time (JIT) compilation mode
- cluster-aware
- Global Object Space
- A shared global heap spanning all cluster nodes
- Provide location-transparent object access
- Adaptive migrating home protocol for memory
consistency, plus various optimizing schemes. - I/O redirection
8JESSICA2 thread migration (In a JIT-enabled JVM)
RTC Raw Thread Context BTC Bytecode-oriented
Thread Context (thread id, frames, class names,
method signature, PC, Operand stack ptr, local
vars )
- Frame parsing
- Restore execution
Thread
Frames
(3)
Frames
BTC
RTC
Migration Manager
JVM
Method Area
Frame
PC
RTC
(2)
- Stack analysis
- Stack capturing
Method Area
Thread Scheduler
PC
Source node
(1) Alert
Transformation of the RTC into the BTC directly
inside the JIT compiler
Destination node
Load Monitor
9Thread Stack Transformation
Raw Thread Context (RTC)
esp 0x00000000 esp4 0x082ca809 esp8 0x0822
5400 esp12 0x08266bc0 ... eax
0x08623200 ebx 0x08293100
Raw Thread Context (RTC)
esp 0x00000000 esp4 0x082ca809 esp8 0x0822
5400 esp12 0x08266bc0
Stack Restoration
Stack Capturing
Frames method CPIrun()V_at_111 local13stack0 v
ar arg0CPI, 33, 0x8225400 local1 D 33,
0x8266bc0_at_2 local2 int, 2 ...
Bytecode-oriented Thread Context (BTC)
10Details
Bytecode verifier
Linking Constant Resolution
11Example of native code instrumentation
12Optimization on migration points Pseudo-inlining
- Purpose eliminate the costs of unnecessary
inserted migration points - General idea delete M-points before a small
method invocation
13Dynamic Register Patching
Compiled methods
reg1 lt- value1 jmp restore_point1
Method1() ... retore_point1
frame 1
ebp
Ret addr
reg1 lt- value1 reg2 lt- value2 jmp restore_point0
Stack growth
Method0() ... retore_point10
frame 0
ebp
Ret addr
trampoline frame
trampoline
bootstrap frame
bootstrap() trampoline() closing handler()
ebp
14Advantages of native code instrumentation
- Lightweight
- Re-use JIT compiler internal data structures and
control flow analysis functions - No need to include debugging information in Java
class files - Instrumented native codes are more efficient than
instrumented bytecode. - Transparent
- No source code modification.
- No new API introduced.
- No preprocessing
15Global Object Space (GOS)
- Provide global heap abstraction for DJVM
- Home-based object coherence protocol, compliant
with JVM Memory Model - OO-based to reduce false sharing
- Non-blocking communication
- Use threaded I/O interface inside JVM for
communication to hide the latency - Adaptive object home migration mechanism
- Take advantage of JVM runtime information for
optimization
16GOS runtime data structure
Master object
Cache object
object header
object header
cache pointer
cache pointer
Cache header
object data
cache data
Master host id master address class cache obj list
thread id status cache data next
Cache data
thread id status cache data next
cache data
17Experimental environment
- HKU Gideon 300 Linux cluster 300 P4 PCs (2GHz,
512 MB RAM, 40 GB disk) - Network 312-port Foundry FastIron 1500
Non-blocking switch (100 Mbits/s)
18Migration overhead during normal execution
(SPECJVM98 benchmark)
19Migration overhead analysis
Overall migration latency
Migration time breakdown (LT program)
20GOS Optimizations (using 4 PCs)
NO No optimizations HS Home migration
Synchronized Method Shipping H Home
migration HSP HS Object pushing
21JESSICA2 vs JESSICA (CPI)
22Application benchmark
23Parallel Ray Tracing (using 64 nodes of Gideon
300 cluster)
Linux 2.4.18-3 kernel (Redhat 7.3) 64 nodes 108
seconds 1 node 4402 seconds ( 1.2 hour) Speedup
4402/10840.75
24Conclusions
- Transparent Java thread migration in JIT compiler
enables the high-performance execution of
multithreaded Java application on clusters - An embedded GOS layer can take advantage of the
JVM runtime information to reduce communication
overhead
25Future work
- Advanced thread migration mechanism without
overhead during normal execution (finished) - Incremental Distributed GC
- Enhanced Single I/O Space to benefit more
real-life applications - Parallel I/O Support
26Thanks
- JESSICA2 Webpage
- http//www.csis.hku.hk/clwang/projects/JESSICA2.h
tml