JITCompilerAssisted Distributed Java Virtual Machine

About This Presentation

Title:

JITCompilerAssisted Distributed Java Virtual Machine

Description:

A distributed Java Virtual Machine (DJVM) consists of a group of ... trampoline frame. Ret addr. frame 0. reg1 - value1. reg2 - value2. jmp restore_point0 ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 27

Provided by: wzz9

Category:

more less

Transcript and Presenter's Notes

Title: JITCompilerAssisted Distributed Java Virtual Machine

1
JIT-Compiler-Assisted Distributed Java Virtual
Machine

Wenzhang Zhu, Cho-Li Wang, Weijian Fang and
Francis C. M. Lau
Department of Computer Science and Information
Systems
The University of Hong Kong
Presented by Cho-Li Wang

2
Outline

Distributed Java Virtual Machine
Design Tradeoffs
Related work
JESSICA2 features
Experimental results
Conclusion future work
A raytracing demo

3
Distributed Java Virtual Machine (DJVM)
import java.util. class worker extends
Thread private long n public worker(long
N) nN public void run() long sum0
for(long i0 iltn i) sumi
System.out.println(Nn Sum"sum) public
class test static final int N100 public
static void main(String args) worker w
new workerN Random r new Random() for
(int i0 iltN i) wi new
worker(r.nextLong()) for (int i0 iltN i)
wi.start() try for (int i0 iltN i)
wi.join() catch (Exception e)
Java thread

A distributed Java Virtual Machine (DJVM)
consists of a group of extended JVMs running on a
distributed environment to support true parallel
execution of a multithreaded Java application.
A DJVM provides all the JVM services, that are
compliant with the Java language specification,
as if running on a single machine Single System
Image (SSI).

(Single System Image)
Bytecode Execution Engine
DJVM
Heap
Class
Thread
JVM
JVM
JVM
JVM
4
Design Tradeoffs of a DJVM

How to manage the threads?
Distributed thread scheduling
Initial placement vs thread migration
How to store the data ?
Distributed heap (object store)
Java memory model (memory consistency)
Can an off-the-shelf DSM be used as the heap?
How to process the bytecode ?
Execution Engine Interpretation, Just-in-Time
(JIT) compilation, Static compilation

Thread Sched
Exec Engine
Heap
5
Related work
Remote Creation
Intr
Embedded OO-based DSM (Proxy)

cJVM (IBM Haifa Research)
Interpreter mode execution
built-in object caching
JAVA/DSM (Rice University)
Interpreter mode execution
Heap built on top of a page-based DSM
JESSICA(HKU)
Thread migration
Interpreter mode execution
Heap built on top of a page-based DSM
Jackal, Hyperion
Static compilation
Link to object-based DSM

Manual Distribution
Intr
Page-based DSM
Transparent Migration
Intr
Page-based DSM
Remote Creation
Static compilation
OO-based DSM
6
JESSICA2 (Java-Enabled Single-System-Image
Computing Architecture)
A Multithreaded Java Program
Thread Migration
JIT Compiler Mode
Portable Java Frame
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
Master
Worker
Worker
Worker
Worker
Worker
Global Object Space
7
JESSICA2 Main Features

Transparent Java thread migration
Runtime capturing and restoring of thread
execution context.
No source code modification no bytecode
instrumentation (preprocessing) no new API
introduced
Enable dynamic load balancing on clusters
JIT compiler-based execution engine (JITEE)
Operated in Just-In-Time (JIT) compilation mode
cluster-aware
Global Object Space
A shared global heap spanning all cluster nodes
Provide location-transparent object access
Adaptive migrating home protocol for memory
consistency, plus various optimizing schemes.
I/O redirection

8
JESSICA2 thread migration (In a JIT-enabled JVM)
RTC Raw Thread Context BTC Bytecode-oriented
Thread Context (thread id, frames, class names,
method signature, PC, Operand stack ptr, local
vars )

Frame parsing
Restore execution

Thread
Frames
(3)
Frames
BTC
RTC
Migration Manager
JVM
Method Area
Frame
PC
RTC
(2)

Stack analysis
Stack capturing

Method Area
Thread Scheduler
PC
Source node
(1) Alert
Transformation of the RTC into the BTC directly
inside the JIT compiler
Destination node
Load Monitor
9
Thread Stack Transformation
Raw Thread Context (RTC)
esp 0x00000000 esp4 0x082ca809 esp8 0x0822
5400 esp12 0x08266bc0 ... eax
0x08623200 ebx 0x08293100
Raw Thread Context (RTC)
esp 0x00000000 esp4 0x082ca809 esp8 0x0822
5400 esp12 0x08266bc0
Stack Restoration
Stack Capturing
Frames method CPIrun()V_at_111 local13stack0 v
ar arg0CPI, 33, 0x8225400 local1 D 33,
0x8266bc0_at_2 local2 int, 2 ...
Bytecode-oriented Thread Context (BTC)
10
Details
Bytecode verifier
Linking Constant Resolution
11
Example of native code instrumentation
12
Optimization on migration points Pseudo-inlining

Purpose eliminate the costs of unnecessary
inserted migration points
General idea delete M-points before a small
method invocation

13
Dynamic Register Patching
Compiled methods
reg1 lt- value1 jmp restore_point1
Method1() ... retore_point1
frame 1
ebp
Ret addr
reg1 lt- value1 reg2 lt- value2 jmp restore_point0
Stack growth
Method0() ... retore_point10
frame 0
ebp
Ret addr
trampoline frame
trampoline
bootstrap frame
bootstrap() trampoline() closing handler()
ebp
14
Advantages of native code instrumentation

Lightweight
Re-use JIT compiler internal data structures and
control flow analysis functions
No need to include debugging information in Java
class files
Instrumented native codes are more efficient than
instrumented bytecode.
Transparent
No source code modification.
No new API introduced.
No preprocessing

15
Global Object Space (GOS)

Provide global heap abstraction for DJVM
Home-based object coherence protocol, compliant
with JVM Memory Model
OO-based to reduce false sharing
Non-blocking communication
Use threaded I/O interface inside JVM for
communication to hide the latency
Adaptive object home migration mechanism
Take advantage of JVM runtime information for
optimization

16
GOS runtime data structure
Master object
Cache object
object header
object header
cache pointer
cache pointer
Cache header
object data
cache data
Master host id master address class cache obj list
thread id status cache data next
Cache data
thread id status cache data next
cache data
17
Experimental environment

HKU Gideon 300 Linux cluster 300 P4 PCs (2GHz,
512 MB RAM, 40 GB disk)
Network 312-port Foundry FastIron 1500
Non-blocking switch (100 Mbits/s)

18
Migration overhead during normal execution
(SPECJVM98 benchmark)
19
Migration overhead analysis
Overall migration latency
Migration time breakdown (LT program)
20
GOS Optimizations (using 4 PCs)
NO No optimizations HS Home migration
Synchronized Method Shipping H Home
migration HSP HS Object pushing
21
JESSICA2 vs JESSICA (CPI)
22
Application benchmark
23
Parallel Ray Tracing (using 64 nodes of Gideon
300 cluster)
Linux 2.4.18-3 kernel (Redhat 7.3) 64 nodes 108
seconds 1 node 4402 seconds ( 1.2 hour) Speedup
4402/10840.75
24
Conclusions

Transparent Java thread migration in JIT compiler
enables the high-performance execution of
multithreaded Java application on clusters
An embedded GOS layer can take advantage of the
JVM runtime information to reduce communication
overhead

25
Future work

Advanced thread migration mechanism without
overhead during normal execution (finished)
Incremental Distributed GC
Enhanced Single I/O Space to benefit more
real-life applications
Parallel I/O Support

JITCompilerAssisted Distributed Java Virtual Machine - PowerPoint PPT Presentation

JITCompilerAssisted Distributed Java Virtual Machine

A distributed Java Virtual Machine (DJVM) consists of a group of ... trampoline frame. Ret addr. frame 0. reg1 - value1. reg2 - value2. jmp restore_point0 ... – PowerPoint PPT presentation