JITCompilerAssisted Distributed Java Virtual Machine - PowerPoint PPT Presentation

About This Presentation
Title:

JITCompilerAssisted Distributed Java Virtual Machine

Description:

A distributed Java Virtual Machine (DJVM) consists of a group of ... trampoline frame. Ret addr. frame 0. reg1 - value1. reg2 - value2. jmp restore_point0 ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 27
Provided by: wzz9
Category:

less

Transcript and Presenter's Notes

Title: JITCompilerAssisted Distributed Java Virtual Machine


1
JIT-Compiler-Assisted Distributed Java Virtual
Machine
  • Wenzhang Zhu, Cho-Li Wang, Weijian Fang and
    Francis C. M. Lau
  • Department of Computer Science and Information
    Systems
  • The University of Hong Kong
  • Presented by Cho-Li Wang

2
Outline
  • Distributed Java Virtual Machine
  • Design Tradeoffs
  • Related work
  • JESSICA2 features
  • Experimental results
  • Conclusion future work
  • A raytracing demo

3
Distributed Java Virtual Machine (DJVM)
import java.util. class worker extends
Thread private long n public worker(long
N) nN public void run() long sum0
for(long i0 iltn i) sumi
System.out.println(Nn Sum"sum) public
class test static final int N100 public
static void main(String args) worker w
new workerN Random r new Random() for
(int i0 iltN i) wi new
worker(r.nextLong()) for (int i0 iltN i)
wi.start() try for (int i0 iltN i)
wi.join() catch (Exception e)
Java thread
  • A distributed Java Virtual Machine (DJVM)
    consists of a group of extended JVMs running on a
    distributed environment to support true parallel
    execution of a multithreaded Java application.
  • A DJVM provides all the JVM services, that are
    compliant with the Java language specification,
    as if running on a single machine Single System
    Image (SSI).

(Single System Image)
Bytecode Execution Engine
DJVM
Heap
Class
Thread
JVM
JVM
JVM
JVM
4
Design Tradeoffs of a DJVM
  • How to manage the threads?
  • Distributed thread scheduling
  • Initial placement vs thread migration
  • How to store the data ?
  • Distributed heap (object store)
  • Java memory model (memory consistency)
  • Can an off-the-shelf DSM be used as the heap?
  • How to process the bytecode ?
  • Execution Engine Interpretation, Just-in-Time
    (JIT) compilation, Static compilation

Thread Sched
Exec Engine
Heap
5
Related work
Remote Creation
Intr
Embedded OO-based DSM (Proxy)
  • cJVM (IBM Haifa Research)
  • Interpreter mode execution
  • built-in object caching
  • JAVA/DSM (Rice University)
  • Interpreter mode execution
  • Heap built on top of a page-based DSM
  • JESSICA(HKU)
  • Thread migration
  • Interpreter mode execution
  • Heap built on top of a page-based DSM
  • Jackal, Hyperion
  • Static compilation
  • Link to object-based DSM

Manual Distribution
Intr
Page-based DSM
Transparent Migration
Intr
Page-based DSM
Remote Creation
Static compilation
OO-based DSM
6
JESSICA2 (Java-Enabled Single-System-Image
Computing Architecture)
A Multithreaded Java Program
Thread Migration
JIT Compiler Mode
Portable Java Frame
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
JESSICA2 JVM
Master
Worker
Worker
Worker
Worker
Worker
Global Object Space
7
JESSICA2 Main Features
  • Transparent Java thread migration
  • Runtime capturing and restoring of thread
    execution context.
  • No source code modification no bytecode
    instrumentation (preprocessing) no new API
    introduced
  • Enable dynamic load balancing on clusters
  • JIT compiler-based execution engine (JITEE)
  • Operated in Just-In-Time (JIT) compilation mode
  • cluster-aware
  • Global Object Space
  • A shared global heap spanning all cluster nodes
  • Provide location-transparent object access
  • Adaptive migrating home protocol for memory
    consistency, plus various optimizing schemes.
  • I/O redirection

8
JESSICA2 thread migration (In a JIT-enabled JVM)
RTC Raw Thread Context BTC Bytecode-oriented
Thread Context (thread id, frames, class names,
method signature, PC, Operand stack ptr, local
vars )
  • Frame parsing
  • Restore execution

Thread
Frames
(3)
Frames
BTC
RTC
Migration Manager
JVM
Method Area
Frame
PC
RTC
(2)
  • Stack analysis
  • Stack capturing

Method Area
Thread Scheduler
PC
Source node
(1) Alert
Transformation of the RTC into the BTC directly
inside the JIT compiler
Destination node
Load Monitor
9
Thread Stack Transformation
Raw Thread Context (RTC)
esp 0x00000000 esp4 0x082ca809 esp8 0x0822
5400 esp12 0x08266bc0 ... eax
0x08623200 ebx 0x08293100
Raw Thread Context (RTC)
esp 0x00000000 esp4 0x082ca809 esp8 0x0822
5400 esp12 0x08266bc0
Stack Restoration
Stack Capturing
Frames method CPIrun()V_at_111 local13stack0 v
ar arg0CPI, 33, 0x8225400 local1 D 33,
0x8266bc0_at_2 local2 int, 2 ...
Bytecode-oriented Thread Context (BTC)
10
Details
Bytecode verifier
Linking Constant Resolution
11
Example of native code instrumentation
12
Optimization on migration points Pseudo-inlining
  • Purpose eliminate the costs of unnecessary
    inserted migration points
  • General idea delete M-points before a small
    method invocation

13
Dynamic Register Patching
Compiled methods
reg1 lt- value1 jmp restore_point1
Method1() ... retore_point1
frame 1
ebp
Ret addr
reg1 lt- value1 reg2 lt- value2 jmp restore_point0
Stack growth
Method0() ... retore_point10
frame 0
ebp
Ret addr
trampoline frame
trampoline
bootstrap frame
bootstrap() trampoline() closing handler()
ebp
14
Advantages of native code instrumentation
  • Lightweight
  • Re-use JIT compiler internal data structures and
    control flow analysis functions
  • No need to include debugging information in Java
    class files
  • Instrumented native codes are more efficient than
    instrumented bytecode.
  • Transparent
  • No source code modification.
  • No new API introduced.
  • No preprocessing

15
Global Object Space (GOS)
  • Provide global heap abstraction for DJVM
  • Home-based object coherence protocol, compliant
    with JVM Memory Model
  • OO-based to reduce false sharing
  • Non-blocking communication
  • Use threaded I/O interface inside JVM for
    communication to hide the latency
  • Adaptive object home migration mechanism
  • Take advantage of JVM runtime information for
    optimization

16
GOS runtime data structure
Master object
Cache object
object header
object header
cache pointer
cache pointer
Cache header
object data
cache data
Master host id master address class cache obj list
thread id status cache data next
Cache data
thread id status cache data next
cache data
17
Experimental environment
  • HKU Gideon 300 Linux cluster 300 P4 PCs (2GHz,
    512 MB RAM, 40 GB disk)
  • Network 312-port Foundry FastIron 1500
    Non-blocking switch (100 Mbits/s)

18
Migration overhead during normal execution
(SPECJVM98 benchmark)
19
Migration overhead analysis
Overall migration latency
Migration time breakdown (LT program)
20
GOS Optimizations (using 4 PCs)
NO No optimizations HS Home migration
Synchronized Method Shipping H Home
migration HSP HS Object pushing
21
JESSICA2 vs JESSICA (CPI)
22
Application benchmark
23
Parallel Ray Tracing (using 64 nodes of Gideon
300 cluster)
Linux 2.4.18-3 kernel (Redhat 7.3) 64 nodes 108
seconds 1 node 4402 seconds ( 1.2 hour) Speedup
4402/10840.75
24
Conclusions
  • Transparent Java thread migration in JIT compiler
    enables the high-performance execution of
    multithreaded Java application on clusters
  • An embedded GOS layer can take advantage of the
    JVM runtime information to reduce communication
    overhead

25
Future work
  • Advanced thread migration mechanism without
    overhead during normal execution (finished)
  • Incremental Distributed GC
  • Enhanced Single I/O Space to benefit more
    real-life applications
  • Parallel I/O Support

26
Thanks
  • JESSICA2 Webpage
  • http//www.csis.hku.hk/clwang/projects/JESSICA2.h
    tml
Write a Comment
User Comments (0)
About PowerShow.com