Title: The Java Virtual Machine Internal Architecture and Function
1The Java Virtual MachineInternal Architecture
and Function
2Contents
- Overview
- The Architecture
- Class Loader Subsystem
- Method Area
- Method Tables
- The Heap
- Object and Class Data Representation
- Object Representation
- Local Variables Representation
- Resolution, Exceptions, and Abrupt Method
Completion - Execution Engine and Execution Techniques
- The Instruction Set
- Native methods interaction
- Execution Order and Optimizations
- Summary
3Overview
- The Java Virtual Machine is the environment for
running Java programs. It is called a virtual
machine, because it is an abstract computer
defined by a specification. - Can be defined in three ways
- Abstract Specification
- Concrete Implementation
- Runtime Instance
- Each Java application runs in its own virtual
machine, and has exclusive access to the
structures created by the virtual machine to
accommodate its runtime.
4The Architecture
- JVM Architecture is based on subsystems, memory
areas, data types and instructions organized as - Class Loader Subsystem
- mechanism for loading types, classes and
interfaces given fully qualified names. - Execution Engine
- mechanism responsible for execution of the class
and method instructions. - Runtime Data Areas
- organized memory unit used to store bytecodes
loaded from class files, object instances, method
parameters, return values, local variables and
intermediate results. - Native Method Interface
- not always publicly available to the programmer.
Some implementations hide this part, while other
try to emphasize it and make code optimization
more feasible. - Note There are no general purpose registers,
instead JVM uses a stack to simulate
register-like operations
5(No Transcript)
6Class Loader Subsystem
- Responsible with
- Loading finding and importing binary data for
each type - Linking verification, preparation, resolution
- Initializing invoking java code that performs
initialization - Two kinds of loaders
- Bootstrap class loader part of the virtual
machine implementation - User-defined class loader part of the running
application - Classes loaded by each class loader are placed in
separate namespaces - Loaders must be able to recognize and load
classes stored in files that conform to Java
compiled class format
7- Bootstrap class loader
- Loads trusted classes including the Java API
- Is unique and has its own namespace
- User-defined class loader
- Is not necessarily unique
- Inherits four gateway methods into the JVM, the
most important is resolveClass() which accepts a
reference to a heap object and can dynamically
determine its type - By using namespaces, JVM can load multiple types
with the same fully qualified name through
different loaders - When it resolves symbolic references from one
class to another, it requests the referenced
class from the same loader that imported the
referencing class
8Method Area
- Properties
- Complex data area
- Stores information about a loaded type
- Methods of instantiated objects are kept in the
method area in fact, not in the heap along with
other objects content - Shared among all running threads (thread safe)
- when two threads request the same type, only one
of the requests will actually load the type while
the other is waiting - Not fixed in size
- Garbage collected
- The idea here is slightly different from
collecting unreferenced objects
9- Contents
- Basic Information
- Fully qualified name
- Relationship to the superclass
- Type modifiers (public, abstract, final, etc)
- Advanced Information
- Constant Pool
- Ordered set of constants used by type,
literals, symbolic references to types, fields
and methods. It plays a major role in dynamic
linking. Entries referenced by index much like
elements of an array - Field Information
- Method Information
- Static variables
- Static variables of a loaded class must
retain changes across multiple calls. The fact
that two related classes are in the same
namespace ensures that subsequent accesses to
static variables is not memory-less - References to ClassLoader and Class
10Method Tables
- A method table is an array of direct
references to all the instance methods that may
be invoked on a class instance, including the
inherited methods. - Properties
- Allows the virtual machine quick access to
instance methods - Each instantiated object will have a reference to
the method table associated with the class - In conjunction with information stored in the
heap, plays an important role in dynamic linking
and polymorphism
11The Heap
- Is the location where all class instances and
arrays (which are also viewed as objects) are
instantiated. - Properties
- One common heap for each running instance of the
JVM - The JVM has an instruction for allocating space
on the heap, but has no explicit instruction for
de-allocating space. Just as an object cannot be
freed in Java code, it cannot be freed explicitly
in virtual machine code either - The garbage collector is solely responsible for
eliminating unreferenced objects from the heap
12Object and Class Data Representation
- Object Representation
- The JVM specification is not strict about object
representation - Given an object instance, the JVM must be able to
quickly locate the instance data and the class
data - Memory allocated for an object in the heap must
contain a pointer into the method area, where the
class data is stored. - Two most important models are presented next
- Arrays are represented as objects
- Local variables representation
- Local variables are stored in the Java stack
frame associated with each method - Each running thread gets its own Java stack, and
each method has an active method frame onto the
stack of the thread context in which it is called - Passing parameters is done through the Java stack
- The storage size is one entry for int, float,
reference and returnAddress - The storage size is two entries for long and
double, called with the address of the first
entry - Note Variables of type byte, short and char
are stored as int on the Java stack. The boolean
type is not directly supported by the JVM, it is
translated into int
13Object Representation (model 1)
- Divide the heap in two parts the handle pool and
the object pool - An object reference is a native pointer to a
handle pool - Each handle pool has two entries
- A pointer to instance data (in the heap)
- A pointer to class data (in the method area)
- Advantage
- Prevents fragmentation. When an object is moved,
only one pointer needs to be changed - Disadvantage
- Each time a referencing is made, in fact the
virtual machine must dereference two pointers.
One to the handle and another one to the data
14(No Transcript)
15Object Representation (model 2)
- An object reference is a native pointer to a
bundle of data that contains the object instance
data and a pointer to class data - Advantage
- Dereferencing only once to access the instance
data from the object referencing native pointer. - Disadvantage
- Moving objects to prevent fragmentation becomes
more complicated. When the Java virtual machine
moves an object into the heap it must update
every reference to that object anywhere in the
runtime area where it is used.
16(No Transcript)
17- Object Representation and Casts
- The main reason the JVM needs to get access from
an object reference to the class data is for
resolving attempts to perform casts - It must check to see if the type being cast to
is - Either the actual type of the object the cast
is allowed instantly - Or a type of its ancestors the procedure
involves checking all superclasses up the class
inheritance tree - Note
- Earlier we noted that an object must have a
reference to its super class data. Imagine a cast
this way. The Java virtual machine attempts a
cast. If the objects real type is the same as
the type being cast to, then the cast is allowed
instantly. If the two do not match, the Java
virtual machine can follow the reference to its
superclass. It will then check again the type
cast consistency and so on up to the Object
class. A successful type cast looks like a direct
path between the actual type and the cast type up
the class tree.
18Most Common Model (2)
19Similarities and Differences Java vs. C
- Java object representation is somewhat similar to
VTBL structure in C. - In Java, the objects are represented by instance
data and a pointer to class data (and implicitly
method table) - In C, the objects are represented by instance
data and an array of pointers to any virtual
functions that can be invoked on the object - The main difference between Java objects and C
objects is that, while in C the functions are
not predominantly virtual, in Java they always
act like virtual. - If Java would adopt the same layout as VTBL of
C, then it would need to store (redundantly)
pointers to all instance methods - Java can accomplish the same results by only
storing one pointer to the class data - It hurts performance (by dereferencing twice),
but it saves on memory.
20Array Representation
- Java arrays are objects, they are stored in the
heap, and they are associated with a class type - Example A one dimensional array of int
elements and a two dimensional array of int
elements have different class types. Symbolically
they are represented as I and I. A two
dimendsional array of objects would be
symbolically represented as Ljava.lang.object - Multidimensional arrays are represented as arrays
of arrays, thus some array elements can be
considered themselves compatible for other array
type assignments or casts - The length of an array or any of its dimensions
does not determine the type of the array, it is
only an instance data (field)
21Array Representation
22Local Variables Representation
- Local variables can be represented in any order
by the compilers inside the Java stack frame
associated with a method - Some locations on the stack can be reused for
local variables that temporarily go out of scope. - Parameters are also passed using the Java stack,
and they are pushed onto the stack in the order
they are encountered from left to right - There is one important difference between the
Java stack frames of class methods (static) and
instance methods - The instance method has in its first entry of
the stack a reference corresponding to the hidden
this, used to access the instance data in the
heap associated with the invoking object
23- class Example
-
- public static int runClassMethod(int i, long l,
float f, double d, Object o, byte b) -
- return 0
-
- public int runInstanceMethod(char c, double d,
short s, boolean b) -
- return 0
-
-
24Resolution, Exceptions, and Abrupt Method
Completion
- Resolution
- The references to types, fields and methods in
the constant pool are initially symbolic. - When the JVM needs to refer to either one, they
are still in symbolic form, and the virtual
machine needs to perform a resolution - Resolutions are performed using data from the
Method Area together with information obtained
from the class loaders - Exceptions
- JVM uses exception tables to handle exceptions
- Exception table entries consist of ranges within
the bytecode of a method that are protected under
a certain exception - Entries contain a starting and ending point, and
also a pointer to the exception handler - Abrupt Method Completion
- Every unmatched exception causes an abrupt method
completion - The JVM uses the Java frame data in the
processing of abrupt method completion to restore
the stack, set the exception message, and
terminate the running program
25Execution Engines and Execution Techniques
- The Execution Engine is part of the core of any
JVM. Its specification is made up of the
instruction set and what the implementation
should no, not how it should do it. - Possible implementations can interpret,
just-in-time compile, natively execute, or a
combination of these - Each thread of a running Java application is a
distinct instance of the execution engine in
action - Important aspects
- The Instruction Set
- Native Method Interaction
- Execution Order and Optimization
26The Instruction Set
- Each instruction is a one-byte opcode followed by
zero or more operands - The opcode indicates the operation and the
operands supply the data needed by the JVM to
complete the operation. Information about how
many operands are needed is built in the nature
of the opcode itself - The execution engine processes one opcode at a
time - When running, the execution engine has direct
access to the current constant pool, current
frame, and current operand stack - The operand stack is part of the Java stack,
organized as an array of words, accessed solely
by push and pop operations, and used as a
workspace to perform stack based register-like
operations. - All instructions in the JVM are associated with
mnemonics. The listing of a class file can
produce an assembly-like language file - To be able to understand how the JVM works, we
can look inside a class file using the javap
program distributed with any Java 2 SDK
27Example A class method, primitive types
28Example B class method, object types
29Example C instance method, primitive types
30Native Methods Interaction
- It is possible for the execution engine to be
requested a native method - Depending on the implementation of the virtual
machine it may or may not be able to invoke
native methods - The implementations that allow it, provide an
interface (JNI). The execution engine must be
able to invoke a native method, wait idle until
the native method returns, and then continue the
execution of bytecodes. It also must be able to
deal with exceptions that come from the native
method - There is a layer of complexity added to this
running schema, because the native methods
themselves need to be able to access information
in the JVM while running native code
31Execution Order and Optimizations
- Execution Order
- Execution engines are responsible to determining
the next instruction to be executed - Generally the flow is straightforward, most
instructions are executed in order - Instructions like goto and return use data to
specify the next instruction - The only abnormal paths of execution are in the
case of exception handling - Optimizations
- Interpretation first generation JVM
- Just-in-time compilation second generation JVM
- Adaptive optimization contemporary trend
- Native execution A form of JIT and Adaptive
Optimization
32Adaptive Optimization
- Implemented by most modern versions of the JVM,
like Suns Hotspot virtual machine - The advantages of either pure interpretation or
just-in-time compilation are too extreme if
implemented in absolute terms - A purely interpreted program will be slow at
runtime, but it does not take extra time to get
started - JIT compilation allows for fast execution, but
would delay the beginning of execution by the
time needed to completely compile the bytecode to
native code - In Adaptive Optimization, the JVM takes advantage
of information available at runtime and attempts
to combine the bytecode interpretation with
compilation to native code.
33- Based on a clever remark
- most programs spend 80 to 90 percent of the
time executing 10 to 20 percent of the code - The JVM
- Begins by interpreting the bytecodes
- Monitors execution of that code
- Figures out the hot spot of the code and starts a
background thread to compile that code to native
code - Avoids premature optimization, which is typical
to static compilers
34- Too good to be true? Correct!
- There are some issues with this, and depending
on how well these issues are dealt with, one
implementation can greatly differ in performance
from another - Known Issues
- Adaptive optimization does not work well over
method invocations. - Inlining? This can have issues too when we talk
in terms of polymorphism - Going in and out the hot spot
35Summary
36Practice
- Play with javap to determine
- A) The assembly-like listing of a compiled class
using javap c - B) The method signature (public and protected) of
a compiled class using javap s - C) The complete profile (including Constant Pool)
of a compiled class javap -verbose