Title: Java for High Performance and Distributed Computing
1Java for High Performance and Distributed
Computing
- Prof Mark Baker
- School of Systems Engineering
University of Reading Tel 44 118 378 8615
E-mail Mark.Baker_at_computer.org - Web http//acet.rdg.ac.uk/mab
2Outline
- Introduction.
- The Pros and Cons of Java.
- Java Applications
- Message Passing in Java (MPJ Express),
- Application Services (Tycho).
- Some lessons learnt!
- Conclusions.
3Introduction
- Java is a modern, object-oriented language based
on open public standards - Object orientation allow a degree of modularity,
which makes tasks easier to understand and the
code more maintainable - The paradigm makes it possible to distribute code
with a consistent, public API, while keeping the
implementation details private. - The core Java API is extensive and includes
standard packages for - Threads, sockets, Internet access, security,
graphics, sound, and other useful areas. - This means that Java programs, which use these
standard packages, can execute unchanged on
heterogeneous platforms.
4The Pros and Cons of Java - Advantages
- Simplicity
- Java is considered a simple and easy to use
object-oriented language when compared to other
popular languages, such as C or C. - Partially modelled on C
- Multiple inheritance replaced with a structure
called an interface, - Eliminated the use of pointers, which removes the
possibilities of a multitude of errors. - In Java, memory management is automatic, and many
errors, such as buffer overflows, and stray
pointers are impossible. - Another reason why Java is considered simpler
than C is because it uses automatic memory
allocation and garbage collection, whereas C
requires the programmer to allocate memory and
reclaim memory.
5The Pros and Cons of Java - Advantages
- Reliability
- Considered more reliable, as it has integrated
exception handling, which deals with error
conditions and systematically forces the
programmer to take the necessary action to handle
errors - Exception handlers can be written to catch a
specific exception such as - Number format exception, or
- Entire group of exceptions by using a generic
exception handler. - Any exception not specifically handled within a
program is caught by the Java run time
environment itself. - C has essentially no runtime error checking and
memory allocation/retrieval is manual.
6The Pros and Cons of Java - Advantages
- Security
- A mature security model, which has been
extensively tested by the community at large. - At its core, the language itself is type-safe and
provides automatic garbage collection, enhancing
the robustness of application code. - A secure class loading and verification mechanism
ensures that only legitimate code is executed. - There are a large set of APIs, tools, and
implementations of commonly used security
algorithms, mechanisms, and protocols - This provides the developer a comprehensive
security framework for writing applications, and
also provides the user or administrator a set of
tools to securely manage applications.
7The Pros and Cons of Java - Advantages
- Threads
- Built-in support for threading
- Threads were designed into the language from the
start, they are simple to use, and increasingly
needed with the rapid take-up of multi-core
processors. - With threading comes the need to provide
concurrency control, which should prevent race
conditions, interference and deadlock - Java has comprehensive support for
general-purpose concurrent programming, such as - Task scheduling, concurrent collections, atomic
variables, synchronizers, locks, and
nanosecond-granularity timing.
8The Pros and Cons of Java - Advantages
- Memory Allocation
- When Java applications create objects, the JVM
allocates memory for their storage - when the
object is no longer needed memory is reclaimed
for later use. - Garbage collection in Java operates incrementally
on separate generations of objects rather than on
all objects every time. - The latest version of Java adds the ability to
customise the way memory is recovered, and this
is helping dispel the idea that interpreted
languages are slow.
9The Pros and Cons of Java - Advantages
- Internet Aware
- Designed to be Internet aware, and to support
network programming with built-in support for - Sockets, IP addresses, URLs and HTTP.
- Java natively includes support for other
protocols including - Remote Method Invocation (RMI), and those found
in CORBA and Jini. - Documentation
- Java has built-in support for comment-based
documentation. - The source code file can also contain its own
documentation, which is stripped out and
reformatted into HTML via a separate program
javadoc. - This way API documents can be created, and this
is a boon for documentation maintenance and use.
10The Pros and Cons of Java - Advantages
- Code Execution
- Java compilers, interpreters, and runtime systems
have come a long way - Today, the execution of well-written Java code
can now be on a par with well written C or C
code. - Most Java code is executed by a JVM, which can be
an interpreter, a JIT (Just-In-Time) compiler, or
an adaptive optimising system such as HotSpot. - Java applications can execute with little or no
change on multiple hardware platforms where a
compliant JVM exists - This is a compelling argument for using Java, as
it obviates the heterogeneous nature of
distributed systems and promotes the ideal of
write once, run anywhere.
11The Pros and Cons of Java - Advantages
- Performance
- Depends on a number of factors, including
- Coding style/efficiency, version of the JVM,
underlying Operating System and memory available.
- Java is now nearly equal to (or faster than) C
on low-level and numeric benchmarks - no shock as
Java is a compiled language, via the JIT compiler.
The authors conclude, "On Intel Pentium hardware,
especially with Linux, the performance gap is
small enough to be of little or no concern to
programmers."
12The Pros and Cons of Java - Advantages
- Development Tools
- There are a huge number of Java development
tools, for example the Eclipse platform, as well
as a large number of open source and free
software that has been made available by the
community of programmers. - If nothing else, this software can be a starting
place to develop new ideas and more sophisticated
applications. - Programmer Productivity
- Programmer productivity is believed to be at
least two times greater with Java. - A lot can be done in a short amount of time with,
because there is an extensive library of
functions already built into the language,
integrated development environments, and a wide
selection of supporting tools.
13The Pros and Cons of Java - Disadvantages
- Memory Management
- There are no destructors in Java.
- There is no "scope" of a variable per se, to
indicate when the objects lifetime is ended
the lifetime of an object is determined instead
by the garbage collector. - All objects in C will be (or rather, should be)
destroyed, but not all objects in Java are
garbage collected. - The Java garbage collector can be changed, but
there is no explicit control over object
collection.
14The Pros and Cons of Java - Disadvantages
- Arrays
- Although arrays in Java look similar, they have a
very different structure and behaviour than they
do in C/C - There is a read-only length member (size of
array) and run-time checking throws an exception
if you go out of bounds. - In Java a two-dimensional array is an array of
one-dimensional arrays - May expect that elements of rows are stored
contiguously, one cannot depend upon the rows
themselves being stored contiguously - In fact, there is no way to check whether rows
have been stored contiguously after they have
been allocated. - The possible non-contiguity of rows implies that
the effectiveness of block-oriented algorithms
may be dependent on the particular implementation
of the JVM as well as the current state of the
memory manager.
15The Pros and Cons of Java - Disadvantages
- Floating Point arithmetic
- There is a floating-point issue because it is
required that Java programs produce bitwise
identical floating-point results in every JVM. - This ideal inhibits efficient floating-point
processing on some platforms. - For example, it eliminates the efficient use of
floating-point hardware on processors that
utilise extended precision in registers.
16The Pros and Cons of Java - Disadvantages
- Accessing non-Java resources
- Java has a problem with accessing resources
outside the JVM, such as directly accessing
hardware. - Overcome using the Java Native Interface (JNI)
that allows calls to functions written in another
language (C and C are supported), - Thus, you can always solve a platform-specific
problem (in a relatively non-portable fashion,
but then that code is isolated). - This approach does not comply with the write
once run anywhere philosophy and breaks the
programming model because there is no way to
ensure code type safety. - There are performance overheads too, especially
for large messages, due to copying of the data
from the JVMs heap onto the system buffer. - JNI can lead to memory leaks because the
programmer is responsible for allocating and
freeing the memory. - Finally, accessing languages that are not C/C
requires a C/C wrapper to interact with other
languages such as Fortran or Delphi.
17Java-based Application Support
18MPJ Expresshttp//acet.rdg.ac.uk/projects/mpj
- A Java Message Passing System
19Introduction
- Since its introduction in 1993, the Message
Passing Interface (MPI) has become a de facto
standard for writing HPC applications on clusters
and Massively Parallel Processors (MPPs). - The emergence of SMP clusters and multi-core
processors presents a new challenge for
established parallel programming paradigms,
including those based on MPI. - The adoption of traditional languages for HPC
is largely a matter of economics - Creating entirely novel development environments
for new languages, matching the standards
programmers expect today, is expensive. - Also contemporary parallel architectures
predominately use off-the-shelf microprocessors
that are best be exploited using off-the-shelf
compilers.
20Introduction
- As mentioned in earlier slides
- Besides lacking native support for threads, C and
Fortran lag behind more modern languages in
concerns like object-orientation, portability,
modularity, and maintainability. - Applications written in C and Fortran can be
error-prone, due to limited compile time and
runtime error checking. - Compared to more modern languages, productivity
may be significantly reduced.
21Java Messaging
- Introduction
- As a response to the appearance of several
prototype MPI-like libraries, the Message-Passing
Working Group of the Java Grande Forum was formed
in late 1998. - This working group came up with an initial draft
of an API, which was distributed at
Supercomputing in 1998. - Two APIs, mpiJava and MPJ, have been proposed
- Main difference lies in the naming conventions of
variables and functions. - There have been various efforts over the last
decade to develop a Java messaging system, they
follow one of three approaches - JNI to interact with a underlying native MPI
runtime - Java messaging from scratch using the likes of
RMI - Communications using Java Sockets.
- Experience gained with these implementations
suggests that there is no universal approach that
satisfies, often conflicting requirements of end
users.
22Java Messaging - Paradox
- The fastest implementation is via JNI using the
likes of mpiJava - This introduces various issues
- Compatibility with underlying MPI,
- Breaks programming model,
- Compromises portability!
- Using 100 pure Java ensures portability
- But it might not provide the most efficient
solution, especially in the presence of commodity
high-performance hardware. - It is important to address these contradictory
requirements of portability and high performance,
in the design of Java messaging systems.
23MPJ Express
- To address this, MPJE (MPJ Express) has been
developed project started in 2004 - released in
September 2006. - It is an implementation of a MPI-like API for
Java - The higher-level concepts, such as communicators
(inter and intra), virtual topologies, and
derived datatypes have been implemented in pure
Java. - MPJE uses a unique buffering API (mpjbuf) that
allows explicit memory management at the
application level. - Both Java and JNI communication devices can use
this buffering layer, by making use of direct
byte buffers for the actual storage. - Currently, two communication devices have been
implemented - niodev based on Java NIO (New I/O),
- mxdev based on Myrinet eXpress (MX).
- MPJE also includes a runtime system that permits
boot-strapping of processes across remote nodes.
24MPJ Express - Architecture
- The high and base levels rely on the mpjdev and
the xdev level for actual communications and
interaction with the underlying networking
hardware. - Two implementations of the mpjdev are envisaged
- JNI wrappers to native MPI implementations,
- Use of a lower level device, called xdev, to
provide access to Java sockets or specialised
communication libraries.
25MPJE - Performance
Latency
Bandwidth
26MPJ Express Meets Gadget
- To help establish the practicality of real
scientific computing using MPJE, the parallel
cosmological simulation code, Gadget-2, was
ported from C to Java. - Gadget-2 is a massively parallel structure
formation code developed by Volker Springel at
the Max Planck Institute of Astrophysics. - Versions of Gadget-2 has been used in various
research papers in astrophysics literature,
including the noteworthy Millennium Simulation
the largest ever model of the Universe.
27J- Gadget
- Gadget-2 was manually translated to Java.
- The data structures were deliberately kept
similar so that a cross-reference to the original
source code could be made for debugging purposes.
- Gadget-2 uses the GNU Scientific Library (GSL), a
parallel version of Fastest Fourier Transforms in
the West (FFTW), and a MPI library. - We used the Barnes-Hut tree algorithm for
calculating gravitational forces and for
communication, we used MPJE. - The main simulation loop involves calculating
gravitational forces for each particle in the
simulation and updating their accelerations - This is the most compute intensive task in the
simulation.
28Performance
- Java Gadget can achieve around 70 of performance
of the C version. - This is acceptable performance given that Java
has many extra safety features including array
bounds checking. - Comparison is between a production quality C code
against a Java code that could be optimized
further. - The performance of Java Gadget-2 reinforces our
belief that Java is a viable option for HPC.
29Multi-threaded Performance
- The emergence of HPC systems based on multi-core
processors presents a new challenge for existing
parallel programming paradigms, including MPI. - As an efficient way to program such hardware, is
using MPJ Express to implement nested
parallelism, by which we mean the mixed use of
multi-threading and messaging. - Messaging libraries, including MPICH2 and
OpenMPI, can handle this using hybrid application
code or by providing shared memory devices, but
the inherent multi-threaded nature of Java opens
up new opportunities. - Use of threads is not restricted to performing
computation concurrently on multiple cores in a
system. - Thread-safe messaging also provides an
opportunity to perform concurrent communication
in multiple threads.
30Integrating MPJE and Java OpenMP
- The approach we took was to write a hybrid
application using MPJ Express and JOMP libraries.
- The application code is saved with .jomp
extension. - This source file was translated to .java
extension and compiled to produce class files. - To execute JOMP-based applications with our
runtime, we passed a command line switch
jomp.threads and specified JOMPs JAR file to
mpjrun.
31Performance of Mutli-threaded Gadget
- The Java version of Gadget with 4 JOMP threads
shows the advantages of our approach of using
thread-level parallelism. - Using JOMP threads has increased the performance
of the Java code by a factor of 2 to 3, depending
on total number of processors used.
32MPJE - Summary
- MPJ Express is a MPI-like Java messaging system
- Thread safe, with a unique buffer management
system. - Pure Java system that can use fast proprietary
networks too. - The latency and bandwidth of MPJE is not far off
that for native C - The performance of Java Gadget-2 reinforces our
belief that Java is a viable option for HPC. - With careful programming, it is possible to
achieve performance in the same general ballpark
as C code. - System allows nested-parallelism, and this
provides significant performance improvements.
33MPJE - Summary
- MPJE is also being used in an increasing number
projects, including - MoSeS (Modelling and Simulation for e-Social
Science) at Leeds, - CartaBlanca (Sandia) that is simulating
non-linear physics on unstructured grids. - Evolve genetic programming agents (Santa Fe
Institute). - Parallelise dynamic programming algorithms in
computational biology using block-cyclic based
wave-front patterns (University of Northern
British Columbia). - P2PMPI (University of Strasbourg, France).
34James Gosling Says
35MPJE - Links
http//mpj-express.org/
36Tychohttp//acet.rdg.ac.uk/projects/tycho
- An Asynchronous Messaging System with an
Integrated Virtual Registry
37GridRM
- Original Motivation
- We were developing a Grid-based monitoring system
(see gridrm.org), - Global layer of peer-related gateways
- Which in turn have a local layer that interacts
with the local data sources, and/or a hierarchy
child gateways. - Wanted software to bind together the global
gateways!
38Tycho
- Needed a lightweight implementation of the
OGFs Grid Monitoring Architecture in Java for
GridRM. - There were others systems around
- R-GMA,
- pyGMA,
- Autopilot, MDS, NWS, CODE
- Found that existing systems were heavyweight,
complex and/or not standalone, and hard to use. - Decided to produce our own version
- Aims
- So called GMA compliant!
- Easy to install and use,
- Easy to program and extend,
- Java-based,
- Small memory footprint.
39Tycho
- Tycho is a based on a publish, subscribe and bind
paradigm (Service Oriented Architecture). - Design Philosophy
- We believed that the system should have an
architecture similar to the Internet, where every
node provides reliable core services, and the
complexity is kept, as far as possible, to the
edges - The core services can be kept to the minimum, and
endpoints can provide higher-level and more
sophisticated services, that may fail, but will
not cause the overall system to crash. - We have kept Tychos core small, simple and
efficient, so that it has a minimal memory
foot-print, is easy to install, and is capable of
providing robust and reliable services. - More sophisticated services can then be built on
this core and are provided via libraries and
tools to applications. - Tycho is flexible and extensible so that it is
possible to incorporate additional features and
functionality easily.
40Tycho Architecture
- Tycho consists of the following components
- Mediators that allow producers and consumers to
discover each other and establish remote
communications, - Consumers that typically subscribe to receive
information or events from producers, - Producers that gather and publish information for
consumers. - There is an asynchronous messaging API.
- In Tycho, producers and/or consumers (clients)
can publish their existence in a Virtual Registry
(VR).
41Tycho Messaging Tests
- Performance Tests of Tycho against NaradaBrokering
42Performance Tests (Messaging)
- NaradaBrokering
- The NaradaBrokering framework is a distributed
messaging infrastructure, developed by the
Community Grids Lab at Indiana University. - NaradaBrokering is an asynchronous messaging
infrastructure with a publish and subscribe based
architecture. - NaradaBrokering is Sun JMS compliant. This
messaging standard allows application components
to exchange unified messages in an asynchronous
system.
43Summary - Tycho vs NB
- Latency and throughput
- When looking at end-to-end performance, on a LAN
for messages less than 2 Kbytes, Tycho and
NaradaBrokering have comparable performance, - Tycho achieves 95 bandwidth, whereas NB only
65.3, - Tycos performance is inhibited by the fact that
it creates a new socket for each message sent - NB reuses sockets instances once they have been
created. - Scalability Summary
- The scalability tests have shown Tycho and
NaradaBrokering producers and consumers to be
stable under heavy load - Performance is weaker when there is a large ratio
of consumers to producers. - Heap Size
- The heap size for NB becomes a limiting factor in
when a broker is receiving messages faster than
it can send them, as the internal message buffer
fills until the heap is consumed and system
fails, - Tycho tests were performed without modifying the
heap size, throttling used as there is limited
buffering.
44Tycho Registry Tests
- Performance Tests against Globuss MDS4 and
gLites R-GMA
45Registries
- MDS4
- The Globus Toolkits Monitoring and Discovery
Service (MDS version 4) is a Web Services
Resource Framework (WSRF) based implementation of
a wide-area information and registry service. - MDS4 provides a framework that can be used to
collect, index and expose data about the state of
grid resources and services. - Typical uses for MDS4 include making resource
data available for decision making in job
submission services or notifying an administrator
when storage space is running low on a cluster.
46Registries
- R-GMA
- A Java-based implementation of the Grid
Monitoring Architecture (GMA) for publishing
network monitoring information over the wide-area
and as an information service. - R-GMA uses a relational model to search, using an
SQL-like API, and describe the monitoring
information it collects. - based on a consumer/producer paradigm with client
data being stored in a directory service, which
presents it as a virtual database. - R-GMA uses the term tuples for sets of data being
published or consumed.
47Tycho compared to R-GMA and MDS4
- For the tests we created a set of randomly
generated strings to act as attributes for
records to be inserted into the registries. - A single record, with no mark up, had an average
size of 114 bytes. - Two tests were used to assess the performance of
the registries - S1 simulates a client searching the registry
for records matching some known attributes - Systematic queries are generated using a function
to select a record name at random from the test
data to guarantee the query will only match one
record. - S2 measures the worst-case scenario of the
client requesting all of the records from within
the registry.
48Tycho compared to R-GMA and MDS4
Query Response Time Versus the Number of Records
for a Query When Selecting a Single Random Record
From the Registry (S1)
Query Response Time Versus the Number of Records
for a Query that Selects All the Records From the
Registry (S2)
49Tycho compared to R-GMA and MDS4
- When testing the effect of the number of records
on response time, we see that when selecting a
single record from 100,000 - Tycho 32 seconds faster than R-GMA,
- MDS4 runs out of heap space for larger records
sizes. - We also ran tests where there were multiple
clients accessing the registries - Again Tycho's VR had a lower response latency
than R-GMA and MDS4 - With 100 clients Tycho was 94 seconds faster than
R-GMA and 65 seconds faster than MDS4. - The results highlight that one of the strengths
of our implementation is its performance under
load - Tycho's performance is linear with regard to both
increasing numbers of clients and response sizes.
50Tycho Applications
- Tychos functionality has all been incorporated
within a single Java JAR and requires just Java
1.5 JDK for building and running applications - Use embedded Jetty applications server,
- Hypersonic DB.
- Tycho is now used in
- Swarm system - Bit Torrent like system (more in
coming slides), - GridRM,
- Slogger (Semantic Log Analyser)
- VOTechBroker,
- FP6 SORMA project - GridRM and messaging layer.
- Others to come
- Events/Transactions, Computational Steering.
51Content Distribution With Tycho
- We wanted to develop a Tycho utility that would
demonstrate and validate the utility concept - We created a content distribution system call the
Tycho swarm utility. - The swarm utility provides content distribution
similar to BitTorrent and overcomes the common 2
Gigabyte file size problem. - Content is split into chunks and the virtual
registry is used to store chunk availability. - Peers use the virtual registry to locate each
other and decide what chunks to download. - Tycho messages are used to transfer the chunks
between peers and peers cooperate to distribute
the content throughout the swarm.
52Swarm Utility Architecture
53Swarm Utility - Summary.
- The utility was developed to test the potential
of Tycho utilities and also further stress test
the overall infrastructure - By simultaneously utilising the VR and messaging
functionality, - Storing and updating thousands of entry records
in the VR, - Sending thousands of multi-megabyte messages
between clients. - Its potential uses include
- Distributing files for collaboration purposes,
- Staging data for computation,
- Mirroring and managing large data sets.
54Some Conclusions
- We designed Tycho to have a relatively small,
simple and efficient core, so that it has a
minimal memory footprint, is easy to install, and
is capable of providing robust and reliable
services. - More sophisticated services can then be built on
this core and be provided via libraries and tools
to applications - This provides us with a flexible and extensible
framework where it is possible to incorporate
additional feature and functionality, which are
created as producers or consumers, which do not
affect the core. - Performance
- Tychos performance is comparable to that of
NaradaBrokering, a more mature system - Certain features of NaradaBrokering are superior
to those of Tycho, but its memory utilisation and
indirect communications are limiting features. - Compared to MDS4 and R-GMA, Tycho shows superior
performance and scalability to both these
systems - In addition, we would argue that both MDS4 and
R-GMA have problems with memory utilisation and
without significant extra effort limited
scalability.
55Some Lessons Learnt!
- In general
- Packages should be simple to install, configure
and use - Limit external dependences - attempt to embed
extras, - Possible to install in user space - avoid need
to be root! - Use a sensible extensible architecture
- We believe in the internet-type architecture
- Stable core, with complexity at the edges.
- Security by default.
- Keep communication simple
- HTTP(S) and XML works fine!
56Some Lessons Learnt!
- Java Development
- Necessary to manage ones own memory
- Cannot guarantee behaviour of GC!
- Embedded application server, such as Jetty has
far superior performance to Tomcat, - In-memory storage of data structure is not really
always sensible - Java databases, such as
Hypersonic overall are as fast to use for storing
(persistently) this data, - Thread safety very important,
- Efficient thread use and throttling algorithms
for messaging.
57Overall Conclusions
- Discussed Pros and Cons of Java - not all cream,
but more advantages, then disadvantages. - Even though sceptics may still feel that Java is
less than ideal, the community of middleware and
application developers is increasingly using Java
and its related technologies to produce quality
software. - Java has an extensive number of features,
libraries and tools that provide a rich
development environment. - This uptake is no doubt assisted by the fact that
Java is often the first language learnt by
students in educational and other institutions. - Looked at some Java-based middleware
- MPJ Express, a thread-safe MPI-like bindings for
Java. - Tycho is a combined wide area-messaging framework
with a built-in distributed registry (VR) - All free to download, supported, and under GP
License.
58Project Links
- Downloads
- Tycho - http//acet.rdg.ac.uk/projects/tycho/
- MPJE - http//acet.rdg.ac.uk/projects/mpj/
- GridRM - http//gridrm.org/
59Shameless Plug
http//www.amazon.co.uk/exec/obidos/ASIN/047009417
6/qid3D1113207878/202-7878523-7639008
60Thank you for listening