Title: Parallel and Distributed Computing with Java
1Parallel and Distributed Computing with Java
- Prof Mark Baker
- ACET, University of Reading Tel 44 118 378
8615 E-mail Mark.Baker_at_computer.org - Web http//acet.rdg.ac.uk/mab
2Outline
- The Advantages of Java.
- Popular Java Applications.
- Java Messaging - MPJ Express
- Introduction,
- Architecture,
- Performance,
- Gadget,
- Conclusions.
- Wide area messaging and registry Tycho.
- Motivation,
- Architecture.
- Comparative tests,
- Conclusions.
- Summary and conclusions.
3Introduction
- Java first came to public attention in 1995
within a year, it was being speculated that it
would be a good language for parallel and
distributed computing. - Its core features, including being objected
oriented and platform independence, as well as
having built-in network support and threads, has
encouraged this view. - Today, Java is being used in almost every type of
computer-based system - Sensor networks to high performance computing
platforms, and - Enterprise applications through to complex
research-based simulations.
4Some Advantages of Java
- More reliable
- C has essentially no runtime error checking and
memory allocation/retrieval is manual - In Java, memory management is automatic, and many
errors, such as buffer overflows, and stray
pointers are impossible. - Built-in exception handling, which C lacks
completely. - More Secure
- A robust and mature security model, which has
been extensively tested by the community at
large. - Java compilers, interpreters, and runtime systems
have come a long way in the last five years too - The execution of well-written Java code can now
be on a par with well written C or C code - Java code is executed by a Java Virtual Machine
(JVM), which can be an interpreter, a JIT
(Just-In-Time) compiler, or an adaptive
optimising engine such as HotSpot.
5Some Advantages of Java
- Java development tools
- e.g. Eclipse platform, and a large number of open
source, free software that has been made
available by the community of programmers. - If nothing else, this software can be a starting
place to develop new ideas and more sophisticated
applications. - Portability
- Java applications can execute with little or no
change on multiple hardware platforms where a
compliant JVM exists. - Compelling benefit and argument for using Java.
- Programmer productivity
- At least two times greater with Java.
- A lot can be done in a short amount of time with
Java because it has such an extensive library of
functions already built into the language,
integrated development environments, and a wide
selection of supporting tools.
6Popular Java Applications
- Apache Tomcat
- The Tomcat project started at Sun Microsystems as
the reference implementation of a server that
supports Servlet and JSP specifications. - The Tomcat code base was donated by Sun to the
Apache Software Foundation in 1999. Since then,
volunteers from numerous organisations have
contributed to the server. - There have been multiple major releases and the
product has enjoyed considerable take-up from
academia and industry.
7Popular Java Applications
8Popular Java Applications
9Popular Java Applications
- Web Portals
- Modern software can be complex, often this makes
it difficult to install, configure and use. - This has motivated many groups to develop a
variety of portal-based frameworks as the
mechanism to manage information in a cohesive and
structured fashion. - Portals have many advantages, which is why they
have become the de facto standard for web-based
application delivery.
10Popular Java Applications
11Popular Java Applications
12Popular Java Applications
- Berkeley DB Java Edition
- A high performance, transactional storage engine,
which supports full ACID transactions and
recovery that is written entirely in Java. - It stores data in the application's native
format consequently no runtime data translation
is required. - Berkeley DB Java Edition was designed from the
ground up in Java and takes advantage of the Java
environment. - The architecture of Berkeley DB Java Edition
supports high performance and concurrency for
both read-intensive and write-intensive workloads.
13Popular Java Applications
14Popular Java Applications
- Jabber
- Jabber is an open source, secure, ad-free
alternative to consumer-based instance messaging
services, such as MSN or Yahoo! Messenger. - Jabber is based on a set of streaming XML
protocols and technologies that enable any two
entities on the Internet to exchange messages,
presence, and other structured information.
15Popular Java Applications
16Java Messaging
- Introduction
- As a response to the appearance of several
prototype MPI-like libraries, the Message-Passing
Working Group of the Java Grande Forum was formed
in late 1998. - This working group came up with an initial draft
of an API, which was distributed at
Supercomputing in 1998. - Two APIs, mpiJava 1.2 and MPJ, have been
proposed - Main difference lies in the naming conventions of
variables and functions. - There have been various efforts over the last
decade to develop a Java messaging system, follow
one of three approaches - JNI to interact with a underlying native MPI
runtime - Java messaging from scratch using the likes of
RMI - Communications using Java Sockets.
- Experience gained with these implementations
suggests that there is no universal approach that
satisfies, often conflicting requirements of end
users.
17Java Messaging
- The Paradox
- The fastest implementation is via JNI using the
likes of mpiJava - This introduces various issues
- Compatibility with underlying MPI,
- Breaks programming model,
- Compromises portability!
- Using 100 pure Java ensures portability
- But it might not provide the most efficient
solution, especially in the presence of commodity
high-performance hardware. - It is important to address these contradictory
requirements of portability and high performance,
in the design of Java messaging systems.
18MPJ Express
- To address this, MPJE (MPJ Express) has been
developed project started early 2004. - MPJE is an implementation of a MPI-like API for
Java - The higher-level concepts, such as communicators
(inter and intra), virtual topologies, and
derived datatypes have been implemented in pure
Java. - MPJE uses a unique buffering API (mpjbuf) that
allows explicit memory management at the
application level. - Both Java and JNI communication devices can use
this buffering layer, by making use of direct
byte buffers for the actual storage. - Currently, two communication devices have been
implemented - niodev based on Java NIO (New I/O),
- mxdev based on Myrinet eXpress (MX).
- MPJE includes a runtime system that permits
boot-strapping of processes across remote nodes.
19MPJ Express - Architecture
- The high and base levels rely on the mpjdev and
the xdev level for actual communications and
interaction with the underlying networking
hardware. - Two implementations of the mpjdev are envisaged
- JNI wrappers to native MPI implementations,
- Use of a lower level device, called xdev, to
provide access to Java sockets or specialised
communication libraries.
20MPJE - Performance
Latency
21MPJE - Performance
Bandwidth
22MPJE - Performance
- Bandwidth
- The one-byte latency of MPJE is 164 microseconds,
approximately 20 slower than MPICH - Difference due to
- Complexity introduced by the thread-safety
algorithm, which requires synchronized. - MPJE uses an intermediate buffering layer
- Implies additional copying,
- A faster alternative on proprietary networks like
Myrinet. - Direct byte buffers reside outside the JVMs
heap, possible to get pointers to these buffers
from C - avoid copying between JVM/JNI. - Throughput
- mpiJava achieves 84.
- Difference due to additional data copying in JNI.
- MPICH and MPJE achieve 89 and 87 respectively.
- mpjdev achieves 90 bandwidth which shows that
the main overhead for MPJE is packing/unpacking
data onto buffers - The drop at 128 Kbytes is due to the change of
communication protocol from eager send to
rendezvous.
23MPJ Express Meets Gadget
- To help establish the practicality of real
scientific computing using MPJE, the parallel
cosmological simulation code, Gadget-2, was
ported from C to Java. - Gadget-2 is a massively parallel structure
formation code developed by Volker Springel at
the Max Planck Institute of Astrophysics. - Versions of Gadget-2 has been used in various
research papers in astrophysics literature,
including the noteworthy Millennium Simulation
the largest ever model of the Universe.
24Gadget
- A 3-dimensional visualization of the Millennium
Simulation. - The movie shows a journey through the simulated
universe. - On the way, we visit a rich cluster of galaxies
and fly around it. - During the two minutes of the movie, we travel a
distance for which light would need more than 2.4
billion years.
25J- Gadget
- Gadget-2 was manually translated to Java.
- The data structures were deliberately kept
similar so that a cross-reference to the original
source code could be made for debugging purposes.
- Gadget-2 uses the GNU Scientific Library (GSL), a
parallel version of Fastest Fourier Transforms in
the West (FFTW), and a MPI library. - We used the Barnes-Hut tree algorithm for
calculating gravitational forces and for
communication, we used MPJE. - The main simulation loop involves calculating
gravitational forces for each particle in the
simulation and updating their accelerations - This is the most compute intensive task in the
simulation.
26Performance
- Java Gadget can achieve around 70 of performance
of the C version. - This is acceptable performance given that Java
has many extra safety features including array
bounds checking. - Comparison is between a production quality C code
against a Java code that could be optimized
further. - The performance of Java Gadget-2 reinforces our
belief that Java is a viable option for HPC.
27Summary
- MPJ Express is a MPI-like Java messaging system.
- MPJE is a pure Java system, which has an
architecture that can also use fast proprietary
networks too. - The latency and bandwidth of MPJE is not far off
that for native C. - The performance of Java Gadget-2 reinforces our
belief that Java is a viable option for HPC. With
careful programming, it is possible to achieve
performance in the same general ballpark as C
code. - MPJE is also being used in other projects,
including - MoSeS (Modelling and Simulation for e-Social
Science) at Leeds. - CartaBlanca (Sandia) that is simulating
non-linear physics on unstructured grids.
28James Gosling Says
29Distributed Applications
- In a distributed environment remote entities
(producers or consumers of services) need a means
to publish their existence so that clients,
needing their services, can search and find the
appropriate ones that they can then interact with
directly. - The publication of information is via a registry
service, and the interaction is via a high-level
messaging service - Typically, separate libraries provide these two
services. - Tycho is an implementation of a wide-area
asynchronous messaging framework with an
integrated distributed registry. - Frees developer from the need to assemble their
applications from a range middleware offerings - Simplify and speed application development and
allow developers to concentrate on their own
domain of expertise.
30Original Motivation
- The resource-monitoring framework, known as
GridRM, has a distributed architecture where
information needs to flow between remote
gateways. - Rather than reinvent a means of discovering and
asynchronously transferring data between these
end-points, a mature package was sought.
31Original Motivation
- Even though the original motivation behind this
project was to find and integrate an exiting
solution into GridRM, it became evident that most
generic distributed applications have similar
requirements. - This has led us believe that there is a general
need for a system that provides a scalable
registry and integrated wide-area messaging
support. - In addition, as the Open Grid Services
Architecture (OGSA) gains increasing acceptance
in the e-Science community, a system that
combines MOM with a generic registry will be a
key aspect of these Service-Oriented
Architectures (SOA). - Tycho is an implementation of a wide-area
asynchronous messaging framework with an
integrated distributed registry.
32Tycho
- Tycho is a Java-based framework based on a
publish, subscribe and bind paradigm. - Design Philosophy
- We believe that the system should have an
architecture similar to the Internet, where every
node provides reliable core services, and the
complexity is kept, as far as possible, to the
edges. - The core services can be kept to the minimum, and
endpoints can provide higher-level and more
sophisticated services, that may fail, but will
not cause the overall system to crash. - We have kept Tychos core small, simple and
efficient, so that it has a minimal memory
foot-print, is easy to install, and is capable of
providing robust and reliable services. - More sophisticated services can then be built on
this core and are provided via libraries and
tools to applications. - Allows Tycho to be flexible and extensible so
that it will be possible to incorporate
additional features and functionality.
33Tycho
- Tycho consists of the following components
- Mediators that allow producers and consumers to
discover each other and establish remote
communications, - Consumers that typically subscribe to receive
information or events from producers, - Producers that gather and publish information for
consumers. - There is an asynchronous messaging API.
- In Tycho, producers and/or consumers (clients)
can publish their existence in a Virtual Registry
(VR). - A client uses the VR to locate other clients,
which act as a source or sink for the data they
are interested in. - The VR is a distributed service provided by a
network of mediators. - When possible, clients communicate directly
- For clients that do not have direct access to
the Internet, the mediator provides wide-area
connectivity by acting as a gateway or proxy into
a localised Tycho installation.
34The Architecture of Tycho
35Layered View of Tychos Architecture
36Tycho Messaging Tests
- Performance Tests of Tycho against NaradaBrokering
37Performance Tests (Messaging)
- NaradaBrokering
- The NaradaBrokering framework is a distributed
messaging infrastructure, developed by the
Community Grids Lab at Indiana University. - NaradaBrokering is an asynchronous messaging
infrastructure with a publish and subscribe based
architecture. - NaradaBrokering is Sun JMS compliant. This
messaging standard allows application components
to exchange unified messages in an asynchronous
system. - The JMS specification is used to develop Message
Orientated Middleware (MOM) and defines how
messages are to be communicated via queues or
topics. - Networks of collaborating brokers are arranged in
a cluster topology, with a hierarchy of clusters,
super-clusters, and super-super-clusters. - It aims to provide a unified messaging
environment that incorporates the capability to
support Grid and Web Services, Peer-to-Peer and
video conferencing, within a SOA.
38NaradaBrokering
39Tycho vs NaradaBrokering
WAN
Baseline Java
LAN
WAN
Baseline Java
LAN
- Ping-Pong tests.
- In the test the consumer is run on one host and
the producer on - another, they communicate over Fast Ethernet
via sockets. - The Tycho consumer and producer communicate
directly using - sockets, only using the mediator to bootstrap
the test. - The NaradaBrokering consumer and producer also
use sockets to - send the messages, but these messages are
routed via the broker.
40Tycho vs NaradaBrokering
- The scalability tests designed to measure the
performance of - Tycho and NaradaBrokering as the number of
producers or - consumers is increased.
- In both tests, the single consumer/producer and
mediator/broker - are started on separate nodes within the test
cluster, with the - remaining nodes used to run clients.
- Initial experiments showed us that fourteen
clients are sufficient - to saturate the Fast Ethernet network
41Summary
- Latency and Throughput
- When looking at end-to-end performance, on a LAN
for messages less than 2 Kbytes, Tycho and
NaradaBrokering have comparable performance. - Tycho achieves 95 bandwidth, whereas NB only
65.3. - Tycos performance is inhibited by the fact that
it creates a new socket for each message sent - NB reuses sockets instances once they have been
created. - Scalability Summary
- The scalability tests have shown Tycho and
NaradaBrokering producers and consumers to be
stable under heavy load - Performance is weaker when there is a large ratio
of consumers to producers. - The heap size for NB becomes a limiting factor in
when a broker is receiving messages faster than
it can send them, as the internal message buffer
fills until the heap is consumed. - Tycho tests were performed without modifying the
heap size, throttling used as there is limited
buffering.
42Tycho Registry Tests
- Performance Tests against Globuss MDS4 and
gLites R-GMA
43Registries
- MDS4
- The Globus Toolkits Monitoring and Discovery
Service (MDS version 4) is a Web Services
Resource Framework (WSRF) based implementation of
a wide-area information and registry service. - MDS4 provides a framework that can be used to
collect, index and expose data about the state of
grid resources and services. - Typical uses for MDS4 include making resource
data available for decision making in job
submission services or notifying an administrator
when storage space is running low on a cluster.
44Registries
- R-GMA
- A Java-based implementation of the Grid
Monitoring Architecture (GMA) for publishing
network monitoring information over the wide-area
and as an information service. - R-GMA uses a relational model to search, using an
SQL-like API, and describe the monitoring
information it collects. - based on a consumer/producer paradigm with client
data being stored in a directory service, which
presents it as a virtual database. - R-GMA uses the term tuples for sets of data being
published or consumed.
45Tycho compared to R-GMA and MDS4
- For the tests we created a set of randomly
generated strings to act as attributes for
records to be inserted into the registries. - A single record, with no mark up, had an average
size of 114 bytes. - Two tests were used to assess the performance of
the registries - S1 simulates a client searching the registry
for records matching some known attributes - Systematic queries are generated using a function
to select a record name at random from the test
data to guarantee the query will only match one
record. - S2 measures the worst-case scenario of the
client requesting all of the records from within
the registry.
46Tycho compared to R-GMA and MDS4
Query Response Time Versus the Number of Records
for a Query When Selecting a Single Random Record
From the Registry (S1)
Query Response Time Versus the Number of Records
for a Query that Selects All the Records From the
Registry (S2)
47Tycho compared to R-GMA and MDS4
- When testing the effect of the number of records
on response time, we see that when selecting a
single record from 100,000 - Tycho 32 seconds faster than R-GMA,
- MDS4 runs out of heap space for larger records
sizes. - We also ran tests where there were multiple
client accessing the registries - Again Tycho's VR had a lower response latency
than R-GMA and MDS4 - With 100 clients Tycho was 94 seconds faster than
R-GMA and 65 seconds faster than MDS4. - The results highlight that one of the strengths
of our implementation is its performance under
load - Tycho's performance is linear with regard to both
increasing numbers of clients and response sizes.
48Conclusions
- We designed Tycho to have a relatively small,
simple and efficient core, so that it has a
minimal memory footprint, is easy to install, and
is capable of providing robust and reliable
services. - More sophisticated services can then be built on
this core and be provided via libraries and tools
to applications - This provides us with a flexible and extensible
framework where it is possible to incorporate
additional feature and functionality, which are
created as producers or consumers, which do not
affect the core. - Tychos functionality has all been incorporated
within a single Java JAR and requires only Java
1.5 JDK for building and running applications.
49Conclusions
- Tycho performance is comparable to that of
NaradaBrokering, a more mature system - Certain features of NaradaBrokering are superior
to those of Tycho, but its memory utilisation and
indirect communications are limiting features. - Whereas, compared to MDS4 and R-GMA, Tycho shows
superior performance and scalability to both
these systems - In addition, we would argue that both MDS4 and
R-GMA have problems with memory utilisation and
without significant extra effort limited
scalability.
50Overall Conclusions
- We have discussed two Java-based middleware
systems - MPJ Express, a thread-safe implementation of
MPI-like bindings for Java - MPJE has all the high level MPI features,
- MPJEs performance, though not as fast as native
C using MPICH, has been shown to be comparable
and is improving, - Its other features, such as portability and
thread-safety, make up for the performance gap
and should encourage its take-up. - Tycho is a combined wide area-messaging framework
with a built-in distributed registry (VR) - The results of communication benchmarks,
comparing Tycho to NB, shows that these systems
have comparable performance, - Whereas the performance of Tychos VR, compared
to R-GMA and MDS4 demonstrates that it is
significantly faster and more scalable, - Tycho currently provides unique services that
should free up application developers to
concentrate on their own domain of expertise.
51Overall Conclusions
- In this talk I have discussed the use of Java for
parallel and distributed computing. - Even though sceptics may still feel that Java is
less than ideal, the community of middleware and
application developers is increasingly using Java
and its related technologies to produce quality
software. - Java has an extensive number of features,
libraries and tools that provide a rich
development environment. - This uptake is no doubt assisted by the fact that
Java is often the first language learnt by
students in educational and other institutions. - Downloads
- MPJE - http//dsg.port.ac.uk/projects/mpj/
- Tycho - http//dsg.port.ac.uk/projects/tycho/
52Thank you for listening