Parallel and Distributed Computing with Java
  • Prof Mark Baker
  • ACET, University of Reading Tel 44 118 378
    8615 E-mail
  • Web http//

  • The Advantages of Java.
  • Popular Java Applications.
  • Java Messaging - MPJ Express
  • Introduction,
  • Architecture,
  • Performance,
  • Gadget,
  • Conclusions.
  • Wide area messaging and registry Tycho.
  • Motivation,
  • Architecture.
  • Comparative tests,
  • Conclusions.
  • Summary and conclusions.

  • Java first came to public attention in 1995
    within a year, it was being speculated that it
    would be a good language for parallel and
    distributed computing.
  • Its core features, including being objected
    oriented and platform independence, as well as
    having built-in network support and threads, has
    encouraged this view.
  • Today, Java is being used in almost every type of
    computer-based system
  • Sensor networks to high performance computing
    platforms, and
  • Enterprise applications through to complex
    research-based simulations.

Some Advantages of Java
  • More reliable
  • C has essentially no runtime error checking and
    memory allocation/retrieval is manual
  • In Java, memory management is automatic, and many
    errors, such as buffer overflows, and stray
    pointers are impossible.
  • Built-in exception handling, which C lacks
  • More Secure
  • A robust and mature security model, which has
    been extensively tested by the community at
  • Java compilers, interpreters, and runtime systems
    have come a long way in the last five years too
  • The execution of well-written Java code can now
    be on a par with well written C or C code
  • Java code is executed by a Java Virtual Machine
    (JVM), which can be an interpreter, a JIT
    (Just-In-Time) compiler, or an adaptive
    optimising engine such as HotSpot.

Some Advantages of Java
  • Java development tools
  • e.g. Eclipse platform, and a large number of open
    source, free software that has been made
    available by the community of programmers.
  • If nothing else, this software can be a starting
    place to develop new ideas and more sophisticated
  • Portability
  • Java applications can execute with little or no
    change on multiple hardware platforms where a
    compliant JVM exists.
  • Compelling benefit and argument for using Java.
  • Programmer productivity
  • At least two times greater with Java.
  • A lot can be done in a short amount of time with
    Java because it has such an extensive library of
    functions already built into the language,
    integrated development environments, and a wide
    selection of supporting tools.

Popular Java Applications
  • Apache Tomcat
  • The Tomcat project started at Sun Microsystems as
    the reference implementation of a server that
    supports Servlet and JSP specifications.
  • The Tomcat code base was donated by Sun to the
    Apache Software Foundation in 1999. Since then,
    volunteers from numerous organisations have
    contributed to the server.
  • There have been multiple major releases and the
    product has enjoyed considerable take-up from
    academia and industry.

Popular Java Applications
  • Web Portals
  • Modern software can be complex, often this makes
    it difficult to install, configure and use.
  • This has motivated many groups to develop a
    variety of portal-based frameworks as the
    mechanism to manage information in a cohesive and
    structured fashion.
  • Portals have many advantages, which is why they
    have become the de facto standard for web-based
    application delivery.

Popular Java Applications
  • Berkeley DB Java Edition
  • A high performance, transactional storage engine,
    which supports full ACID transactions and
    recovery that is written entirely in Java.
  • It stores data in the application's native
    format consequently no runtime data translation
    is required.
  • Berkeley DB Java Edition was designed from the
    ground up in Java and takes advantage of the Java
  • The architecture of Berkeley DB Java Edition
    supports high performance and concurrency for
    both read-intensive and write-intensive workloads.

Popular Java Applications
  • Jabber
  • Jabber is an open source, secure, ad-free
    alternative to consumer-based instance messaging
    services, such as MSN or Yahoo! Messenger.
  • Jabber is based on a set of streaming XML
    protocols and technologies that enable any two
    entities on the Internet to exchange messages,
    presence, and other structured information.

Java Messaging
  • Introduction
  • As a response to the appearance of several
    prototype MPI-like libraries, the Message-Passing
    Working Group of the Java Grande Forum was formed
    in late 1998.
  • This working group came up with an initial draft
    of an API, which was distributed at
    Supercomputing in 1998.
  • Two APIs, mpiJava 1.2 and MPJ, have been
  • Main difference lies in the naming conventions of
    variables and functions.
  • There have been various efforts over the last
    decade to develop a Java messaging system, follow
    one of three approaches
  • JNI to interact with a underlying native MPI
  • Java messaging from scratch using the likes of
  • Communications using Java Sockets.
  • Experience gained with these implementations
    suggests that there is no universal approach that
    satisfies, often conflicting requirements of end

Java Messaging
  • The Paradox
  • The fastest implementation is via JNI using the
    likes of mpiJava
  • This introduces various issues
  • Compatibility with underlying MPI,
  • Breaks programming model,
  • Compromises portability!
  • Using 100 pure Java ensures portability
  • But it might not provide the most efficient
    solution, especially in the presence of commodity
    high-performance hardware.
  • It is important to address these contradictory
    requirements of portability and high performance,
    in the design of Java messaging systems.

MPJ Express
  • To address this, MPJE (MPJ Express) has been
    developed project started early 2004.
  • MPJE is an implementation of a MPI-like API for
  • The higher-level concepts, such as communicators
    (inter and intra), virtual topologies, and
    derived datatypes have been implemented in pure
  • MPJE uses a unique buffering API (mpjbuf) that
    allows explicit memory management at the
    application level.
  • Both Java and JNI communication devices can use
    this buffering layer, by making use of direct
    byte buffers for the actual storage.
  • Currently, two communication devices have been
  • niodev based on Java NIO (New I/O),
  • mxdev based on Myrinet eXpress (MX).
  • MPJE includes a runtime system that permits
    boot-strapping of processes across remote nodes.

MPJ Express - Architecture
  • The high and base levels rely on the mpjdev and
    the xdev level for actual communications and
    interaction with the underlying networking
  • Two implementations of the mpjdev are envisaged
  • JNI wrappers to native MPI implementations,
  • Use of a lower level device, called xdev, to
    provide access to Java sockets or specialised
    communication libraries.

MPJE - Performance
MPJE - Performance
MPJE - Performance
  • Bandwidth
  • The one-byte latency of MPJE is 164 microseconds,
    approximately 20 slower than MPICH
  • Difference due to
  • Complexity introduced by the thread-safety
    algorithm, which requires synchronized.
  • MPJE uses an intermediate buffering layer
  • Implies additional copying,
  • A faster alternative on proprietary networks like
  • Direct byte buffers reside outside the JVMs
    heap, possible to get pointers to these buffers
    from C - avoid copying between JVM/JNI.
  • Throughput
  • mpiJava achieves 84.
  • Difference due to additional data copying in JNI.
  • MPICH and MPJE achieve 89 and 87 respectively.
  • mpjdev achieves 90 bandwidth which shows that
    the main overhead for MPJE is packing/unpacking
    data onto buffers
  • The drop at 128 Kbytes is due to the change of
    communication protocol from eager send to

MPJ Express Meets Gadget
  • To help establish the practicality of real
    scientific computing using MPJE, the parallel
    cosmological simulation code, Gadget-2, was
    ported from C to Java.
  • Gadget-2 is a massively parallel structure
    formation code developed by Volker Springel at
    the Max Planck Institute of Astrophysics.
  • Versions of Gadget-2 has been used in various
    research papers in astrophysics literature,
    including the noteworthy Millennium Simulation
    the largest ever model of the Universe.

  • A 3-dimensional visualization of the Millennium
  • The movie shows a journey through the simulated
  • On the way, we visit a rich cluster of galaxies
    and fly around it.
  • During the two minutes of the movie, we travel a
    distance for which light would need more than 2.4
    billion years.

J- Gadget
  • Gadget-2 was manually translated to Java.
  • The data structures were deliberately kept
    similar so that a cross-reference to the original
    source code could be made for debugging purposes.
  • Gadget-2 uses the GNU Scientific Library (GSL), a
    parallel version of Fastest Fourier Transforms in
    the West (FFTW), and a MPI library.
  • We used the Barnes-Hut tree algorithm for
    calculating gravitational forces and for
    communication, we used MPJE.
  • The main simulation loop involves calculating
    gravitational forces for each particle in the
    simulation and updating their accelerations
  • This is the most compute intensive task in the

  • Java Gadget can achieve around 70 of performance
    of the C version.
  • This is acceptable performance given that Java
    has many extra safety features including array
    bounds checking.
  • Comparison is between a production quality C code
    against a Java code that could be optimized
  • The performance of Java Gadget-2 reinforces our
    belief that Java is a viable option for HPC.

  • MPJ Express is a MPI-like Java messaging system.
  • MPJE is a pure Java system, which has an
    architecture that can also use fast proprietary
    networks too.
  • The latency and bandwidth of MPJE is not far off
    that for native C.
  • The performance of Java Gadget-2 reinforces our
    belief that Java is a viable option for HPC. With
    careful programming, it is possible to achieve
    performance in the same general ballpark as C
  • MPJE is also being used in other projects,
  • MoSeS (Modelling and Simulation for e-Social
    Science) at Leeds.
  • CartaBlanca (Sandia) that is simulating
    non-linear physics on unstructured grids.

James Gosling Says
Distributed Applications
  • In a distributed environment remote entities
    (producers or consumers of services) need a means
    to publish their existence so that clients,
    needing their services, can search and find the
    appropriate ones that they can then interact with
  • The publication of information is via a registry
    service, and the interaction is via a high-level
    messaging service
  • Typically, separate libraries provide these two
  • Tycho is an implementation of a wide-area
    asynchronous messaging framework with an
    integrated distributed registry.
  • Frees developer from the need to assemble their
    applications from a range middleware offerings
  • Simplify and speed application development and
    allow developers to concentrate on their own
    domain of expertise.

Original Motivation
  • The resource-monitoring framework, known as
    GridRM, has a distributed architecture where
    information needs to flow between remote
  • Rather than reinvent a means of discovering and
    asynchronously transferring data between these
    end-points, a mature package was sought.

Original Motivation
  • Even though the original motivation behind this
    project was to find and integrate an exiting
    solution into GridRM, it became evident that most
    generic distributed applications have similar
  • This has led us believe that there is a general
    need for a system that provides a scalable
    registry and integrated wide-area messaging
  • In addition, as the Open Grid Services
    Architecture (OGSA) gains increasing acceptance
    in the e-Science community, a system that
    combines MOM with a generic registry will be a
    key aspect of these Service-Oriented
    Architectures (SOA).
  • Tycho is an implementation of a wide-area
    asynchronous messaging framework with an
    integrated distributed registry.

  • Tycho is a Java-based framework based on a
    publish, subscribe and bind paradigm.
  • Design Philosophy
  • We believe that the system should have an
    architecture similar to the Internet, where every
    node provides reliable core services, and the
    complexity is kept, as far as possible, to the
  • The core services can be kept to the minimum, and
    endpoints can provide higher-level and more
    sophisticated services, that may fail, but will
    not cause the overall system to crash.
  • We have kept Tychos core small, simple and
    efficient, so that it has a minimal memory
    foot-print, is easy to install, and is capable of
    providing robust and reliable services.
  • More sophisticated services can then be built on
    this core and are provided via libraries and
    tools to applications.
  • Allows Tycho to be flexible and extensible so
    that it will be possible to incorporate
    additional features and functionality.

  • Tycho consists of the following components
  • Mediators that allow producers and consumers to
    discover each other and establish remote
  • Consumers that typically subscribe to receive
    information or events from producers,
  • Producers that gather and publish information for
  • There is an asynchronous messaging API.
  • In Tycho, producers and/or consumers (clients)
    can publish their existence in a Virtual Registry
  • A client uses the VR to locate other clients,
    which act as a source or sink for the data they
    are interested in.
  • The VR is a distributed service provided by a
    network of mediators.
  • When possible, clients communicate directly
  • For clients that do not have direct access to
    the Internet, the mediator provides wide-area
    connectivity by acting as a gateway or proxy into
    a localised Tycho installation.

The Architecture of Tycho
Layered View of Tychos Architecture
Tycho Messaging Tests
  • Performance Tests of Tycho against NaradaBrokering

Performance Tests (Messaging)
  • NaradaBrokering
  • The NaradaBrokering framework is a distributed
    messaging infrastructure, developed by the
    Community Grids Lab at Indiana University.
  • NaradaBrokering is an asynchronous messaging
    infrastructure with a publish and subscribe based
  • NaradaBrokering is Sun JMS compliant. This
    messaging standard allows application components
    to exchange unified messages in an asynchronous
  • The JMS specification is used to develop Message
    Orientated Middleware (MOM) and defines how
    messages are to be communicated via queues or
  • Networks of collaborating brokers are arranged in
    a cluster topology, with a hierarchy of clusters,
    super-clusters, and super-super-clusters.
  • It aims to provide a unified messaging
    environment that incorporates the capability to
    support Grid and Web Services, Peer-to-Peer and
    video conferencing, within a SOA.

Tycho vs NaradaBrokering
Baseline Java
Baseline Java
  • Ping-Pong tests.
  • In the test the consumer is run on one host and
    the producer on
  • another, they communicate over Fast Ethernet
    via sockets.
  • The Tycho consumer and producer communicate
    directly using
  • sockets, only using the mediator to bootstrap
    the test.
  • The NaradaBrokering consumer and producer also
    use sockets to
  • send the messages, but these messages are
    routed via the broker.

Tycho vs NaradaBrokering
  • The scalability tests designed to measure the
    performance of
  • Tycho and NaradaBrokering as the number of
    producers or
  • consumers is increased.
  • In both tests, the single consumer/producer and
  • are started on separate nodes within the test
    cluster, with the
  • remaining nodes used to run clients.
  • Initial experiments showed us that fourteen
    clients are sufficient
  • to saturate the Fast Ethernet network

  • Latency and Throughput
  • When looking at end-to-end performance, on a LAN
    for messages less than 2 Kbytes, Tycho and
    NaradaBrokering have comparable performance.
  • Tycho achieves 95 bandwidth, whereas NB only
  • Tycos performance is inhibited by the fact that
    it creates a new socket for each message sent
  • NB reuses sockets instances once they have been
  • Scalability Summary
  • The scalability tests have shown Tycho and
    NaradaBrokering producers and consumers to be
    stable under heavy load
  • Performance is weaker when there is a large ratio
    of consumers to producers.
  • The heap size for NB becomes a limiting factor in
    when a broker is receiving messages faster than
    it can send them, as the internal message buffer
    fills until the heap is consumed.
  • Tycho tests were performed without modifying the
    heap size, throttling used as there is limited

Tycho Registry Tests
  • Performance Tests against Globuss MDS4 and
    gLites R-GMA

  • MDS4
  • The Globus Toolkits Monitoring and Discovery
    Service (MDS version 4) is a Web Services
    Resource Framework (WSRF) based implementation of
    a wide-area information and registry service.
  • MDS4 provides a framework that can be used to
    collect, index and expose data about the state of
    grid resources and services.
  • Typical uses for MDS4 include making resource
    data available for decision making in job
    submission services or notifying an administrator
    when storage space is running low on a cluster.

  • R-GMA
  • A Java-based implementation of the Grid
    Monitoring Architecture (GMA) for publishing
    network monitoring information over the wide-area
    and as an information service.
  • R-GMA uses a relational model to search, using an
    SQL-like API, and describe the monitoring
    information it collects.
  • based on a consumer/producer paradigm with client
    data being stored in a directory service, which
    presents it as a virtual database.
  • R-GMA uses the term tuples for sets of data being
    published or consumed.

Tycho compared to R-GMA and MDS4
  • For the tests we created a set of randomly
    generated strings to act as attributes for
    records to be inserted into the registries.
  • A single record, with no mark up, had an average
    size of 114 bytes.
  • Two tests were used to assess the performance of
    the registries
  • S1 simulates a client searching the registry
    for records matching some known attributes
  • Systematic queries are generated using a function
    to select a record name at random from the test
    data to guarantee the query will only match one
  • S2 measures the worst-case scenario of the
    client requesting all of the records from within
    the registry.

Tycho compared to R-GMA and MDS4
Query Response Time Versus the Number of Records
for a Query When Selecting a Single Random Record
From the Registry (S1)
Query Response Time Versus the Number of Records
for a Query that Selects All the Records From the
Registry (S2)
Tycho compared to R-GMA and MDS4
  • When testing the effect of the number of records
    on response time, we see that when selecting a
    single record from 100,000
  • Tycho 32 seconds faster than R-GMA,
  • MDS4 runs out of heap space for larger records
  • We also ran tests where there were multiple
    client accessing the registries
  • Again Tycho's VR had a lower response latency
    than R-GMA and MDS4
  • With 100 clients Tycho was 94 seconds faster than
    R-GMA and 65 seconds faster than MDS4.
  • The results highlight that one of the strengths
    of our implementation is its performance under
  • Tycho's performance is linear with regard to both
    increasing numbers of clients and response sizes.

  • We designed Tycho to have a relatively small,
    simple and efficient core, so that it has a
    minimal memory footprint, is easy to install, and
    is capable of providing robust and reliable
  • More sophisticated services can then be built on
    this core and be provided via libraries and tools
    to applications
  • This provides us with a flexible and extensible
    framework where it is possible to incorporate
    additional feature and functionality, which are
    created as producers or consumers, which do not
    affect the core.
  • Tychos functionality has all been incorporated
    within a single Java JAR and requires only Java
    1.5 JDK for building and running applications.

  • Tycho performance is comparable to that of
    NaradaBrokering, a more mature system
  • Certain features of NaradaBrokering are superior
    to those of Tycho, but its memory utilisation and
    indirect communications are limiting features.
  • Whereas, compared to MDS4 and R-GMA, Tycho shows
    superior performance and scalability to both
    these systems
  • In addition, we would argue that both MDS4 and
    R-GMA have problems with memory utilisation and
    without significant extra effort limited

Overall Conclusions
  • We have discussed two Java-based middleware
  • MPJ Express, a thread-safe implementation of
    MPI-like bindings for Java
  • MPJE has all the high level MPI features,
  • MPJEs performance, though not as fast as native
    C using MPICH, has been shown to be comparable
    and is improving,
  • Its other features, such as portability and
    thread-safety, make up for the performance gap
    and should encourage its take-up.
  • Tycho is a combined wide area-messaging framework
    with a built-in distributed registry (VR)
  • The results of communication benchmarks,
    comparing Tycho to NB, shows that these systems
    have comparable performance,
  • Whereas the performance of Tychos VR, compared
    to R-GMA and MDS4 demonstrates that it is
    significantly faster and more scalable,
  • Tycho currently provides unique services that
    should free up application developers to
    concentrate on their own domain of expertise.

Overall Conclusions
  • In this talk I have discussed the use of Java for
    parallel and distributed computing.
  • Even though sceptics may still feel that Java is
    less than ideal, the community of middleware and
    application developers is increasingly using Java
    and its related technologies to produce quality
  • Java has an extensive number of features,
    libraries and tools that provide a rich
    development environment.
  • This uptake is no doubt assisted by the fact that
    Java is often the first language learnt by
    students in educational and other institutions.
  • Downloads
  • MPJE - http//
  • Tycho - http//

Thank you for listening
  • Questions!
