Parallel and Distributed Computing with Java - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Parallel and Distributed Computing with Java

Description:

Java first came to public attention in 1995 within a year, it was being ... James Gosling Says... 2 July, 2006. mark.baker_at_computer.org. Distributed Applications ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 52
Provided by: MarkB153
Category:

less

Transcript and Presenter's Notes

Title: Parallel and Distributed Computing with Java


1
Parallel and Distributed Computing with Java
  • Prof Mark Baker
  • ACET, University of Reading Tel 44 118 378
    8615 E-mail Mark.Baker_at_computer.org
  • Web http//acet.rdg.ac.uk/mab

2
Outline
  • The Advantages of Java.
  • Popular Java Applications.
  • Java Messaging - MPJ Express
  • Introduction,
  • Architecture,
  • Performance,
  • Gadget,
  • Conclusions.
  • Wide area messaging and registry Tycho.
  • Motivation,
  • Architecture.
  • Comparative tests,
  • Conclusions.
  • Summary and conclusions.

3
Introduction
  • Java first came to public attention in 1995
    within a year, it was being speculated that it
    would be a good language for parallel and
    distributed computing.
  • Its core features, including being objected
    oriented and platform independence, as well as
    having built-in network support and threads, has
    encouraged this view.
  • Today, Java is being used in almost every type of
    computer-based system
  • Sensor networks to high performance computing
    platforms, and
  • Enterprise applications through to complex
    research-based simulations.

4
Some Advantages of Java
  • More reliable
  • C has essentially no runtime error checking and
    memory allocation/retrieval is manual
  • In Java, memory management is automatic, and many
    errors, such as buffer overflows, and stray
    pointers are impossible.
  • Built-in exception handling, which C lacks
    completely.
  • More Secure
  • A robust and mature security model, which has
    been extensively tested by the community at
    large.
  • Java compilers, interpreters, and runtime systems
    have come a long way in the last five years too
  • The execution of well-written Java code can now
    be on a par with well written C or C code
  • Java code is executed by a Java Virtual Machine
    (JVM), which can be an interpreter, a JIT
    (Just-In-Time) compiler, or an adaptive
    optimising engine such as HotSpot.

5
Some Advantages of Java
  • Java development tools
  • e.g. Eclipse platform, and a large number of open
    source, free software that has been made
    available by the community of programmers.
  • If nothing else, this software can be a starting
    place to develop new ideas and more sophisticated
    applications.
  • Portability
  • Java applications can execute with little or no
    change on multiple hardware platforms where a
    compliant JVM exists.
  • Compelling benefit and argument for using Java.
  • Programmer productivity
  • At least two times greater with Java.
  • A lot can be done in a short amount of time with
    Java because it has such an extensive library of
    functions already built into the language,
    integrated development environments, and a wide
    selection of supporting tools.

6
Popular Java Applications
  • Apache Tomcat
  • The Tomcat project started at Sun Microsystems as
    the reference implementation of a server that
    supports Servlet and JSP specifications.
  • The Tomcat code base was donated by Sun to the
    Apache Software Foundation in 1999. Since then,
    volunteers from numerous organisations have
    contributed to the server.
  • There have been multiple major releases and the
    product has enjoyed considerable take-up from
    academia and industry.

7
Popular Java Applications
8
Popular Java Applications
9
Popular Java Applications
  • Web Portals
  • Modern software can be complex, often this makes
    it difficult to install, configure and use.
  • This has motivated many groups to develop a
    variety of portal-based frameworks as the
    mechanism to manage information in a cohesive and
    structured fashion.
  • Portals have many advantages, which is why they
    have become the de facto standard for web-based
    application delivery.

10
Popular Java Applications
11
Popular Java Applications
12
Popular Java Applications
  • Berkeley DB Java Edition
  • A high performance, transactional storage engine,
    which supports full ACID transactions and
    recovery that is written entirely in Java.
  • It stores data in the application's native
    format consequently no runtime data translation
    is required.
  • Berkeley DB Java Edition was designed from the
    ground up in Java and takes advantage of the Java
    environment.
  • The architecture of Berkeley DB Java Edition
    supports high performance and concurrency for
    both read-intensive and write-intensive workloads.

13
Popular Java Applications
14
Popular Java Applications
  • Jabber
  • Jabber is an open source, secure, ad-free
    alternative to consumer-based instance messaging
    services, such as MSN or Yahoo! Messenger.
  • Jabber is based on a set of streaming XML
    protocols and technologies that enable any two
    entities on the Internet to exchange messages,
    presence, and other structured information.

15
Popular Java Applications
16
Java Messaging
  • Introduction
  • As a response to the appearance of several
    prototype MPI-like libraries, the Message-Passing
    Working Group of the Java Grande Forum was formed
    in late 1998.
  • This working group came up with an initial draft
    of an API, which was distributed at
    Supercomputing in 1998.
  • Two APIs, mpiJava 1.2 and MPJ, have been
    proposed
  • Main difference lies in the naming conventions of
    variables and functions.
  • There have been various efforts over the last
    decade to develop a Java messaging system, follow
    one of three approaches
  • JNI to interact with a underlying native MPI
    runtime
  • Java messaging from scratch using the likes of
    RMI
  • Communications using Java Sockets.
  • Experience gained with these implementations
    suggests that there is no universal approach that
    satisfies, often conflicting requirements of end
    users.

17
Java Messaging
  • The Paradox
  • The fastest implementation is via JNI using the
    likes of mpiJava
  • This introduces various issues
  • Compatibility with underlying MPI,
  • Breaks programming model,
  • Compromises portability!
  • Using 100 pure Java ensures portability
  • But it might not provide the most efficient
    solution, especially in the presence of commodity
    high-performance hardware.
  • It is important to address these contradictory
    requirements of portability and high performance,
    in the design of Java messaging systems.

18
MPJ Express
  • To address this, MPJE (MPJ Express) has been
    developed project started early 2004.
  • MPJE is an implementation of a MPI-like API for
    Java
  • The higher-level concepts, such as communicators
    (inter and intra), virtual topologies, and
    derived datatypes have been implemented in pure
    Java.
  • MPJE uses a unique buffering API (mpjbuf) that
    allows explicit memory management at the
    application level.
  • Both Java and JNI communication devices can use
    this buffering layer, by making use of direct
    byte buffers for the actual storage.
  • Currently, two communication devices have been
    implemented
  • niodev based on Java NIO (New I/O),
  • mxdev based on Myrinet eXpress (MX).
  • MPJE includes a runtime system that permits
    boot-strapping of processes across remote nodes.

19
MPJ Express - Architecture
  • The high and base levels rely on the mpjdev and
    the xdev level for actual communications and
    interaction with the underlying networking
    hardware.
  • Two implementations of the mpjdev are envisaged
  • JNI wrappers to native MPI implementations,
  • Use of a lower level device, called xdev, to
    provide access to Java sockets or specialised
    communication libraries.

20
MPJE - Performance
Latency
21
MPJE - Performance
Bandwidth
22
MPJE - Performance
  • Bandwidth
  • The one-byte latency of MPJE is 164 microseconds,
    approximately 20 slower than MPICH
  • Difference due to
  • Complexity introduced by the thread-safety
    algorithm, which requires synchronized.
  • MPJE uses an intermediate buffering layer
  • Implies additional copying,
  • A faster alternative on proprietary networks like
    Myrinet.
  • Direct byte buffers reside outside the JVMs
    heap, possible to get pointers to these buffers
    from C - avoid copying between JVM/JNI.
  • Throughput
  • mpiJava achieves 84.
  • Difference due to additional data copying in JNI.
  • MPICH and MPJE achieve 89 and 87 respectively.
  • mpjdev achieves 90 bandwidth which shows that
    the main overhead for MPJE is packing/unpacking
    data onto buffers
  • The drop at 128 Kbytes is due to the change of
    communication protocol from eager send to
    rendezvous.

23
MPJ Express Meets Gadget
  • To help establish the practicality of real
    scientific computing using MPJE, the parallel
    cosmological simulation code, Gadget-2, was
    ported from C to Java.
  • Gadget-2 is a massively parallel structure
    formation code developed by Volker Springel at
    the Max Planck Institute of Astrophysics.
  • Versions of Gadget-2 has been used in various
    research papers in astrophysics literature,
    including the noteworthy Millennium Simulation
    the largest ever model of the Universe.

24
Gadget
  • A 3-dimensional visualization of the Millennium
    Simulation.
  • The movie shows a journey through the simulated
    universe.
  • On the way, we visit a rich cluster of galaxies
    and fly around it.
  • During the two minutes of the movie, we travel a
    distance for which light would need more than 2.4
    billion years.

25
J- Gadget
  • Gadget-2 was manually translated to Java.
  • The data structures were deliberately kept
    similar so that a cross-reference to the original
    source code could be made for debugging purposes.
  • Gadget-2 uses the GNU Scientific Library (GSL), a
    parallel version of Fastest Fourier Transforms in
    the West (FFTW), and a MPI library.
  • We used the Barnes-Hut tree algorithm for
    calculating gravitational forces and for
    communication, we used MPJE.
  • The main simulation loop involves calculating
    gravitational forces for each particle in the
    simulation and updating their accelerations
  • This is the most compute intensive task in the
    simulation.

26
Performance
  • Java Gadget can achieve around 70 of performance
    of the C version.
  • This is acceptable performance given that Java
    has many extra safety features including array
    bounds checking.
  • Comparison is between a production quality C code
    against a Java code that could be optimized
    further.
  • The performance of Java Gadget-2 reinforces our
    belief that Java is a viable option for HPC.

27
Summary
  • MPJ Express is a MPI-like Java messaging system.
  • MPJE is a pure Java system, which has an
    architecture that can also use fast proprietary
    networks too.
  • The latency and bandwidth of MPJE is not far off
    that for native C.
  • The performance of Java Gadget-2 reinforces our
    belief that Java is a viable option for HPC. With
    careful programming, it is possible to achieve
    performance in the same general ballpark as C
    code.
  • MPJE is also being used in other projects,
    including
  • MoSeS (Modelling and Simulation for e-Social
    Science) at Leeds.
  • CartaBlanca (Sandia) that is simulating
    non-linear physics on unstructured grids.

28
James Gosling Says
29
Distributed Applications
  • In a distributed environment remote entities
    (producers or consumers of services) need a means
    to publish their existence so that clients,
    needing their services, can search and find the
    appropriate ones that they can then interact with
    directly.
  • The publication of information is via a registry
    service, and the interaction is via a high-level
    messaging service
  • Typically, separate libraries provide these two
    services.
  • Tycho is an implementation of a wide-area
    asynchronous messaging framework with an
    integrated distributed registry.
  • Frees developer from the need to assemble their
    applications from a range middleware offerings
  • Simplify and speed application development and
    allow developers to concentrate on their own
    domain of expertise.

30
Original Motivation
  • The resource-monitoring framework, known as
    GridRM, has a distributed architecture where
    information needs to flow between remote
    gateways.
  • Rather than reinvent a means of discovering and
    asynchronously transferring data between these
    end-points, a mature package was sought.

31
Original Motivation
  • Even though the original motivation behind this
    project was to find and integrate an exiting
    solution into GridRM, it became evident that most
    generic distributed applications have similar
    requirements.
  • This has led us believe that there is a general
    need for a system that provides a scalable
    registry and integrated wide-area messaging
    support.
  • In addition, as the Open Grid Services
    Architecture (OGSA) gains increasing acceptance
    in the e-Science community, a system that
    combines MOM with a generic registry will be a
    key aspect of these Service-Oriented
    Architectures (SOA).
  • Tycho is an implementation of a wide-area
    asynchronous messaging framework with an
    integrated distributed registry.

32
Tycho
  • Tycho is a Java-based framework based on a
    publish, subscribe and bind paradigm.
  • Design Philosophy
  • We believe that the system should have an
    architecture similar to the Internet, where every
    node provides reliable core services, and the
    complexity is kept, as far as possible, to the
    edges.
  • The core services can be kept to the minimum, and
    endpoints can provide higher-level and more
    sophisticated services, that may fail, but will
    not cause the overall system to crash.
  • We have kept Tychos core small, simple and
    efficient, so that it has a minimal memory
    foot-print, is easy to install, and is capable of
    providing robust and reliable services.
  • More sophisticated services can then be built on
    this core and are provided via libraries and
    tools to applications.
  • Allows Tycho to be flexible and extensible so
    that it will be possible to incorporate
    additional features and functionality.

33
Tycho
  • Tycho consists of the following components
  • Mediators that allow producers and consumers to
    discover each other and establish remote
    communications,
  • Consumers that typically subscribe to receive
    information or events from producers,
  • Producers that gather and publish information for
    consumers.
  • There is an asynchronous messaging API.
  • In Tycho, producers and/or consumers (clients)
    can publish their existence in a Virtual Registry
    (VR).
  • A client uses the VR to locate other clients,
    which act as a source or sink for the data they
    are interested in.
  • The VR is a distributed service provided by a
    network of mediators.
  • When possible, clients communicate directly
  • For clients that do not have direct access to
    the Internet, the mediator provides wide-area
    connectivity by acting as a gateway or proxy into
    a localised Tycho installation.

34
The Architecture of Tycho
35
Layered View of Tychos Architecture
36
Tycho Messaging Tests
  • Performance Tests of Tycho against NaradaBrokering

37
Performance Tests (Messaging)
  • NaradaBrokering
  • The NaradaBrokering framework is a distributed
    messaging infrastructure, developed by the
    Community Grids Lab at Indiana University.
  • NaradaBrokering is an asynchronous messaging
    infrastructure with a publish and subscribe based
    architecture.
  • NaradaBrokering is Sun JMS compliant. This
    messaging standard allows application components
    to exchange unified messages in an asynchronous
    system.
  • The JMS specification is used to develop Message
    Orientated Middleware (MOM) and defines how
    messages are to be communicated via queues or
    topics.
  • Networks of collaborating brokers are arranged in
    a cluster topology, with a hierarchy of clusters,
    super-clusters, and super-super-clusters.
  • It aims to provide a unified messaging
    environment that incorporates the capability to
    support Grid and Web Services, Peer-to-Peer and
    video conferencing, within a SOA.

38
NaradaBrokering
39
Tycho vs NaradaBrokering
WAN
Baseline Java
LAN
WAN
Baseline Java
LAN
  • Ping-Pong tests.
  • In the test the consumer is run on one host and
    the producer on
  • another, they communicate over Fast Ethernet
    via sockets.
  • The Tycho consumer and producer communicate
    directly using
  • sockets, only using the mediator to bootstrap
    the test.
  • The NaradaBrokering consumer and producer also
    use sockets to
  • send the messages, but these messages are
    routed via the broker.

40
Tycho vs NaradaBrokering
  • The scalability tests designed to measure the
    performance of
  • Tycho and NaradaBrokering as the number of
    producers or
  • consumers is increased.
  • In both tests, the single consumer/producer and
    mediator/broker
  • are started on separate nodes within the test
    cluster, with the
  • remaining nodes used to run clients.
  • Initial experiments showed us that fourteen
    clients are sufficient
  • to saturate the Fast Ethernet network

41
Summary
  • Latency and Throughput
  • When looking at end-to-end performance, on a LAN
    for messages less than 2 Kbytes, Tycho and
    NaradaBrokering have comparable performance.
  • Tycho achieves 95 bandwidth, whereas NB only
    65.3.
  • Tycos performance is inhibited by the fact that
    it creates a new socket for each message sent
  • NB reuses sockets instances once they have been
    created.
  • Scalability Summary
  • The scalability tests have shown Tycho and
    NaradaBrokering producers and consumers to be
    stable under heavy load
  • Performance is weaker when there is a large ratio
    of consumers to producers.
  • The heap size for NB becomes a limiting factor in
    when a broker is receiving messages faster than
    it can send them, as the internal message buffer
    fills until the heap is consumed.
  • Tycho tests were performed without modifying the
    heap size, throttling used as there is limited
    buffering.

42
Tycho Registry Tests
  • Performance Tests against Globuss MDS4 and
    gLites R-GMA

43
Registries
  • MDS4
  • The Globus Toolkits Monitoring and Discovery
    Service (MDS version 4) is a Web Services
    Resource Framework (WSRF) based implementation of
    a wide-area information and registry service.
  • MDS4 provides a framework that can be used to
    collect, index and expose data about the state of
    grid resources and services.
  • Typical uses for MDS4 include making resource
    data available for decision making in job
    submission services or notifying an administrator
    when storage space is running low on a cluster.

44
Registries
  • R-GMA
  • A Java-based implementation of the Grid
    Monitoring Architecture (GMA) for publishing
    network monitoring information over the wide-area
    and as an information service.
  • R-GMA uses a relational model to search, using an
    SQL-like API, and describe the monitoring
    information it collects.
  • based on a consumer/producer paradigm with client
    data being stored in a directory service, which
    presents it as a virtual database.
  • R-GMA uses the term tuples for sets of data being
    published or consumed.

45
Tycho compared to R-GMA and MDS4
  • For the tests we created a set of randomly
    generated strings to act as attributes for
    records to be inserted into the registries.
  • A single record, with no mark up, had an average
    size of 114 bytes.
  • Two tests were used to assess the performance of
    the registries
  • S1 simulates a client searching the registry
    for records matching some known attributes
  • Systematic queries are generated using a function
    to select a record name at random from the test
    data to guarantee the query will only match one
    record.
  • S2 measures the worst-case scenario of the
    client requesting all of the records from within
    the registry.

46
Tycho compared to R-GMA and MDS4
Query Response Time Versus the Number of Records
for a Query When Selecting a Single Random Record
From the Registry (S1)
Query Response Time Versus the Number of Records
for a Query that Selects All the Records From the
Registry (S2)
47
Tycho compared to R-GMA and MDS4
  • When testing the effect of the number of records
    on response time, we see that when selecting a
    single record from 100,000
  • Tycho 32 seconds faster than R-GMA,
  • MDS4 runs out of heap space for larger records
    sizes.
  • We also ran tests where there were multiple
    client accessing the registries
  • Again Tycho's VR had a lower response latency
    than R-GMA and MDS4
  • With 100 clients Tycho was 94 seconds faster than
    R-GMA and 65 seconds faster than MDS4.
  • The results highlight that one of the strengths
    of our implementation is its performance under
    load
  • Tycho's performance is linear with regard to both
    increasing numbers of clients and response sizes.

48
Conclusions
  • We designed Tycho to have a relatively small,
    simple and efficient core, so that it has a
    minimal memory footprint, is easy to install, and
    is capable of providing robust and reliable
    services.
  • More sophisticated services can then be built on
    this core and be provided via libraries and tools
    to applications
  • This provides us with a flexible and extensible
    framework where it is possible to incorporate
    additional feature and functionality, which are
    created as producers or consumers, which do not
    affect the core.
  • Tychos functionality has all been incorporated
    within a single Java JAR and requires only Java
    1.5 JDK for building and running applications.

49
Conclusions
  • Tycho performance is comparable to that of
    NaradaBrokering, a more mature system
  • Certain features of NaradaBrokering are superior
    to those of Tycho, but its memory utilisation and
    indirect communications are limiting features.
  • Whereas, compared to MDS4 and R-GMA, Tycho shows
    superior performance and scalability to both
    these systems
  • In addition, we would argue that both MDS4 and
    R-GMA have problems with memory utilisation and
    without significant extra effort limited
    scalability.

50
Overall Conclusions
  • We have discussed two Java-based middleware
    systems
  • MPJ Express, a thread-safe implementation of
    MPI-like bindings for Java
  • MPJE has all the high level MPI features,
  • MPJEs performance, though not as fast as native
    C using MPICH, has been shown to be comparable
    and is improving,
  • Its other features, such as portability and
    thread-safety, make up for the performance gap
    and should encourage its take-up.
  • Tycho is a combined wide area-messaging framework
    with a built-in distributed registry (VR)
  • The results of communication benchmarks,
    comparing Tycho to NB, shows that these systems
    have comparable performance,
  • Whereas the performance of Tychos VR, compared
    to R-GMA and MDS4 demonstrates that it is
    significantly faster and more scalable,
  • Tycho currently provides unique services that
    should free up application developers to
    concentrate on their own domain of expertise.

51
Overall Conclusions
  • In this talk I have discussed the use of Java for
    parallel and distributed computing.
  • Even though sceptics may still feel that Java is
    less than ideal, the community of middleware and
    application developers is increasingly using Java
    and its related technologies to produce quality
    software.
  • Java has an extensive number of features,
    libraries and tools that provide a rich
    development environment.
  • This uptake is no doubt assisted by the fact that
    Java is often the first language learnt by
    students in educational and other institutions.
  • Downloads
  • MPJE - http//dsg.port.ac.uk/projects/mpj/
  • Tycho - http//dsg.port.ac.uk/projects/tycho/

52
Thank you for listening
  • Questions!
Write a Comment
User Comments (0)
About PowerShow.com