MPJ%20Express:%20An%20Implementation%20of%20Message%20Passing%20Interface%20(MPI)%20in%20Java - PowerPoint PPT Presentation

About This Presentation
Title:

MPJ%20Express:%20An%20Implementation%20of%20Message%20Passing%20Interface%20(MPI)%20in%20Java

Description:

The first approach is to use messaging libraries (packages) written in already ... Aamir Shafi, Bryan Carpenter, and Mark Baker ... – PowerPoint PPT presentation

Number of Views:367
Avg rating:3.0/5.0
Slides: 53
Provided by: mpjex
Category:

less

Transcript and Presenter's Notes

Title: MPJ%20Express:%20An%20Implementation%20of%20Message%20Passing%20Interface%20(MPI)%20in%20Java


1
MPJ Express An Implementation of Message Passing
Interface (MPI) in Java
  • Aamir Shafi
  • http//mpj-express.org
  • http//acet.rdg.ac.uk/projects/mpj

2
Writing Parallel Software
  • There are mainly two approaches for writing
    parallel software
  • Software that can be executed on parallel
    hardware to exploit computational and memory
    resources
  • The first approach is to use messaging libraries
    (packages) written in already existing languages
    like C, Fortran, and Java
  • Message Passing Interface (MPI)
  • Parallel Virtual Machine (PVM)
  • The second and more radical approach is to
    provide new languages
  • HPC has a history of novel parallel languages
  • High Performance Fortran (HPF)
  • Unified Parallel C (UPC)
  • In this talk we talk about an implementation of
    MPI in Java called MPJ Express

3
Introduction to Java for HPC
  • Java was released by Sun in 1996
  • A mainstream language in software industry,
  • Attractive features include
  • Portability,
  • Automatic garbage collection,
  • Type-safety at compile time and runtime,
  • Built-in support for multi-threading
  • A possible option to provide nested parallelism
    on multi-core systems,
  • Performance
  • Just-In-Time compilers convert source code to
    byte code,
  • Modern JVMs perform compilation from byte code to
    native machine code on the fly
  • But Java has safety features that may limit
    performance.

4
Introduction to Java for HPC
  • Three existing approaches to Java messaging
  • Pure Java (Sockets based),
  • Java Native Interface (JNI), and
  • Remote Method Invocation (RMI),
  • mpiJava has been perhaps the most popular Java
    messaging system
  • mpiJava (http//www.hpjava.org/mpiJava.html)
  • MPJ/Ibis (http//www.cs.vu.nl/ibis/mpj.html)
  • Motivation for a new Java messaging system
  • Maintain compatibility with Java threads by
    providing thread-safety,
  • Handle contradicting issues of high-performance
    and portability.

5
Distributed Memory Cluster
Proc 1
Proc 2
Proc 0
message
LAN Ethernet Myrinet Infiniband etc
Proc 3
Proc 7
Proc 6
Proc 4
Proc 5
6
(No Transcript)
7
Write machines files
8
Bootstrap MPJ Express runtime
9
Write Parallel Program
10
Compile and Execute
11
Introduction to MPJ Express
  • MPJ Express is an implementation of a Java
    messaging system, based on Java bindings
  • Will eventually supersede mpiJava.
  • Aamir Shafi, Bryan Carpenter, and Mark Baker
  • Thread-safe communication devices using Java NIO
    and Myrinet
  • Maintain compatibility with Java threads,
  • The buffering layer provides explicit memory
    management instead of relying on the garbage
    collector,
  • Runtime system for portable bootstrapping

12
James Gosling Says
13
Who is using MPJ Express?
  • First released in September 2005 under LGPL (an
    open-source licence)
  • Approximately 1000 users all around the world
  • Some projects using this software
  • Cartablanca is a simulation package that uses
    Jacobian-Free-Newton-Krylov (JFNK) methods to
    solve non-linear problems
  • The project is done at Los Alamos National Lab
    (LANL) in the US
  • Researchers at University of Leeds, UK have used
    this software in Modelling and Simulation in
    e-Social Science (MoSeS) project
  • Teaching Purposes
  • Parallel Programming using Java (PPJ)
  • http//www.sc.rwth-aachen.de/Teaching/Labs/PPJ05/
  • Parallel Processing SS 2006
  • http//tramberend.inform.fh-hannover.de/

14
MPJ Express Design
15
Presentation Outline
  • Implementation Details
  • Point-to-point communication
  • Communicators, groups, and contexts
  • Process topologies
  • Derived datatypes
  • Collective communications
  • MPJ Express Buffering Layer
  • Runtime System
  • Performance Evaluation

16
Java NIO Device
  • Uses non-blocking I/O functionality,
  • Implements two communication protocols
  • Eager-send protocol for small messages,
  • Rendezvous protocol for large messages,
  • Locks around communication methods results in
    deadlocks
  • In Java, the keyword synchronized ensures that
    only one object can call synchronized method at a
    time,
  • A process sending a message to itself using
    synchronous send,
  • Locks for thread-safety
  • Writing messages
  • A lock for send-communication-sets,
  • Locks for destination channels
  • One for every destination process,
  • Obtained one after the other,
  • Reading messages
  • A lock for receive-communication-sets.

17
Standard mode with eager send protocol (small
messages)
18
Standard mode with rendezvous protocol (large
messages)
19
MPJ Express Buffering Layer
  • MPJ Express requires a buffering layer
  • To use Java NIO
  • SocketChannels use byte buffers for data
    transfer,
  • To use proprietary networks like Myrinet
    efficiently,
  • Implement derived datatypes,
  • Various implementations are possible based on
    actual storage medium,
  • Direct or indirect ByteBuffers,
  • An mpjbuf buffer object consists of
  • A static buffer to store primitive datatypes,
  • A dynamic buffer to store serialized Java
    objects,
  • Creating ByteBuffers on the fly is costly
  • Memory management is based on Knuths buddy
    algorithm,
  • Two implementations of memory management.

20
MPJ Express Buffering Layer
  • Frequent creation and destruction of
    communication buffers hurts performance.
  • To tackle this, MPJ Express requires a buffering
    layer
  • Provides two implementations of Knuths buddy
    algorithm,
  • To use Java NIO and proprietary networks
  • Direct ByteBuffers,
  • Implement derived datatypes

21
Presentation Outline
  • Implementation Details
  • Point-to-point communication
  • Communicators, groups, and contexts
  • Process topologies
  • Derived datatypes
  • Collective communications
  • MPJ Express Buffering Layer
  • Runtime System
  • Performance Evaluation

22
Communicators, groups, and contexts
  • MPI provides a higher level abstraction to create
    parallel libraries
  • Safe communication space
  • Group scope for collective operations
  • Process Naming
  • Communicators Groups provide
  • Process Naming (instead of IP address ports)
  • Group scope for collective operations
  • Contexts
  • Safe communication

23
What is a group?
  • A data-structure that contains processes
  • Main functionality
  • Keep track of ranks of processes
  • Explanation of figure
  • Group A contains eight processes
  • Group B and C are created from Group A
  • All group operations are local (no communication
    with remote processes)

24
Example of a group operation(Union)
  • Explanation of union operation
  • Two processes a and d are in both groups
  • Thus, six processes are executing this operation
  • Each group has its own view of this group
    operations
  • Apply theory of relativity
  • Re-assigning ranks in new groups
  • Process 0 in group A is re-assigned rank 0 in
    Group C
  • Process 0 in group B is re-assigned rank 4 in
    Group C
  • If any existing process does not make it into the
    new group, it returns MPI.GROUP_EMPTY

25
What are communicators?
  • A data-structure that contains groups (and thus
    processes)
  • Why is it useful
  • Process naming, ranks are names for application
    programmers
  • Easier than IPaddress ports
  • Group communications as well as point to point
    communication
  • There are two types of communicators,
  • Intracommunicators
  • Communication within a group
  • Intercommunicators
  • Communication between two groups (must be
    disjoint)

26
What are contexts?
  • An unique integer
  • An additional tag on the messages
  • Each communicator has a distinct context that
    provides a safe communication universe
  • A context is agreed upon by all processes when a
    communicator is built
  • Intracommunicators has two contexts
  • One for point-to-point communications
  • One for collective communications,
  • Intercommunicators has two contexts
  • Explained in the coming slides

27
Process topologies
  • Used to specify processes in a geometric shape
  • Virtual topologies have no connection with the
    physical layout of machines
  • Its possible to make use of underlying machine
    architecture
  • These virtual topologies can be assigned to
    processes in an Intracommunicator
  • MPI provides
  • Cartesian topology
  • Graph topology

28
Cartesian topology Mapping four processes onto
2x2 topology
  • Each process is assigned a coordinate
  • Rank 0 (0,0)
  • Rank 1 (1,0)
  • Rank 2 (0,1)
  • Rank 3 (1,1)
  • Uses
  • Calculate rank by knowing grid (not globus one!)
    position
  • Calculate grid positions from ranks
  • Easier to locate rank of neighbours
  • Applications may have communication patterns
  • Lots of messaging with immediate neighbours

29
Periods in cartesian topology
  • Axis 1 (y-axis is periodic)
  • Processes in top and bottom rows have valid
    neighbours towards top and bottom respectively
  • Axis 0 (x-axis is non-periodic)
  • Processes in right and left column have undefined
    neighbour towards right and left respectively

30
Derived datatypes
  • Besides, basic datatypes, it is possible to
    communicate heterogeneous, non-contiguous data.
  • Contiguous
  • Indexed
  • Vector
  • Struct

31
Indexed datatype
  • The elements that may form this datatype should
    be
  • Same types
  • At non-contiguous locations
  • Add flexibility by specifying displacements
  • int SIZE 4 int blklen new intDIM,displ
    new intDIM
  • for(i0 iltDIM i)
  • blkleniDIM-i displi(iDIM)i
  • double params new doubleSIZESIZE
  • double rparams new doubleSIZESIZE
  • Datatype i Datatype.Indexed(blklen, displ,
    MPI.INT)
  • //array_of_block_lengths, array_displacements
  • Send(params,0,1,i,dst,tag) //0 is offset, 1 is
    count
  • Recv(rparams,0,1,i,src,tag)

32
(No Transcript)
33
Presentation Outline
  • Implementation Details
  • Point-to-point communication
  • Communicators, groups, and contexts
  • Process topologies
  • Derived datatypes
  • Collective communications
  • Runtime System
  • Thread-safety in MPJ Express
  • Performance Evaluation

34
Collective communications
  • Provided as a convenience for application
    developers
  • Save significant development time
  • Efficient algorithms may be used
  • Stable (tested)
  • Built on top of point-to-point communications,
  • These operations include
  • Broadcast, Barrier, Reduce, Allreduce, Alltoall,
    Scatter, Scan, Allscatter
  • Versions that allows displacements between the
    data

35
Broadcast, scatter, gather, allgather, alltoall
Image from MPI standard doc
36
Reduce collective operations
  • MPI.PROD
  • MPI.SUM
  • MPI.MIN
  • MPI.MAX
  • MPI.LAND
  • MPI.BAND
  • MPI.LOR
  • MPI.BOR
  • MPI.LXOR
  • MPI.BXOR
  • MPI.MINLOC
  • MPI.MAXLOC

37
Barrier with Tree Algorithm
38
Execution of barrier with eight processes
  • Eight processes, thus forms only one group
  • Each process exchanges an integer 4 times
  • Overlaps communications well

39
Intracomm.Bcast( )
  • Sends data from a process to all the other
    processes
  • Code from adlib
  • A communication library for HPJava
  • The current implementation is based on n-ary
    tree
  • Limitation broadcasts only from rank0
  • Generated dynamically
  • Cost O( log2(N) )
  • MPICH1.2.5 uses linear algorithm
  • Cost O(N)
  • MPICH2 has much improved algorithms
  • LAM/MPI uses n-ary trees
  • Limitation, broadcast from rank0

40
Broadcasting algorithm, total processes8, root0
41
Presentation Outline
  • Implementation Details
  • Point-to-point communication
  • Communicators, groups, and contexts
  • Process topologies
  • Derived datatypes
  • Collective communications
  • Runtime System
  • Thread-safety in MPJ Express
  • Performance Evaluation

42
The Runtime System
43
Thread-safety in MPI
  • The MPI 2.0 specification introduced the notion
    of thread-compliant MPI implementation,
  • Four levels of thread-safety
  • MPI_THREAD_SINGLE,
  • MPI_THREAD_FUNNELED,
  • MPI_THREAD_SERIALIZED,
  • MPI_THREAD_MULTIPLE,
  • A blocked thread should not halt the execution of
    other threads,
  • Issues in Developing Thread-Safe MPI
    Implementation by Gropp et al.

44
Presentation Outline
  • Implementation Details
  • Point-to-point communication
  • Communicators, groups, and contexts
  • Process topologies
  • Derived datatypes
  • Collective communications
  • Runtime System
  • Thread-safety in MPJ Express
  • Performance Evaluation

45
Latency on Fast Ethernet
46
Throughput on Fast Ethernet
47
Latency on Gigabit Ethernet
48
Throughput on GigE
49
Choking experience 1
50
Latency on Myrinet
51
Throughput on Myrinet
52
Questions
?
Write a Comment
User Comments (0)
About PowerShow.com