Distributed Shared Memory

About This Presentation

Title:

Distributed Shared Memory

Description:

reduction (op : var) e.g. add, logical OR. A local copy of the ... buf address of receive buffer (var param) count max no. of elements in receive buffer ( =0) ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 53

Provided by: SAJ80

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Shared Memory

1
Distributed Shared Memory

Distributed Shared Memory (DSM) Systems build the
shared memory abstract on top of the distributed
memory machines
The users have a virtual global address space and
the message passing underneath is sorted out by
DSM transparently from the users
Then we can use shared memory programming
techniques
Software of implementing DSM http//www.ics.uci.ed
u/javid/dsm/page.html

2
Three types of DSM implementations

Page-based technique
The virtual global address space is divided into
equal sized chunks (pages) which are spread over
the machines
Page is the minimal sharing unit
The request by a process to access a non-local
piece of memory results in a page fault
a trap occurs and the DSM software fetches the
required page of memory and restarts the
instruction
a decision has to be made whether to replicate
pages or maintain only one copy of any page and
move it around the network
The granularity of the pages has to be decided
before implementation

3
Three types of DSM implementations

Shared-variable based technique
only the variables and data structures required
by more than one process are shared.
Variable is minimal sharing unit
Trade-off between consistency and network traffic

4
Three types of DSM implementations

Object-based technique
memory can be conceptualized as an abstract space
filled with objects (including data and methods)
Object is minimal sharing unit
Trade-off between consistency and network traffic

5
OpenMP

OpenMP stands for Open specification for
Multi-processing
used to assist compilers to understand and
parallelise the serial code better
Can be used to specify shared memory parallelism
in Fortran, C and C programs
OpenMP is a specification for
a set of compiler directives,
RUN TIME library routines, and
environment variables
Started mid-late 80s with emergence of shared
memory parallel computers with proprietary
directive-driven programming environments
OpenMP is industry standard

6
OpenMP

OpenMP specifications include
OpenMP 1.0 for Fortran, 1997
OpenMP 1.0 for C/C, 1998
OpenMP 2.0 for Fortran, 2000
OpenMP 2.0 for C/C , 2002
OpenMP 2.5 for C/C and Fortran, 2005
OpenMP Architecture Review Board Compaq, HP,
IBM, Intel, SGI, SUN

7
OpenMP programming model

Shared Memory, thread-based parallelism
Explicit parallelism
Fork-join model

8
OpenMP code structure in C

include ltomp.hgt
main ()
int var1, var2, var3
Serial code
/Beginning of parallel section. Fork a team
of threads. Specify variable scoping/
pragma omp parallel private(var1, var2)
shared(var3)
Parallel section executed by all
threads
All threads join master thread and
disband
Resume serial code

9
OpenMP code structure in Fortran

PROGRAM HELLO INTEGER VAR1, VAR2, VAR3 Serial
code . . . !Beginning of parallel section. Fork
a team of threads. Specify variable scoping
!OMP PARALLEL PRIVATE(VAR1, VAR2) SHARED(VAR3)
Parallel section executed by all threads . .
. All threads join master thread and disband
!OMP END PARALLEL Resume serial code . . .
END

10
OpenMP Directives Format

C/C
Fortran

11
OpenMP features

OpenMP directives are ignored by compilers that
dont support OpenMP, so codes can also be run on
sequential machines
Compiler directives used to specify
sections of code that can be executed in parallel
critical sections
Scope of variables (private or shared)
Mainly used to parallelize loops, e.g. separate
threads to handle separate iterations of the loop
There is also a run-time library that has several
useful routines for checking the number of
threads and number of processors, changing the
number of threads, etc

12
Fork-Join Model

Multiple threads are created using the parallel
construct
For C and C
pragma omp parallel
... do stuff
For Fortran
!OMP PARALLEL
... do stuff
!OMP END PARALLEL

13
How many threads generated

The number of threads in a parallel region is
determined by the following factors, in order of
precedence
Use of the omp_set_num_threads() library function
Setting of the OMP_NUM_THREADS environment
variable
Implementation default - the number of CPUs on a
node
Threads are numbered from 0 (master thread) to
N-1

14
Parallelizing loops in OpenMP Work Sharing
construct

Compiler directive specifies that loop can be
done in parallel
For C and C
pragma omp parallel for
for (i0iiltN)
valuei compute(i)
For Fortran
!OMP PARALLEL DO
DO (i1N)
value(i) compute(i)
END DO
!OMP END PARALLEL DO
Can use thread scheduling to specify partition
and allocation of iterations to threads
pragma omp parallel for schedule(static,4)
schedule(static ,chunk)
Deal out blocks of iterations of size chunk to
each thread
schedule(dynamic ,chunk)

15
Synchronisation in OpenMP

Critical construct
Barrier construct

16
Example of Critical Section in OpenMP

include ltomp.hgt
main()
int x
x 0
pragma omp parallel shared(x)
pragma omp critical
x x1
/ end of parallel section /

17
Example of Barrier in OpenMP

include ltomp.hgt include ltstdio.hgt int main
(int argc, char argv) int th_id,
nthreads pragma omp parallel
private(th_id) th_id
omp_get_thread_num() printf("Hello
World from thread d\n", th_id)
pragma omp barrier if ( th_id 0 )
nthreads
omp_get_num_threads()
printf("There are d threads\n",nthreads)
return 0

18
Data Scope Attributes in OpenMP

OpenMP Data Scope Attribute Clauses are used to
explicitly define how variables should be scoped
These clauses are used in conjunction with
several directives (e.g. PARALLEL, DO/for) to
control the scoping of enclosed variables
Three often encountered clauses
Shared
Private
Reduction

19
Shared and private data in OpenMP

private(var) creates a local copy of var for each
thread
shared(var) states that var is a global variable
to be shared among threads
Default data storage attribute is shared

!OMP PARALLEL DO !OMP PRIVATE(xx,yy)
SHARED(u,f) DO j 1,m DO i 1,n
xx -1.0 dx (i-1) yy -1.0 dy
(j-1) u(i,j) 0.0 f(i,j)
-alpha (1.0-xxxx) (1.0-yyyy) END
DO END DO !OMP END PARALLEL DO
20
Reduction Clause

Reduction -
reduction (op var)
e.g. add, logical OR. A local copy of the
variable is made for each thread. Reduction
operation done for each thread, then local values
combined to create global value

double ZZ, res0.0 pragma omp parallel for
reduction (res) private(ZZ) for (i1iltNi)
ZZ i res res ZZ
21
Run-Time Library Routines

Can perform a variety of functions, including
Query the number of threads/thread no.
Set number of threads

22
Run-Time Library Routines

query routines allow you to get the number of
threads and the ID of a specific thread
id omp_get_thread_num() //thread no.
Nthreads omp_get_num_threads() //number of
threads
Can specify number of threads at runtime
omp_set_num_threads(Nthreads)

23
Environment Variable

Controlling the execution of parallel code
Four environment variables
OMP_SCHEDULE how iterations of a loop are
scheduled
OMP_NUM_THREADS maximum number of threads
OMP_DYNAMIC enable or disable dynamic adjustment
of the number of threads
OMP_NESTED enable or disable nested parallelism

24
OpenMP compilers

Since parallelism is mostly achieved by
parallelising loops using shared memory, OpenMP
compilers work well for multiprocessor SMPs and
vector machines
OpenMP could work for distributed memory
machines, but would need to use a good
distributed shared memory (DSM) implementation
For more information on OpenMP, see
www.openmp.org

25
High Performance ComputingCourse Notes
2007-2008Message Passing Programming I
26
Message Passing Programming

Message Passing is the most widely used parallel
programming model
Message passing works by creating a number of
tasks, uniquely named, that interact by sending
and receiving messages to and from one another
(hence the message passing)
Generally, processes communicate through sending
the data from the address space of one process to
that of another
Communication of processes (via files, pipe,
socket)
Communication of threads within a process (via
global data area)
Programs based on message passing can be based on
standard sequential language programs (C/C,
Fortran), augmented with calls to library
functions for sending and receiving messages

27
Message Passing Interface (MPI)

MPI is a specification, not a particular
implementation
Does not specify process startup, error codes,
amount of system buffer, etc
MPI is a library, not a language
The goals of MPI functionality, portability and
efficiency
Message passing model gt MPI specification gt MPI
implementation

28
OpenMP vs MPI

In a nutshell
MPI is used on distributed-memory systems
OpenMP is used for code parallelisation on
shared-memory systems
Both are explicit parallelism
High-level control (OpenMP), lower-level control
(MPI)

29
A little history

Message-passing libraries developed for a number
of early distributed memory computers
By 1993 there were loads of vendor specific
implementations
By 1994 MPI-1 came into being
By 1996 MPI-2 was finalized

30
The MPI programming model

MPI standards -
MPI-1 (1.1, 1.2), MPI-2 (2.0)
Forwards compatibility preserved between versions
Standard bindings - for C, C and Fortran. Have
seen MPI bindings for Python, Java etc (all
non-standard)
We will stick to the C binding, for the lectures
and coursework. More info on MPI
www.mpi-forum.org
Implementations - For your laptop pick up MPICH
(free portable implementation of MPI
(http//www-unix.mcs.anl. gov/mpi/mpich/index.htm)
Coursework will use MPICH

31
MPI

MPI is a complex system comprising of 129
functions with numerous parameters and variants
Six of them are indispensable, but can write a
large number of useful programs already
Other functions add flexibility (datatype),
robustness (non-blocking send/receive),
efficiency (ready-mode communication), modularity
(communicators, groups) or convenience
(collective operations, topology).
In the lectures, we are going to cover most
commonly encountered functions

32
The MPI programming model

Computation comprises one or more processes that
communicate via library routines and sending and
receiving messages to other processes
(Generally) a fixed set of processes created at
outset, one process per processor
Different from PVM

33
Intuitive Interfaces for sending and receiving
messages

Send(data, destination), Receive(data, source)
minimal interface
Not enough in some situations, we also need
Message matching add message_id at both send
and receive interfaces
they become Send(data, destination, msg_id),
receive(data, source, msg_id)
Message_id
Is expressed using an integer, termed as message
tag
Allows the programmer to deal with the arrival of
messages in an orderly fashion (queue and then
deal with

34
How to express the data in the send/receive
interfaces

Early stages
(address, length) for the send interface
(address, max_length) for the receive interface
They are not always good
The data to be sent may not be in the contiguous
memory locations
Storing format for data may not be the same or
known in advance in heterogeneous platform
Enventually, a triple (address, count, datatype)
is used to express the data to be sent and
(address, max_count, datatype) for the data to be
received
Reflecting the fact that a message contains much
more structures than just a string of bits, For
example, (vector_A, 300, MPI_REAL)
Programmers can construct their own datatype
Now, the interfaces become send(address, count,
datatype, destination, msg_id) and
receive(address, max_count, datatype, source,
msg_id)

35
How to distinguish messages

Message tag is necessary, but not sufficient
So, communicator is introduced

36
Communicators

Messages are put into contexts
Contexts are allocated at run time by the system
in response to programmer requests
The system can guarantee that each generated
context is unique
The processes belong to groups
The notions of context and group are combined in
a single object, which is called a communicator
A communicator identifies a group of processes
and a communication context
The MPI library defines a initial communicator,
MPI_COMM_WORLD, which contains all the processes
running in the system
The messages from different process groups can
have the same tag
So the send interface becomes send(address,
count, datatype, destination, tag, comm)

37
Status of the received messages

The structure of the message status is added to
the receive interface
Status holds the information about source, tag
and actual message size
In the C language, source can be retrieved by
accessing status.MPI_SOURCE,
tag can be retrieved by status.MPI_TAG and
actual message size can be retrieved by calling
the function MPI_Get_count(status, datatype,
count)
The receive interface becomes receive(address,
maxcount, datatype, source, tag, communicator,
status)

38
How to express source and destination

The processes in a communicator (group) are
identified by ranks
If a communicator contains n processes, process
ranks are integers from 0 to n-1
Source and destination processes in the
send/receive interface are the ranks

39
Some other issues

In the receive interface, tag can be a wildcard,
which means any message will be received
In the receive interface, source can also be a
wildcard, which match any source

40
MPI basics

First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a message
buf address of send buffer
count no. of elements to send (gt0)
datatype of elements
dest process id of destination
tag message tag
comm communicator (handle)

41
MPI basics

First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a message
buf address of send buffer
count no. of elements to send (gt0)
datatype of elements
dest process id of destination
tag message tag
comm communicator (handle)

42
MPI basics

First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a message
buf address of send buffer
count no. of elements to send (gt0)
datatype of elements
dest process id of destination
tag message tag
comm communicator (handle)

43
MPI basics

First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Calculating the size of the data to be send
buf address of send buffer
count sizeof (datatype) bytes of data

44
MPI basics

First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a message
buf address of send buffer
count no. of elements to send (gt0)
datatype of elements
dest process id of destination
tag message tag
comm communicator (handle)

45
MPI basics

First six functions (C bindings)
MPI_Send (buf, count, datatype, dest, tag, comm)
Send a message
buf address of send buffer
count no. of elements to send (gt0)
datatype of elements
dest process id of destination
tag message tag
comm communicator (handle)

46
MPI basics

First six functions (C bindings)
MPI_Recv (buf, count, datatype, source, tag,
comm, status)
Receive a message
buf address of receive buffer (var param)
count max no. of elements in receive buffer
(gt0)
datatype of receive buffer elements
source process id of source process, or
MPI_ANY_SOURCE
tag message tag, or MPI_ANY_TAG
comm communicator
status status object

47
MPI basics

First six functions (C bindings)
MPI_Init (int argc, char argv)
Initiate a computation
argc (number of arguments) and argv (argument
vector) are main programs arguments
Must be called first, and once per process
MPI_Finalize ( )
Shut down a computation
The last thing that happens

48
MPI basics

First six functions (C bindings)
MPI_Comm_size (MPI_Comm comm, int size)
Determine number of processes in comm
comm is communicator handle, MPI_COMM_WORLD is
the default (including all MPI processes)
size holds number of processes in group
MPI_Comm_rank (MPI_Comm comm, int pid)
Determine id of current (or calling) process
pid holds id of current process

49
MPI basics a basic example

include "mpi.h" include ltstdio.hgt int
main(int argc, char argv)     int rank,
nprocs    MPI_Init(argc,argv)
MPI_Comm_size(MPI_COMM_WORLD,nprocs)
MPI_Comm_rank(MPI_COMM_WORLD,rank)
printf("Hello, world. I am d of d\n", rank,
nprocs)     MPI_Finalize()

mpirun np 4 myprog Hello, world. I am 1 of
4 Hello, world. I am 3 of 4 Hello, world. I am 0
of 4 Hello, world. I am 2 of 4
50
MPI basics send and recv example (1)

include "mpi.h"include ltstdio.hgt int
main(int argc, char argv)    int rank,
size, i    int buffer10    MPI_Status
status     MPI_Init(argc, argv)
MPI_Comm_size(MPI_COMM_WORLD, size)
MPI_Comm_rank(MPI_COMM_WORLD, rank)    if
(size lt 2)            printf("Please run with
two processes.\n")         MPI_Finalize()
   return 0        if (rank 0)
        for (i0 ilt10 i)
bufferi i        MPI_Send(buffer, 10,
MPI_INT, 1, 123, MPI_COMM_WORLD)

51
MPI basics send and recv example (2)

    if (rank 1)            for (i0 ilt10
i)            bufferi -1
MPI_Recv(buffer, 10, MPI_INT, 0, 123,
MPI_COMM_WORLD, status)        for (i0 ilt10
i)                    if (bufferi !
i)                printf("Error bufferd d
but is expected to be d\n", i, bufferi,
i)                MPI_Finalize()

52
MPI language bindings

Standard (accepted) bindings for Fortran, C and
C
Java bindings are work in progress
JavaMPI Java wrapper to native calls
mpiJava JNI wrappers
jmpi pure Java implementation of MPI library
MPIJ same idea
Java Grande Forum trying to sort it all out
We will use the C bindings

Write a Comment

User Comments (0)

About PowerShow.com

Distributed Shared Memory - PowerPoint PPT Presentation

Distributed Shared Memory

reduction (op : var) e.g. add, logical OR. A local copy of the ... buf address of receive buffer (var param) count max no. of elements in receive buffer ( =0) ... – PowerPoint PPT presentation