Distributed Shared Memory - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Shared Memory

Description:

reduction (op : var) e.g. add, logical OR. A local copy of the ... buf address of receive buffer (var param) count max no. of elements in receive buffer ( =0) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 53
Provided by: SAJ80
Category:

less

Transcript and Presenter's Notes

Title: Distributed Shared Memory


1
Distributed Shared Memory
  • Distributed Shared Memory (DSM) Systems build the
    shared memory abstract on top of the distributed
    memory machines
  • The users have a virtual global address space and
    the message passing underneath is sorted out by
    DSM transparently from the users
  • Then we can use shared memory programming
    techniques
  • Software of implementing DSM http//www.ics.uci.ed
    u/javid/dsm/page.html

2
Three types of DSM implementations
  • Page-based technique
  • The virtual global address space is divided into
    equal sized chunks (pages) which are spread over
    the machines
  • Page is the minimal sharing unit
  • The request by a process to access a non-local
    piece of memory results in a page fault
  • a trap occurs and the DSM software fetches the
    required page of memory and restarts the
    instruction
  • a decision has to be made whether to replicate
    pages or maintain only one copy of any page and
    move it around the network
  • The granularity of the pages has to be decided
    before implementation

3
Three types of DSM implementations
  • Shared-variable based technique
  • only the variables and data structures required
    by more than one process are shared.
  • Variable is minimal sharing unit
  • Trade-off between consistency and network traffic

4
Three types of DSM implementations
  • Object-based technique
  • memory can be conceptualized as an abstract space
    filled with objects (including data and methods)
  • Object is minimal sharing unit
  • Trade-off between consistency and network traffic

5
OpenMP
  • OpenMP stands for Open specification for
    Multi-processing
  • used to assist compilers to understand and
    parallelise the serial code better
  • Can be used to specify shared memory parallelism
    in Fortran, C and C programs
  • OpenMP is a specification for
  • a set of compiler directives,
  • RUN TIME library routines, and
  • environment variables
  • Started mid-late 80s with emergence of shared
    memory parallel computers with proprietary
    directive-driven programming environments
  • OpenMP is industry standard

6
OpenMP
  • OpenMP specifications include
  • OpenMP 1.0 for Fortran, 1997
  • OpenMP 1.0 for C/C, 1998
  • OpenMP 2.0 for Fortran, 2000
  • OpenMP 2.0 for C/C , 2002
  • OpenMP 2.5 for C/C and Fortran, 2005
  • OpenMP Architecture Review Board Compaq, HP,
    IBM, Intel, SGI, SUN

7
OpenMP programming model
  • Shared Memory, thread-based parallelism
  • Explicit parallelism
  • Fork-join model

8
OpenMP code structure in C
  • include ltomp.hgt
  • main ()
  • int var1, var2, var3
  • Serial code
  • /Beginning of parallel section. Fork a team
    of threads. Specify variable scoping/
  • pragma omp parallel private(var1, var2)
    shared(var3)
  • Parallel section executed by all
    threads
  • All threads join master thread and
    disband
  • Resume serial code

9
OpenMP code structure in Fortran
  • PROGRAM HELLO INTEGER VAR1, VAR2, VAR3 Serial
    code . . . !Beginning of parallel section. Fork
    a team of threads. Specify variable scoping
    !OMP PARALLEL PRIVATE(VAR1, VAR2) SHARED(VAR3)
    Parallel section executed by all threads . .
    . All threads join master thread and disband
    !OMP END PARALLEL Resume serial code . . .
    END

10
OpenMP Directives Format
  • C/C
  • Fortran

11
OpenMP features
  • OpenMP directives are ignored by compilers that
    dont support OpenMP, so codes can also be run on
    sequential machines
  • Compiler directives used to specify
  • sections of code that can be executed in parallel
  • critical sections
  • Scope of variables (private or shared)
  • Mainly used to parallelize loops, e.g. separate
    threads to handle separate iterations of the loop
  • There is also a run-time library that has several
    useful routines for checking the number of
    threads and number of processors, changing the
    number of threads, etc

12
Fork-Join Model
  • Multiple threads are created using the parallel
    construct
  • For C and C
  • pragma omp parallel
  • ... do stuff
  • For Fortran
  • !OMP PARALLEL
  • ... do stuff
  • !OMP END PARALLEL

13
How many threads generated
  • The number of threads in a parallel region is
    determined by the following factors, in order of
    precedence
  • Use of the omp_set_num_threads() library function
  • Setting of the OMP_NUM_THREADS environment
    variable
  • Implementation default - the number of CPUs on a
    node
  • Threads are numbered from 0 (master thread) to
    N-1

14
Parallelizing loops in OpenMP Work Sharing
construct
  • Compiler directive specifies that loop can be
    done in parallel
  • For C and C
  • pragma omp parallel for
  • for (i0iiltN)
  • valuei compute(i)
  • For Fortran
  • !OMP PARALLEL DO
  • DO (i1N)
  • value(i) compute(i)
  • END DO
  • !OMP END PARALLEL DO
  • Can use thread scheduling to specify partition
    and allocation of iterations to threads
  • pragma omp parallel for schedule(static,4)
  • schedule(static ,chunk)
  • Deal out blocks of iterations of size chunk to
    each thread
  • schedule(dynamic ,chunk)

15
Synchronisation in OpenMP
  • Critical construct
  • Barrier construct

16
Example of Critical Section in OpenMP
  • include ltomp.hgt
  • main()
  • int x
  • x 0
  • pragma omp parallel shared(x)
  • pragma omp critical
  • x x1
  • / end of parallel section /

17
Example of Barrier in OpenMP
  • include ltomp.hgt include ltstdio.hgt int main
    (int argc, char argv) int th_id,
    nthreads pragma omp parallel
    private(th_id) th_id
    omp_get_thread_num() printf("Hello
    World from thread d\n", th_id)
    pragma omp barrier if ( th_id 0 )
    nthreads
    omp_get_num_threads()
    printf("There are d threads\n",nthreads)
    return 0

18
Data Scope Attributes in OpenMP
  • OpenMP Data Scope Attribute Clauses are used to
    explicitly define how variables should be scoped
  • These clauses are used in conjunction with
    several directives (e.g. PARALLEL, DO/for) to
    control the scoping of enclosed variables
  • Three often encountered clauses
  • Shared
  • Private
  • Reduction

19
Shared and private data in OpenMP
  • private(var) creates a local copy of var for each
    thread
  • shared(var) states that var is a global variable
    to be shared among threads
  • Default data storage attribute is shared

!OMP PARALLEL DO !OMP PRIVATE(xx,yy)
SHARED(u,f) DO j 1,m DO i 1,n
xx -1.0 dx (i-1) yy -1.0 dy
(j-1) u(i,j) 0.0 f(i,j)
-alpha (1.0-xxxx) (1.0-yyyy) END
DO END DO !OMP END PARALLEL DO
20
Reduction Clause
  • Reduction -
  • reduction (op var)
  • e.g. add, logical OR. A local copy of the
    variable is made for each thread. Reduction
    operation done for each thread, then local values
    combined to create global value

double ZZ, res0.0 pragma omp parallel for
reduction (res) private(ZZ) for (i1iltNi)
ZZ i res res ZZ
21
Run-Time Library Routines
  • Can perform a variety of functions, including
  • Query the number of threads/thread no.
  • Set number of threads

22
Run-Time Library Routines
  • query routines allow you to get the number of
    threads and the ID of a specific thread
  • id omp_get_thread_num() //thread no.
  • Nthreads omp_get_num_threads() //number of
    threads
  • Can specify number of threads at runtime
  • omp_set_num_threads(Nthreads)

23
Environment Variable
  • Controlling the execution of parallel code
  • Four environment variables
  • OMP_SCHEDULE how iterations of a loop are
    scheduled
  • OMP_NUM_THREADS maximum number of threads
  • OMP_DYNAMIC enable or disable dynamic adjustment
    of the number of threads
  • OMP_NESTED enable or disable nested parallelism

24
OpenMP compilers
  • Since parallelism is mostly achieved by
    parallelising loops using shared memory, OpenMP
    compilers work well for multiprocessor SMPs and
    vector machines
  • OpenMP could work for distributed memory
    machines, but would need to use a good
    distributed shared memory (DSM) implementation
  • For more information on OpenMP, see
  • www.openmp.org

25
High Performance ComputingCourse Notes
2007-2008Message Passing Programming I
26
Message Passing Programming
  • Message Passing is the most widely used parallel
    programming model
  • Message passing works by creating a number of
    tasks, uniquely named, that interact by sending
    and receiving messages to and from one another
    (hence the message passing)
  • Generally, processes communicate through sending
    the data from the address space of one process to
    that of another
  • Communication of processes (via files, pipe,
    socket)
  • Communication of threads within a process (via
    global data area)
  • Programs based on message passing can be based on
    standard sequential language programs (C/C,
    Fortran), augmented with calls to library
    functions for sending and receiving messages

27
Message Passing Interface (MPI)
  • MPI is a specification, not a particular
    implementation
  • Does not specify process startup, error codes,
    amount of system buffer, etc
  • MPI is a library, not a language
  • The goals of MPI functionality, portability and
    efficiency
  • Message passing model gt MPI specification gt MPI
    implementation

28
OpenMP vs MPI
  • In a nutshell
  • MPI is used on distributed-memory systems
  • OpenMP is used for code parallelisation on
    shared-memory systems
  • Both are explicit parallelism
  • High-level control (OpenMP), lower-level control
    (MPI)

29
A little history
  • Message-passing libraries developed for a number
    of early distributed memory computers
  • By 1993 there were loads of vendor specific
    implementations
  • By 1994 MPI-1 came into being
  • By 1996 MPI-2 was finalized

30
The MPI programming model
  • MPI standards -
  • MPI-1 (1.1, 1.2), MPI-2 (2.0)
  • Forwards compatibility preserved between versions
  • Standard bindings - for C, C and Fortran. Have
    seen MPI bindings for Python, Java etc (all
    non-standard)
  • We will stick to the C binding, for the lectures
    and coursework. More info on MPI
    www.mpi-forum.org
  • Implementations - For your laptop pick up MPICH
    (free portable implementation of MPI
    (http//www-unix.mcs.anl. gov/mpi/mpich/index.htm)
  • Coursework will use MPICH

31
MPI
  • MPI is a complex system comprising of 129
    functions with numerous parameters and variants
  • Six of them are indispensable, but can write a
    large number of useful programs already
  • Other functions add flexibility (datatype),
    robustness (non-blocking send/receive),
    efficiency (ready-mode communication), modularity
    (communicators, groups) or convenience
    (collective operations, topology).
  • In the lectures, we are going to cover most
    commonly encountered functions

32
The MPI programming model
  • Computation comprises one or more processes that
    communicate via library routines and sending and
    receiving messages to other processes
  • (Generally) a fixed set of processes created at
    outset, one process per processor
  • Different from PVM

33
Intuitive Interfaces for sending and receiving
messages
  • Send(data, destination), Receive(data, source)
  • minimal interface
  • Not enough in some situations, we also need
  • Message matching add message_id at both send
    and receive interfaces
  • they become Send(data, destination, msg_id),
    receive(data, source, msg_id)
  • Message_id
  • Is expressed using an integer, termed as message
    tag
  • Allows the programmer to deal with the arrival of
    messages in an orderly fashion (queue and then
    deal with

34
How to express the data in the send/receive
interfaces
  • Early stages
  • (address, length) for the send interface
  • (address, max_length) for the receive interface
  • They are not always good
  • The data to be sent may not be in the contiguous
    memory locations
  • Storing format for data may not be the same or
    known in advance in heterogeneous platform
  • Enventually, a triple (address, count, datatype)
    is used to express the data to be sent and
    (address, max_count, datatype) for the data to be
    received
  • Reflecting the fact that a message contains much
    more structures than just a string of bits, For
    example, (vector_A, 300, MPI_REAL)
  • Programmers can construct their own datatype
  • Now, the interfaces become send(address, count,
    datatype, destination, msg_id) and
    receive(address, max_count, datatype, source,
    msg_id)

35
How to distinguish messages
  • Message tag is necessary, but not sufficient
  • So, communicator is introduced

36
Communicators
  • Messages are put into contexts
  • Contexts are allocated at run time by the system
    in response to programmer requests
  • The system can guarantee that each generated
    context is unique
  • The processes belong to groups
  • The notions of context and group are combined in
    a single object, which is called a communicator
  • A communicator identifies a group of processes
    and a communication context
  • The MPI library defines a initial communicator,
    MPI_COMM_WORLD, which contains all the processes
    running in the system
  • The messages from different process groups can
    have the same tag
  • So the send interface becomes send(address,
    count, datatype, destination, tag, comm)

37
Status of the received messages
  • The structure of the message status is added to
    the receive interface
  • Status holds the information about source, tag
    and actual message size
  • In the C language, source can be retrieved by
    accessing status.MPI_SOURCE,
  • tag can be retrieved by status.MPI_TAG and
  • actual message size can be retrieved by calling
    the function MPI_Get_count(status, datatype,
    count)
  • The receive interface becomes receive(address,
    maxcount, datatype, source, tag, communicator,
    status)

38
How to express source and destination
  • The processes in a communicator (group) are
    identified by ranks
  • If a communicator contains n processes, process
    ranks are integers from 0 to n-1
  • Source and destination processes in the
    send/receive interface are the ranks

39
Some other issues
  • In the receive interface, tag can be a wildcard,
    which means any message will be received
  • In the receive interface, source can also be a
    wildcard, which match any source

40
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

41
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

42
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

43
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Calculating the size of the data to be send
  • buf address of send buffer
  • count sizeof (datatype) bytes of data

44
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

45
MPI basics
  • First six functions (C bindings)
  • MPI_Send (buf, count, datatype, dest, tag, comm)
  • Send a message
  • buf address of send buffer
  • count no. of elements to send (gt0)
  • datatype of elements
  • dest process id of destination
  • tag message tag
  • comm communicator (handle)

46
MPI basics
  • First six functions (C bindings)
  • MPI_Recv (buf, count, datatype, source, tag,
    comm, status)
  • Receive a message
  • buf address of receive buffer (var param)
  • count max no. of elements in receive buffer
    (gt0)
  • datatype of receive buffer elements
  • source process id of source process, or
    MPI_ANY_SOURCE
  • tag message tag, or MPI_ANY_TAG
  • comm communicator
  • status status object

47
MPI basics
  • First six functions (C bindings)
  • MPI_Init (int argc, char argv)
  • Initiate a computation
  • argc (number of arguments) and argv (argument
    vector) are main programs arguments
  • Must be called first, and once per process
  • MPI_Finalize ( )
  • Shut down a computation
  • The last thing that happens

48
MPI basics
  • First six functions (C bindings)
  • MPI_Comm_size (MPI_Comm comm, int size)
  • Determine number of processes in comm
  • comm is communicator handle, MPI_COMM_WORLD is
    the default (including all MPI processes)
  • size holds number of processes in group
  • MPI_Comm_rank (MPI_Comm comm, int pid)
  • Determine id of current (or calling) process
  • pid holds id of current process

49
MPI basics a basic example
  • include "mpi.h" include ltstdio.hgt int
    main(int argc, char argv)     int rank,
    nprocs    MPI_Init(argc,argv)    
    MPI_Comm_size(MPI_COMM_WORLD,nprocs)    
    MPI_Comm_rank(MPI_COMM_WORLD,rank)    
    printf("Hello, world.  I am d of d\n", rank,
    nprocs)     MPI_Finalize()

mpirun np 4 myprog Hello, world. I am 1 of
4 Hello, world. I am 3 of 4 Hello, world. I am 0
of 4 Hello, world. I am 2 of 4
50
MPI basics send and recv example (1)
  • include "mpi.h"include ltstdio.hgt int
    main(int argc, char argv)    int rank,
    size, i    int buffer10    MPI_Status
    status     MPI_Init(argc, argv)   
    MPI_Comm_size(MPI_COMM_WORLD, size)   
    MPI_Comm_rank(MPI_COMM_WORLD, rank)    if
    (size lt 2)            printf("Please run with
    two processes.\n")         MPI_Finalize()     
       return 0        if (rank 0)   
            for (i0 ilt10 i)           
    bufferi i        MPI_Send(buffer, 10,
    MPI_INT, 1, 123, MPI_COMM_WORLD)   

51
MPI basics send and recv example (2)
  •     if (rank 1)            for (i0 ilt10
    i)            bufferi -1       
    MPI_Recv(buffer, 10, MPI_INT, 0, 123,
    MPI_COMM_WORLD, status)        for (i0 ilt10
    i)                    if (bufferi !
    i)                printf("Error bufferd d
    but is expected to be d\n", i, bufferi,
    i)                MPI_Finalize()

52
MPI language bindings
  • Standard (accepted) bindings for Fortran, C and
    C
  • Java bindings are work in progress
  • JavaMPI Java wrapper to native calls
  • mpiJava JNI wrappers
  • jmpi pure Java implementation of MPI library
  • MPIJ same idea
  • Java Grande Forum trying to sort it all out
  • We will use the C bindings
Write a Comment
User Comments (0)
About PowerShow.com