Porting NANOS on SDSM - PowerPoint PPT Presentation

About This Presentation
Title:

Porting NANOS on SDSM

Description:

Generic and multi-protocol DSM. DSM : lock/unlock. MP : LRPC. Linux, Solaris, Irix (32 bits) ... Lazy data consistency protocol in evaluation. eager consistency, ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 45
Provided by: ParaD9
Category:

less

Transcript and Presenter's Notes

Title: Porting NANOS on SDSM


1
Porting NANOS on SDSM
GOAL Porting a shared memory environment to
distributed memory. What is missing to current
SDSM ?
Christian Perez
2
Who am i ?
  • December 1999 PhD at LIP, ENS Lyon, France
  • Data parallel languages, distributed memory, load
    balancing, preemptive thread migration
  • Winter 1999/2000 TMR at UPC
  • OpenMP, Nanos, SDSM
  • October 2000 INRIA researcher
  • Distributed programs, code coupling

3
Contents
  • Motivation
  • Related works
  • Nanos execution model (NthLib)
  • Nanos on top of 2 SDSM (JIAJIA DSM-PM2)
  • Missing SDSM functionalities
  • Conclusion

4
Motivation
  • OpenMP emerging standard
  • simplicity (no data distribution)
  • Cluster of machines (mono or multiprocessors)
  • excellent ratio performance / price
  • OpenMP on top of a cluster !

5
OpenMP / Cluster HOW ?
  • OpenMP paradigm shared memory
  • Cluster paradigm message passing
  • Use of software DSM system !
  • Hardware DSM system SCI (write 2 ?s)
  • specific hardware
  • not yet stable

6
Related work
  • Several OpenMP/DSM implementations
  • OpenMP NOW!, Omni
  • But,
  • Modification of OpenMP semantics
  • One level of parallelism
  • Do not exploit high performance networks

7
OpenMP on classical DSM
  • Compiler extracts shared data from stack
  • Expensive local variable creation
  • shared memory allocation
  • Modification of OpenMP standard
  • default should be private instead of being
    shared variables
  • New synchronization primitives
  • condition variables semaphores

8
OpenMP on classical DSM
  • One level of parallelism (SPMD)

!omp parallel do do i 1,4 x(i) x(i)
x(i1) end do
call schedule(lb, up, ) do i lb, ub x(i)
x(i) x(i1) end do call dsm_barrier()
barrier
9
Omni compilation approach
Taken from pdplab.trc.rwcp.or.jp/pdperf/Omni/wgcc2
k/
10
Our goals
  • Support OpenMP standard
  • High performance
  • Allow exploitation of
  • multithreading (SMP)
  • high performance networks

11
Nanos OpenMP compiler
  • Convert an OpenMP program to a task graph
  • Communications via shared memory

!omp parallel do do i 1,4 x(i) x(i)
x(i1) end do
i1,2
i3,4
12
NthLib runtime support
  • Nanos compiler generates intermediate codes
  • Communications still via shared memory

call nthf_depadd() do nth_p 1, proc
nth nthf_create_1s(,f,) done call
nth_block() subroutine f() x(i) x(i)
x(i1)
13
NthLib details
  • Assumes to run on top of kernel threads
  • Provides user-level threads (QT)
  • Stack management (allocate)
  • Stack initialization (argument)
  • Explicit context switch

14
Nthlib queues
  • Global/Local
  • Thread descriptor
  • Rich functionalities
  • Work descriptor
  • High performance

15
Nthlib Memory management
Nano-thread descriptor
Successors
Stack
Guard zone
  • Mutal exclusion mmap allocation
  • SLOT_SIZE stack alignment

16
Porting Nthlib to SDSM
  • Data consistency
  • Shared memory management
  • Nanos threads
  • JIAJIA implementation
  • DSM-PM2 implementation
  • Summary of DSM requirements

17
Data consistency
  • Mutual exclusion for defined data structures
  • ? Acquire/Release
  • User level shared memory data
  • ? Barrier

18
Data consistency
  • Mutual exclusion for defined data structures
  • ? Acquire/Release
  • User level shared memory data
  • ? Barrier

barrier
barrier
barrier
19
Shared memory management
  • Asynchronous shared memory allocation
  • Alignment parameter (gt PAGE_SIZE)
  • Global variables/common declaration
  • ? Not yet supported

20
Nano-threads
  • Run-to-block execution model
  • Shared stacks (father/sons relationship)
  • Implicit thread migration (scheduler)

21
JIAJIA
  • Developed at China by W. Hu, W. Shi Z. Tang
  • Public domain DSM
  • User level DSM
  • DSM lock/unlock, barrier, cond. variables
  • MP send/receive, broadcast, reduce
  • Solaris, AIX, Irix, Linux, NT (not distributed)

22
JIAJIA Memory Allocation
  • No control of memory alignment (x2)
  • Synchronous memory allocation primitive
  • ? Development of an RPC version
  • Based on send/receive primitive
  • Add of a user level message handler
  • ? Problems
  • Global lock
  • Interference with JIAJIA blocking function

23
JIAJIA Discussion
  • Global barrier for data synchronization
  • ? Not multiple levels of parallelism
  • No thread aware
  • ? No efficient use of SMP nodes

24
DSM/PM2
  • Developed at LIP by G. Antoniu (PhD student)
  • Public domain
  • User level, module of PM2
  • Generic and multi-protocol DSM
  • DSM lock/unlock
  • MP LRPC
  • Linux, Solaris, Irix (32 bits)

25
PM2 organization
DSM
MAD1 TCP PVM MPI SCI VIA SBP
MAD2 TCP MPI SCI VIA BIP
MARCEL MONO SMP ACTIVATON
PM2
TBX
NTBX
http//www.pm2.org
26
DSM/PM2 Memory Allocation
  • Only static memory allocation
  • ? Build dynamic memory allocation primitive
  • Centralized memory allocation
  • LRPC to Node 0
  • ? Integration of alignment parameter
  • Summer 2000 dynamic memory allocation ready !

27
DSM/PM2 marcel descriptor
Page boundary
marcel_t
(spMASK)SLOT_SIZE
NthLib requirement a kernel thread ? many
nano-threads
28
DSM/PM2 marcel descriptor
Page boundary
marcel_t
(spMASK)SLOT_SIZE
marcel_t
Page boundary
((spMASK)SLOT_SIZE)
29
DSM/PM2 Discussion
  • Using page level sequential consistency
  • no need of barrier (Multiple levels of
    parallelism)
  • False sharing
  • ? Dedicated stack layout

marcel_t
Page boundary
Pad
Page boundary
30
DSM/PM2 Discussion (cont)
  • No alternate stack for signal handler
  • ? Prefetch page before context switch O(n)
  • ? Pad to next page before opening parallelism

Page boundary
Shared data
Pad
Page boundary
31
DSM/PM2 improvement
  • Availability of an asynchronous DSM malloc
  • Lazy data consistency protocol in evaluation
  • eager consistency, multiple writer
  • scope consistency
  • Support for stack in shared memory (LINUX)

32
DSM/PM2 shared stack support
marcel_t
SEGV stack
(spMASK)SLOT_SIZE
33
DSM/PM2 shared stack support
marcel_t
SEGV stack
(spMASK)SLOT_SIZE
34
DSM/PM2 shared stack support
marcel_t
SEGV stack
SEGV stack
(spMASK)SLOT_SIZE
35
DSM/PM2 shared stack support
marcel_t
SEGV stack
SEGV stack
(spMASK)SLOT_SIZE
36
DSM/PM2 shared stack support
marcel_t
SEGV stack
SEGV stack
(spMASK)SLOT_SIZE
37
DSM/PM2 shared stack support
marcel_t
SEGV stack
(spMASK)SLOT_SIZE
38
DSM requirement
  • Support of static global shared variables
  • Efficient code
  • remove one indirection level
  • Enable use of classical compiler
  • Support for common
  • ?  Sharedization  of already allocated memory
  • dsm_to_shared(void p, size_t size)

39
DSM requirement
  • Support for multiple level of parallelism
  • Partial barrier
  • group management
  • Dependencies support
  • like acquire/release
  • but without lock

40
DSM requirement
  • Support for multiple level of parallelism
  • Partial barrier
  • group management
  • Dependencies support
  • like acquire/release
  • but without lock

barrier
barrier
41
DSM requirement
  • Support for multiple level of parallelism
  • Partial barrier
  • group management
  • Dependencies support
  • like acquire/release
  • but without lock

barriers
barrier
42
DSM requirement
  • Support for multiple level of parallelism
  • Partial barrier
  • group management
  • Dependencies support
  • like acquire/release
  • but without lock

start(1)
start(2)
stop(1)
stop(2)
update(1,2)
43
Summary of DSM requirements
  • Support of static global shared variables
  • ?  Sharedization  of already allocated memory
  • Acquire/release primitive
  • Partial barrier
  • ? group management
  • Asynchronous shared memory allocation
  • Alignment parameter to memory allocation
  • Threads (SMP nodes)
  • Optimized stack management

44
Conclusion
  • Successfully port Nanos to 2 DSM
  • ? JIAJIA DSM-PM2
  • DSM requirement to obtain performance
  • ? Support MIMD model
  • ? Automatic thread migration
  • Performance ?

45
Optimized stack management
  • Virtual address range memory reservation
  • Page creation (mmap) on demand
  • Alternate stack for handler
  • ? Minimize the number of created pages
  • ? Reduce message size on thread migration
  • ? Allow potential huge stacks
Write a Comment
User Comments (0)
About PowerShow.com