Title: GAMMA: An Efficient Distributed Shared Memory Toolbox for MATLAB
1GAMMA An Efficient Distributed Shared Memory
Toolbox for MATLAB
- Rajkiran Panuganti1,
- Muthu Baskaran1, Jarek Nieplocha2,
- Ashok Krishnamurthy3, Atanas Rountev1, P.
Sadayappan1 -
- 1 The Ohio State University
- 2 PNNL
- 3 Ohio Supercomputer Center
2Overview
- Motivation
- GAMMA Programming Model
- Implementation Overview
- Experimental Evaluation
- Conclusions
3High Productivity Computing
- Programmers productivity is extremely important
- C/Fortran Good performance but poor
productivity - Parallel Programming in C/Fortran even harder
- MATLAB, Python etc. Good programmer
productivity - Poor performance and inability to run large scale
problems (memory limitations)
4MATLAB and High Productivity
- Numerous features resulting in High Programmer
Productivity - Array Based Semantics
- Copy/Value based semantics
- Debugging and Profiling Support
- Integrated Development Environment
- Numerous Domain Specific libraries (Toolboxes)
- Visualization
- And a lot more......
- Need to retain above features while addressing
performance Issues
5Problem
Out-Of-Memory!
Out-Of-Memory!
Performance!
199 sec
10.19 s
6ParaM - Parallel MATLAB
USER
DParaM
GAMMA
Specialized Libraries
mexMPI
Library Writers
Compiler
MATLAB
GA MVAPICH
GA MVAPICH
7Overview
- Motivation
- GAMMA Programming Model
- Implementation Overview
- Experimental Evaluation
- Conclusions
8Programming Model
- Global Shared View of the distributed Array
Physical View
Logical View
(1,1)
P1
(250,75)
P0
P2
P3
(700,610)
(1024,1024)
A GA(1024, 1024,distr) Block
A(250700,75610)
9Programming Model (Contd..)
- Get-Compute-Put Computation Model
Get()
Put()
Put()
Process 0
Get()
Compute
Process 1
Compute
10Other features in the Programming Model enabling
Efficiency
- Pass-by-reference semantics for distributed
arrays - Intended for Library writers
- Management of Data Locality (NUMA)
- Distribution information can be retrieved by the
programmer - Reference based access to the local data
- Data replication
- Support for replicating near-neighbor data
11Other features in the Programming Model enabling
Efficiency Contd..
- Asynchronous operations
- Support for Library Writers
- Interoperable with Message Passing
- Message Passing support using mexMPI
- Interoperable with some other Parallel MATLAB
projects - Interoperable with pMATLAB, Mathworks DCT
12Illustration by Example (FFT2) 2D FFT
- rank, nprocs Begin()
- dims N N distr N N/nprocs
- A GA(dims, distr)
- tmplocal(A) GET()
- tmp fft(tmp) Compute()
- Put(A,tmp) PUT()
- Sync()
- ATmp GA(A)
- Transpose(A,ATmp) Collective Ops
- Tmp local(ATmp)
- Put(ATmp,fft(Tmp))
- Sync()
- Transpose(ATmp,A)
- GA_End()
Transpose
13Software Architecture
User
MATLAB Front-End
GAMMA
mexMPI
GA
MATLAB Computation Engine
MPI
SCALAPACK
14Overview
- Motivation
- GAMMA Programming Model
- Implementation Overview
- Experimental Evaluation
- Conclusions
15Evaluation
- OSC Pentium 4 Cluster
- Two 2.4 GHz Intel P4 processors per node, Linux
kernel 2.6.6 , 4GB RAM, - MVAPICH 0.9.4
- Infiniband
- MATLAB Version 7.01
- Fully distributed environment
- Evaluation using NAS Benchmarks
16Programmability
Moderate Increase in SLOC
Moderate Increase in SLOC
Moderate Increase in SLOC
Slight Increase in SLOC
17Performance Analysis
18Performance Analysis
19Use of reference-based semantics
20Speedup on Large Problem Sizes
21Related Work
- Early 90s MPI Cluster Programming
- 1995 Why there isnt a Parallel MATLAB?
Cleve Moler - Embarrassingly Parallel
- Paralize(98) Multi(00) PLab(00)
Parmatlab(01) - Message Passing
- MultiMatlab(96) PT(96) DPToolbox(99)
MATmarks(99) PMI(99) MPITB/PVMTB(00)
CMTM(01) - Compilation Based
- Conlab(93) Falcon(95) ParAL(95) Otter(98)
Menhir(98) MaJIC(98) MATCH(00)
RTExpress(00) - Backend Support
- Matpar(98) DLab(99) Netsolve(01)
Paramat(01)
22Related Work (Currently Active)
- Star-P (97) MIT
- MatlabMPI(98) pMATLAB(02) MIT-LL
- File-based Message Passing Communication
- MATLAB_D (00) Rice
- Telescoping Compilation HPF JIT Compilation
- ParaM (04) OSU OSC
- Mathworks(04) MDCE/MDCT
23Conclusions
- Discussed an efficient Distributed Shared Memory
Toolbox for MATLAB - Programming Model and Efficiency features of the
toolbox - Demonstrated efficiency using NAS Benchmarks
- Download available upon request
24Questions ?
- Contact
- panugant_at_cse.ohio-state.edud
25Backup
- NAS FT A
- NAS EP A
- Implementation Issues
26Performance Analysis Contd
27Implementation Issues
- Different Memory managers
- Automated Book Keeping
- Data layout inconsistencies
- In-Place Operations
- Data movement between different workspaces
- Out-of-order and irregular accesses
28(No Transcript)