MPI in uClinux on Microblaze - PowerPoint PPT Presentation

About This Presentation

Title:

MPI in uClinux on Microblaze

Description:

MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006 Project Proposal Port uClinux to work on Microblaze Add MPI implementation on top of uClinux ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 21

Provided by: vlsiCsBer

Learn more at: http://vlsi.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: MPI in uClinux on Microblaze

1
MPI in uClinux on Microblaze

Neelima Balakrishnan
Khang Tran
05/01/2006

2
Project Proposal

Port uClinux to work on Microblaze
Add MPI implementation on top of uClinux
Configure NAS parallel benchmarks and port them
to work on RAMP

3
What is Microblaze?

Soft core processor, implemented using general
logic primitives
32-bit Harvard RISC architecture
Supported in the Xilinx Spartan and Virtex series
of FPGAs
Customizability of the core makes it challenging
while opening up vistas for kernel configurations

4
Components

uClinux - kernel v2.4
MPICH2 - portable, high performance
implementation of the entire MPI-2 standard
Communication via different channels -sockets,
shared memory, etc.
MPI port for Microblaze communication is over FSL

5
Components (contd.)

NASPB v2.4 - MPI-based source code
implementations written and distributed by NAS
5 kernels
3 pseudo-applications

6
Porting uClinux to Microblaze

Done by Dr. John Williams - Embedded Systems
group, University of Queensland in Brisbane,
Australia
Part of their reconfigurable computing research
program. The work on this is still going on
http//www.itee.uq.edu.au/jwilliams/mblaze-uclinu
x

7
Challenge in porting uClinux to Microblaze

Linux derivative for microprocessors that lack a
memory management unit (MMU)
No memory protection
No virtual memory
For most user applications, the fork() system
call is unavailable
malloc() function call needs to be modified

8
MPI implementation

MPI Message Passing Interface
Standard API used to create parallel applications
Designed primarily to support the SPMD (single
program multiple data) model
Advantage over older message passing libraries
Portability
Fast as each implementation is optimized for the
hardware it runs on

9
Interactions between Application and MPI
Other processors .
Communication Channel
MPI process manager
MPI process manager
MPI interface
MPI interface
Initiating application
Application on other processors
10
NAS parallel benchmarks

Set of 8 programs intended to aid in evaluating
the performance of parallel supercomputers
Derived from computational fluid dynamics (CFD)
applications,
5 kernels
3 pseudo-applications
Used NPB2.4 version MPI-based source code
implementation

11
Phases

Studied the uClinux and found the initial port
done for Microblaze
Latest kernel (2.4) and distribution from
uClinux.org
Successful compilation for Microblaze
architecture
MPI - MPICH2 out of many versions of MPI
Investigated the MPICH2 implementation available
from Argonne National Laboratory
Encountered challenges in porting MPI onto
uClinux

12
Challenges in porting MPI to uClinux

Use of fork and a complex state machine
Default process manager for unix platforms is MPD
written in Python and uses a wrapper to call fork
Simple fork-gtvfork is not possible as the
function is called deep inside other functions
and require a lot of stack unwinding
Alternate Approaches
Port SMPD, written in C
It will involve a complex state machine and stack
unwinding after the fork
Use pthreads
Might involve a lot of reworking of code as the
current implementation is not using pthreads
Need to ensure thread safety

13
NAS Parallel Benchmark

Used NAS PB v2.4
Compiled and executed it on a desktop and
Millennium Cluster
Obtained information about
MOPS
Type of operation
Execution time
Number of nodes involved
Number of processes and iterations

14
NAS PB simulation result(Millennium cluster,
Class A)
15
Simulation result (cont.)
16
Estimated statistics for the floating point group

4 test benches use floating point op heavily are
BT, CG, MG, and SP
Very few fp comparison ops in all
BT (Block Tridiagonal) all fp ops are add,
subtract, and multiply. About 5 of all ops is
division
CG (Conjugate Gradient) has the highest of ops
that is sqrt, 30. Add, mult is about 60,
divide is about 10.
MG (Multigrid) about 5 is sqrt, 20 is division.
The rest is add, subtract, and multiply
For SP (Scalar Pentadiagonal) almost all ops are
add, 10 is division

17
Floating Point Operation Frequency
18
Most frequently used MPI functions in NASPB v2.4
19
Observations about NASPB

NASPB suite 6 out of 8 benchmarks are
predictive of parallel performance
EP little/negligible communication between
processors.
IS high communication overhead.

20
Project status

Compiled uClinux and put it on Microblaze
Worked on the porting of MPI but not completed
Compiled and executed NASPB on desktop and
Millennium (which currently uses 8 computing
nodes)

Write a Comment

User Comments (0)