MPI in uClinux on Microblaze - PowerPoint PPT Presentation

About This Presentation
Title:

MPI in uClinux on Microblaze

Description:

MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006 Project Proposal Port uClinux to work on Microblaze Add MPI implementation on top of uClinux ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 21
Provided by: vlsiCsBer
Category:

less

Transcript and Presenter's Notes

Title: MPI in uClinux on Microblaze


1
MPI in uClinux on Microblaze
  • Neelima Balakrishnan
  • Khang Tran
  • 05/01/2006

2
Project Proposal
  • Port uClinux to work on Microblaze
  • Add MPI implementation on top of uClinux
  • Configure NAS parallel benchmarks and port them
    to work on RAMP

3
What is Microblaze?
  • Soft core processor, implemented using general
    logic primitives
  • 32-bit Harvard RISC architecture
  • Supported in the Xilinx Spartan and Virtex series
    of FPGAs
  • Customizability of the core makes it challenging
    while opening up vistas for kernel configurations

4
Components
  • uClinux - kernel v2.4
  • MPICH2 - portable, high performance
    implementation of the entire MPI-2 standard
  • Communication via different channels -sockets,
    shared memory, etc.
  • MPI port for Microblaze communication is over FSL

5
Components (contd.)
  • NASPB v2.4 - MPI-based source code
    implementations written and distributed by NAS
  • 5 kernels
  • 3 pseudo-applications

6
Porting uClinux to Microblaze
  • Done by Dr. John Williams - Embedded Systems
    group, University of Queensland in Brisbane,
    Australia
  • Part of their reconfigurable computing research
    program. The work on this is still going on
  • http//www.itee.uq.edu.au/jwilliams/mblaze-uclinu
    x

7
Challenge in porting uClinux to Microblaze
  • Linux derivative for microprocessors that lack a
    memory management unit (MMU)
  • No memory protection
  • No virtual memory
  • For most user applications, the fork() system
    call is unavailable
  • malloc() function call needs to be modified

8
MPI implementation
  • MPI Message Passing Interface
  • Standard API used to create parallel applications
  • Designed primarily to support the SPMD (single
    program multiple data) model
  • Advantage over older message passing libraries
  • Portability
  • Fast as each implementation is optimized for the
    hardware it runs on

9
Interactions between Application and MPI
Other processors .
Communication Channel
MPI process manager
MPI process manager
MPI interface
MPI interface
Initiating application
Application on other processors
10
NAS parallel benchmarks
  • Set of 8 programs intended to aid in evaluating
    the performance of parallel supercomputers
  • Derived from computational fluid dynamics (CFD)
    applications,
  • 5 kernels
  • 3 pseudo-applications
  • Used NPB2.4 version MPI-based source code
    implementation

11
Phases
  • Studied the uClinux and found the initial port
    done for Microblaze
  • Latest kernel (2.4) and distribution from
    uClinux.org
  • Successful compilation for Microblaze
    architecture
  • MPI - MPICH2 out of many versions of MPI
  • Investigated the MPICH2 implementation available
    from Argonne National Laboratory
  • Encountered challenges in porting MPI onto
    uClinux

12
Challenges in porting MPI to uClinux
  • Use of fork and a complex state machine
  • Default process manager for unix platforms is MPD
    written in Python and uses a wrapper to call fork
  • Simple fork-gtvfork is not possible as the
    function is called deep inside other functions
    and require a lot of stack unwinding
  • Alternate Approaches
  • Port SMPD, written in C
  • It will involve a complex state machine and stack
    unwinding after the fork
  • Use pthreads
  • Might involve a lot of reworking of code as the
    current implementation is not using pthreads
  • Need to ensure thread safety

13
NAS Parallel Benchmark
  • Used NAS PB v2.4
  • Compiled and executed it on a desktop and
    Millennium Cluster
  • Obtained information about
  • MOPS
  • Type of operation
  • Execution time
  • Number of nodes involved
  • Number of processes and iterations

14
NAS PB simulation result(Millennium cluster,
Class A)
15
Simulation result (cont.)
16
Estimated statistics for the floating point group
  • 4 test benches use floating point op heavily are
    BT, CG, MG, and SP
  • Very few fp comparison ops in all
  • BT (Block Tridiagonal) all fp ops are add,
    subtract, and multiply. About 5 of all ops is
    division
  • CG (Conjugate Gradient) has the highest of ops
    that is sqrt, 30. Add, mult is about 60,
    divide is about 10.
  • MG (Multigrid) about 5 is sqrt, 20 is division.
    The rest is add, subtract, and multiply
  • For SP (Scalar Pentadiagonal) almost all ops are
    add, 10 is division

17
Floating Point Operation Frequency
18
Most frequently used MPI functions in NASPB v2.4
19
Observations about NASPB
  • NASPB suite 6 out of 8 benchmarks are
    predictive of parallel performance
  • EP little/negligible communication between
    processors.
  • IS high communication overhead.

20
Project status
  • Compiled uClinux and put it on Microblaze
  • Worked on the porting of MPI but not completed
  • Compiled and executed NASPB on desktop and
    Millennium (which currently uses 8 computing
    nodes)
Write a Comment
User Comments (0)
About PowerShow.com