Advanced MPI

About This Presentation

Title:

Advanced MPI

Description:

You have run parallel jobs through a queueing system in a shared environment ... PVFS, LUSTRE, IBRIX,Panasas. I/O Libraries. HDF, NetCDF, Panda ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 82

Provided by: kentmi

Category:

more less

Transcript and Presenter's Notes

Title: Advanced MPI

1
Advanced MPI

Dan Stanzione

2
Audience Background

This class assumes you have some background in
parallel programming
You are familiar with basic MPI
You are familiar with basic OpenMP
And, of course, you are familiar with a
programming language and the UNIX/Linux operating
environment
You have run parallel jobs through a queueing
system in a shared environment such as Ranger
before.
You can use SSH to log into remote systems and
transfer files
If this is not the case, please see the TACC
introductory courses
Slides from Lisboa class this week
http//taccspringschool.ist.utl.pt/SpringSchool/Ag
enda.html
Ranger Virtual Workshop
https//www.cac.cornell.edu/ranger/

3
Outline

Review of MPI Advanced Topics
Derived Datatypes
Communicator manipulations
One-sided communication
I/O
What is Parallel I/O? Do I need it?
Cluster Filesystem Options
MPI I/O and ROMIO
Example striping schemes

4
User Defined Datatypes

Methods for creating data types
MPI_Type_contiguous()
MPI_Type_vector()
MPI_Type_indexed()
MPI_Type_struct()
MPI_Pack()
MPI_Unpack()
MPI allows datatypes to be defined in much the
same way as modern programming languages
(C,C,F90)
This allows your communication and I/O operations
to operate using the same datatypes as the rest
of your program
Makes expressing the partitioning of datasets
easier

5
Contiguous Array

Creates an array of counts elements
MPI_Type_contiguous(int count,
MPI_Datatype oldtype,
MPI_Datatype newtype)

6
Strided Vector

Constructs a cyclic set of elements
MPI_Type_vector(int count,
int blocklength,
int stride,
MPI_Datatype oldtype,
MPI_Datatype newtype)
Stride specified in number of elements
Stride can be specified in bytes
MPI_Type_hvector()
Stride counts from start of block

7
Subarrarys

Perhaps the most useful MPI Datatype, the
Subarray type lets you divide a
multi-dimensional array into smaller blocks
int MPI_Type_create_subarray(ndims,
array_of_sizes, array_of_subsizes,
array_of_starts, order, oldtype, newtype)
int ndims
int array_of_sizes
int array_of_subsizes
int array_of_starts
int order
MPI_Datatype oldtype
MPI_Datatype newtype
SubArray Example coming after we add some
communicator and I/O operations

8
SubArray Illustration
start1
Size0
start0
subsize0
subsize1
Size1
9
Indexed Vector

Allows an irregular pattern of elements
MPI_Type_indexed(int count,
int array_of_blocklengths,
int array_of_displacements,
MPI_Datatype oldtype,
MPI_Datatype newtype)
Displacements specified in number of elements
Displacements can be specified in bytes
MPI_Type_hindexed()
MPI_Type_create_indexed_block() is a shortcut if
all blocks are the same length

10
Structured Records

Allows different types to be combined
MPI_Type_struct(int count,
int array_of_blocklengths,
MPI_Aint array_of_displacements,
MPI_Datatype array_of_types,
MPI_Datatype newtype)
Blocklengths specified in number of elements
Displacements specified in bytes

11
Committing types

In order for a user-defined derived datatype to
be used as an argument to other MPI calls, the
type must be committed.
MPI_Type_commit(type)
MPI_Type_free(type)
Use commit after calling the type constructor,
but before using the type anywhere else
Call free after the type is no longer in use (no
one actually does this, but it makes computer
scientists happy...)

12
Vector Example

MPI_TYPE_VECTOR function allows creating
non-contiguous vectors with constant stride

mpi_type_vector(count, blocklen, stride, oldtype,
vtype, ierr)mpi_type_commit(vtype, ierr)
1 6 11 16 2 7 12 17 3 8 13
18 4 9 14 19 5 10 15 20
nrows
ncols
call MPI_Type_vector(ncols, 1, nrows,
MPI_DOUBLE_PRECISION, vtype, ierr) call
MPI_Type_commit(vtype, ierr) call MPI_Send(
A(nrows,1) , 1 , vtype )
13
Dealing with Communicators

Many MPI operations deal with all the processes
in a communicator
MPI_COMM_WORLD by default contains every task in
your MPI job
Other communicators can be defined for more
complex operations for different parts of the
task, to add topology, to segregate different
kinds of messaging

14
Communicators and Groups I

All MPI communication is relative to a
communicator which contains a context and a
group. The group is just a set of processes.

MPI_COMM_WORLD
3
2
4
1
0

MPI_COMM_WORLD
2
0
COMM1
1
0
1
COMM2
15
Communicators and Groups II

To subdivide communicators into multiple
non-overlapping communicators Approach I
e.g. to form groups of rows of PEs

MPI_Comm_rank(MPI_COMM_WORLD, rank) myrow
(int)(rank/ncol)
16
MPI_Comm_split

Argument 1 communicator to split
Argument 2 key, all processes with the same key
go in the same communicator
Argument 3 (optional) value to determine
ordering in the result communicator
Argument 4 result communicator

MPI_Comm_rank(MPI_COMM_WORLD, rank) myrow
(int)(rank/ncol) MPI_Comm_split(MPI_COMM_WORLD,
myrow,rank,row_comm)
17
Topologies and Communicators

MPI allows processes to be grouped in logical
topologies
Topologies can aid the programmer
Convenient naming methods for processes in a
group
Naming can match communication patterns
a standard mechanism for representing common
algorithmic concepts (i.e. 2D grids)
Topologies can aid the runtime environment
Better mappings of MPI tasks to hardware nodes
(Not really widely used in most implementations
yet)

18
Topology Mechanics

Topologies have the scope of a single
(intra)communicator
Topologies are an optional attribute given to a
communicator
Two topologies are supported
Cartesian coordinates (grid)
Graph
nodes are tasks
edges are named communication pathways

19
Cartesian Topologies

int MPI_Cart_create(MPI_Comm comm_old, int ndims,
int dims, int periods, int reorder, MPI_Comm
comm_cart)
MPICartcomm MPIIntracommCreateCart(...)
MPI_CART_CREAT(...)
comm_old - input communicator
ndims - of dimensions in cartesian grid
dims - integer array of size ndims specifying the
number of processes in each dimension
periods - true/false specifying whether each
dimension is periodic
reorder - ranks may be reordered or not
comm_cart - new communicator containing new
topology.

20
MPI_DIMS_CREATE

A helper function for specifying a likely
dimension decomposition.
int MPI_Dims_create(int nnodes, int ndims, int
dims)
MPI_DIMS_CREATE(NNODES, NDIMS, DIMS, IERROR)
void MPICompute_dims(int nnodes,int ndims, int
dims)
nnodes - total nodes in grid
ndims - number of dimensions
dims - array returned with dimensions
Example
MPI_Dims_create(6,2,dims) will return (3,2) in
dims
MPI_Dims_create(6,3,dims) will return (3,2,1) in
dims
No rounding or ceiling function provided

21
Cartesian Inquiry Functions

MPI_Cartdim_get will return the number of
dimensions in a cartesian structure
int MPI_Cartdim_get(MPI_Comm comm, int ndims)
MPI_Cart_get provides information on an existing
topology
Arguments roughly mirror the create call
int MPI_Cart_get(MPI_Comm comm,int maxdims, int
dims, int periods, int coords)
maxdims keeps a given communicator from
overflowing your arguments

22
Cartesian Translator Functions

Task IDs in a cartesian coordinate system
correspond to ranks in a "normal" communicator.
point-to-point communication routines
(send/receive) rely on ranks
int MPI_Cart_rank(MPI_Comm comm, int coords, int
rank)
int MPI_Cart_coords(MPI_Comm comm, int rank, int
maxdims, int coords)
Coords - cartesian coordinates
rank - ranks

23
Cartesian Shift function

int MPI_Cart_Shift(MPI_Comm comm, int direction,
int disp, int rank_source, int rank_dest)
direction - coordinate dimension of shift
disp - displacement (can be positive or negative)
rank_source and rank_dest are return values
use that source and dest to call MPI_Sendrecv

24
Remote Memory Access Windows and Window Objects

MPI-2 provides a facility for remote memory
access in specified regions
"one-sided communication" - no need for send and
receive
From what I hear, this may not work well yet,
but...
MPI_Win_create( base, size, disp_unit, info,
comm, win)
exposes memory from base to base(sizesizeof(disp
_unit) to remote memory access

25
One-Sided Communication Calls

MPI_Put() - stores into remote memory
MPI_Get() - reads from remote memory
MPI_Accumulate() - updates remote memory has an
op like MPI_Reduce
All are non-blocking data transfer is described,
maybe even initiated, but may continue after call
returns
Subsequent synchronization is needed to make sure
operations on window object are complete
MPI_Win_fence()

26
I/O -(Parallel and Otherwise) on Large Scale
Systems Dan Stanzione Arizona State University
27
Parallel I/O in Data Parallel Programs

Each task reads a distinct partition of the input
data and writes a distinct partition of the
output data.
Each task reads its partition in parallel
Data is distributed to the slave nodes
Each task computes output data from input data
Each task writes its partition in parallel

28
What Are All These Names?

MPI - Message Passing Interface Standard
Also known as MPI-1
MPI-2 - Extensions to MPI standard
I/O, RDMA, dynamic processes
MPI-IO - I/O part of MPI-2 extensions
ROMIO - Implementation of MPI-IO
Handles mapping MPI-IO calls into communication
(MPI) and file I/O

29
Filesystems

Since each node in a cluster has it's own disk,
making the same files available on each node can
be problematic
Three filesystem options
Local
Remote (eg. NFS)
Parallel (eg. PVFS, LUSTRE)

30
Filesystems (cont.)

Local - Use storage on each node's disk
Relatively high performance
Each node has different filesystem
Shared datafiles must be copied to each node
No synchronization
Most useful for temporary/scratch files accessed
only by copy of program running on single node
RANGER DOESNT HAVE LOCAL DISKS
This trend may continue with other large scale
systems for reliability reasons
Very, very small RAMdisk (300MB)

31
Accessing Local File Systems

I/O system calls on compute nodes are executed on
the compute node
File systems on the slave can be made available
to tasks running there and accessed as on any
Linux system
Recommended programming model does not assume
that a task will run on a specific node
Best used for temporary storage
Access permissions may be a problem
Very small on newer systems like Ranger

32
Filesystems(cont.)

Remote - Share a single disk among all nodes
Every node sees same filesystem
Synchronization mechanisms manage changes
"Traditional" UNIX approach
Relatively low performance
Doesn't scale well server becomes bottleneck in
large systems
Simplest solution for small clusters,
reading/writing small files

33
Accessing Network File Systems

Network file systems such as NFS and AFS can be
mounted by slave nodes
Provides a shared storage space for home
directories, parameter files, smaller data files
Performance problems can be severe for a very
large number of nodes (100)
Otherwise, works like local file systems

34
Filesystems(cont.)

Parallel - Stripe files across multiple disks on
multiple nodes
Relatively high performance
Each node sees same filesystem
Works best for I/O intensive applications
Not a good solution for small files
Certain slave nodes are designated I/O nodes,
local disks used to store pieces of filesystem

35
Accessing Parallel File Systems

Distribute file data among many I/O nodes
(servers), potentially every node in the system
Typically not so good for small files, but very
good for large data files
Should provide good performance even for a very
large degree of sharing
Critical for scalability in applications with
large I/O demands
Particularly good for data parallel model

36
Using File Systems

Local File Systems
EXT3, /tmp
Network File Systems
NFS, AFS
Parallel File Systems
PVFS, LUSTRE, IBRIX,Panasas
I/O Libraries
HDF, NetCDF, Panda

37
Example Application for Parallel I/O
Input
Read
Process
Write
Output
38
Issues in Parallel I/O

Physical distribution of data to I/O nodes
interacts with logical distribution of the I/O
requests to affect performance
Logical record sizes should be considered in
physical distribution
I/O buffer sizes depend on physical distribution
and number of tasks
Performance is best with rather large requests
Buffering should be used to get requests of 1MB
or more, depending on the size of the system

39
I/O Libraries

May make I/O simpler for certain applications
Multidimensional data sets
Special data formats
Consistent access to shared data
"Out-of-core" computation
May hide some details of parallel file systems
Partitioning
May provide access to special features
Caching, buffering, asynchronous I/O, performance

40
MPI-IO

Common file operations
MPI_File_open()
MPI_File_close()
MPI_File_read()
MPI_File_write()
MPI_File_read_at()
MPI_File_write_at()
MPI_File_read_shared()
MPI_File_write_shared()
Open, close are collective. The rest have
collective counterparts add _all

41
MPI_File_open

MPI_File_open(
MPI_Comm comm,
char filename,
int amode,
MPI_Info info,
MPI_File fh)
Collective operation on comm
amode similar to UNIX file mode a few extra MPI
possibilities

42
MPI_File_close

MPI_File_close(
MPI_File fh
)

43
File Views

File views supported
MPI_File_set_view()
Essentially, a file view allows you to change
your program's treatment of a file as simply a
stream of bytes, to viewing the file as a set of
MPI_Datatypes and displacements.
Arguments to set view are similar to the
arguments for creating derived datatypes

44
MPI_File_read

MPI_File_read(
MPI_File fh,
void buf,
int count,
MPI_Datatype datatype,
MPI_Status status
)

45
MPI_File_read_at

MPI_File_read_at(
MPI_File fh,
MPI_Offset offest,
void buf,
int count,
MPI_Datatype datatype,
MPI_Status status
)
MPI_File_read_at_all() is the collective version

46
Non-Blocking I/O

MPI_File_iread()
MPI_File_iwrite()
MPI_File_iread_at()
MPI_File_iwrite_at()
MPI_File_iread_shared()
MPI_File_iwrite_shared()

47
MPI_File_iread

MPI_File_iread(
MPI_File fh,
void buf,
int count,
MPI_Datatype datatype,
MPI_Request request
)
Request structure can be queried to determine if
the operation is complete

48
Collective access

The shared routines use a collective file
pointer
Collective routines also provided to allow each
task to read/write a specific chunk of the file
MPI_File_read_ordered(MPI_File fh, void buf, int
count, MPI_Datatype type, MPI_Status st)
MPI_File_write_ordered()
MPI_File_seek_shared()
MPI_File_read_all()
MPI_File_write_all()

49
File Functions

MPI_File_delete()
MPI_File_set_size()
MPI_File_preallocate()
MPI_File_get_size()
MPI_File_get_group()
MPI_File_get_amode()
MPI_File_set_info()
MPI_File_get_info()

50
ROMIO MPI-IO Implementation

Implementation of MPI-2 I/O specification
Operates on wide variety of platforms
Abstract Device Interface for I/O (ADIO) aids in
porting to new file systems
Fortran and C bindings
Successes
Adopted by industry (e.g. Compaq, HP, SGI)
Used at ASCI sites (e.g. LANL Blue Mountain)

51
Data Staging for Tiled Display

Commodity components
projectors, PCs
Provide very high resolutionvisualization
Staging application splitsframes into a tile
stream foreach visualization node
Uses MPI-IO to access data from PVFS file system
Streams of tiles are merged into movie files on
visualization node

52
Splitting Movie Frames into Tiles

Hundreds of frames make up a single movie
Each frame is stored in its own file in PVFS
Frame size is 2532x1408 pixels
3x2 display
Tile size is 1024x768 pixels (overlapped)

53
Obtaining Highest Performance

To make best use of PVFS
Use MPI-IO (ROMIO) for data access
Use file views and datatypes
Take advantage of collectives
Use hints to optimize for your platform
Simple, right )?

54
Trivial MPI-IO Example

Reading contiguous pieces with MPI-IO calls
Simplest, least powerful way to use MPI-IO
Easy to port from POSIX calls
Lots of I/O operations to get desired data

MPI_File_open(comm, fname, MPI_MODE_RDONLY, MPI_I
NFO_NULL, handle) / read tile data from one
frame / for (row 0 row lt 768 row)
offset rowrow_size tile_offset
header_size MPI_File_read_at(handle, offset,
buffer, 10243, MPI_BYTE, status) MPI_File_
close(handle)
55
Avoiding the VFS Layer

UNIX calls go through VFS layer
MPI-IO calls use Filesystem library directly
Significant performance gain

56
Why Use File Views?

Concisely describe noncontiguous regions in a
file
Create datatype describing region
Assign view to open file handle
Separate description of region from I/O operation
Datatype can be reused on subsequent calls
Access these regions with a single operation
Single MPI read call requests all data
Provides opportunity for optimization of access
in MPI-IO implementation

57
Setting a File View

Use MPI_Type_create_subarray() to define a
datatype describing the data in the file
Example for tile access (24-bit data)

MPI_Type_contiguous(3, MPI_BYTE,
rgbtype) frame_size1 2532 / frame width
/ frame_size0 1408 / frame height
/ tile_size1 1024 / tile width
/ tile_size0 768 / tile height / /
create datatype describing tile
/ MPI_Type_create_subarray(2, frame_size,
tile_size, tile_offset, MPI_ORDER_C, rgbtype,
tiletype) MPI_Type_commit(tiletype) MPI_File_
set_view(handle, header_size, rgbtype, tiletype,
native, MPI_INFO_NULL) MPI_File_read(handle,
buffer, buffer_size, rgbtype, status)
58
Noncontiguous Access in ROMIO

ROMIO performs data sieving to cut down number
of I/O operations
Uses large reads which grab multiple
noncontiguous pieces
Example, reading tile 1

59
Data Sieving Performance

Reduces I/O operations from 4600 to 6
87 effective throughput improvement
Reading 3 times as much data as necessary

60
Collective I/O

MPI-IO supports collective I/O calls (_all
suffix)
All processes call the same function at once
May vary parameters (to access different regions)
More fully describe the access pattern as a whole
Explicitly define relationship between accesses
Allow use of ROMIO aggregation optimizations
Flexibility in what processes interact with I/O
servers
Fewer, larger I/O requests

61
Collective I/O Example

Single line change

/ create datatype describing tile
/ MPI_Type_create_subarray(2, frame_size,
tile_size, tile_offset, MPI_ORDER_C, rgbtype,
tiletype) MPI_Type_commit(tiletype) MPI_File_
set_view(handle, header_size, rgbtype, tiletype,
native, MPI_INFO_NULL) if 0 MPI_File_read(han
dle, buffer, buffer_size, rgbtype,
status) endif / collective read
/ MPI_File_read_all(handle, buffer, buffer_size,
rgbtype, status)
62
Two-Phase Access

ROMIO implements two-phase collective I/O
Data is read by clients in contiguous pieces
(phase 1)
Data is redistributed to the correct client
(phase 2)
ROMIO applies two-phase when collective accesses
overlap between processes
More efficent I/O access than data sieving alone

63
Two-Phase Performance

Often a big win

64
Hints

Controlling PVFS
striping_factor - size of strips on I/O servers
striping_unit - number of I/O servers to stripe
across
start_iodevice - which I/O server to start with
Controlling aggregation
cb_config_list - list of aggregators
cb_nodes - number of aggregators (upper bound)
Tuning ROMIO optimizations
romio_cb_read, romio_cb_write - aggregation
on/off
romio_ds_read, romio_ds_write - data sieving
on/off

65
The Proof is in the Performance

Final performance is almost 3 times VFS access!
Hints allowed us to turn off two-phase, modify
striping of data

66
A More Sophisticated I/O Example

Dividing a 2D Matrix with Ghost Rows

67
File - on Disk vs. in Memory
Partition on one processor (with ghost rows on
borders)
Full Dataset
68
includeltstdio.hgt includeltmpi.hgt main(int argc,
char argv) int gsizes2,psizes2,lsi
zes2,memsizes2 int
dims2,periods2,coords2,start_indices2
MPI_Comm comm MPI_Datatype filetype,
memtype MPI_File fh MPI_Status status
int local_array1212 int rank, m, n
m20 n30 gsizes0m
gsizes1n psizes02 psizes13
lsizes0 m/psizes0 / Rows in local
array / lsizes1 m/psizes1 / Cols
in local array / dims02 dims13
periods0periods1 1
MPI_Init(argc,argv) MPI_Cart_create(MPI
_COMM_WORLD, 2, dims, periods, 0, comm)
MPI_Comm_rank(comm, rank)
69
MPI_Cart_coords(comm, rank, 2, coords)
start_indices0 coords0 lsizes0
start_indices1 coords1 lsizes1
MPI_Type_create_subarray(2, gsizes, lsizes,
start_indices, MPI_ORDER_C,
MPI_FLOAT, filetype) MPI_Type_commit(fi
letype) MPI_File_open(MPI_COMM_WORLD,
"datafile", MPI_MODE_CREATE
MPI_MODE_WRONLY, MPI_INFO_NULL, fh)
MPI_File_set_view(fh,0, MPI_CHAR, filetype,
"native", MPI_INFO_NULL) memsizes0
lsizes0 2 memsizes1 lsizes1
2 start_indices0 start_indices1
1 MPI_Type_create_subarray(2, memsizes,
lsizes, start_indices, MPI_ORDER_C,
MPI_CHAR, memtype) MPI_Type_commit(me
mtype) MPI_File_write_all(fh,
local_array, 1, memtype, status)
MPI_File_close(fh)
70
Summary Why Use MPI-IO?

Better concurrent access model than POSIX one
Explicit list of processes accessing concurrently
More lax (but still very usable) consistency
model
More descriptive power in interface
Derived datatypes for concise, noncontiguous file
and/or memory regions
Collective I/O functions
Optimizations built into MPI-IO implementation
Noncontiguous access
Collective I/O (aggregation)
Performance portability

71
Optional, Really Advanced Stuff

Dynamic Process Management
Intercommunicator communication
MPI external connections

72
Creating Communicators

int MPI_Comm_dup(MPI_Comm comm, MPI_Comm
newcomm)
MPIIntracomm MPIIntracommDup() const
MPIIntercomm MPIIntercommDup() const
MPICartcomm MPICartcommDup() const
MPIGraph comm MPIGraphcommDup() const
Creates an exact copy of the communicator

73
Creating Communicators

int MPI_Comm_create(MPI_Comm comm, MPI_Group
group, MPI_Comm newcomm)
MPIIntracomm MPIIntracommcreate(...) const
MPIIntercomm MPIIntercommcreate(...) const
Creates a new communicator with the contents of
group
Group must be a subset of Comm
Int MPI_Comm_split(comm,color,key,newcomm)

74
Destroying Communicators

int MPI_Comm_free(MPI_Comm comm)
void MPICommFree()
Destroys the named communicator

75
Dynamic Process Management

Create new processes from running programs (as
opposed to with MPIrun)
MPI_Comm_spawn
(for SPMD style programs)
MPI_Comm_spawn_multiple
(MPMD style programs)
Connecting two (or more) applications together
MPI_Comm_accept and MPI_Comm_connect
Useful in assembling complex distributed
applications

76
Dynamic Process Management

Issues
maintaining simplicity, flexibility, and
correctness
interaction with OS, resource manager, and
process manager
connecting independently started processes
Spawning new processes is collective, returning
an intercommunicator
Local group is group of spawning processes.
Remote group is group of new processes
New processes have own MPI_COMM_WORLD
MPI_Comm_get_parent lets new processes find
parent communicator

77
Intercommunicators

Contain a local group and a remote group
Point-to-point communication is between a process
in one group and a process in the other
Can be merged into a normal communicator
Created by MPI_Intercomm_create()

78
Spawning Processes

int MPI_Comm_spawn(command, argv, numprocs, info,
root, comm, intercomm, errcodes)
Tries to start numprocs process running command,
passing them command line arguments argv
The operation is collective over comm
Spawnees are in remote group of intercomm
Errors are reported on a per-process basis in
errcodes
Info used to optionally specify hostname,
archname, etc...

79
Spawning Multiple Executables

MPI_Comm_spawn_multiple(...)
Arguments command, argv, numprocs, info all
become arrays
Still Collective

80
Establishing Connections

MPI-2 makes it possible for two MPI jobs started
separately to establish communication
e.g. visualizer connecting to simulation
Connection results in an intercommunicator
Client/server architecture
Similar to TCP Sockets programming

81
Establishing Connections

Server
MPI_Open_port(info, port_name)
MPI_Comm_accept(port_name, info,root, comm,
intercomm)
Client
MPI_Comm_connect( port_name, info, root, comm,
intercomm)
Optional name service (like normal UNIX)
MPI_Publish_name( ... )
MPI_Lookup_name( ... )
(not sure if name service is implemented)

Write a Comment

User Comments (0)

About PowerShow.com

Advanced MPI - PowerPoint PPT Presentation

Advanced MPI

You have run parallel jobs through a queueing system in a shared environment ... PVFS, LUSTRE, IBRIX,Panasas. I/O Libraries. HDF, NetCDF, Panda ... – PowerPoint PPT presentation