PVFS: A Parallel File System for Linux Clusters - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

PVFS: A Parallel File System for Linux Clusters

Description:

Linux clusters have pretty matured, but one area devoid of support has been ... PFS (Intel Paragon), PIOFS/GPFS (IBM SP), HFS (HP Exemplar), XFS (SGI) Distributed FS ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 21
Provided by: John4175
Category:

less

Transcript and Presenter's Notes

Title: PVFS: A Parallel File System for Linux Clusters


1
PVFS A Parallel File System for Linux Clusters
  • Philip H. Carns Walter B.Ligon
  • Robert B. Ross Rajeev Thakur
  • Proceedings of the 4th Annual LinuxShowcase
    Conference, 2000

2
Abstract
  • Linux clusters have pretty matured, but one area
    devoid of support has been parallel file systems
  • PVFS Parallel Virtual File System
  • High-performance parallel FS
  • Tool for pursuing further research in parallel
    I/O and parallel FS

3
Introduction (1/2)
  • Design goals
  • Provide high bandwidth for concurrent read/write
    operations
  • Support multiple APIs
  • Native PVFS API
  • UNIX/POSIX I/O API
  • Other API (such as MPI-IO)
  • Common UNIX commands must work with PVFS files
    (ls, cp, rm)

4
Introduction (2/2)
  • Design goals (Contd)
  • Appls developed with UNIX I/O API must be able to
    access PVFS files without recompiling
  • Robust, scalable
  • Easy to install and use

5
Related Works-Various FSs
  • Commercial parallel FS
  • PFS (Intel Paragon), PIOFS/GPFS (IBM SP), HFS (HP
    Exemplar), XFS (SGI)
  • Distributed FS
  • NFS, AFS/Coda, InterMezzo, xFS, GFS
  • FS in research projects related to parallel I/O
    and parallel FS
  • PIOUS, PPFS, Galley

6
PVFS Design Impl
  • client-server system
  • Client user process
  • Server multiple I/O daemons (iod)
    manager daemon (mgr)

7
Sample Cluster
head (metadata server node)
/pvfs-meta
mount.pvfs head/pvfs-meta /mnt
n1 (I/O node)
n2 (I/O node)
pvfsd

c1 (client node)

libpvfs
pvfs.o
/pvfs-data
/pvfs-data
/mnt
8
PVFS Design Impl
  • PVFS manager and metadata
  • Metadata
  • Permissions, owner/group membership,
  • Physical distribution of file data
  • file locations on disk disk locations in
    cluster
  • For simplicity, both file data and metadata are
    stored on existing local file systems rather than
    directly on raw devices

9
PVFS Design Impl
  • I/O daemons and data storage
  • When a client openes a PVFS file
  • PVFS manager informs the locations of iods
  • Clients establish connections with iod directly
  • When a client wishes to access a PVFS file
  • Client library sends a descriptor of the file
    region being accessed to io daemons holding data
    in that region

10
PVFS Design Impl
  • File striping example

inode 1092157504
.
Base 2
pcount 3
base 65536
11
PVFS Design Impl
  • File striping example

pvfs_open(char pathname, int flag, mode_t
mode) pvfs_open(char pathname, int flag, mode_t
mode struct pvfs_filestat dist)
struct pvfs_filestat int base int pcount
int ssize int soff / not used / int
bsize / not used /
12
PVFS Design Impl
  • PVFS can be used with multiple APIs
  • Native API
  • pvfs_open(), pvfs_read(), pvfs_write()
  • MPI-IO interface
  • MPI_File_open(), MPI_File_read(),
    MPI_File_write()
  • UNIX/POSIX API
  • By trapping UNIX I/O calls
  • open(), read(), write()

13
PVFS Design Impl
  • Trapping UNIX I/O calls

application
C library
libc syscall wrappers
kernel
application
C library
PVFS syscall wrappers
kernel
PVFS library
a) Standard operation
b) With PVFS library loaded
14
Performance Results
  • Environments (Chiba City cluster)
  • 256 nodes
  • 500MHz Pentium III, 512M RAM
  • 9G Quantum SCSI
  • 100M EtherExpress Pro fast-ethernet,64-bit
    Myrinet card
  • Linux 2.2.15pre4
  • 60 nodes were used in experiment
  • Some for compute node, some for I/O node

15
Performance Results
  • Disk bandwidth bonnie benchmark
  • Write 22 Mbytes/sec
  • Read 15 Mbytes/sec
  • Network bandwidth ttcp test
  • Fast ethernet 10.2 Mbytes/sec
  • Myrinet 37.7 Mbytes/sec

16
Performance Results
  • Read performance (fast ethernet)

224Mbytes/sec (24 ion/ 24 cn)
24 ion
32 ion
16 ion
aggregate bandwidth (Mbytes/sec)
number of compute nodes
8 ion
4 ion
17
Performance Results
  • Write performance (fast ethernet)

18
Performance Results
  • Read performance (Myrinet)

687Mbytes/sec (32 ion/ 28 cn)
19
Performance Results
  • Write performance (Myrinet)

20
Performance Results
  • R/W performance (Myrinet) 32 ion
  • native vs. MPI

native read
MPI read
native write
MPI write
Write a Comment
User Comments (0)
About PowerShow.com