A Look at PVFS, a Parallel File System for Linux PowerPoint PPT Presentation

presentation player overlay
1 / 41
About This Presentation
Transcript and Presenter's Notes

Title: A Look at PVFS, a Parallel File System for Linux


1
A Look at PVFS, a Parallel File System for Linux
  • Talk originally given by
  • Will Arensman and Anila Pillai

2
Overview
  • Network File Systems (NFS)
  • Drawbacks of NFS
  • Parallel Virtual File Systems (PVFS)
  • Using PVFS
  • Conclusion
  • References

3
1. Network File System (NFS)
  • NFS is a client/server application developed by
    Sun Microsystems
  • It lets a user view, store and update files on a
    remote computer as though the files were on the
    user's local machine.
  • The basic function of the NFS server is to allow
    its file systems to be accessed by any computer
    on an IP network.
  • NFS clients access the server files by mounting
    the servers exported file systems.
  • For example

where the
files really are
Your view
/home/ann server1/export/hom
e/ann
4
2. Drawbacks of NFS
  • Having all your data stored in a central location
    presents a number of problems
  • Scalability arises when the number of computing
    nodes exceeds the performance capacity of the
    machine exporting the file system could add more
    memory, processing power and network interfaces
    at the NFS server, but you will soon run out of
    CPU, memory and PCI slots the higher the node
    count, the less bandwidth (file I/O) individual
    node processes end up with
  • Availability if NFS server goes down all the
    processing nodes have to wait until the server
    comes back into life.
  • Solution Parallel Virtual File System (PVFS)

5
3. Parallel Virtual File System(PVFS)
  • Parallel Virtual File System (PVFS) is an open
    source implementation of a parallel file system
    developed specifically for Beowulf class parallel
    computers and Linux operating system
  • It is joint project between Clemson University
    and Argonne National Laboratory
  • PVFS has been released and supported under a GPL
    license since 1998

6
3. Parallel Virtual File System(PVFS)
  • File System allows users to store and retrieve
    data using common file access methods (open,
    close, read, write)
  • Parallel stores data on multiple independent
    machines with separate network connections
  • Virtual exists as a set of user-space daemons
    storing data on local file systems

7
PVFS
  • Instead of having one server exporting a file via
    NFS, you have N servers exporting portions of a
    file to parallel application tasks running on
    multiple processing nodes over an existing
    network
  • The aggregate bandwidth exceeds that of a single
    machine exporting the same file to all processing
    nodes.
  • This works much the same way as RAID 0 file
    data is striped across all I/O nodes.

Data blocks
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
RAID 0 (Stripping)
8
PVFS
  • PVFS provides the following features in one
    package
  • allows existing binaries to operate on PVFS files
    without the need for recompiling
  • enables user-controlled striping of data across
    disks on the I/O nodes
  • robust and scalable
  • provides high bandwidth for concurrent read/write
    operations from multiple processes to a common
    file

9
PVFS
  • PVFS provides the following features in one
    package
  • ease of installation
  • easily used - provides a cluster wide consistent
    name space,
  • PVFS file systems may be mounted on all nodes in
    the same directory simultaneously, allowing all
    nodes to see and access all files on the PVFS
    file system through the same directory scheme.
  • Once mounted PVFS files and directories can be
    operated on with all the familiar tools, such as
    ls, cp, and rm.

10
PVFS Design and Implementation
  • Roles of nodes in PVFS
  • COMPUTE NODES - on which applications are run,
  • MANAGEMENT NODE - which handles metadata
    operations
  • I/O NODES - which store file data for PVFS file
    systems.
  • Note- nodes may perform more than one role

11
PVFS I/O Nodes
  • In order to provide high-performance access to
    data stored on the file system by many clients,
    PVFS spreads data out across multiple cluster
    nodes I/O nodes
  • By spreading data across multiple I/O nodes,
    applications have multiple paths to data through
    the network and multiple disks on which data is
    stored.
  • This eliminates single bottlenecks in the I/O
    path and thus increases the total potential
    bandwidth for multiple clients, or aggregate
    bandwidth.

12
PVFS System Architecture
13
PVFS Software Components
  • There are four major components to the PVFS
    software system
  • Metadata server (mgr)
  • I/O server (iod)
  • PVFS native API (libpvfs)
  • PVFS Linux kernel support
  • The first two components are daemons (server
    types) which run on nodes in the cluster

14
PVFS Components
  • The metadata server (or mgr)
  • File manager it manages metadata for PVFS files.
  • A single manager daemon is responsible for the
    storage of and access to all the metadata in the
    PVFS file system
  • Metadata - information describing the
    characteristics of a file, such as permissions,
    the owner and group, and, more important, the
    physical distribution of the file data

15
PVFS Version 1 I/O Node Configuration
  • Data in a file is striped across a set of I/O
    nodes in order to facilitate parallel access.
  • The specifics of a given file distribution are
    described with three metadata parameters
  • base I/O node number
  • number of I/O nodes
  • stripe size
  • These parameters, together with an ordering of
    the I/O nodes for the file system,
    allow the file distribution to be completely
    specified

16
PVFS Components
  • Example
  • pcount - field specifies that the the number of
    I/O nodes used for storing data
  • base - specifies that the first (or base) I/O
    node (is node 2 here)
  • ssize - specifies that the stripe size--the unit
    by which the file is divided among the I/O
    nodeshere it is 64 Kbytes
  • The user can set these parameters when the file
    is created, or PVFS will use a default set of
    values

Meta data example for file
17
PVFS Components
PVFS file striping done in a round-robin fashion
  • Though there are six I/O nodes in this example,
    the file is striped across only three I/O
    nodes, starting from node 2, because the
    metadata file specifies such a striping.
  • Each I/O daemon stores its portion of the PVFS
    file in a file on the local file
    system on the I/O node.
  • The name of this file is based on the inode
    number that the manager assigned to
    the PVFS file (in our example, 1092157504).

18
PVFS Components
  • 2.0 The I/O server (or iod)
  • It handles storing and retrieving file data
    stored on local disks connected to the node.
  • when application processes (clients) open a PVFS
    file, the PVFS manager informs them of the
    locations of the I/O daemons
  • the clients then establish connections with the
    I/O daemons directly

19
PVFS Components
  • the daemons determine when a client wishes to
    access file data, the client library sends a
    descriptor of the file region being accessed to
    the I/O daemons holding data in the region
  • what portions of the requested region they have
    locally and perform the necessary I/O and data
    transfers.

20
PVFS Components
  • PVFS native API (libpvfs)
  • It provides user-space access to the PVFS servers
  • This library handles the scatter/gather
    operations necessary to move data between user
    buffers and PVFS servers, keeping these
    operations transparent to the user
  • For metadata operations, applications communicate
    through the library with the metadata server

21
PVFS Components
  • PVFS native API (libpvfs)
  • For data access the metadata server is eliminated
    from the access path and instead I/O servers are
    contacted directly
  • This is key to providing scalable aggregate
    performance
  • The figure shows data flow in the PVFS system for
    metadata operations and data access

22
PVFS Components
Metadata access

For metadata operations applications communicate
through the library with the metadata server
23
PVFS Components
Data access
Metadata server is eliminated from the access
path instead I/O servers are contacted directly
libpvfs reconstructs file data from pieces
received from iods
24
PVFS Components
  • PVFS Linux kernel support
  • The PVFS Linux kernel support provides the
    functionality necessary to mount PVFS file
    systems on Linux nodes
  • This allows existing programs to access PVFS
    files without any modification
  • This support is not necessary for PVFS use by
    applications, but it provides an extremely
    convenient means for interacting with the system

25
PVFS Components
  • PVFS Linux kernel support
  • The PVFS Linux kernel support includes
  • a loadable module
  • an optional kernel patch to eliminate a memory
    copy
  • a daemon (pvfsd) that accesses the PVFS file
    system on
  • behalf of applications
  • It uses functions from libpvfs to perform these
    operations.

26
PVFS Components
Data flow through kernel
to PVFS servers
app
pvfsd
user space
/dev/pvfsd
VFS
kernel space
  • The figure shows data flow through the kernel
    when the Linux kernel support is used
  • Operations are passed through system calls to the
    Linux VFS layer. Here they are queued for
    service by the pvfsd, which receives
    operations from the kernel through a device file
  • It then communicates with the PVFS servers and
    returns data through the kernel to the
    application

27
PVFS Application Interfaces
  • Applications on client nodes can access PVFS data
    on I/O nodes using one of the three methods
  • Linux kernel interface-
  • The Linux kernel VFS Module provides the
    functionality for adding new file-system support
    via loadable modules without recompiling the
    kernel. These modules allow PVFS file systems to
    be mounted in a manner similar to NFS. Once
    mounted, the PVFS file system can be traversed
    and accessed with existing binaries just as any
    other file system. .

28
PVFS Application Interfaces
  • Applications on client nodes can access PVFS data
    on I/O nodes using one of the three methods
  • PVFS native API-
  • The PVFS native API provides a UNIX-like
    interface for accessing PVFS files. It also
    allows users to specify how files will be striped
    across the I/O nodes in the PVFS system.

29
PVFS Application Interfaces
  • ROMIO MPI-IO interface-
  • ROMIO implements the MPI2 I/O calls in a
    portable library. This allows parallel
    programmers using MPI to access PVFS files
    through the MPI-IO interface

30
5. Using PVFS
  • 1. Download, untar pvfs, pvfs-kernel files.
  • Available at
  • http//parlweb.parl.clemson.edu/pvfs/
  • 2. Go to PVFS directory
  • ./configure, make, make install
  • make install on each node
  • 3. Go to PVFS kernel directory
  • ./configure with-libpvfs-dir../pvfs/lib
  • make, make install
  • cp pvfs.o /lib/modules/ltkernel-versiongt/misc/
  • make install, cp pvfs.o on each node

31
PVFS Installation
  • Metadata server needs
  • mgr executable
  • .iodtab file contains IP addresses and ports of
    I/O daemons
  • .pvfsdir file permissions of the directory
    where metadata is stored
  • Run mkmgrconf to create .iodtab and .pvfsdir

32
PVFS Installation
  • I/O server needs
  • iod executable
  • iod.conf file describes the location of the
    pvfs data directory on the machine
  • Each client needs
  • pvfsd executable
  • pvfs.o kernel module
  • /dev/pvfsd device file
  • mount.pvfs executable
  • mount point

33
PVFS Installation
  • After installation
  • Start iods
  • Start mgr
  • Run pvfs-mount

34
PVFS API
  • 1. Initialize a pvfs_filestat struct
  • struct pvfs_filestat
  • int base / First node. -1 default 0 /
  • int pcount / I/O Nodes. default all/
  • int ssize / Stripe size. default 64K /
  • int soff / Not used. /
  • int bsize / Not used. /

35
PVFS API
  • 2. Open the file
  • pvfs_open(char pathname, int flag,
  • mode_t mode, struct pvfs_filestat dist)
  • 3. Have a look at your metadata
  • pvfs_iocctl(int fd, GETMETA,
  • struct pvfs_filestat dist)

36
PVFS Utilities
  • Copy files to PVFS
  • u2p s ltssizegt -b ltbasegt -n ltnodesgt ltsrcgt
    ltdestgt
  • Examine file distribution
  • pvstat ltfilegt
  • ltfilegt base 0, pcount 8, ssize 65536

37
PVFS at U of A
  • Hawk
  • Gigabit Ethernet (100Mbps)
  • 4 nodes, each with local drives
  • Raven
  • Myrinet (160 MB per second)
  • 7 40 GB Drives (60 MB per second)

38
(No Transcript)
39
(No Transcript)
40
7. Conclusions
  • Pros
  • Higher cluster performance than NFS.
  • Many hard drives to act a one large hard drive.
  • Works with current software.
  • Best when reading/writing large amounts of data
  • Cons
  • Multiple points of failure.
  • Poor performance when using kernel module.
  • Not as good for interactive work.

41
8. References
  • The Parallel Virtual File System, Available at
    http//www.parl.clemson.edu/pvfs/
  • P. H. Carns, W. B. Ligon III, R. B. Ross, and R.
    Thakur, PVFS A Parallel File System For Linux
    Clusters'', Proceedings of the 4th Annual Linux
    Showcase and Conference, Atlanta, GA, October
    2000, pp. 317-327
  • Thomas Sterling, Beowulf Cluster Computing with
    Linux, The MIT Press, 2002
  • W. B. Ligon III and R. B. Ross, An Overview of
    the Parallel Virtual File System'', Proceedings
    of the 1999 Extreme Linux Workshop, June, 1999.
  • Network File System, Available at
    http//www.redhat.com/docs/manuals/linux/RHL-7.2-M
    anual/ref-guide/ch-nfs.html
  • http//www.linuxuser.co.uk/articles/issue14/lu14-A
    ll_you_need_to_know_about-PVFS.pdf
Write a Comment
User Comments (0)
About PowerShow.com