Distributed File Systems - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Distributed File Systems

Description:

Immediate access to files using UFIDs (without open or close) ... saturn:~ 35 % df -k. Filesystem kbytes capacity Mounted on /dev/dsk/c0t3d0s0 143903 91 ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 48
Provided by: Gulu
Category:

less

Transcript and Presenter's Notes

Title: Distributed File Systems


1
Distributed File Systems
  • Yih-Kuen Tsay
  • Dept. of Information Management
  • National Taiwan University

2
Purposes of a Distributed File System
  • Sharing of storage and information across a
    network
  • Convenience (and efficiency) of a conventional
    file system
  • Persistent storage that most other services
    (e.g., Web servers) need

3
Properties of Storage Systems
Sharing
Persis-
Distributed
Consistency
Example
tence
cache/replicas
maintenance
Main memory
RAM
1
File system
UNIX file system
1
Distributed file system
Sun NFS
Web server
Web
Distributed shared memory
Ivy (DSM, Ch. 18)
Remote objects (RMI/ORB)
CORBA
1
Persistent object store
1
CORBA Persistent
Object Service
2
Peer-to-peer storage system
OceanStore (Ch. 10)
Types of consistency 1 strict one-copy.
slightly weaker guarantees. 2 considerably
weaker guarantees.
Other properties include availability, timing
guarantees, etc.
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
4
Files
  • Files are an abstraction of permanent storage.
  • A file is typically defined as a sequence of
    similar-sized data items along with a set of
    attributes.
  • A directory is a file that provides a mapping
    from text names to internal file identifiers.

5
File Attributes
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
6
File Systems
  • Responsible for the (a) organization, (b)
    storage, (c) retrieval, (d) naming, (e) sharing,
    and (f) protection of files.
  • Provide a set of programming operations that
    characterize the file abstraction, particularly
    operations to read and write subsequences of data
    items beginning at any point of a file.

7
File System Modules
A basic distributed file system implements all of
the above plus modules for client-server
communication and distributed naming and location
of files.
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
8
UNIX File Operations
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
9
Distributed File System Requirements
  • Transparency access, location, mobility,
    performance, and scaling transparency.
  • Concurrency (and Consistency)
  • Replication/Caching (and Consistency)
  • Hardware/operating system heterogeneity
  • Fault-Tolerance
  • Security (Access Control, Authentication)
  • Efficiency

10
A File Service Architecture
Note The modules communicate with one another by
remote procedure calls.
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
11
File Service Components
  • Flat file service implementing operations on the
    contents of files, which are referred to by
    unique file identifiers (UFIDs)
  • Directory service mapping text names of files
    (including directories) to their UFIDs
  • Client module integrating and extending the
    previous two services under a single application
    programming interface
  • Why is this structure more open and
    configurable?

12
Flat File Service Operations
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
13
Difference from UNIX
  • Immediate access to files using UFIDs (without
    open or close)
  • Read or write starts at the position indicated by
    a parameter
  • All operations, except create, are repeatable
  • Allows a stateless implementation

14
Access Control
  • Conventional access rights checks (at open calls)
    not feasible
  • Two stateless approaches
  • Capability (by manipulating the UFID)
  • User identity sent with every request
    (adopted in NFS and AFS)
  • Main problem forged requests some
    authentication mechanism is needed

15
Capabilities and UFIDs
  • A capability is a binary value that acts as an
    access key it can be encoded in the UFID.
  • Basic construction of a UFID
  • file group id file number random number
  • Additional field permissions
  • Additional field encryption of the permission
    field

16
Directory Service Operations
Note Each directory is stored as an ordinary
file with a UFID.
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
17
The Network File System (NFS)
  • Introduced by Sun Microsystems in 1985, now an
    Internet standard
  • Runs on top of RPC (RFC 1831)
  • Implemented on most operating systems
  • Version described here UNIX implementation of
    NFS Version 3 (RFC 1813, June 1995)
  • Most recent version NFS Version 4 (RFC 3010,
    December 2000)

18
NFS Architecture
Note Each computer can act as both a client and
a server.
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
19
The Virtual File System Module
  • Access transparency
  • File handles (file identifiers)
  • filesystem indentifier i-node number
    i-node generation number
  • One VFS structure for each mounted filesystem
  • relates a remote filesystem (identified by its
    file handle obtained at mount time) to a local
    directory on which it is mounted
  • One v-node per open file
  • indicates whether a file is local (i-node) or
    remote (file handle)

20
The NFS Client Module in UNIX
  • Integrated with the kernel
  • Emulates the UNIX file system primitives
  • A single client module serves all user-level
    processes
  • The encryption key for authentication stored in
    the kernel
  • Caches file blocks

21
NFS Server Operations
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
22
NFS Server Operations (contd)
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
23
Remote File Accesses
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
24
File System Information in UNIX
  • saturn 35 df -k
  • Filesystem kbytes capacity Mounted on
  • /dev/dsk/c0t3d0s0 143903 91 /
  • /dev/dsk/c0t3d0s6 267943 99 /usr
  • /dev/dsk/c0t3d0s3 15383 3 /tmp
  • galaxy/usr/local.real 4030440 53 /usr/local
  • lucky/var/mail.real 564648 86 /var/mail
  • cosmos/home.real/student/xxx
  • 3941760 60 /home/xxx
  • galaxy/home.real/faculty/yyy
  • 2964512 51 /home/yyy
  • Note The output of df -k has been edited.

25
Caching
  • Server caching
  • read-ahead
  • write-through
  • delayed-write with the commit operation
  • Client caching
  • cache validation (freshness interval and
    validation timestamp, modification timestamp and
    getattr, )
  • bio-daemon (for read-ahead and delayed-write
    caching at the client side)

26
Achievements of NFS
  • Access and location transparency
  • Mobility transparency (partially)
  • Read-only file replication the automounter
  • Fault-tolerance stateless servers, the
    automounter
  • Efficiency caching of disk blocks (main problem
    frequent use of getattr)
  • Nonachievements scalability, concurrency and
    consistency, security (Kerberos), ...

27
The Andrew File System (AFS)
  • Developed at CMU
  • Current versions AFS-2, AFS-3
  • Compatible with NFS
  • Main achievement over (older) NFS better
    scalability by minimizing client-server
    communication
  • Key characteristics whole-file serving and
    caching (partial file caching allowed in AFS-3)

28
Observations on UNIX File Usage
  • Files are mostly small
  • Read operations are more common
  • Sequential accesses are more common
  • Most files are written by one user
  • Files are referenced in burst

29
AFS Architecture
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
30
AFS File Name Space
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
31
System Call Interception in AFS
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
32
AFS System Calls Implementation
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
33
Cache Consistency
  • A callback promise is provided when Vice supplies
    a copy of file to a Venus process
  • The callback promise stored with the cached copy
    is in either valid or cancelled state
  • When Venus handles an open, it checks the cache.

34
The Vice Service Interface
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
35
Enhancements to NFS and AFS
  • Spritely NFS
  • add open and close, use callbacks
  • NQNFS (Not Quite NFS)
  • use callbacks and leases
  • WebNFS
  • allow browsers and other applications to interact
    with an NFS server directly
  • NFS Version 4 (RFC 3010, December 2000)
  • incorporating all of the above and more
  • DCE/DFS (based on AFS)
  • use callbacks and write tokens (with a lifetime)

36
New Features of NFS Version 4
  • Adoption of the RPCSEC_GSS (RFC 2203) security
    protocol
  • Multiple operations in one request
  • Better migration and replication abilities
  • A client may query the location(s) of a file
    system.
  • Introduction of open and close operations
  • Lease-based file locking
  • Callback-based delegation of files

37
New Design Approaches
  • Background
  • high-performance storage technology (e.g., RAID)
  • log-structure file systems (e.g., Sprite, BSD
    LFS)
  • high-performance switched networks (e.g., ATM,
    high-speed Ethernet)
  • Goals high scalability and fault-tolerance
  • Main ideas distribute file data among many
    nodes, separate responsibilities,
  • Constraints high level of trust

38
More Recent File System Designs
  • xFS
  • Serverless all data, metadata, and control can
    be located anywhere in the system any machine
    can take over the responsibilities of a failed
    one
  • Frangipani
  • Two-layer structure
  • the Petal distributed virtual disk system
  • the Frangipani server module
  • Both designs utilize RAID-style striping,
    log-structured file storage, etc.

39
Log-based Striping in xFS
Source T.E. Anderson et al., Serverless Network
File Systems, ACM TOCS 1996
40
An xFS Configuration
Source T.E. Anderson et al., Serverless Network
File Systems, ACM TOCS 1996
41
A Frangipani Configuration
Source C.A. Thekkath et al., Frangipani, A
Scalable Distributed File System, ACM SOSP 1997
42
Storage Systems
Source G.A. Gibson and R. van Meter, Network
Attached Storage Architecture, CACM, November
2000.
43
NAS and SAN
Note the difference is disappearing.
Source G.A. Gibson and R. van Meter, Network
Attached Storage Architecture, CACM, November
2000.
44
Bandwith for Disk Access
Source E. Riedel, Storage Systems, Queue, June
2003.
45
Increasing the Bandwith
Source E. Riedel, Storage Systems, Queue, June
2003.
46
Virtualization in SAN
Source E. Riedel, Storage Systems, Queue, June
2003.
47
Requirements for Storage Systems
  • Basic requirements
  • resource consolidation, rapid deployment, central
    management, convenient backup, high availability,
    data sharing.
  • Geographic separation
  • Security
  • against an increasing risk of unauthorized access
  • Performance scalable with capacity
  • (accesses per second or megabytes per second)
Write a Comment
User Comments (0)
About PowerShow.com