Network File System NFS

About This Presentation

Title:

Network File System NFS

Description:

File handle is created by a server hosting the file system and it is unique with ... small number of dedicated Vice file servers, which are centrally administered. ... – PowerPoint PPT presentation

Number of Views:1330

Avg rating:3.0/5.0

Slides: 67

Provided by: raunotu

Category:

more less

Transcript and Presenter's Notes

Title: Network File System NFS

1
Network File System (NFS)

Architecture and system model
Communication and processes

2
Network File System (NFS)

Developed by Sun Microsystems Inc.
Used widely (version 3)
Interoperable with various operating systems,
mostly used in UNIX systems
Version 4 specification has been announced
Basic idea is that file server provides a
standardized view of its local file system
Communication protocol allows clients access the
files stored on a server
This makes possible to processes on different
operating systems and machines to share a common
file system

3
NFS Architecture

NFS model is remote file service model
Clients are normally unaware of the actual
location of files (local/remote)
File system interface supports normal file
operations (read, write etc) and it is called as
remote access model (in contrast to
upload/download model)
Files are accessed (typically in UNIX) via one
interface to file system, Virtual File System
layer that is hides differences between local and
remote file systems
Communication is done through Remote Procedure
Calls and server converts calls to normal local
file system calls via VFS
NFS is largely independent of local file systems
/ operating systems

4
NFS Architecture

The remote access model (for example NFS)
The upload/download model (for example FTP)

5
NFS Architecture

The basic NFS architecture for UNIX systems.

6
NFS File System Model

File system model is almost the same than in
UNIX-based systems (Files are sequences of bytes,
hierarchical naming graph represents directories
and files)
Hard and symbolic links are supported
Files are named but accessed via file handles
There are some differences in system operations
between versions 3 and 4 of NFS protocol
Files have various attributes associated with
them

7
NFS File System Model

An incomplete list of file system operations
supported by NFS.

8
NFS Communication

Design of NFS makes it independent of operating
systems, network architectures and transport
protocols. For example Windows system can
communicate with a UNIX file server
That is possible because NFS is placed on top of
and RPC layer that hides the differences between
various systems
Every NFS operation is implemented as a single
RPC to a file server (in NFS v3). Version 4 of
NFS protocol supports compound procedures by
which several RPCs can be combined to a single
request
Compound procedures are executed in order and if
some operation fails error is returned and no
further operations are executed at server side

9
NFS Communication

Reading data from a file in NFS version 3
Reading data using a compound procedure in
version 4

10
NFS Processes

NFS is traditional client-server system in which
clients request a file server to perform
operations on files
Server implementation can be stateless (in NFS
v3) and therefore simple. In version 4 of NFS
server maintains information of clients.
For example locking of file and authentication
requires server to maintain information of
clients. In NFS v3 file locking is done with a
separate NFS lock manager
NFS v4 maintains only very little information of
clients and it is expected to also work in Wide
Area Networks (WANs)
WAN usage requires efficient client cache and
cache consistency protocol. It often works best
when server maintains some information on files
used by its clients

11
Network File System (NFS)

Naming
File handles
Automounting
File attributes

12
NFS Naming

Idea of NFS naming model is to provide clients
complete transparent access to a remote file
system maintained by a server
Transparency is achieved by letting a client be
able to mount a remote file system into its local
file system
Instead of mounting an entire file system of
server, NFS allows clients to mount only a part
of file system. Server is said to export a
directory when it makes a directory and it
entries available to clients
A directory exported by server can be mounted
into a client's local name space (as some
directory, called as mount point)
Clients can have different name spaces and for
example two clients can refer to a same file on
server with different path name because of
different local name spaces
NFS server can mount a directory from other NFS
server but it cannot export mounted directory to
its clients. If clients want to access files on
other server, clients can mount directory
directly from other server

13
NFS Naming

Mounting (part of) a remote file system in NFS.

14
NFS Naming

Mounting nested directories from multiple servers
in NFS.

15
NFS File Handles

File handle is reference to a file within a file
system
It is independent of the name of the file it
refers to
File handle is created by a server hosting the
file system and it is unique with respect to all
file systems exported by the server
Client is unaware of actual content of file
handle
Length of file handle is up to 64 bytes (NFS v3)
and 128 bytes (NFS v4)
File handle is true identifier for a file
relative to a file system. It must be same all
the time for a file and it cannot be reused after
the file has been deleted

16
NFS Automounting

Because of different local namespace problemNFS
client namespaces can be partly standardized
Remote file system can also be mounted when
needed, this is called automounting
Automounting is done by automounter which runs a
separate process on the client's machine
For example automounter can mount directory /home
as automounting directory. When a process wants
to access some directory in /home directory, file
access operation is forwarded from kernel to NFS
client, then from NFS client to automounter
Then automounter creates mount point and mounts a
required remote file system and file can be
accessed.

17
NFS Automounting

A simple automounter for NFS.

18
NFS Automounting

Problem with this simple automounter is that the
automounter will have to be involved in all file
operations to guarantee transparency. This can
cause performance problems
A simple solution is to let the automounter mount
directories in a special subdirectory, and create
a symbolic link to each mounted directory

19
NFS Automounting

Using symbolic links with automounting.

20
NFS File Attributes

An NFS file has a number of associated attributes
In version 3 of NFS, set of attributes is fixed
and every NFS implementation is expected to
support those attributes (Fully implementing NFS
on non-UNIX systems was sometimes difficult or
impossible)
In version 4 of NFS, set of attributes has been
split into a set of mandatory attributes that
every implementation must support, and a set of
recommended attributes that should be preferably
supported, and an additional set of named
attributes
Named attributes are actually not part of the NFS
protocol, but are encoded as array (attribute,
value)-pairs where attribute is string and value
sequence of bytes
Attributes and values are stored with file or
directory and there are NFS operations to read
and write them. Interpretation of attributes and
their values is left to an application, it is not
defined in NFS specification

21
NFS File Attributes

Some general mandatory file attributes in NFS.

22
NFS File Attributes

Some general recommended file attributes.

23
Network File System (NFS)

File sharing
File locking
Caching and replication

24
NFS File Sharing

When two or more users share the same file, it is
necessary to define the semantics of reading and
writing precisely to avoid problems
In single-processor systems semantics normally
state that when a read operation follows a write
operation, the read returns the value just
written. Similarly, when two writes happen in
quick succession, followed by a read, the value
read is the value stored by the last write.
System enforces an absolute time ordering and
always returns the most recent value. This model
is called as UNIX semantics.
In distributed system UNIX semantics can be
achieved easily as long there is only one file
server and clients do not cache files
Performance in single server system is frequently
poor
If client caching (local) is allowed, other
problems will occur
If a client locally modifies a cached file and
shortly after that another client reads the same
file from the server, the second client will get
an obsolete file

25
Semantics of File Sharing

On a single processor, when a read follows a
write, the value returned by the read is the
value just written.
In a distributed system with caching, obsolete
values may be returned.

26
Semantics of File Sharing

It is difficult to propagate all changes to
cached files back to the server immediately
An alternative solution is to relax the semantics
of file sharing, for example defining new rule
"Changes to an open file are initially visible
only to the process that modified the file"
New rule doesn't change behaviors, it only
defines them to be correct. It is known as
session semantics and NFS supports it.
If two or more clients are caching and modifying
the same file at the same time and session
semantics is used, most recently processed close
file operation wins. Or winning modification is
one of the candidates and leave the winner
unspecified
Immutable semantics only create and read
operations are supported. Simplifies
implementation of sharing and replication
Transactions BEGIN_TRANSACTION, operations,
END_TRANSACTION. System guarantees that all
operations are executed correctly or returns
error and none of the operations is actually
executed. If two or more at same time, final
result is verified to be correct.

27
Semantics of File Sharing

Four ways of dealing with the shared files in a
distributed system.

28
File locking in NFS

Locking files can be problematic on stateless
servers
In NFS v3, file locking is handled by a separate
protocol and implemented by a stateful lock
manager. Performance is not good and
implementations can be faulty
In NFS v4, file locking is integrated in NFS
protocol making locking simpler
There are only four operations related to locking
and NFS distinguishes read locks from write
locks.
Multiple clients can simultaneously access the
same part of a file provided they only read data.
A write lock is needed to obtain exclusive access
to modify part of a file
Locks are granted for a specified time (leases).
Client must renew the lease if it wants to keep
lock. Server automatically removes locks that
have not been renewed
In addition to these locking operations, there is
also implicit way to lock a file referred to as
share reservation. It is independent of locking
and it can be used to implement NFS for
Windows-based system

29
File Locking in NFS

NFS version 4 operations related to file locking.

30
Client Caching

Like many distributed file systems, NFS makes
extensive use of client caching to improve
performance
Caching in NFS v3 has been mainly left outside of
the protocol. This has led to the implementation
of different caching policies, of which most
never guaranteed consistency
NFS v4 solves some of the consistency problems,
but essentially still leaves cache consistency to
be handled in an implementation-dependent way
General caching model of NFS consists a client
having a memory cache that contains data
previously read from the server. In addition,
there may be also a disk cache that is added as
an extension to the memory cache, using the same
consistency parameters.
Typically clients cache file data, attributes,
file handles and directories. Different
strategies exist to handle consistency of the
cached data

31
Client Caching

Client-side caching in NFS.

32
Client Caching

NFS v4 supports two approaches for caching file
data
Simplest Client opens file, caches data from
read operations, allows write operations to
cache, when client closes the file and
modifications is made, cached data must be
flushed back to the server (Session semantics)
Second Once (part of) a file has been cached, a
client can keep its data in the cache even after
closing the file. NFS requires that when a client
opens previously closed file that has been
(partly) cached, the client must immediately
revalidate the cached data. Revalidation takes
place by checking when the file was last modified
and invalidating the cache in case it contains
stale data.
Server may delegate some of its rights to a
client when a file is opened. Open delegation
takes place when the client machine is allowed to
locally handle open and close operations from
other clients on the same machine. This reduces
need to communicate with server.
Server is able to recall the delegation for
example if another client needs to obtain access
rights. Recalling is implemented as callback RPC
and that requires that the server keeps track of
clients to which it has delegated a file

33
Client Caching

Using the NFS version 4 callback mechanism to
recall file delegation.

34
Replication

NFS version 4 provides minimal support for file
replication
Only whole file systems can be replicated
Support is provided in the form of the
FS_LOCATIONS attribute that is recommended for
each file
Attribute contains list of locations where the
replica of file system in which the associated
file is contained, may possibly occur
Each location is given as DNS name or IP address
It is up to a specific NFS implementation to
actually provide replicated servers. NFS v4 does
not specify how replication should take place

35
Network File System (NFS)

Fault tolerance
Security

36
Fault tolerance

Fault tolerance in NFS v3 has hardly been an
issue, because NFS protocol did not require
servers to be stateful and no state was ever
lost. Of course separate lock managers were
stateful and special measures were needed
In NFS v4 statefulness occurs in file locking and
delegation
In addition special measures need to be taken to
handle the unreliability of the RPC mechanism
underlying the NFS protocol
RPC stubs can be configured to use reliable TCP
or unreliable UDP
If RPC reply is lost, client can send request
again and server executes operation again (more
than once, the original was exactly once)
Situation can be handled with duplicate-request
cache
Each RPC request from a client carries a unique
transaction identifier (XID) in its header, which
is cached by the server when the request comes
in.
As long as the server has not sent a reply, it
will indicate that the RPC request is in
progress. When the request has been handled, its
associated reply is also cached, after which the
reply is returned to the client.

37
RPC Failures

Three situations for handling retransmissions.
The request is still in progress
The reply has just been returned
The reply has been some time ago, but was lost.

38
Fault tolerance

File locking
Client must renew lease, but false removal may
happen for example if network is (temporary)
partitioned and client's renew message don't
reach the server. No special measures are taken
to handle such situations.
When the server crashes and subsequently
recovers, it may have lost information on locks
it granted to clients. Solution is to enter grace
period in which client can reclaim locks that
were previously granted to it. In this way server
builds up its previous state with respect to
locks. During the grace period normal lock
requests are refused.
There are also numerous problems that using
leases introduces. For example leasing requires
clocks to be synchronized and that may not easily
solved in wide-area systems.

39
Fault tolerance

Delegation
Open delegation introduces additional problems
when a client or server crashes
If a client to which the opening of the file has
been delegated crashes, it presumably had not
propagated updates to the server. In that case,
unless the client's updates are locally saved to
stable storage, full recovery of the file will be
impossible.
In any case, client is made partially responsible
for file recovery
When the server crashes an subsequently recovers,
it follows a procedure similar to lock recovery.
A client to which the opening of a file has been
delegated will reclaim that delegation when the
server comes up again. However server forces the
client to flush all modifications back to the
server, effectively recalling the delegation.
Because of that file server is up-to-date with
respect to the most recent modifications of each
file it had delegated to a client. And server is
again in full charge of the file, and may decide
to delegate the file to another client

40
Security

Basic idea behind NFS is that a remote file
system is presented to clients as it were a local
file system. Therefore security of NFS mainly
focuses on the communication between a client and
a server. Secure communication means that a
secure channel between the two should be set up.
Because of NFS is layered on top of an RPC
system, setting up a secure channel in NFS boils
down to establishing secure RPCs
In addition to secure RPCs, it is necessary to
control access to files which are handled by
means of access control file attributes in NFS.
A file server is in charge of verifying the
access rights of its clients
NFS security architecture is so based in secure
RPCs and access control attributes

41
Security

The NFS security architecture.

42
Security

Secure RPCs
In NFS v3 secure RPC meant that only
authentication was taken care of. Possible ways
to do authentication were system authentication
(User ID Group ID), secure NFS (Diffie-Hellman
key exchange to establish session key) and
Kerberos (tickets).
In NFS v4 security is enhanced by the support for
RPCSEC_GSS.
RPCSEC_GSS is general security framework, it
provides the hooks for different authentication
methods and supports also message integrity and
confidentiality which were not supported in NFS
v3
RPCSEC_GSS is based on standard interface for
security services, namely GSS-API. RPCSEC_GSS is
layered on top of this interface.
For NFS v4 RPCSEC_GSS should be configured with
support for Kerberos v5, LIPKEY (public-key
system that allows clients to be authenticated
using a password while servers can be
authenticated using a public key)

43
Secure RPCs

Secure RPC in NFS version 4.

44
Security

Access control
Access control is supported by means of ACL file
attribute
This attribute is a list of access control
entries, where each entry specifies the access
rights for a specific user or group
The operations that NFS distinguishes with
respect to access control are relatively
straightforward
Compared to the simple access control mechanisms
in, for example UNIX systems, NFS distinguishes
many different kinds of operations
NFS has more richer semantics than most UNIX
systems

45
Access Control

The classification of operations recognized by
NFS with respect to access control.

46
Coda distributed file system

Overview
Communication and naming
File sharing and transactional semantics

47
Overview

Coda was designed to be a scalable, secure and
highly available distributed file system
Coda is descendant of version 2 of the Andrew
File System (AFS)
AFS was designed to support large community and
to meet this requirement nodes are partioned into
two groups. One group consists of a relatively
small number of dedicated Vice file servers,
which are centrally administered. The other group
consists of a very much larger collection of
Virtue workstations that give users and processes
access to the file system
Coda follows the same organization as AFS. Every
Virtue workstation hosts a user-level process
called Venus, whose role is similar to that of an
NFS client. A Venus process is responsible for
providing access to the files that are maintained
by the Vice file servers. Venus is also
responsible to continue operation even if access
to file servers is (temporary) impossible.

48
Architecture

The overall organization of Coda.

49
Architecture

Internal architecture of Virtue workstation
Important issue is that Venus runs as user-level
process
Virtual File System (VFS) layer intercepts all
calls from client applications and forwards these
calls either to the local file system or to Venus
Organization with VFS is same than in NFS
Venus communicates with Vice file servers using
user-level RPC system
RPC system is constructed on top of UDP datagrams
and provides at-most-once semantics
At the server-side there are three different
processes
Vice servers are responsible for maintaining a
local collection of files
Trusted Vice machines are allowed to run
authentication server
Update processes are used to keep
meta-information on the file system consistent at
each Vice server
Coda appears to its users as a traditional
UNIX-based file system
Coda provides a globally shared name space that
is maintained by the Vice servers. Clients have
access to this name space via local /afs
directory. When client looks up a name in this
directory Venus ensures that the appropriate part
of shared name space is mounted locally

50
Architecture

The internal organization of a Virtue workstation.

51
Communication

Interprocess communication is performed using
RPCs
RPC2 system for Coda is more sophisticated than
traditional RPC used for example in NFS
RPC2 offers reliable RPCs on top of the
(unreliable) UDP protocol
Each time a remote procedure is called, the RPC2
client code starts a new thread that sends an
invocation request to server and subsequently
blocks until it receives an answer. As request
processing may take an arbitrary time to
complete, the server regularly sends back
messages to the client to let it know it is still
working on the request. If the server dies,
sooner or later the thread will notice it and
return error to the calling application
An interesting aspect of RPC2 is its support for
side effects. A side effect is a mechanism by
which client and server can communicate using and
application-specific protocol. For example
opening a file at a video server can be done as
continuous data stream. There are routines for
setting up connection and transferring data
RPC2 also supports multicasting and it is used
for invalidating client caches in parallel

52
Communication

Side effects in Coda's RPC2 system.

53
Communication

Sending an invalidation message one at a time.
Sending invalidation messages in parallel.

54
Naming

Coda maintains a naming system analogous to that
of UNIX
Files are grouped into units referred to as
volumes. A volume is similar to a UNIX disk
partition, but generally has a much smaller
granularity. It corresponds to a partial subtree
in the shared name space. Like disk partitions,
volumes can be mounted.
Volumes form the basic unit by which the entire
name space is constructed. This construction
takes place by mounting volumes at mount points.
Clients can only mount the root of volumes.
Volumes form also the unit for server-side
replication
Granularity of volumes causes that name lookup
will cross several mount points. To support
naming transparency, Vice file server returns
mounting information to a Venus process during
name lookup. This allows Venus to automatically
mount a volume into the client's name space when
necessary. This is similar to crossing mount
points as supported in NFS v4
Shared name space is accessible by means of a
subdirectory /afs in client's local namespace

55
Naming

Clients in Coda have access to a single shared
name space.

56
File identifiers

Because the collection of shared files may be
replicated and distributed across multiple Vice
servers, it becomes important to uniquely
identify each file in such way it can be tracked
to its physical location, while at the same time
maintaining replication and location transparency
Each file in Coda is contained in exactly one
volume. Because a volume may be replicated across
several servers, Coda makes a distinction between
logical and physical volumes.
A logical volume represents a possibly replicated
physical volume and has associated Replicated
Volume Identifier (RVID). An RVID is a location
and replication independent volume identifier.
Multiple replicas may be associated with the same
RVID
Each physical volume has its own Volume
Identifier (VID), which identifies specific
replica in a location independent way.
Each file has 96-bit file identifier, first part
is 32-bit RVID, second part is 64-bit file
identifier that uniquely identifies the file
within a volume

57
File Identifiers

The implementation and resolution of a Coda file
identifier.

58
Sharing files

When a client successfully opens a file, an
entire copy of file is transferred to client's
machine. The server records that the client has
copy of this file.
If client has opened file for writing and another
client wants to open the same file, opening will
fail. This is caused by the fact that the server
has recorded that first client might have
modified file. If first client had opened file
for reading and another for writing, both will
have succeeded.
If several copies of the same file is stored
locally on clients and one client modifies file
and closes it, file will be transferred back to
the server. Other clients may proceed to read
their own copies of file despite the fact that
the copy is actually outdated.
The reason for this apparently inconsistent
behavior, is that a session is treated as a
transaction in Coda.

59
Sharing Files

The transactional behavior in sharing files in
Coda.

60
Transactional Semantics

In Coda, the notion of a network partition plays
a crucial role in defining transactional
semantics
A partition is a part of the network that is
isolated from the rest and which consists of a
collection of clients or servers, or both. The
basic idea is that series of file operations
should continue to execute in the presence of
conflicting operations across different
partitions. Two operations are said to conflict
if they both operate on the same data and at
least one is a write operation.
Coda recognizes different types of sessions. For
example, each UNIX system call is associated with
a different session type. More complex session
types are the ones that start with a call to
open.
As an example, the store session type starts with
opening a file for writing as a specific user.
Meta-data entries associated with file is read
and modified as needed.
Conflicting transactions force clients to save
its local version of file for manual
reconciliation.

61
Transactional Semantics

The metadata read and modified for a store
session type in Coda.

62
Coda distributed file system

Caching and replication

63
Client Caching

Client-side caching is crucial to the operation
of Coda for two reasons. First, caching is done
to achieve scalability. Second, caching provides
a higher degree of fault tolerance as the client
becomes less dependent on the availability of the
server. For these reasons, clients in Coda always
cache entire files
When a file is opened for either reading or
writing, an entire copy of the file is
transferred to the client, where is subsequently
cached
Cache coherence in Coda is maintained by means of
callbacks. Server keeps track of which clients
have a copy of that file. A server is said to
record a callback promise for a client. When a
client updates its local copy for the first time,
it notifies the server, which sends an
invalidation message to the other clients.
Invalidation message is called a callback break,
because server will then discard the callback
promise it held for the client it just sent an
invalidation
As long as client knows it has an outstanding
callback promise at the server, it can safely
access the file locally.

64
Client Caching

The use of local copies when opening a session
in Coda.

65
Server replication

Coda allows file servers to be replicated and
unit of replication is a volume. The collection
of servers that have a copy of a volume, are
known as that volume's Volume Storage Group (VSG)
In presence of failures, a client may not have
access to all servers in VSG. Client maintains a
list of servers that are accessible. It is called
Accessible Volume Storage Group (AVSG). If the
AVSG is empty, the client is said to be
disconnected.
Coda uses a replicated-write protocol to maintain
consistency of a replicated volume. It uses a
variant of Read-One, Write-All.
If client needs to read a file, it contacts one
of the members in its AVSG. If client needs to
write, the closing of the file is different.
Client transfers modified file back in parallel
to each member of AVSG. Parallel transfer is done
with multicast RPC2.
This scheme works fine as long as there are no
failures. (For each client, that client's AVSG of
a volume is the same as its VSG)