Distributed Systems

About This Presentation

Title:

Distributed Systems

Description:

... accessing and sharing of files across machine boundaries. Offer ... Transaction semantics (locking files: share for reads and ... Files are always ... – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 128

Provided by: liefl

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Systems

1
Distributed Systems
Lecture 11 Distributed File Systems 2. July, 2002
2
Schedule of Today
Distributed File Systems

Distributed File Systems
Implementation of Distributed File Systems
File Usage and FS Structure
Caching and Replication
Web Test Case for Replication
Example DFS
Suns Network File System (NFS)
Sprite File System
Andrew File System (AFS)
Coda
Leases
Log Structured File System

3
File Systems
Introduction

Goal
Provide a set of primitives that support users to
keep information on persistent media
e.g. disk, tapes, etc.
manage accesses to files and directories
name files or directories
offer abstractions for users from details of
storage access and management

4
Distributed File Systems
Introduction

Promote accessing and sharing of files across
machine boundaries
Offer transparency to users
Make diskless machines viable
Increase disk space availability by avoiding
duplication
Balance load among multiple file servers
Offer mobility

5
Transparency
Introduction

Access transparency
Location transparency
Concurrency transparency
Failure transparency
Performance transparency

6
Access Transparency
Introduction

Users, i.e. application programmers do not notice
whether file system is local or distributed
Accesses to local or remote files are the same,
i.e. applications running on a local file systems
still run under a distributed file system

7
Location Transparency
Introduction

User does not have to know the exact location of
a file
Files can be migrated without affecting users

8
Concurrency Transparency
Introduction

Concurrent accesses to a file from different
users, i.e. applications should not lead to
inconsistencies of that file
To achieve this goal you have to use the concept
of transactions

9
Failure Transparency
Introduction

After a client or after a server has crashed the
file system should work as before

10
Performance Transparency
Introduction

Delays due to remote access should be as minimal
as possible
Delays due to remote access should not depend on
the current load

11
Distributed File Systems
Distributed File Systems

File and directory naming
Semantics of file sharing
Implementation considerations
Caching
Update protocols
Replication

12
Naming
Distributed File Systems

File names for file retrieval
Name service states alphabet and syntax of valid
file names
Some file systems may offer names consisting of
ltfile namegt.extension to differ file types
(Others FS deal file type as a file attribute)

13
Directory
Distributed File Systems

Flat directories
Hierarchical directories
Directories may contain files and other sub
directories
Directory tree
Internal node may never be a file
Complete file name is a path name
Relative path name
Absolute path name

14
Distributed File Systems
Distributed File Systems
Assumption Hierarchical directory tree,
consisting of local and remote directories.
Whats the view of a user if he/she wants to
access a non local file?

3 Possibilities
additional node name, e.g. hostxyzdir1/dir2//fi
le
(easy to implement but no transparency at
all)
mounting of remote directory-sub trees, there is
transparency,
(however each node may have a different view)
single global name space, that looks the same on
all nodes,
(full naming transparency)

15
Additional File Descriptors
Distributed File Systems

Instead of symbolic file names the system uses
internal file descriptors (unique file identifier
(UFID), i-nodes, etc.)
UFID are short, constant in length ? easing the
use for system programs
Directory has to map symbolic names to UFIDs
A UFID may consist of

File number (32 bit)
Random number (32 bit)
16
Possible File Sharing Semantics
Distributed File Systems

One-copy semantics (à la Unix)
Updates are written to the single copy and are
available immediately (without regarding delays
due to file caching)
Session Semantics
Copy the file on open, work on your local copy,
and copy back on close
No Updates of Files
Any update of a file causes the creation of a
new file
Using simple locking
System offers read- and write-locks, users have
to deal with
Serializability
Transaction semantics (locking files share for
reads and exclusive access for writes, but
without interfering a user)

17
File System Implementation
Implementation

File Usage
System Structure
Caching
Replication

18
File Usage
Implementation

Result of Satayaranans analysis at CMU (1990)
File size lt 10 k
? Feasible, to transfer entire files instead
of block transfers
gt 80 read operation (most read writes are
sequential)
Life time 1 usage
? Create file on client side and wait if it
really will survive,
reducing network traffic a lot
Few files are shared
? Client caching is favorable
Average application uses only a few files
Several file classes with different behavior
? Provide an adequate solution for each of them

Similar resulty by Mullender Tanenbaum and the
Unix-Team
19
File System Structure
Implementation

Clients and Servers on different machines?
Distinct directory server and file server or not?
Discuss pros and cons of both designs
1 directors server or several directory servers?
Iterative lookup on several directory servers
Automatic lookup on several directory servers

20
Iterative Lookup
Implementation
Server 1
a
Server 2
Client
b
Server 3
c
file
Analysis 1. Bad performance Needs a couple of
messages. 2. Forces clients to be aware of which
server holds which file or directory
21
Automatic Lookup
Implementation
Server 1
a
Lookup a/b/c
Server 2
Client
b
Server 3
c
file
Analysis Fewer messages, thus more efficient,
but cannot use RPCs, since a different server
replies to the initial call.
22
State of the DFS
Implementation
In local systems FCB is created whenever a user
opens a file. FCB contains all relevant state
information (i.e. file pointer etc.).

In a DFS we can distinguish three session like
phases
creation of a usage-relation
the using of the file itself and
deletion of the usage-relation

In a DFS file server may provide the above
state-information or not.
Thus we can distinguish between
stateless server and
stateful server

23
Stateless versus Stateful Server
Implementation

Advantages of stateless servers
If server crashes it can restart immediately
cause no state information was lost.
Also if client crashes no additional overhead on
server side,
the server only knows a client during the
clients file request
No need for open and close and their related
messages across the net.
No additional space for this status information
on the server, this may pay off
if server has to deal with many concurrent
clients
No limit on the number of concurrently opened
files.
Advantages of stateful servers
Shorter request messages, no need for symbolic
file name in each request
Better performance, easier to implement a read
ahead
Easier to establish idempotency (controllable
via sequence numbers)
Possibility, to set file locks in order to
establish a certain consistency model

24
File System Caches in a DFS
Implementation

No caches at all, all files only on the servers
disk
Servers disk should be large enough
Files are always accessible to all clients
No additional memory overhead, no consistency
problem

server
client
Performance Problem !!!
file
Server disk(s)
Local disk
25
File System Caches in a DFS
Implementation

Server uses parts of its main memory as a file
system cache for all most recently used clients
files
Still the data transfer via the network, but
hopefully most accesses to the servers disk is
avoided

server
client
Cache 1
Conceptual Problems?
file
Server disk(s)
Local disk
26
Servers Main Memory as Cache
Implementation

Cacheable units
Complete files
File portions, e.g. blocks or chunks
Replacement algorithm
What to do if cache fills up
LRU ?
FIFO ?
No additional consistency problems from the users
point of view

27
File System Caches in a DFS
Implementation

To minimize time consuming data tranfers via the
net you can use caching on the client side (in
main memory or in local disk, a matter of
performance)
Consistency problems due to cache implementation

server
client
Cache 1
Cache 2
Cache 3
file
Server disk(s)
Local disk
28
Client Cache Implementation
Implementation

Placement of Caches
Clients main memory
Cache within the user address space
Cache within the kernel
Separate user level cache manager
Clients disk
Improves temporally the availability of the file
copy

29
Client Cache in Main Memory (1)
Implementation
Cache inside the UAS managed by a library holds
the most recently used files per UAS. Only, if
file is reused by the same UAS, it may still be
in the cache. If task exits, contents of
modified files are written back to the server,
cache will be freed.
cache hit
Analysis Few overhead, but only valuable, if
file is reused by the same task, (e.g.. data
base manager process), in most other tasks a file
is opened once and closed once, so caching
within the library wins nothing..
30
Client Cache in Main Memory (2)
Implementation
Cache inside the kernel is used by all
applications. For each file access a kernel call
is necessary, but cache may survive a task.
UAS
client
cache hit
Analysis Unix pipelining of user task is
supported very efficiently, e.g. ls count, or
a 2-phase compiler
31
Client Cache in Main Memory (3)
Implementation
UAS
User level cache manager free kernel From keeping
caches for various clients. Cache manager is
isolated and easier to test. Kernel could
decide to page out some of the pages of the
cache manager ? Cache hit results in 1 or more
page faults
cache manager
client
cache hit

Compare the three methods concerning
the number of RPCs involved in cache hits and
cache misses and
applicability to µ-kernel systems and

32
Cache Consistency
Implementation
Using caches in a DFS ? consistency problem. To
reduce the network traffic the following policy
have been proposed

Write-Through
The cache is used for reading, only.
Writing is immediately done to the server, i.e.
into the original file.
Delayed Write
Several write operations are collected and
passed to the server in a burst.
Write-on-Close
File updates are delayed until the file is to be
closed.

33
Replication
Replication

Objective
Store information at multiple sites in a DS

Why? To increase

Availability (its there when you want it)
Reliability (it doesnt get destroyed)
Performance (its always nearby)

How? Replication is
User initiated (aware of replication process)
Automatic (transparent replication)

34
Replication
From a Centralized Server ...
server
Network
Remark As long as only clients or their subnets
fail the rest still can use the centralized
server.
35
Replication
From a Centralized Server ...
server
Network
Remark But what to do when the server or its
subnet crashes? Rien ne va plus!
36
Replication
to Decentralized Servers
Remark As long as only one replicated server or
its entire subnet fails, all clients
from other subnets still may get their
services via their nearby servers.
37
Explicit File Replication
Replication
servers
Is this a good idea?
s2
s1

Not at all
neither user friendly
nor efficient

s3
38
Analysis of Explicit File Replication
Replication

Not very user friendly
you have to find all servers (may vary in place
and amount)
if at copy time one server is down, user may
forget to copy
file ltprog.cgt when the failed server will be
available again
Not very efficient
to be sure that all replicas have the same state
we have to use
a transaction mechanism (two or three phase
commit protocol)
when the last copy has been done successfully
(has committed)
the information is available again
what to do when a site fails for a longer period
of time?

39
Lazy Replication
Replication
s2
s1
c1
At some time t0 n Dt, n gt 1, replication
manager on the server s1 issues the other lazy
replicas gt for a while the servers may have
different states!!!
s3
40
Replication using a Group
Replication
group of servers
s2
s1
c1
s3
41
Replication and Update-Protocols
Replication

Approaches
write to all-available replicas
primary/backup
quorum consensus

42
Write to all-available Update-Protocol
Replication
s2
prog.c
s1
c1
prog.c
s3
prog.c
If update process fails for s3 the replicated
servers s1, s2 respectively s3 have different
states gt Inconsistency! Characteristic of this
protocol Cheap reads, but expensive writes!
43
Primary/Backup
Replication

Possible options
Backups are maintained for availability only
Backups can improve performance for reads,
What is the query semantic?
How can we achieve one copy serializability?
Client interacts with one copy, and if it is a
backup,
these updates are propagated to the primary
What is the query semantics with regard to our
own updates?
Clients who dont need actual data can read from
any site

44
Primary/Backup
Replication

Any client has one primary server (hopefully a
nearby one and a powerful one) within the total
system. The other servers only provide as
backups.
Any request of the client goes to his primary
server.
If the primary fails, i.e. a failover occurs,
then one of the backup servers will become the
new primary
Consequences
There is at most one primary at any time
Every client ci has a single site si to which it
send requests
Any client message arriving at a backup server
is ignored

45
Alsberg and Day Protocol
Replication
s2
prog.c
primary s1
c1
prog.c
46
Alsberg and Day Protocol
Replication
s2
prog.c
primary s1
c1
prog.c
How to meet and detect possible failures? Server
send periodically I am alive messages, use
timeout to detect possible crashes Backup takes
over control and recruits the new backup
47
Tandem Non-Stop Protocol
Replication

Symmetric Pair Policy
One primary process and one backup process
joined by redundant links
Client sends request to the primary
Primary forwards updates to the backup
Backup acknowledges to primary, only
Primary acknowledges to client
Failures detected by timeout.
Tolerates
node crashes
one link failure

48
Anti-Entropy Method (Golding 1992)
Replication

State kept by replicated servers can be weakly
consistent,
i.e. replicas are allowed to diverge
temporarily.
They will eventually come to agreement.
From time to time, a server picks another server
and these 2 servers exchange updates and converge
to same state
Total ordering is obtained after getting one
message from every server (directly)
Lamport timestamps are used to order messages

49
Anti-Entropy Method
Replication
knowledge at s1
knowledge at s2
A B C
A B C
1
3
5
12
1
3
2
5
6
2
9
11
2
3
4
2
12
3
summary s1
summary s2
2
11
4
2
Remark Numbers in the objects refer to Lamport
time stamps.
50
Anti-Entropy Method
Replication
knowledge at s2
knowledge at s1
A B C
A B C
1
3
5
12
1
3
2
5
6
2
9
11
2
3
4
2
12
3
summary s2
summary s1
2
11
4
2
12
11
4
summary after merge
51
Eventual Path Propagation
Replication
Phase 1 Partitoning
mx
my
mx
my
mx
my
52
Eventual Path Propagation
Replication
Phase 2 Partitoning
mx
my
mx
my
mx
my
53
Eventual Path Propagation
Replication
Phase 3 Merging
mx my
mx my
mx
mx my
mx
my
54
Eventual Path Propagation
Replication
Further merging
mx my
mx my
mx
mx my
mx
mx my
55
Analysis
Replication

All primary/backup protocols have some
disadvantages
if primary fails, no updates any valid longer
doesnt tolerate network partitions
Gifford published 1979 another protocol based on
majority voting, named quorum algorithm.

56
Quorum Algorithm
Replication

Any client has to acquire permission of some
subset of the replicated servers before reading
from or writing to a replicated file.
Readers need a read-quorum (i.e. at least Nr
servers must accept)
Writes need a write-quorum (i.e. at least Nw
servers must accept).
Simplification There are N server, then
Nr Nw gt N.
Any write is coupled with an update of the files
version number!!!!

57
Example
Replication
Nr 3 and Nw 10
A B C D E F G H I J K L
Suppose at time t0, the red servers C, D, , and
L have been updated. If a client wants to read at
t0 ?t, he needs a least 3 sites, e.g. site A,
B, and C. Although, sites A and B have majority
with their old version, client can detect,
that sites C version number is newer, thus
client reads from C.
58
Web Test Case for Replication
Web Replication

Observation
Explosion of the web has led to a situation where
majority of the traffic on the Internet is web
related.
Goal
Offer a subset of web servers spread all over the
world
lowering the long distance traffic
The providers viewpoint
As few Web Servers as possible without bothering
clients
The clients viewpoint
As many nearby Web Servers to get quick answers

59
Web Test Case for Replication
Web Replication

Objective
Try to establish web replication
where each of the replicas resides in a different
part of the network
Problem
How may the clients web browser automatically
and transparently
contact the bets replica server, taking into
account
Network topology which replica is closest to the
client
Server availability which web servers are
currently active
Server load which one is able to run the most
rapid response

60
Provider 1 Single Web Server
Web Replication
Analysis A popular web-site being served only
from one location gt frequent, long-distance
network transfer gt high-response-times for user
requests and wasting available network
bandwidth Moreover danger of single point of
failure
Web server Clients Web queries and responses
61
Solution Caching and Replication
Web Replication

Caching
Server side caching (Squid, Harvest, Apache)
Client side caching (proxy, browser)
Replication
Cluster replication
Wide area replication
Wide area, cluster replication
Combination of Caching and Replication

62
Caching
Web Replication

Analysis
lower latency
better network untilization
freshness
some things cannot be cached (server side
programming, CGI scripts etc.)
some things are not meant to be cached
(advertising)

63
Cluster Replication
Web Replication

Analysis
improves performance - load is shared by several
servers
improves availability of the web server as a
whole
moderate effort is required to set up and
maintain
still a single point of failure in the network
still high latency for clients that are distant
(network-wise)

64
Wide-Area (Cluster) Replication
Web Replication

Analysis
improves performance
load is shared by several servers
clients access the best server
improves availability of the web service
network availability
server availability
complex to implement, deploy and maintain

65
The Technical Challenge
Web Replication

Making wide-area cluster replication work in a
Web environment
get the nearest server
based on network topology
from the nearest server to the best server
server availability
server load
do it automatically and seamlessly
HTTP redirect method - application layer
DNS round trip method - session layer
shared IP address method - network layer

Simple and limited method works without knowledge
of the network topology and the location of the
client within that network, i.e. you might get
the overall best server for all potential
clients, but not the best server for a specific
client, I.e. an overloaded server on the same LAN
may be better than a very fast and unloaded
server in New South Wales.
For more details on this method see
http//www.cnds.jhu.edu
66
DNS Round Trip Times Method
Web Replication
ns.bar.edu
foo.bar.edu
Do I know www.cnds.jhu.edu? NO Do I know DNS for
cnds.jhu.edu? NO Di I know DNS for jhu.edu? Yes
128.220.1.5
time
67
DNS Round Trip Times Method
Web Replication

DNSnearby Web server Local DNS serving
all local clients potential replica
selected replica
No special requirements
Convergence time is linear with the number of
replica

68
Practical Implementation Walrus
Web Replication

A Wide Area Load Re-balancing User-transparent
System
No change to the Web server
No change to the Web client
No change to the infrastructure (ISP, DNS, OS)
Implemented in a Unix environment,
but can be ported to other environments
see http//www.cnds.jhu.edu/walrus

69
Some Distributed File Systems
Example DFS

Problem to solve
Find out the main characteristics of at least 3
major DFS.
Discuss the pros and cons of each DFS.
Explain the typical application of each DFS.

70
Network File System (NFS)
Network File Systems

De facto standard, Sun published its protocol
specification
to establish a platform independent DFS
Mid 80s
Widely adopted in academia and industry
In NFS each node may act as client or/and as
server
Each server holds a file /etc/exports containing
a list of directories
the server wants to export to other nodes
NFS supports heterogeneous systems (DOS, MacOS,
VMS)
mostly in LANs, but also applicable in WANs
Uses Suns synchronous RPC and XDR
Client blocks until it gets result from file
server

71
Characteristics of NFS
Network File System

Access transparency is reached only within the
Unix area, i.e. Unix-applic.
can access local or remote files with the
common Unix-file-operations.
Location transparency is implemented via the
import mechanism.
The client specifies the mount point within his
local file system
where he wants to import a sub-file-system of
NFS.
Concurrency transparency is not supported.
There are some rudimentary locking mechanisms,
only.
Fault transparency is supported, because a
NFS-server is stateless.
Performance transparency ? With only a slight
load in a LAN
remote accesses are hardly slower than accesses
onto a local file.

72
Sun NFS (1)
Network File System

Architecture
Server exports n 1 directory trees for access
by remote clients
Clients may access exported directory trees by
mounting them to the clients local tree
Diskless clients can mount exported directory
to their root directory
Auto-mount (on the first access)
Remote access is done via Suns RPC

73
SUN NFS (2)
Network File System

Stateless server
RPCs are self-contained
Servers dont need to keep state about previous
requests, i.e. flush all modified data to disk
before returning from RPC call
Robustness
No state to recover
Clients initiate a retry

74
NFS Protocols
Network File System

Mount Protocol
Hand mounting
Boot mounting
Auto mounting
Directory and File Access Protocol

75
Sun NFS Protocols
Network File System

Mounting protocol
Client sends pathname of the exportable directory
to the server (not including the mount place)
If that pathname is legal and the directory is
exportable, then server returns a file handle to
client
File handle contains
Uniquely identifying file system type
Disk
i-node number of directory
Security information

76
Sun NFS Protocols
Network File System

Bootmounting
Scriptfile /etc/rc containing all mounting
commands is executed
Automounting
Set of remote exportable directories are
associated with the client
If client opens the first time one of these
remote files, OS sends a mount message to each
file server, the first replying wins
If 1 server is down during boot mounting client
hangs

Mostly used for read-only files
77
Mount a Remote File System in NFS
Network File System
Result Clients file name space includes remote
files
78
Achieving NFS Transparancy
Network File System

Mount service
Mount remote file systems in the clients file
name space
Mount service process runs on each node to
provide RPC interface for mounting and unmounting
file systems at client
Runs at system boot time or user login time

79
Achieving NFS Transparancy 2
Network File System

Auto mounter
Dynamically mounts file systems
Runs as user-level process on client (demon)
Resolves references to unmounted pathnames by
mounting them on demand
Maintains a table of mount points and the
corresponding server(s) sends probes to
server(s)
Primitive form of replication

80
NFS Transparency ?
Network File System

Early binding
Mount system call attaches remote file systems to
local mount point
Client has to deal with the host only once
But, mount needs to happen before remote files
become accessible

81
NFS Directory and File Access Protocol
Network File System

Directory and file accessing protocol
RPC for read write to files and directories
No open/close, since NFS server is stateless
Each read/write message contains the full path
and file description position
NFS protocol differs from Suns Remote File
System (RFS), where you have to open and close
files explicitly

82
Other NFS Functions
Network File System

NFS file and directory operations
Read, write, create, delete, getattr, etc.
Access control
File and directory access permission (UNIX)
Path name translation
Lookup for each path component
Caching

83
NFS Semantics
Network File System

Unix
You cannot open a file and lock it, so that no
other user can use that file anymore
In a stateless server locks cannot be associated
with opened files, server does not know about
Additionally Network information System (NIS) is
established controlling whether client and server
are really those they are claiming for, however
data are still transferred without encryption

84
NFS Implementation
Network File System
Client
Server
Virtual File System
Local OS
NFS Server
Local Disk
Message from client
85
Virtual File System
Network File System

VFS added to Unix kernel
Location transparent file access
Distinguished between local and remote access
Client
Executing a file system call to determine
whether access is local or remote
Server
NFS server receives request and passes it to
local FS via VFS

86
VFS 2
Network File System

If local, translates file handle to internal file
ids (in Unix i-nodes)
V-node
If file local, reference to files i-node
Of file remote, reference to file handle
File handle uniquely distinguishes a file

File system id
I-node
I-node generation
87
NFS Caching
Network File System

File contents and attributes
Client versus server caching

server
client
88
NFS Server Caching
Network File System

Read
Same as in Unix FS
Caching of file blocks and attributes
Cache replacement using LRU
Write
Write through (as opposed to delayed writes in
conventional Unix FS)
Delayed writes modified blocks written to disk
when buffer space is needed, or by an explicit or
periodical synch operation, and on every close)

89
NFS Client Caching 1
Network File System

Time stamped-based cache invalidation
Read
Cached entries have timestamps with last-modified
time
Blocks assumed to be valid for TTL
TTL specified at mount time
Typically 3 sec for files

90
NFS Client Caching 2
Network File System

Write
Modified pages are marked and flushed to
server at file close or at sync
Consistency
Not always guaranteed
E.g. client modifies file delay for modification
to reach the server 3 sec, window for cache
validation from clients sharing file

91
NFS Cache Validation
Network File System

Validation check performed when
First reference to file after TTL expires
File open or new block fetched from server
Done for all files (even those not being shared)
Expensive
Potentially, every 3 sec get file attributes
If needed invalidate all blocks
Fetch fresh copy when file is accessed again

92
Satayas Design Principles (1990) Lessons
learned from NFS
Lessons Learned from NFS

WS have enough processing power ? its wise to
use it,
instead of the servers processor, whenever
possible
Caching files can save network bandwidth
since they are likely to be used again.
Exploit usage properties
Minimize the dependency on as much of the system
as possible as part of the requirement for
changes
Trust the fewest possible entries
Perform work in batch whenever possible

93
Sprite File System
Sprite FS

Main memory caching on client and server side
Write-sharing consistency guarantees.
Variable sized caches
VM and FS negotiate amount of memory needed
According to caching needs, cache size may adapt

Sprite at Berkeley by John Osterhout, started
1984, finished 91, a test bed for research in
log-structured file systems, striped file
systems, crash recovery, RAID file systems
94
Sprite File System
Sprite FS

Sprite supports concurrent writes by disabling
caching of write-shared files.
If a file is shared, server notifies client that
has opened a file for writing to write modified
blocks back to the server
Server notifies all clients that have opened this
file for reading that this file is no longer
cacheable
Clients then discard all cached blocks, so that
next accesses go though the server

95
Sprite File System
Sprite FS

Sprite servers are stateful
Need to keep state about current accesses
Centralized points for cache consistency
Bottleneck?
Single point of failure?
Tradeoff
consistency versus performance/robustness

96
Andrew File System
Andrew File System

Distributed Computing environment
developed at Carnegie Mellon University (CMU)
(again by Satya)
Campus wide computing system
Between 5 K and 10 k workstations (WSs)
1991 already 800 WSs, 40 servers

97
Design Goals
Andrew File System

Information sharing
Scalability
Key policy caching of whole files at client
Whole file serving
Entire file to client
Whole file caching
Local copy of file cached on clients local disk
Survive clients reboots and server unavailability

98
Andrew File System
Andrew File System

Supports information sharing on a large scale (
1000 WSs)
Uses a session semantics
Provides location transparency and location
independence
First entire file is copied to the local machine
(Venus)
from the server (Vice) when it is opened. If
file will be changed,
it will be copied back to the server when it is
closed again.
The method works because in practice most files
are changed by only one person
Measurements show that only 0.4 of all
changed files
have been updated by more than one user during
one week.

Remark AFS works only on BSD 4.3 Unix platforms
with TCP/IP. Each node in the entire system
needs a local hard disk.
99
File Cache Consistency
Andrew File System

File caches hold recently accessed file records
Caches are consistent when they contain
exact copies for remote data
File-locking prevents simultaneous access to a
file
writing causes the server cached to be updated

100
Whole File Caching
Andrew File System

Local cache contains some most recently used files

Client
Server
(5) file
Subsequent operations on file apply to local
copy On close ltfilegt, if file modified, sent back
to server
101
AFS Structure
Andrew File System
Venus works as a file cache manager
Vice is a multi-threaded server providing shared
file services
N e t w o r k
Clients
102
Implementation 1
Andrew File System

Network of WSs running BSD 4.3 and Mach
Implemented as 2 user-level processes
Vice runs at each Andrew server
Venus runs at each Andrew client

103
Implementation 2
Andrew File System

Modified BSD 4.3 Unix kernel
At client, intercept file system calls (e.g.
open, close, etc.) and pass them to Venus when
referring to shared, non cached files
Venus manages the client cache partition on
local disk
LRU replacement policy
Cache large enough for 100s of average sized
files

104
File Sharing
Andrew File System

Files are shared or local
Shared files
Utilities (/bin, /lib) infrequently updated
files or files accessed by a single user (e.g.
users home directory)
Stored on servers and cached on clients
Local copies remain valid for long time
Local files
Temporary files (/tmp) and files used for
start-up
Stored on local machines disk

105
AFS Components
Andrew File System
Namespace Each local file system can be set up
differently However, the shared file system has a
universal look
Shared files use symbolic links
106
AFS Caching 1
Andrew File System

AFS-1 timestamp-based cache invalidation
ASF 2 dito use of callbacks
When serving file, Vice server promises to notify
Venus client whenever a file will be modified
Still a stateless server?
Callback is stored with cached file
Valid
Canceled when client is notified by server thet
file has been modified

107
ASF Caching 2
Andrew File System

Callbacks implemented using RPC
When accessing a file, Venus checks if file
exists and if callback is valid if canceled,
fetches copy from server
Failure recovery
When restarting after failure, Venus checks each
cached file by sending a validation request to
server
Also periodic checks in case of communication
failures

108
AFS Caching 3
Andrew File System

After file close time, Venus on client modifying
file sends update to Vice server
Server updates its own copy and sends callback
cancellation to all clients caching file
Consistency?
Concurrent updates?

109
Andrew File Validation
Andrew File System

Older AFS versions
On open Venus accesses Vice to see if its copy
of the file is still valid.
This causes a substantial delay even if the
copy is valid.
Vice is stateless
Newer AFS versions
Vice maintains lists of valid copies.
If a file is modified Vice invalidates other
copies.
On open if Venus has a valid copy it can open
it immediately.
If Venus crashes it has to invalidate its
version or check their validity.

110
AFS Replication
Andrew File System

Read-only replication
Only read-only files allowed to be replicated at
several servers

111
File Identifiers
Andrew File System
A volume is a collection of files being managed
together to allow ease of movement A partition
may consist of n 1 volumes
Volume number
Vnode number
Unique number

Volume number
to uniquely identify a single volume in the
system

Vnode number
to identify a file within a volume ( ? Unix
inodes)
can be reused if old file is deleted

Unique number
to cater reused Vnode numbers if there is still
an old Vnode number

112
Example of a System Call (fopen)
Andrew File System
Application requests fopen(filename, ...)?
Venus parses filename. If its a local file,
fopen() is treated in a similar way as in
Unix. However, if it starts with /afs Venus has
to check several things Is the requested file
already in the local file cache (see /cache).
If so, it checks whether this file is
still valid or currently invalid If still valid,
Venus returns the file descriptor to the
application If already invalid Venus compares
the timestamps of the local copy
with the server file and if the local copy is
outdated, Venus sends a request to Vice to
download this file to the local cache If the
file is not the local cache, Venus dens a request
to Vice to download this file to the local
cache.
113
Security within AFS
Andrew File System
In AFS all traffic between clients and servers is
encrypted. Access to directories is controlled
via ACLs. File access is controlled as in Unix
(9 rwx bits for owner, group and others) (for
compatibility reasons to Unix). Newer versions
of AFS use Kerberos authentication systems and
also offer ACLs for file accesses
Remark More on AFS see www.homepages.uel.ac.uk/5
291n/afs.doc.html
114
Coda
Coda File System

Evolved from AFS
Goal constant data availability
Improved replication
Replication of read-write volumes
Disconnected operation mobility
Extension of AFSs whole file caching mechanism
Access to shared file repository (servers) versus
relying on local resources when server not
available

115
Replication in Coda
Coda File System

Replication unit file volume (set of files)
Set of replicas of file volume volume storage
group (VSG)
Subset of replicas available to client AVSG
Different clients, different AVSGs
AVSG membership changes as server availability
changes
On write when file is closed, copies of modified
file broadcast to AVSG

116
Optimistic Replication
Coda File System

Primary goal Availability
Replicated files are allowed to be modified even
in the presence of partitions or during
disconnected operation

117
Disconnected Operation
Coda File System

AVSG
Network/server failures or host on the move
Rely on local cache to serve all needed files.
Loading the cache
User intervention list of files to be cached
Learning usage patterns over time
Upon reconnection, cached copies validated
against servers files

118
Normal and Disconnected Operation
Coda File System

During normal operation
Coda behaves like AFS
Cache miss transparent to user only performance
penalty
Load balancing across replicas
Cost replica consistency cache consistency
Disconnected operation
No replicas are accessible
Cache miss prevents further progress
Need to load cache before disconnection

119
Replication and Caching
Coda File System

Coda integrates server replication and client
caching
On cache hit and valid data Venus does not need
to contact the server
On cache miss Venus gets data from AVSG server,
i.e. the preferred server (PS)
PS is chosen at random or based upon proximity,
load
Venus also contacts other AVSG servers and
collects their versions if conflict, abort
operation if replicas stale, update them off-line

120
Summary Caching
Coda File System

Improves performance in terms of
response time,
availability (disconnected operations), and
fault tolerance
Price consistency
Consistency mechanisms
Time stamp- based invalidation
Callbacks

121
Leases
Leases

Time-based cache consistency protocol
Contract between client and server
Lease grants holder control over writes to
corresponding data item during lease term
Server must obtain approval from holder of lease
before modifying data
When holder grants approval for write, it
invalidates its local copy

122
Other Distributed File Systems
Example DSF

Plan 9 (Pike et al)
xFS (based on Berkeleys LFS)
Secure file System (SFS) (Maziere et al)

123
Log-Structured File System
Log-Structured File System

Built as extension to Sprite FS (Sprite LFS)
New disk storage technique that tries to use
disks more efficiently
Assumes main memory cache for files
Larger memory makes cache more efficient in
satisfying reads
Most of the working set is cached
Thus, most disk access costs due to writes

124
Main Idea
Log-Structured File System

Batch multiple writes in file cache
Transform may small writes into 1 large one
Close to disks full bandwidth utilization
Write to disk in one write in a contiguous region
of the disk called log
Eliminates seeks (i.e. reduces access time)
Improves crash recovery
Sequential structure of log
Only most recent portion of log needs to be
examined

125
LSFS Structure
Log-Structured File System

2 key functions
How to retrieve information from log
How to manage free disk space

126
File Location and Retrieval 1
Log-Structured File System

Allows random access to information in the log
Goal is to match or increase read performance
Keeps indexing structures with log
Each file has i-node containing
File attributes (type, owner, permissions9
Disk address of first 10 blocks
Files gt 10 block, i-node contains pointer to more
data

127
File Location and Retrieval 2
Log-Structured File System

In Unix FS
Fixed mapping between disk address and file
i-node disk address as function of the file id
In LFS
I-nodes written to log
I-node map keeps current location of each i-node
I-node maps usually fit main memory cache

i-nodes disk address

Write a Comment

User Comments (0)