Distributed Systems - PowerPoint PPT Presentation

1 / 127
About This Presentation
Title:

Distributed Systems

Description:

... accessing and sharing of files across machine boundaries. Offer ... Transaction semantics (locking files: share for reads and ... Files are always ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 128
Provided by: liefl
Category:

less

Transcript and Presenter's Notes

Title: Distributed Systems


1
Distributed Systems
Lecture 11 Distributed File Systems 2. July, 2002
2
Schedule of Today
Distributed File Systems
  • Distributed File Systems
  • Implementation of Distributed File Systems
  • File Usage and FS Structure
  • Caching and Replication
  • Web Test Case for Replication
  • Example DFS
  • Suns Network File System (NFS)
  • Sprite File System
  • Andrew File System (AFS)
  • Coda
  • Leases
  • Log Structured File System

3
File Systems
Introduction
  • Goal
  • Provide a set of primitives that support users to
  • keep information on persistent media
  • e.g. disk, tapes, etc.
  • manage accesses to files and directories
  • name files or directories
  • offer abstractions for users from details of
    storage access and management

4
Distributed File Systems
Introduction
  • Promote accessing and sharing of files across
    machine boundaries
  • Offer transparency to users
  • Make diskless machines viable
  • Increase disk space availability by avoiding
    duplication
  • Balance load among multiple file servers
  • Offer mobility

5
Transparency
Introduction
  • Access transparency
  • Location transparency
  • Concurrency transparency
  • Failure transparency
  • Performance transparency

6
Access Transparency
Introduction
  • Users, i.e. application programmers do not notice
    whether file system is local or distributed
  • Accesses to local or remote files are the same,
    i.e. applications running on a local file systems
    still run under a distributed file system

7
Location Transparency
Introduction
  • User does not have to know the exact location of
    a file
  • Files can be migrated without affecting users

8
Concurrency Transparency
Introduction
  • Concurrent accesses to a file from different
    users, i.e. applications should not lead to
    inconsistencies of that file
  • To achieve this goal you have to use the concept
    of transactions

9
Failure Transparency
Introduction
  • After a client or after a server has crashed the
    file system should work as before

10
Performance Transparency
Introduction
  • Delays due to remote access should be as minimal
    as possible
  • Delays due to remote access should not depend on
    the current load

11
Distributed File Systems
Distributed File Systems
  • File and directory naming
  • Semantics of file sharing
  • Implementation considerations
  • Caching
  • Update protocols
  • Replication

12
Naming
Distributed File Systems
  • File names for file retrieval
  • Name service states alphabet and syntax of valid
    file names
  • Some file systems may offer names consisting of
  • ltfile namegt.extension to differ file types
  • (Others FS deal file type as a file attribute)

13
Directory
Distributed File Systems
  • Flat directories
  • Hierarchical directories
  • Directories may contain files and other sub
    directories
  • Directory tree
  • Internal node may never be a file
  • Complete file name is a path name
  • Relative path name
  • Absolute path name

14
Distributed File Systems
Distributed File Systems
Assumption Hierarchical directory tree,
consisting of local and remote directories.
Whats the view of a user if he/she wants to
access a non local file?
  • 3 Possibilities
  • additional node name, e.g. hostxyzdir1/dir2//fi
    le
  • (easy to implement but no transparency at
    all)
  • mounting of remote directory-sub trees, there is
    transparency,
  • (however each node may have a different view)
  • single global name space, that looks the same on
    all nodes,
  • (full naming transparency)

15
Additional File Descriptors
Distributed File Systems
  • Instead of symbolic file names the system uses
    internal file descriptors (unique file identifier
    (UFID), i-nodes, etc.)
  • UFID are short, constant in length ? easing the
    use for system programs
  • Directory has to map symbolic names to UFIDs
  • A UFID may consist of

File number (32 bit)
Random number (32 bit)
16
Possible File Sharing Semantics
Distributed File Systems
  • One-copy semantics (à la Unix)
  • Updates are written to the single copy and are
    available immediately (without regarding delays
    due to file caching)
  • Session Semantics
  • Copy the file on open, work on your local copy,
    and copy back on close
  • No Updates of Files
  • Any update of a file causes the creation of a
    new file
  • Using simple locking
  • System offers read- and write-locks, users have
    to deal with
  • Serializability
  • Transaction semantics (locking files share for
    reads and exclusive access for writes, but
    without interfering a user)

17
File System Implementation
Implementation
  • File Usage
  • System Structure
  • Caching
  • Replication

18
File Usage
Implementation
  • Result of Satayaranans analysis at CMU (1990)
  • File size lt 10 k
  • ? Feasible, to transfer entire files instead
    of block transfers
  • gt 80 read operation (most read writes are
    sequential)
  • Life time 1 usage
  • ? Create file on client side and wait if it
    really will survive,
  • reducing network traffic a lot
  • Few files are shared
  • ? Client caching is favorable
  • Average application uses only a few files
  • Several file classes with different behavior
  • ? Provide an adequate solution for each of them

Similar resulty by Mullender Tanenbaum and the
Unix-Team
19
File System Structure
Implementation
  • Clients and Servers on different machines?
  • Distinct directory server and file server or not?
  • Discuss pros and cons of both designs
  • 1 directors server or several directory servers?
  • Iterative lookup on several directory servers
  • Automatic lookup on several directory servers

20
Iterative Lookup
Implementation
Server 1
a
Server 2
Client
b
Server 3
c
file
Analysis 1. Bad performance Needs a couple of
messages. 2. Forces clients to be aware of which
server holds which file or directory
21
Automatic Lookup
Implementation
Server 1
a
Lookup a/b/c
Server 2
Client
b
Server 3
c
file
Analysis Fewer messages, thus more efficient,
but cannot use RPCs, since a different server
replies to the initial call.
22
State of the DFS
Implementation
In local systems FCB is created whenever a user
opens a file. FCB contains all relevant state
information (i.e. file pointer etc.).
  • In a DFS we can distinguish three session like
    phases
  • creation of a usage-relation
  • the using of the file itself and
  • deletion of the usage-relation
  • In a DFS file server may provide the above
    state-information or not.
  • Thus we can distinguish between
  • stateless server and
  • stateful server

23
Stateless versus Stateful Server
Implementation
  • Advantages of stateless servers
  • If server crashes it can restart immediately
    cause no state information was lost.
  • Also if client crashes no additional overhead on
    server side,
  • the server only knows a client during the
    clients file request
  • No need for open and close and their related
    messages across the net.
  • No additional space for this status information
    on the server, this may pay off
  • if server has to deal with many concurrent
    clients
  • No limit on the number of concurrently opened
    files.
  • Advantages of stateful servers
  • Shorter request messages, no need for symbolic
    file name in each request
  • Better performance, easier to implement a read
    ahead
  • Easier to establish idempotency (controllable
    via sequence numbers)
  • Possibility, to set file locks in order to
    establish a certain consistency model

24
File System Caches in a DFS
Implementation
  • No caches at all, all files only on the servers
    disk
  • Servers disk should be large enough
  • Files are always accessible to all clients
  • No additional memory overhead, no consistency
    problem

server
client
Performance Problem !!!
file
Server disk(s)
Local disk
25
File System Caches in a DFS
Implementation
  • Server uses parts of its main memory as a file
    system cache for all most recently used clients
    files
  • Still the data transfer via the network, but
    hopefully most accesses to the servers disk is
    avoided

server
client
Cache 1
Conceptual Problems?
file
Server disk(s)
Local disk
26
Servers Main Memory as Cache
Implementation
  • Cacheable units
  • Complete files
  • File portions, e.g. blocks or chunks
  • Replacement algorithm
  • What to do if cache fills up
  • LRU ?
  • FIFO ?
  • No additional consistency problems from the users
    point of view

27
File System Caches in a DFS
Implementation
  • To minimize time consuming data tranfers via the
    net you can use caching on the client side (in
    main memory or in local disk, a matter of
    performance)
  • Consistency problems due to cache implementation

server
client
Cache 1
Cache 2
Cache 3
file
Server disk(s)
Local disk
28
Client Cache Implementation
Implementation
  • Placement of Caches
  • Clients main memory
  • Cache within the user address space
  • Cache within the kernel
  • Separate user level cache manager
  • Clients disk
  • Improves temporally the availability of the file
    copy

29
Client Cache in Main Memory (1)
Implementation
Cache inside the UAS managed by a library holds
the most recently used files per UAS. Only, if
file is reused by the same UAS, it may still be
in the cache. If task exits, contents of
modified files are written back to the server,
cache will be freed.
cache hit
Analysis Few overhead, but only valuable, if
file is reused by the same task, (e.g.. data
base manager process), in most other tasks a file
is opened once and closed once, so caching
within the library wins nothing..
30
Client Cache in Main Memory (2)
Implementation
Cache inside the kernel is used by all
applications. For each file access a kernel call
is necessary, but cache may survive a task.
UAS
client
cache hit
Analysis Unix pipelining of user task is
supported very efficiently, e.g. ls count, or
a 2-phase compiler
31
Client Cache in Main Memory (3)
Implementation
UAS
User level cache manager free kernel From keeping
caches for various clients. Cache manager is
isolated and easier to test. Kernel could
decide to page out some of the pages of the
cache manager ? Cache hit results in 1 or more
page faults
cache manager
client
cache hit
  • Compare the three methods concerning
  • the number of RPCs involved in cache hits and
    cache misses and
  • applicability to µ-kernel systems and

32
Cache Consistency
Implementation
Using caches in a DFS ? consistency problem. To
reduce the network traffic the following policy
have been proposed
  • Write-Through
  • The cache is used for reading, only.
  • Writing is immediately done to the server, i.e.
    into the original file.
  • Delayed Write
  • Several write operations are collected and
    passed to the server in a burst.
  • Write-on-Close
  • File updates are delayed until the file is to be
    closed.

33
Replication
Replication
  • Objective
  • Store information at multiple sites in a DS

Why? To increase
  • Availability (its there when you want it)
  • Reliability (it doesnt get destroyed)
  • Performance (its always nearby)
  • How? Replication is
  • User initiated (aware of replication process)
  • Automatic (transparent replication)

34
Replication
From a Centralized Server ...
server
Network
Remark As long as only clients or their subnets
fail the rest still can use the centralized
server.
35
Replication
From a Centralized Server ...
server
Network
Remark But what to do when the server or its
subnet crashes? Rien ne va plus!
36
Replication
to Decentralized Servers
Remark As long as only one replicated server or
its entire subnet fails, all clients
from other subnets still may get their
services via their nearby servers.
37
Explicit File Replication
Replication
servers
Is this a good idea?
s2
s1
  • Not at all
  • neither user friendly
  • nor efficient

s3
38
Analysis of Explicit File Replication
Replication
  • Not very user friendly
  • you have to find all servers (may vary in place
    and amount)
  • if at copy time one server is down, user may
    forget to copy
  • file ltprog.cgt when the failed server will be
    available again
  • Not very efficient
  • to be sure that all replicas have the same state
    we have to use
  • a transaction mechanism (two or three phase
    commit protocol)
  • when the last copy has been done successfully
    (has committed)
  • the information is available again
  • what to do when a site fails for a longer period
    of time?

39
Lazy Replication
Replication
s2
s1
c1
At some time t0 n Dt, n gt 1, replication
manager on the server s1 issues the other lazy
replicas gt for a while the servers may have
different states!!!
s3
40
Replication using a Group
Replication
group of servers
s2
s1
c1
s3
41
Replication and Update-Protocols
Replication
  • Approaches
  • write to all-available replicas
  • primary/backup
  • quorum consensus

42
Write to all-available Update-Protocol
Replication
s2
prog.c
s1
c1
prog.c
s3
prog.c
If update process fails for s3 the replicated
servers s1, s2 respectively s3 have different
states gt Inconsistency! Characteristic of this
protocol Cheap reads, but expensive writes!
43
Primary/Backup
Replication
  • Possible options
  • Backups are maintained for availability only
  • Backups can improve performance for reads,
  • What is the query semantic?
  • How can we achieve one copy serializability?
  • Client interacts with one copy, and if it is a
    backup,
  • these updates are propagated to the primary
  • What is the query semantics with regard to our
    own updates?
  • Clients who dont need actual data can read from
    any site

44
Primary/Backup
Replication
  • Any client has one primary server (hopefully a
    nearby one and a powerful one) within the total
    system. The other servers only provide as
    backups.
  • Any request of the client goes to his primary
    server.
  • If the primary fails, i.e. a failover occurs,
    then one of the backup servers will become the
    new primary
  • Consequences
  • There is at most one primary at any time
  • Every client ci has a single site si to which it
    send requests
  • Any client message arriving at a backup server
    is ignored

45
Alsberg and Day Protocol
Replication
s2
prog.c
primary s1
c1
prog.c
46
Alsberg and Day Protocol
Replication
s2
prog.c
primary s1
c1
prog.c
How to meet and detect possible failures? Server
send periodically I am alive messages, use
timeout to detect possible crashes Backup takes
over control and recruits the new backup
47
Tandem Non-Stop Protocol
Replication
  • Symmetric Pair Policy
  • One primary process and one backup process
  • joined by redundant links
  • Client sends request to the primary
  • Primary forwards updates to the backup
  • Backup acknowledges to primary, only
  • Primary acknowledges to client
  • Failures detected by timeout.
  • Tolerates
  • node crashes
  • one link failure

48
Anti-Entropy Method (Golding 1992)
Replication
  • State kept by replicated servers can be weakly
    consistent,
  • i.e. replicas are allowed to diverge
    temporarily.
  • They will eventually come to agreement.
  • From time to time, a server picks another server
    and these 2 servers exchange updates and converge
    to same state
  • Total ordering is obtained after getting one
    message from every server (directly)
  • Lamport timestamps are used to order messages

49
Anti-Entropy Method
Replication
knowledge at s1
knowledge at s2
A B C
A B C
1
3
5
12
1
3
2
5
6
2
9
11
2
3
4
2
12
3
summary s1
summary s2
2
11
4
2
Remark Numbers in the objects refer to Lamport
time stamps.
50
Anti-Entropy Method
Replication
knowledge at s2
knowledge at s1
A B C
A B C
1
3
5
12
1
3
2
5
6
2
9
11
2
3
4
2
12
3
summary s2
summary s1
2
11
4
2
12
11
4
summary after merge
51
Eventual Path Propagation
Replication
Phase 1 Partitoning
mx
my
mx
my
mx
my
52
Eventual Path Propagation
Replication
Phase 2 Partitoning
mx
my
mx
my
mx
my
53
Eventual Path Propagation
Replication
Phase 3 Merging
mx my
mx my
mx
mx my
mx
my
54
Eventual Path Propagation
Replication
Further merging
mx my
mx my
mx
mx my
mx
mx my
55
Analysis
Replication
  • All primary/backup protocols have some
  • disadvantages
  • if primary fails, no updates any valid longer
  • doesnt tolerate network partitions
  • Gifford published 1979 another protocol based on
    majority voting, named quorum algorithm.

56
Quorum Algorithm
Replication
  • Any client has to acquire permission of some
    subset of the replicated servers before reading
    from or writing to a replicated file.
  • Readers need a read-quorum (i.e. at least Nr
    servers must accept)
  • Writes need a write-quorum (i.e. at least Nw
    servers must accept).
  • Simplification There are N server, then
  • Nr Nw gt N.
  • Any write is coupled with an update of the files
    version number!!!!

57
Example
Replication
Nr 3 and Nw 10
A B C D E F G H I J K L
Suppose at time t0, the red servers C, D, , and
L have been updated. If a client wants to read at
t0 ?t, he needs a least 3 sites, e.g. site A,
B, and C. Although, sites A and B have majority
with their old version, client can detect,
that sites C version number is newer, thus
client reads from C.
58
Web Test Case for Replication
Web Replication
  • Observation
  • Explosion of the web has led to a situation where
  • majority of the traffic on the Internet is web
    related.
  • Goal
  • Offer a subset of web servers spread all over the
    world
  • lowering the long distance traffic
  • The providers viewpoint
  • As few Web Servers as possible without bothering
    clients
  • The clients viewpoint
  • As many nearby Web Servers to get quick answers

59
Web Test Case for Replication
Web Replication
  • Objective
  • Try to establish web replication
  • where each of the replicas resides in a different
    part of the network
  • Problem
  • How may the clients web browser automatically
    and transparently
  • contact the bets replica server, taking into
    account
  • Network topology which replica is closest to the
    client
  • Server availability which web servers are
    currently active
  • Server load which one is able to run the most
    rapid response

60
Provider 1 Single Web Server
Web Replication
Analysis A popular web-site being served only
from one location gt frequent, long-distance
network transfer gt high-response-times for user
requests and wasting available network
bandwidth Moreover danger of single point of
failure
Web server Clients Web queries and responses
61
Solution Caching and Replication
Web Replication
  • Caching
  • Server side caching (Squid, Harvest, Apache)
  • Client side caching (proxy, browser)
  • Replication
  • Cluster replication
  • Wide area replication
  • Wide area, cluster replication
  • Combination of Caching and Replication

62
Caching
Web Replication
  • Analysis
  • lower latency
  • better network untilization
  • freshness
  • some things cannot be cached (server side
    programming, CGI scripts etc.)
  • some things are not meant to be cached
    (advertising)

63
Cluster Replication
Web Replication
  • Analysis
  • improves performance - load is shared by several
    servers
  • improves availability of the web server as a
    whole
  • moderate effort is required to set up and
    maintain
  • still a single point of failure in the network
  • still high latency for clients that are distant
    (network-wise)

64
Wide-Area (Cluster) Replication
Web Replication
  • Analysis
  • improves performance
  • load is shared by several servers
  • clients access the best server
  • improves availability of the web service
  • network availability
  • server availability
  • complex to implement, deploy and maintain

65
The Technical Challenge
Web Replication
  • Making wide-area cluster replication work in a
    Web environment
  • get the nearest server
  • based on network topology
  • from the nearest server to the best server
  • server availability
  • server load
  • do it automatically and seamlessly
  • HTTP redirect method - application layer
  • DNS round trip method - session layer
  • shared IP address method - network layer

Simple and limited method works without knowledge
of the network topology and the location of the
client within that network, i.e. you might get
the overall best server for all potential
clients, but not the best server for a specific
client, I.e. an overloaded server on the same LAN
may be better than a very fast and unloaded
server in New South Wales.
For more details on this method see
http//www.cnds.jhu.edu
66
DNS Round Trip Times Method
Web Replication
ns.bar.edu
foo.bar.edu
Do I know www.cnds.jhu.edu? NO Do I know DNS for
cnds.jhu.edu? NO Di I know DNS for jhu.edu? Yes
128.220.1.5
time
67
DNS Round Trip Times Method
Web Replication
  • DNSnearby Web server Local DNS serving
    all local clients potential replica
    selected replica
  • No special requirements
  • Convergence time is linear with the number of
    replica

68
Practical Implementation Walrus
Web Replication
  • A Wide Area Load Re-balancing User-transparent
    System
  • No change to the Web server
  • No change to the Web client
  • No change to the infrastructure (ISP, DNS, OS)
  • Implemented in a Unix environment,
  • but can be ported to other environments
  • see http//www.cnds.jhu.edu/walrus

69
Some Distributed File Systems
Example DFS
  • Problem to solve
  • Find out the main characteristics of at least 3
    major DFS.
  • Discuss the pros and cons of each DFS.
  • Explain the typical application of each DFS.

70
Network File System (NFS)
Network File Systems
  • De facto standard, Sun published its protocol
    specification
  • to establish a platform independent DFS
  • Mid 80s
  • Widely adopted in academia and industry
  • In NFS each node may act as client or/and as
    server
  • Each server holds a file /etc/exports containing
    a list of directories
  • the server wants to export to other nodes
  • NFS supports heterogeneous systems (DOS, MacOS,
    VMS)
  • mostly in LANs, but also applicable in WANs
  • Uses Suns synchronous RPC and XDR
  • Client blocks until it gets result from file
    server

71
Characteristics of NFS
Network File System
  • Access transparency is reached only within the
    Unix area, i.e. Unix-applic.
  • can access local or remote files with the
    common Unix-file-operations.
  • Location transparency is implemented via the
    import mechanism.
  • The client specifies the mount point within his
    local file system
  • where he wants to import a sub-file-system of
    NFS.
  • Concurrency transparency is not supported.
  • There are some rudimentary locking mechanisms,
    only.
  • Fault transparency is supported, because a
    NFS-server is stateless.
  • Performance transparency ? With only a slight
    load in a LAN
  • remote accesses are hardly slower than accesses
    onto a local file.

72
Sun NFS (1)
Network File System
  • Architecture
  • Server exports n 1 directory trees for access
    by remote clients
  • Clients may access exported directory trees by
    mounting them to the clients local tree
  • Diskless clients can mount exported directory
    to their root directory
  • Auto-mount (on the first access)
  • Remote access is done via Suns RPC

73
SUN NFS (2)
Network File System
  • Stateless server
  • RPCs are self-contained
  • Servers dont need to keep state about previous
    requests, i.e. flush all modified data to disk
    before returning from RPC call
  • Robustness
  • No state to recover
  • Clients initiate a retry

74
NFS Protocols
Network File System
  • Mount Protocol
  • Hand mounting
  • Boot mounting
  • Auto mounting
  • Directory and File Access Protocol

75
Sun NFS Protocols
Network File System
  • Mounting protocol
  • Client sends pathname of the exportable directory
    to the server (not including the mount place)
  • If that pathname is legal and the directory is
    exportable, then server returns a file handle to
    client
  • File handle contains
  • Uniquely identifying file system type
  • Disk
  • i-node number of directory
  • Security information

76
Sun NFS Protocols
Network File System
  • Bootmounting
  • Scriptfile /etc/rc containing all mounting
    commands is executed
  • Automounting
  • Set of remote exportable directories are
    associated with the client
  • If client opens the first time one of these
    remote files, OS sends a mount message to each
    file server, the first replying wins
  • If 1 server is down during boot mounting client
    hangs

Mostly used for read-only files
77
Mount a Remote File System in NFS
Network File System
Result Clients file name space includes remote
files
78
Achieving NFS Transparancy
Network File System
  • Mount service
  • Mount remote file systems in the clients file
    name space
  • Mount service process runs on each node to
    provide RPC interface for mounting and unmounting
    file systems at client
  • Runs at system boot time or user login time

79
Achieving NFS Transparancy 2
Network File System
  • Auto mounter
  • Dynamically mounts file systems
  • Runs as user-level process on client (demon)
  • Resolves references to unmounted pathnames by
    mounting them on demand
  • Maintains a table of mount points and the
    corresponding server(s) sends probes to
    server(s)
  • Primitive form of replication

80
NFS Transparency ?
Network File System
  • Early binding
  • Mount system call attaches remote file systems to
    local mount point
  • Client has to deal with the host only once
  • But, mount needs to happen before remote files
    become accessible

81
NFS Directory and File Access Protocol
Network File System
  • Directory and file accessing protocol
  • RPC for read write to files and directories
  • No open/close, since NFS server is stateless
  • Each read/write message contains the full path
    and file description position
  • NFS protocol differs from Suns Remote File
    System (RFS), where you have to open and close
    files explicitly

82
Other NFS Functions
Network File System
  • NFS file and directory operations
  • Read, write, create, delete, getattr, etc.
  • Access control
  • File and directory access permission (UNIX)
  • Path name translation
  • Lookup for each path component
  • Caching

83
NFS Semantics
Network File System
  • Unix
  • You cannot open a file and lock it, so that no
    other user can use that file anymore
  • In a stateless server locks cannot be associated
    with opened files, server does not know about
  • Additionally Network information System (NIS) is
    established controlling whether client and server
    are really those they are claiming for, however
    data are still transferred without encryption

84
NFS Implementation
Network File System
Client
Server
Virtual File System
Local OS
NFS Server
Local Disk
Message from client
85
Virtual File System
Network File System
  • VFS added to Unix kernel
  • Location transparent file access
  • Distinguished between local and remote access
  • Client
  • Executing a file system call to determine
  • whether access is local or remote
  • Server
  • NFS server receives request and passes it to
    local FS via VFS

86
VFS 2
Network File System
  • If local, translates file handle to internal file
    ids (in Unix i-nodes)
  • V-node
  • If file local, reference to files i-node
  • Of file remote, reference to file handle
  • File handle uniquely distinguishes a file

File system id
I-node
I-node generation
87
NFS Caching
Network File System
  • File contents and attributes
  • Client versus server caching

server
client
88
NFS Server Caching
Network File System
  • Read
  • Same as in Unix FS
  • Caching of file blocks and attributes
  • Cache replacement using LRU
  • Write
  • Write through (as opposed to delayed writes in
    conventional Unix FS)
  • Delayed writes modified blocks written to disk
    when buffer space is needed, or by an explicit or
    periodical synch operation, and on every close)

89
NFS Client Caching 1
Network File System
  • Time stamped-based cache invalidation
  • Read
  • Cached entries have timestamps with last-modified
    time
  • Blocks assumed to be valid for TTL
  • TTL specified at mount time
  • Typically 3 sec for files

90
NFS Client Caching 2
Network File System
  • Write
  • Modified pages are marked and flushed to
  • server at file close or at sync
  • Consistency
  • Not always guaranteed
  • E.g. client modifies file delay for modification
    to reach the server 3 sec, window for cache
    validation from clients sharing file

91
NFS Cache Validation
Network File System
  • Validation check performed when
  • First reference to file after TTL expires
  • File open or new block fetched from server
  • Done for all files (even those not being shared)
  • Expensive
  • Potentially, every 3 sec get file attributes
  • If needed invalidate all blocks
  • Fetch fresh copy when file is accessed again

92
Satayas Design Principles (1990) Lessons
learned from NFS
Lessons Learned from NFS
  • WS have enough processing power ? its wise to
    use it,
  • instead of the servers processor, whenever
    possible
  • Caching files can save network bandwidth
  • since they are likely to be used again.
  • Exploit usage properties
  • Minimize the dependency on as much of the system
    as possible as part of the requirement for
    changes
  • Trust the fewest possible entries
  • Perform work in batch whenever possible

93
Sprite File System
Sprite FS
  • Main memory caching on client and server side
  • Write-sharing consistency guarantees.
  • Variable sized caches
  • VM and FS negotiate amount of memory needed
  • According to caching needs, cache size may adapt

Sprite at Berkeley by John Osterhout, started
1984, finished 91, a test bed for research in
log-structured file systems, striped file
systems, crash recovery, RAID file systems
94
Sprite File System
Sprite FS
  • Sprite supports concurrent writes by disabling
    caching of write-shared files.
  • If a file is shared, server notifies client that
    has opened a file for writing to write modified
    blocks back to the server
  • Server notifies all clients that have opened this
    file for reading that this file is no longer
    cacheable
  • Clients then discard all cached blocks, so that
    next accesses go though the server

95
Sprite File System
Sprite FS
  • Sprite servers are stateful
  • Need to keep state about current accesses
  • Centralized points for cache consistency
  • Bottleneck?
  • Single point of failure?
  • Tradeoff
  • consistency versus performance/robustness

96
Andrew File System
Andrew File System
  • Distributed Computing environment
  • developed at Carnegie Mellon University (CMU)
    (again by Satya)
  • Campus wide computing system
  • Between 5 K and 10 k workstations (WSs)
  • 1991 already 800 WSs, 40 servers

97
Design Goals
Andrew File System
  • Information sharing
  • Scalability
  • Key policy caching of whole files at client
  • Whole file serving
  • Entire file to client
  • Whole file caching
  • Local copy of file cached on clients local disk
  • Survive clients reboots and server unavailability

98
Andrew File System
Andrew File System
  • Supports information sharing on a large scale (
    1000 WSs)
  • Uses a session semantics
  • Provides location transparency and location
    independence
  • First entire file is copied to the local machine
    (Venus)
  • from the server (Vice) when it is opened. If
    file will be changed,
  • it will be copied back to the server when it is
    closed again.
  • The method works because in practice most files
  • are changed by only one person
  • Measurements show that only 0.4 of all
    changed files
  • have been updated by more than one user during
    one week.

Remark AFS works only on BSD 4.3 Unix platforms
with TCP/IP. Each node in the entire system
needs a local hard disk.
99
File Cache Consistency
Andrew File System
  • File caches hold recently accessed file records
  • Caches are consistent when they contain
  • exact copies for remote data
  • File-locking prevents simultaneous access to a
    file
  • writing causes the server cached to be updated

100
Whole File Caching
Andrew File System
  • Local cache contains some most recently used files

Client
Server
(5) file
Subsequent operations on file apply to local
copy On close ltfilegt, if file modified, sent back
to server
101
AFS Structure
Andrew File System
Venus works as a file cache manager
Vice is a multi-threaded server providing shared
file services
N e t w o r k
Clients
102
Implementation 1
Andrew File System
  • Network of WSs running BSD 4.3 and Mach
  • Implemented as 2 user-level processes
  • Vice runs at each Andrew server
  • Venus runs at each Andrew client

103
Implementation 2
Andrew File System
  • Modified BSD 4.3 Unix kernel
  • At client, intercept file system calls (e.g.
    open, close, etc.) and pass them to Venus when
    referring to shared, non cached files
  • Venus manages the client cache partition on
    local disk
  • LRU replacement policy
  • Cache large enough for 100s of average sized
    files

104
File Sharing
Andrew File System
  • Files are shared or local
  • Shared files
  • Utilities (/bin, /lib) infrequently updated
    files or files accessed by a single user (e.g.
    users home directory)
  • Stored on servers and cached on clients
  • Local copies remain valid for long time
  • Local files
  • Temporary files (/tmp) and files used for
    start-up
  • Stored on local machines disk

105
AFS Components
Andrew File System
Namespace Each local file system can be set up
differently However, the shared file system has a
universal look
Shared files use symbolic links
106
AFS Caching 1
Andrew File System
  • AFS-1 timestamp-based cache invalidation
  • ASF 2 dito use of callbacks
  • When serving file, Vice server promises to notify
    Venus client whenever a file will be modified
  • Still a stateless server?
  • Callback is stored with cached file
  • Valid
  • Canceled when client is notified by server thet
    file has been modified

107
ASF Caching 2
Andrew File System
  • Callbacks implemented using RPC
  • When accessing a file, Venus checks if file
    exists and if callback is valid if canceled,
    fetches copy from server
  • Failure recovery
  • When restarting after failure, Venus checks each
    cached file by sending a validation request to
    server
  • Also periodic checks in case of communication
    failures

108
AFS Caching 3
Andrew File System
  • After file close time, Venus on client modifying
    file sends update to Vice server
  • Server updates its own copy and sends callback
    cancellation to all clients caching file
  • Consistency?
  • Concurrent updates?

109
Andrew File Validation
Andrew File System
  • Older AFS versions
  • On open Venus accesses Vice to see if its copy
    of the file is still valid.
  • This causes a substantial delay even if the
    copy is valid.
  • Vice is stateless
  • Newer AFS versions
  • Vice maintains lists of valid copies.
  • If a file is modified Vice invalidates other
    copies.
  • On open if Venus has a valid copy it can open
    it immediately.
  • If Venus crashes it has to invalidate its
    version or check their validity.

110
AFS Replication
Andrew File System
  • Read-only replication
  • Only read-only files allowed to be replicated at
    several servers

111
File Identifiers
Andrew File System
A volume is a collection of files being managed
together to allow ease of movement A partition
may consist of n 1 volumes
Volume number
Vnode number
Unique number
  • Volume number
  • to uniquely identify a single volume in the
    system
  • Vnode number
  • to identify a file within a volume ( ? Unix
    inodes)
  • can be reused if old file is deleted
  • Unique number
  • to cater reused Vnode numbers if there is still
    an old Vnode number

112
Example of a System Call (fopen)
Andrew File System
Application requests fopen(filename, ...)?
Venus parses filename. If its a local file,
fopen() is treated in a similar way as in
Unix. However, if it starts with /afs Venus has
to check several things Is the requested file
already in the local file cache (see /cache).
If so, it checks whether this file is
still valid or currently invalid If still valid,
Venus returns the file descriptor to the
application If already invalid Venus compares
the timestamps of the local copy
with the server file and if the local copy is
outdated, Venus sends a request to Vice to
download this file to the local cache If the
file is not the local cache, Venus dens a request
to Vice to download this file to the local
cache.
113
Security within AFS
Andrew File System
In AFS all traffic between clients and servers is
encrypted. Access to directories is controlled
via ACLs. File access is controlled as in Unix
(9 rwx bits for owner, group and others) (for
compatibility reasons to Unix). Newer versions
of AFS use Kerberos authentication systems and
also offer ACLs for file accesses
Remark More on AFS see www.homepages.uel.ac.uk/5
291n/afs.doc.html
114
Coda
Coda File System
  • Evolved from AFS
  • Goal constant data availability
  • Improved replication
  • Replication of read-write volumes
  • Disconnected operation mobility
  • Extension of AFSs whole file caching mechanism
  • Access to shared file repository (servers) versus
    relying on local resources when server not
    available

115
Replication in Coda
Coda File System
  • Replication unit file volume (set of files)
  • Set of replicas of file volume volume storage
    group (VSG)
  • Subset of replicas available to client AVSG
  • Different clients, different AVSGs
  • AVSG membership changes as server availability
    changes
  • On write when file is closed, copies of modified
    file broadcast to AVSG

116
Optimistic Replication
Coda File System
  • Primary goal Availability
  • Replicated files are allowed to be modified even
    in the presence of partitions or during
    disconnected operation

117
Disconnected Operation
Coda File System
  • AVSG
  • Network/server failures or host on the move
  • Rely on local cache to serve all needed files.
  • Loading the cache
  • User intervention list of files to be cached
  • Learning usage patterns over time
  • Upon reconnection, cached copies validated
    against servers files

118
Normal and Disconnected Operation
Coda File System
  • During normal operation
  • Coda behaves like AFS
  • Cache miss transparent to user only performance
    penalty
  • Load balancing across replicas
  • Cost replica consistency cache consistency
  • Disconnected operation
  • No replicas are accessible
  • Cache miss prevents further progress
  • Need to load cache before disconnection

119
Replication and Caching
Coda File System
  • Coda integrates server replication and client
    caching
  • On cache hit and valid data Venus does not need
    to contact the server
  • On cache miss Venus gets data from AVSG server,
    i.e. the preferred server (PS)
  • PS is chosen at random or based upon proximity,
    load
  • Venus also contacts other AVSG servers and
    collects their versions if conflict, abort
    operation if replicas stale, update them off-line

120
Summary Caching
Coda File System
  • Improves performance in terms of
  • response time,
  • availability (disconnected operations), and
  • fault tolerance
  • Price consistency
  • Consistency mechanisms
  • Time stamp- based invalidation
  • Callbacks

121
Leases
Leases
  • Time-based cache consistency protocol
  • Contract between client and server
  • Lease grants holder control over writes to
    corresponding data item during lease term
  • Server must obtain approval from holder of lease
    before modifying data
  • When holder grants approval for write, it
    invalidates its local copy

122
Other Distributed File Systems
Example DSF
  • Plan 9 (Pike et al)
  • xFS (based on Berkeleys LFS)
  • Secure file System (SFS) (Maziere et al)

123
Log-Structured File System
Log-Structured File System
  • Built as extension to Sprite FS (Sprite LFS)
  • New disk storage technique that tries to use
    disks more efficiently
  • Assumes main memory cache for files
  • Larger memory makes cache more efficient in
    satisfying reads
  • Most of the working set is cached
  • Thus, most disk access costs due to writes

124
Main Idea
Log-Structured File System
  • Batch multiple writes in file cache
  • Transform may small writes into 1 large one
  • Close to disks full bandwidth utilization
  • Write to disk in one write in a contiguous region
    of the disk called log
  • Eliminates seeks (i.e. reduces access time)
  • Improves crash recovery
  • Sequential structure of log
  • Only most recent portion of log needs to be
    examined

125
LSFS Structure
Log-Structured File System
  • 2 key functions
  • How to retrieve information from log
  • How to manage free disk space

126
File Location and Retrieval 1
Log-Structured File System
  • Allows random access to information in the log
  • Goal is to match or increase read performance
  • Keeps indexing structures with log
  • Each file has i-node containing
  • File attributes (type, owner, permissions9
  • Disk address of first 10 blocks
  • Files gt 10 block, i-node contains pointer to more
    data

127
File Location and Retrieval 2
Log-Structured File System
  • In Unix FS
  • Fixed mapping between disk address and file
    i-node disk address as function of the file id
  • In LFS
  • I-nodes written to log
  • I-node map keeps current location of each i-node
  • I-node maps usually fit main memory cache

i-nodes disk address
Write a Comment
User Comments (0)
About PowerShow.com