Caching in Distributed File System - PowerPoint PPT Presentation

About This Presentation

Title:

Caching in Distributed File System

Description:

Consistency. Is locally cached copy of data consistent with the master copy? ... NFS not guarantee consistency. NFS is stateless. ... – PowerPoint PPT presentation

Number of Views:262

Avg rating:3.0/5.0

Slides: 57

Provided by: kew28

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Caching in Distributed File System

1
Caching in
Distributed File System

Ke Wang
CS614 Advanced System
Apr 24, 2001

2
Key requirements of distributed system

Scalability from small to large networks
Fast and transparent access to geographically
Distributed File System(DFS)
Information protection
Ease of administration
Wide support from variety of vendors

3
Background

DFS -- a distributed implementation of a file
system, where multiple users share files and
storage resources.
Overall storage space managed by a DFS is
composed of different, remotely located, smaller
storage spaces
There is usually a correspondence between
constituent storage spaces and sets of files

4
DFS Structure

Service - a software entity providing a
particular type of function to client
Server - service software running on a single
machine
Client - process that can invoke a service using
a set of operations that forms its client
interface

5
Why caching?

Retaining most recently accessed disk blocks.
Repeated accesses to a block in cache can be
handled without involving the disk.
Advantages
- Reduce delays
- Reduce contention for disk arm

6
Caching in DFS

Advantages
Reduce network traffic
Reduce server contention
Problems
Cache-consistency

7
Stuff to consider

Cache location (disk vs. memory)
Cache Placement (client vs. server)
Cache structure (block vs. file)
Stateful vs. Stateless server
Cache update policies
Consistency
Client-driven vs. Server-driven protocols

8
Practical Distributed System

NFS Suns Network File System
AFS Andrew File System (CMU)
Sprite FS File System for the Sprite OS ( UC
Berkeley)

9
Suns Network File System(NFS)
10
Suns Network File System(NFS)

Originally released in 1985
Build on top of an unreliable datagram protocol
UDP (change to TCP now)
Client-server model

11
Andrew File System(AFS)

Developed at CMU since 1983
Client-server model
Key software Vice and Venus
Goal high scalability (5,000-10,000 nodes)

12
Andrew File System(AFS)
13
Andrew File System(AFS)

VICE is a multi-threaded server process with each
thread handling a single client request
VENUS is the client process that runs on each
workstation which forms the interface with VICE
User-level processes

14
Prototype of AFS

One process for one client
Client cache file
Verify timestamp every open
-gt a lot of interaction with server
-gt heavy network traffic

15
Improve AFS

To improve prototype
Reduce cache validity check
Reduce server processes
Reduce network traffic
? Higher scalability!

16
Sprite File System

Designed for networked workstation with large
physical memories
(can be diskless)
Expect memory of 100-500Mbytes
Goal high performance

17
Caches in Sprite FS
18
Caches in Sprite FS(cont)

When a process makes a file access, it is
presented first to the cache(file traffic). If
not satisfied, request is passed either to a
local disk, if the file is stored locally(disk
traffic), or to the server where the file is
stored(server traffic). Servers also maintain
caches to reduce disk traffic.

19
Caching in Sprite FS

Two unusual aspects
Guarantee complete consistent view
Concurrent write sharing
Sequential write sharing
Cache size varies dynamically

20
Cache LocationDisk vs. Main Memory

Advantages of disk caches
More Reliable
Cached data are still there during recovery and
dont need to be fetched again

21
Cache LocationDisk vs. Main Memory(cont)

Advantages of main-memory caches
Permit workstations to be diskless
More quick access
Server caches(used to speed up disk I/O) are
always in main memory using main-memory caches
on the clients permits a single caching mechanism
for servers and users

22
Cache PlacementClient vs. Server

Client cache reduce network traffic
Read-only operations on unchanged files do not
need go over the network
Server cache reduce server load
Cache is amortized across all clients ( but needs
to be bigger to be effective)
In practice, need BOTH!

23
Cache structure

Block basis
Simple
Sprite FS, NFS
File basis
Reduce interaction with servers
AFS
Cannot access files larger than cache

24
Compare

NFS client memory(disk), block basis
AFS client disk, file basis
Sprint FS client memory, server memory, block
basis

25
Stateful vs. Stateless Server

Stateful Servers hold information about the
client
Stateless Servers maintain no state information
about clients

26
Stateful Servers

Mechanism
Client opens a file
Server fetches information about the file from
its disk, store in memory, gives client a unique
connection id and open file
id is used for subsequent accesses until the
session ends

27
Stateful Servers(cont)

Advantages
Fewer disk access
Read-ahead possible
RPCs are small, contains only an id
File may be cached entirely on client,
invalidated by the server if there is a
conflicting write

28
Stateful Servers(cont)

Disadvantage
Server loses all its volatile state in crash
Restore state by dialog with clients, or abort
operations that underway when crash occurred
Server needs to be aware of client failures

29
Stateless Server

Each request must be self-contained
Each request identifies the file and position in
the file
No need to establish and terminate a connection
by open and close operations

30
Stateless Server(cont)

Advantage
A file server crash does not affect clients
Simple
Disadvantage
Impossible to enforce consistency
RPC needs to contain all state, longer

31
Stateful vs. Stateless

AFS and Sprite FS are stateful
Sprite FS servers keep track of which clients
have which files open
AFS servers keep track of the contents of
clients caches
NFS is stateless

32
Cache Update Policy

Write-through
Delayed-write
Write-on-close (variation of delayed-write)

33
Cache Update Policy(cont)

Write-through all writes be propagated to
stable storage immediately
Reliable, but poor performance

34
Cache Update Policy(cont)

Delayed-write modification written to cache and
then written through to server later
Write-on-close modification written back to
server when file close
Reduces intermediate read and write traffic while
file is open

35
Cache Update Policy(cont)

Pros for delayed-write/write-on-close
Lots of files have lifetimes of less than 30s
Redundant writes are absorbed
Lots of small writes can be batched into larger
writes
Disadvantage
Poor reliability unwritten data may be lost when
client crash

36
Caching in AFS

Key to Andrews scalability
Client cache entire file in disk
Write-on-close
Server load and network traffic reduced
Contacts server only on open and close
Retain across reboots
Require local disk, large enough

37
Cache update policy

NFS and Sprite delayed-write
Delay 30 seconds
AFS write-on-close
Reduce traffic to server dramatically
? Good scalability of AFS

38
Consistency

Is locally cached copy of data consistent with
the master copy?
Is there danger of stale data?
Permit concurrent write sharing?

39
SpriteComplete Consistency

Concurrent Write Share
A file open on multiple clients
At least one client write
Server detects
Require write back to server
Invalidate open cache

40
SpriteComplete Consistency

Sequential Write Sharing
A file modified, closed, opened by others
Out-of-date blocks
Compare version number with server
Current data in others cache
Keep track of last writer

41
AFS session semantics

Session semantics in AFS
Writes to an open file invisible to others
Once file closed, changes visible to new opens
anywhere
Other file operations visible immediately
Only guarantee sequential consistency

42
Consistency

Sprite guarantees complete consistency
AFS uses session semantics
NFS not guarantee consistency
NFS is stateless. All operations involve
contacting the server if server is unreachable,
read write cannot work

43
Client-driven vs. Server-driven