Caching in Distributed File System - PowerPoint PPT Presentation

About This Presentation
Title:

Caching in Distributed File System

Description:

Consistency. Is locally cached copy of data consistent with the master copy? ... NFS not guarantee consistency. NFS is stateless. ... – PowerPoint PPT presentation

Number of Views:262
Avg rating:3.0/5.0
Slides: 57
Provided by: kew28
Category:

less

Transcript and Presenter's Notes

Title: Caching in Distributed File System


1
Caching in
Distributed File System
  • Ke Wang
  • CS614 Advanced System
  • Apr 24, 2001

2
Key requirements of distributed system
  • Scalability from small to large networks
  • Fast and transparent access to geographically
    Distributed File System(DFS)
  • Information protection
  • Ease of administration
  • Wide support from variety of vendors

3
Background
  • DFS -- a distributed implementation of a file
    system, where multiple users share files and
    storage resources.
  • Overall storage space managed by a DFS is
    composed of different, remotely located, smaller
    storage spaces
  • There is usually a correspondence between
    constituent storage spaces and sets of files

4
DFS Structure
  • Service - a software entity providing a
    particular type of function to client
  • Server - service software running on a single
    machine
  • Client - process that can invoke a service using
    a set of operations that forms its client
    interface

5
Why caching?
  • Retaining most recently accessed disk blocks.
  • Repeated accesses to a block in cache can be
    handled without involving the disk.
  • Advantages
  • - Reduce delays
  • - Reduce contention for disk arm

6
Caching in DFS
  • Advantages
  • Reduce network traffic
  • Reduce server contention
  • Problems
  • Cache-consistency

7
Stuff to consider
  • Cache location (disk vs. memory)
  • Cache Placement (client vs. server)
  • Cache structure (block vs. file)
  • Stateful vs. Stateless server
  • Cache update policies
  • Consistency
  • Client-driven vs. Server-driven protocols

8
Practical Distributed System
  • NFS Suns Network File System
  • AFS Andrew File System (CMU)
  • Sprite FS File System for the Sprite OS ( UC
    Berkeley)

9
Suns Network File System(NFS)
10
Suns Network File System(NFS)
  • Originally released in 1985
  • Build on top of an unreliable datagram protocol
    UDP (change to TCP now)
  • Client-server model

11
Andrew File System(AFS)
  • Developed at CMU since 1983
  • Client-server model
  • Key software Vice and Venus
  • Goal high scalability (5,000-10,000 nodes)

12
Andrew File System(AFS)
13
Andrew File System(AFS)
  • VICE is a multi-threaded server process with each
    thread handling a single client request
  • VENUS is the client process that runs on each
    workstation which forms the interface with VICE
  • User-level processes

14
Prototype of AFS
  • One process for one client
  • Client cache file
  • Verify timestamp every open
  • -gt a lot of interaction with server
  • -gt heavy network traffic

15
Improve AFS
  • To improve prototype
  • Reduce cache validity check
  • Reduce server processes
  • Reduce network traffic
  • ? Higher scalability!

16
Sprite File System
  • Designed for networked workstation with large
    physical memories
  • (can be diskless)
  • Expect memory of 100-500Mbytes
  • Goal high performance

17
Caches in Sprite FS
18
Caches in Sprite FS(cont)
  • When a process makes a file access, it is
    presented first to the cache(file traffic). If
    not satisfied, request is passed either to a
    local disk, if the file is stored locally(disk
    traffic), or to the server where the file is
    stored(server traffic). Servers also maintain
    caches to reduce disk traffic.

19
Caching in Sprite FS
  • Two unusual aspects
  • Guarantee complete consistent view
  • Concurrent write sharing
  • Sequential write sharing
  • Cache size varies dynamically

20
Cache LocationDisk vs. Main Memory
  • Advantages of disk caches
  • More Reliable
  • Cached data are still there during recovery and
    dont need to be fetched again

21
Cache LocationDisk vs. Main Memory(cont)
  • Advantages of main-memory caches
  • Permit workstations to be diskless
  • More quick access
  • Server caches(used to speed up disk I/O) are
    always in main memory using main-memory caches
    on the clients permits a single caching mechanism
    for servers and users

22
Cache PlacementClient vs. Server
  • Client cache reduce network traffic
  • Read-only operations on unchanged files do not
    need go over the network
  • Server cache reduce server load
  • Cache is amortized across all clients ( but needs
    to be bigger to be effective)
  • In practice, need BOTH!

23
Cache structure
  • Block basis
  • Simple
  • Sprite FS, NFS
  • File basis
  • Reduce interaction with servers
  • AFS
  • Cannot access files larger than cache

24
Compare
  • NFS client memory(disk), block basis
  • AFS client disk, file basis
  • Sprint FS client memory, server memory, block
    basis

25
Stateful vs. Stateless Server
  • Stateful Servers hold information about the
    client
  • Stateless Servers maintain no state information
    about clients

26
Stateful Servers
  • Mechanism
  • Client opens a file
  • Server fetches information about the file from
    its disk, store in memory, gives client a unique
    connection id and open file
  • id is used for subsequent accesses until the
    session ends

27
Stateful Servers(cont)
  • Advantages
  • Fewer disk access
  • Read-ahead possible
  • RPCs are small, contains only an id
  • File may be cached entirely on client,
    invalidated by the server if there is a
    conflicting write

28
Stateful Servers(cont)
  • Disadvantage
  • Server loses all its volatile state in crash
  • Restore state by dialog with clients, or abort
    operations that underway when crash occurred
  • Server needs to be aware of client failures

29
Stateless Server
  • Each request must be self-contained
  • Each request identifies the file and position in
    the file
  • No need to establish and terminate a connection
    by open and close operations

30
Stateless Server(cont)
  • Advantage
  • A file server crash does not affect clients
  • Simple
  • Disadvantage
  • Impossible to enforce consistency
  • RPC needs to contain all state, longer

31
Stateful vs. Stateless
  • AFS and Sprite FS are stateful
  • Sprite FS servers keep track of which clients
    have which files open
  • AFS servers keep track of the contents of
    clients caches
  • NFS is stateless

32
Cache Update Policy
  • Write-through
  • Delayed-write
  • Write-on-close (variation of delayed-write)

33
Cache Update Policy(cont)
  • Write-through all writes be propagated to
    stable storage immediately
  • Reliable, but poor performance

34
Cache Update Policy(cont)
  • Delayed-write modification written to cache and
    then written through to server later
  • Write-on-close modification written back to
    server when file close
  • Reduces intermediate read and write traffic while
    file is open

35
Cache Update Policy(cont)
  • Pros for delayed-write/write-on-close
  • Lots of files have lifetimes of less than 30s
  • Redundant writes are absorbed
  • Lots of small writes can be batched into larger
    writes
  • Disadvantage
  • Poor reliability unwritten data may be lost when
    client crash

36
Caching in AFS
  • Key to Andrews scalability
  • Client cache entire file in disk
  • Write-on-close
  • Server load and network traffic reduced
  • Contacts server only on open and close
  • Retain across reboots
  • Require local disk, large enough

37
Cache update policy
  • NFS and Sprite delayed-write
  • Delay 30 seconds
  • AFS write-on-close
  • Reduce traffic to server dramatically
  • ? Good scalability of AFS

38
Consistency
  • Is locally cached copy of data consistent with
    the master copy?
  • Is there danger of stale data?
  • Permit concurrent write sharing?

39
SpriteComplete Consistency
  • Concurrent Write Share
  • A file open on multiple clients
  • At least one client write
  • Server detects
  • Require write back to server
  • Invalidate open cache

40
SpriteComplete Consistency
  • Sequential Write Sharing
  • A file modified, closed, opened by others
  • Out-of-date blocks
  • Compare version number with server
  • Current data in others cache
  • Keep track of last writer

41
AFS session semantics
  • Session semantics in AFS
  • Writes to an open file invisible to others
  • Once file closed, changes visible to new opens
    anywhere
  • Other file operations visible immediately
  • Only guarantee sequential consistency

42
Consistency
  • Sprite guarantees complete consistency
  • AFS uses session semantics
  • NFS not guarantee consistency
  • NFS is stateless. All operations involve
    contacting the server if server is unreachable,
    read write cannot work

43
Client-driven vs. Server-driven
  • Client-driven approach
  • Client initiates validity check
  • Server check whether the local data are
    consistent with master copy
  • Server-driven approach
  • Server records files client caches
  • When server detect inconsistency, it must react

44
AFS server-driven
  • Callback (key to scalability)
  • Cache valid if have callback on
  • Server notify before modification
  • When reboot, all suspect
  • reduces cache validation requests to server

45
Client-driven vs. Server-driven
  • AFS is server-driven (callback)
  • Contributes to AFSs scalability
  • Whole file caching and session semantics also
    help
  • NFS and Sprite are client-driven
  • Increased load on network and server

46
AFSEffect on scalability
47
SpriteDynamic cache size
  • Make client cache as large as possible
  • Virtual memory and file system negotiate
  • Compare age of oldest page
  • Two problems
  • Double caching
  • Multiblock pages

48
Why not callback in Sprite?
49
Why not callback in Sprite?
  • Estimated improvement is small
  • Reason
  • Andrew is user-level process
  • Sprite is kernel-level implementation

50
Comparison
51
Performance running time
52
Performance running time
  • Use Andrew benchmark
  • Sprite system is fastest
  • Kernel-to-kernel PRC
  • Delayed write
  • Kernel implementation (AFS is user-level)

53
Performance CPU utilization
54
Performance CPU utilization
  • Use Andrew benchmark
  • Andrew system showed greatest scalability
  • File-based cache
  • Server-driven
  • Use of callback

55
Nomadic Caching
  • New issues
  • If client become disconnected?
  • Weakly connected(by modem)?
  • Violate key property transparency!

56
Nomadic Caching
  • Cache misses may impede progress
  • Local update invisible remotely
  • Update conflict
  • Update vulnerable to loss, damage
  • ? Coda file system
Write a Comment
User Comments (0)
About PowerShow.com