Distributed File Systems - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Distributed File Systems

Description:

same or different view of a directory hierarchy (Fig. 13-3) ... mount remote directories onto local directories: ... provide a single global directory: ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 17
Provided by: insu5
Category:

less

Transcript and Presenter's Notes

Title: Distributed File Systems


1
Distributed File Systems
  • CSE 380
  • Lecture Note 14
  • Insup Lee

2
Remote Files
  • File service vs. file server
  • File service interface the specification of what
    the file system offers to its clients.
  • File server a process that runs on some machine
    and helps implement the file service.
  • File Service Model (Fig 13-1)
  • upload/download model
  • remote access model
  • Comparison between two model
  • The directory service
  • creating and deleting directories
  • naming and renaming files
  • moving files

3
Goals
  • Network transparency uses do not have to aware
    the location of files to access them
  • location transparency the name of a file does
    not reveal any kind of the file's physical
    storage location.
  • /server1/dir1/dir2/X
  • server1 can be moved anywhere (e.g., from CIS to
    SEAS).
  • location independence the name of a file does
    not need to be changed when the file's physical
    storage location changes.
  • The above file X cannot moved to server2 if
    server1 is full and server2 is no so full.
  • High availability system failures or scheduled
    activities such as backups, addition of nodes

4
Architecture
  • Computation model
  • file severs -- machines dedicated to storing
    files and performing storage and retrieval
    operations (for high performance)
  • clients -- machines used for computational
    activities may have a local disk for caching
    remote files
  • Two most important services
  • name server -- maps user specified names to
    stored objects, files and directories
  • cache manager -- to reduce network delay, disk
    delay problem inconsistency
  • Typical data access actions
  • open, close, read, write, etc.

5
Design Issues
  • Naming and name resolution
  • Semantics of file sharing (Fig 13-4, Fig 13-5)
  • Stateless versus stateful servers (Fig 13-8)
  • Caching -- where to store files (Fig 13-9)
  • Cache consistency (Fig 13-11)
  • Replication (Fig 13-12)

6
Naming and Name Resolution
  • a name space -- collection of names
  • name resolution -- mapping a name to an object
  • same or different view of a directory hierarchy
    (Fig. 13-3)
  • 3 traditional ways to name files in a distributed
    environment
  • concatenate the host name to the names of files
    stored on that hostsystem-wide uniqueness
    guaranteed, simple to located a file however,
    not network transparent, not location
    independent, e.g., /machine/usr/foo
  • mount remote directories onto local
    directoriesonce mounted, files can be
    referenced in a location-transparent manner
  • provide a single global directoryrequires a
    unique file name for every file, location
    independent,cannot encompass heterogeneous
    environments and wide geographical areas

7
Semantics of File Sharing
  • Consistency Semantics Problem (Fig 13-4) read
    after write
  • Assume open reads/writes close
  • UNIX semantics value read is the value stored by
    last writeWrites to an open file are visible
    immediately to others that have this file opened
    at the same time. Easy to implement if one
    server and no cache.
  • Session semanticsWrites to an open file by a
    user is not visible immediately by other users
    that have files opened already.Once a file is
    closed, the changes made by it are visible by
    sessions started later.
  • Immutable-Shared-Files semanticsA sharable file
    cannot be modified.File names cannot be reused
    and its contents may not be altered.Simple to
    implement.
  • Transactions All changes have all-or-nothing
    property. W1,R1,R2,W2 not allowed where P1
    W1W2 and P2 R1R2

8
Stateful versus Stateless Service
  • Two approaches to server-side information
  • stateful file server
  • a client performs open on a file
  • the server keeps file information (e.g., file
    descriptor entry, offset)
  • Adv increased performance
  • On server crash, it looses all its volatile state
    information
  • On client crash, the server needs to know to
    claim state space
  • stateless file server -- each request is
    self-contained
  • each request identifies the file, the position,
    read/write.
  • server failure is identical to slow server
    (client retries...)
  • each request must be idempotent.
  • NFS employs this.

9
Caching
  • Four places to store files (Fig. 13-9)
  • servers disk slow performance
  • server caching in main memory
  • cache management issue, how much to cache,
    replacement strategy
  • still slow due to network delay
  • Used in high-performance web-search engine
    servers
  • client caching in main memory
  • can be used by diskless workstation
  • faster to access from main memory than disk
  • compete with the virtual memory system for
    physical memory space
  • Three options (Fig. 13-10)
  • client-cache on a local disk
  • large files can be cached
  • the virtual memory management is simpler
  • a workstation can function even when it is
    disconnected from the network

10
A Comparison of Caching and Remote Service
  • reduces remote accesses (esp, when locality is
    capitalized) Þ reduces network traffic and server
    load
  • total network overhead is lower for big chunks of
    data (caching) than a series of responses to
    specific requests.
  • disk access can be optimized better for large
    requests than random disk blocks
  • cache-consistency problem is the major drawback.
    If there are frequent writes, overhead due to the
    consistency problem is significant.
  • OS is simpler for remote service.

11
Cache Consistency
  • Reflecting changes to local cache to master copy
  • Reflecting changes to master copy to local caches

write
Copy 1
Master copy
update
Copy 2
12
Update algorithms for client caching
  • write-through all writes are carried out
    immediately
  • Reliable little information is lost in the event
    of a client crash
  • Slow cache not that useful
  • delayed-write delays writing at the server
  • possible to perform many writes to a block in the
    cache before it is written
  • if data is written and then deleted immediately,
    data need not be written at all (20-30 of new
    data is deleted with 30 secs)
  • write-on-close delay writing until the file is
    closed at the client
  • if file is open for short duration, works fine
  • if file is open for long, susceptible to losing
    data in the event of client crash

13
Cache Coherence
  • How to maintain consistency between locally
    cached data with the master data when the data
    has been modified by another client?
  • Client-initiated approach -- check validity
    onevery access too much overheadfirst access
    to a file (e.g., file open)every fixed time
    interval
  • Server-initiated approach -- server records, for
    each client, the (parts of) files it
    caches.After the server detects a potential
    inconsistency, it reacts.
  • Not allow caching when concurrent-write sharing
    occurs. Allow many readers. If a client opens
    for writing, inform all the clients to purge
    their cached data.

14
Cache consistency, cont.
  • Potential inconsistency
  • In session semantics, a client closes a modified
    file.
  • In UNIX semantics, the server must be notified
    whenever a file is opened and the intended mode
    (read or write mode) must be indicated for every
    open.
  • Disable cache when a file is opened in
    conflicting modes.

15
Replication
  • Reasons
  • Increase reliability
  • improve availability
  • balance the servers workload
  • how to make replication transparent (Fig. 13-12)
  • how to keep the replicas consistent
  • Problems -- mainly with updates
  • a replica is not updated due to its server
    failure
  • network partitioned
  • Replication Management
  • weighted vote for read and write
  • current synchronization site for each file group
    to control access

16
Current research issues
  • Scalability
  • Mobile Users
  • disconnected operation
  • low bandwidth communication
  • Security
Write a Comment
User Comments (0)
About PowerShow.com