Introduction%20to%20DFS - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction%20to%20DFS

Description:

Introduction to DFS Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed system File ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 14
Provided by: SteveA211
Learn more at: http://cobweb.cs.uga.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction%20to%20DFS


1
Introduction to DFS
2
Distributed File Systems
  • A file system whose clients, servers and storage
    devices are dispersed among the machines of a
    distributed system
  • File system operations have to be carried out
    over the network
  • A good DFS should ensure transparency
  • Clients should have the look and feel of a
    conventional file system

3
Naming and Transparency
  • Mapping between the logical and physical objects
  • Location Transparency Name and physical storage
    location have no relationship
  • Location independence Name and physical storage
    are independent
  • Name need not be changed if physical location is
    changed
  • Location independent files are essentially
    logical data containers
  • Location transparency hides the association b/w
    names and physical storage

4
Naming Schemes
  • Combination of host name and local name
  • Local name is a path similar to Unix
  • Neither transparent nor independent
  • Attaching remote directories to the local
    directory
  • Popularized by Suns NFS
  • Appears as a coherent directory tree
  • Globally unique names
  • Truly transparent
  • Global naming structure spans all names
  • Difficult to achieve due to special files

5
Implementing Naming Schemes
  • Transparent naming requires mapping between names
    and their associated locations
  • Aggregating files into components for scalability
    and manageability
  • Hierarchical directory trees
  • Replication and caching
  • Maintaining consistency of cached view
  • Location independent file identifiers

6
Accessing Remote Files
  • Needs network data transfer
  • Remote service mechanism
  • Remote procedure call
  • Caching for improved performance

7
Caching
  • Idea is fetch once, use multiple times
  • If requested data is not available, get it from
    server
  • Store fetched data
  • Perform access on local data
  • Replace data when cache becomes full
  • One master copy at the server, several secondary
    copies at clients
  • Granularity File blocks to entire file

8
Cache Location
  • Main memory
  • Workstations can be diskless
  • Faster access
  • Technology trends memory accesses becoming faster
  • Server caches will be in main memory code
    reusability
  • Local disks
  • Reliability via persistence
  • Hybrid schemes
  • Best of both worlds

9
Cache Update Policy
  • Policy regarding when the modified data is
    reflected on the master copy
  • Can have significant impact on the performance
  • Write through policy
  • All writes are reflected immediately on the
    master copy
  • Blocking
  • Delayed writes
  • Write on flush
  • Periodic writes
  • Write on close

10
Ensuring consistency
  • Ensuring that data being read is consistent with
    master copy
  • Client initiated approach
  • Clients validates with server whether its data is
    up-to-date
  • Frequency of validation is the main issue
  • Check on first access
  • Check on every access
  • Periodic checking

11
Server Initiated Approaches
  • Server records the files each client is accessing
  • Detects potential inconsistency and notifies
    clients
  • Conflicts occur when at least 2 clients cache and
    one is writing
  • Invalidation/Update based mechanisms
  • Session semantics
  • Consistency enforced upon file closing
  • Unix semantics
  • Consistency enforced upon write

12
Why or Why not Caching
  • Locality of accesses
  • Gains in performance and scalability
  • Big chunks of data lead to lesser overheads
  • Disk accesses can be optimized for larger chunks
    of data
  • Consistency maintenance is the cost
  • Memory/disk space requirements at clients

13
Stateful vs. Stateless Servers
  • Stateful servers maintain information about files
    being accessed by clients
  • Clients are given connection ids, which acts as
    index into inode tables
  • Performance gains Prefetching file blocks
  • Stateless servers maintain no state
  • Each request is self-contained
  • Reliability is the issue !!!
Write a Comment
User Comments (0)
About PowerShow.com