Scale and Performance in a Distributed File System - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Scale and Performance in a Distributed File System

Description:

Callback. the server promises to notify it before allowing a modification ... Each should maintain callback state information (Restricted) ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 30
Provided by: jon106
Category:

less

Transcript and Presenter's Notes

Title: Scale and Performance in a Distributed File System


1
Scale and Performance in a Distributed File System
  • J. H. HOWARD, M. L. KAZAR, S. G. MENEES, D. A.
    NICHOLS
  • M. SATYANARAYANAN, R. N. SIDEBOTHAM, and M. J.
    WEST
  • Carnegie Mellon University
  • ACM Transactions cm Computer Systems, 1988
  • Presented by Jongheum Yeon, 2009. 10. 14

2
Outline
  • Motivation
  • Andrew File System
  • Prototype
  • Improvements to the Prototype
  • Comparison with Remote Open
  • Operability of Improved System
  • Conclusion

3
Andrew File System(AFS) History
  • 1980 task force at CMU
  • August 12, 1981 IBM launched the PC ? Sneaker
    Net
  • 1982 Carnegie Mellon IBM ? Information
    Technology Center (ITC) , personal computing
  • 1983, Multi-platform network operating system
    (NOS), Novell NetWare
  • August 1983, 4.2BSD release with TCP/IP
  • 1989 founded Transarc ? commercial product
  • 1994 IBM acquired Transarc
  • 16 Aug 2000 IBM announces Open AFS
  • July 15, 2005 IBM withdrawn marketing effort
  • Researches are still going on (280 references)
    Coda, GFS, IVY,

4
Motivation
  • DFS Architectural Challenges
  • Fault tolerant
  • Highly available
  • Recoverable
  • Consistent
  • Scalable
  • Predictable performance
  • Secure
  • To build a distributed, scalable file system
  • The file system should be able to scale to serve
    a large number of users with out too much
    degradation of performance
  • Should support simplified security model
  • Should simplify system administration

5
Andrew File System
  • Developed at Carnegie Mellon University
  • Distributed file system by considerations of
    scale
  • Locality of file references
  • Present a homogeneous, location-transparent file
    name space to all the client workstations
  • Use 4.2BSD
  • Server
  • A set of trusted servers - Vice
  • Clients
  • User level processes Venus
  • File system call hooking
  • Contact with servers only on opens and closes for
    a whole-file transfer
  • Caches files from Vice
  • Store modified copies of files back on the servers

6
System Overview
7
Operation Flow
  • Open/Read/Write/Close
  • Workstation1 does not have readme.txt in cache,
    Workstation2 has it.

8
Prototype
  • Venus on the client with a dedicated
  • Persistent process on the server
  • Dedicated lock server process
  • Each server stored the directory hierarchy
  • Mirroring the structure of the Vice files
  • .admin directory Vice file status info (e.g.
    access list)
  • Stub directory location database
  • Vice-Venus interface by their full pathname
  • Theres no notion of a low-level name such as
    inode
  • Before using a cached file, Venus verifies its
    timestamp
  • Each open of a file thus resulted in at least one
    interaction with a server, even if the file were
    already in the cache and up to date

9
Qualitative Observations of the Prototype
  • stat primitive
  • To test for the presence of files
  • To obtain status information before opening files
  • Each stat call involved a cache validity check
  • Increase total running time and the load on
    servers
  • Dedicated Process
  • Virtue of simplicity / Robust system
  • Excessive context switching overhead
  • Critical resource limits excess
  • High virtual memory paging demands

10
Qualitative Observations of the Prototype(contd)
  • Remote Procedure Call (RPC)
  • Simplification of implementation
  • Network related resources in the kernel to be
    exceeded
  • Location Database
  • Difficult to move users directories between
    servers
  • etc.
  • Use Vice file without recompilation or relinking

11
Limitation of the Prototype
  • Too much stat call degraded performance
  • This was solved by reducing the cache lookup
  • Sever side overload due to too many processes
  • Network resources in the kernel frequently
    exhausted
  • Since location information was stored in each
    server, moving files across servers became
    difficult.
  • Was not possible to implement Disk Quotas

12
Benchmark of the Prototype
  • Benchmark
  • Command scripts that operates on a collection of
    files
  • 70files (source code of an application program)
  • 200kb
  • Stand-alone Benchmark and 5 phases

13
Benchmark of the Prototype (contd)
  • Skewed distribution of Vice calls
  • TestAuth
  • Validate cache entries
  • GetFileStat
  • Obtain status information about files absent from
    the cache

14
Benchmark of the Prototype (contd)
  • 510 maximum server load
  • Load unit
  • Load placed on a server by a single client
    workstation running this benchmark
  • A load unit
  • 5 Andrew users

15
Benchmark of the Prototype (contd)
  • CPU/disk utilization profiling
  • Performance bottleneck is CPU
  • Frequently context switches
  • The time spent by the servers in traversing full
    pathnames

16
Improvements to the Prototype
  • Cache management
  • Previous Cache Management
  • Status(in virtual memory)/Data(in local disk)
    cache
  • Interception only opening/closing operations
  • Modifications to a cached files are reflected
    back to Vice when the file is closed
  • Callback
  • the server promises to notify it before allowing
    a modification
  • This reduces cache validation traffic
  • Each should maintain callback state information
    (Restricted)
  • There is a potential for inconsistency

17
Improvements to the Prototype (contd)
  • Name resolution
  • Previous Name Resolution
  • inode - unique, fixed-length
  • pathname one or more, variable-length
  • namei routine maps a pathname to an inode
  • CPU overhead on the servers
  • Each Vice pathname involves implicit namei
    operation.
  • fid unique, fixed-length
  • Map a component of a pathname to a fid
  • Each 32 bit- Volume number, Vnode number,
    Uniquifuier
  • Volume number Identifying a Volume on one
    server
  • Vnode number Index into an file storage info.
    Array
  • Uniquifuier Allowing Reuse of Vnode number
  • Moving files does not invalidate the contents of
    directories cached on workstation

18
Improvements to the Prototype (contd)
  • Communication and server process structure
  • Using LWP instead of a single process
  • An LWP is bound to a particular client only for
    the duration of a single server operation.
  • Using RPC mechanism
  • Low-level storage representation
  • Access files by their inodes
  • vnode on the servers
  • inode on the clients

19
Improvements to the Prototype (contd)
  • Consistency Semantics
  • No dirty read writes to an open file by a
    process are private to the workstation
  • Commit on closed
  • changes are now visible to new opens, open
    instances do not see the changes
  • Other file operation
  • visible immediately
  • No implicit locking
  • application have to cooperate and manage it

20
Improved System Overview
  • Server
  • VICE (Vast Integrated Computing Environment file
    system )
  • Client
  • Venus ? VIRTUE (Virtue is Reached Through Unix
    and Emacs)

21
Improved System Overview (contd)
  • Case of remote file access
  • Access pathname P on workstation
  • Kernel(workstation) detects P is Vice file
  • Kernel passes it to Venus
  • LWP uses the cache to examine each directory
    component D of P
  • If D is in the cache and has a callback on it, it
    is used without any network communication
  • D is in the cache but has no callback on it, the
    appropriate server is contacted, a new copy of D
    is fetched if it has been updated, and a callback
    is established on it
  • D is not in the cache, it is fetched from the
    appropriate server, and a callback is established
    on it

22
Performance of Improved System
  • Scalability
  • 19 slower than stand-alone workstation
  • Prototype is 70 slower

23
Performance of Improved System (contd)
  • Scalability

24
Performance of Improved System (contd)
  • General Observations

25
Comparison with Remote Open
  • Remote Open
  • The data in a file are not fetched en mass
  • Instead the remote site potentially participates
    in each individual read and write operation
  • File is actually opened on the remote site rather
    than the local site
  • NFS

26
Comparison with Remote Open (contd)
27
Comparison with Remote Open (contd)
  • Advantage of remote-open file system
  • Low latency

28
Operability of Improved System
  • Volumes
  • Volume Movement
  • Quotas
  • Read-Only Replication
  • Backup

29
Conclusion
  • AFS Local File, NFS Remote File
  • Having an combined approach to achieve best of
    both world
  • The first access to the file will be remote
    access
  • The file will then downloaded on a low priority
  • Partial download of the file, the server need not
    know how much file is downloaded by the client
  • Subsequent operation can work on the local file
  • Transfer only Changes back to the server
Write a Comment
User Comments (0)
About PowerShow.com