DISTRIBUTED FILE SYSTEM - PowerPoint PPT Presentation

About This Presentation
Title:

DISTRIBUTED FILE SYSTEM

Description:

Computer Science Distributed file system. Cache Consistency -Central control -Central control: ... III.Network file system ... simple, requires storage at client, ... – PowerPoint PPT presentation

Number of Views:330
Avg rating:3.0/5.0
Slides: 69
Provided by: cseHcmut4
Category:

less

Transcript and Presenter's Notes

Title: DISTRIBUTED FILE SYSTEM


1
  • DISTRIBUTED FILE SYSTEM

  • Nhóm báo cáo

  • Lê Tu?n Anh
  • Nguy?n H?i Duy
  • Ð?ng Thanh Linh
  • Tr?n Trung Hi?u 50500892
  • Nguy?n Hoàng Nam

2
  • Content
  • I. Distributed file system design.
  • II. Distributed file system Implementation
  • III. Network file system (NFS)
  • IV. Trends in distributed file system.

3
Whats Distributed File System?
  • Distributed File System (DFS) is a mechanism for
    sharing files
  • DFS is used to make files distributed across
    multiple servers appear to users as if they
    reside in one place on the network
  • DFS provides a mechanism to create logical views
    of folders and files regardless of where those
    files are physically located on the network

4
Whats Distributed File System?(cont.)
5
File Service
  • Specify what the file system offers to its
    clients to manipulate on shared files
  • ex read,writeon files
  • Implemented by a user/kernel process called file
    server
  • A system may have one or several file servers
    running at the same time

6
File Service (cont.)
  • Two models for file services
  • upload/download files move between server and
    clients, few operations (read file write file),
    simple, requires storage at client, good if whole
    file is accessed
  • remote memory access files stay at server, reach
    interface for many operations, less space at
    client, efficient for small accesses

7
File Service (cont.)
8
Directory Service
  • Provide operations for
  • creating and deleting directories
  • naming and renaming files
  • moving files from one directory to another
  • entering, removing, looking up files in one
    directory

9
Naming Transparency
  • Naming is the mapping between logical and
    physical objects.
  • Ex a user filename maps to ltcylinder,sectorgt
  • In a conventional file system, it's understood
    where the file actually resides the system and
    disk are known.
  • In a transparent DFS, the location of a file,
    somewhere in the network, is hidden
  • File replication means multiple copies of a
    file mapping returns a SET of locations for the
    replications.
  •  

10
Naming Transparency(cont.)
  • Location transparency the path name gives no
    hint as to where the file (or other object) is
    located.
  • ex /server1/dir1/x specifies x is located on
    server1 but it does not tell where that server1
    is located -gt server can move the file in the
    network without changing the path
  • Location independence possible to remove one
    file among servers which not change the path
    name.

11
Naming Schemes
  • Machine path naming, such as /machine/path
  • Mounting remote file system onto the local file
    hierarchy
  • A single name space that looks the same on all
    machines

12
Two level naming
  • Symbolic name (external), e.g. prog.c binary
    name (internal), e.g. local i-node number as in
    Unix
  • Directories provide the translation from symbolic
    to binary names
  • Binary name format
  • i-node no cross references among servers
  • (server, i-node) a directory in one server can
    refer to a file on a different server
  • binary_name binary names refer to the original
    file and all of its backups when looking up

13
File Sharing Semantics
  • UNIX semantics total ordering of R/W events
  • easy to achieve in a non-distributed system
  • in a distributed system with one server and
    multiple clients with no caching at client, total
    ordering is also easily achieved since R and W
    are immediately performed at server
  • Session semantics writes are guaranteed to
    become visible only when the file is closed
  • if two or more clients simultaneously write
    one file (last one or non-deterministically)
    replaces the other

14
File Sharing Semantics (cont.)
  • Immutable files create and read file operations
    (no write)
  • writing a file means to create a new one and
    enter it into the directory replacing the
    previous one with the same name atomic
    operations
  • two processes try to replace the same file at
    the same time last copy or nondeterministically
  • what happens if a file is replaced while another
    process is busy reading it
  • Transaction semantics mutual exclusion on file
    accesses either all file operations are
    completed or none is. Good for banking systems

15
II.DFS Implementation
  • File usage
  • Measurements.
  • File Usage Pattern(Observed in a study by
    Satyanarayanan ).
  • System Structure
  • File-server and Directory-server Organization.
  • Special attention to alternative approaches.

16
File usage- Measurements
  • - Static measurements
  • Represent a snapshot of the system at a certain
    instant.
  • Made by examining the disk to see what is on
    it.
  • - Dynamic measurements
  • Modifying the file system to record all
    operations to a log for subsequent analysis

17
File usage- Measurements
  • - Static measurements
  • The distribution of files size.
  • The distribution of file types.
  • The amount of storage occupied by files of
    various types and size.
  • - Dynamic measurements
  • The relative frequency of various operations
  • The number of files open at any moment
  • The amount of sharing that takes place

18
File Usage- Measurement Problems
  • - How typical the observed user population is?
  • Satyanarayanan's measurements were made at a
    university -gt Also apply to industrial research
    lab or office automation project or banking
    system?
  • - Watching out for artifacts of the system being
    measured
  • Ex Distribution of file names in an MS-DOS
    system- File names are never more than 8
    characters( plus an optional three- characters
    extension)
  • - Made on more-or-less traditional UNIX systems.
    Whether or not they can be transferred or
    extrapolated to distributed systems

19
File Usage- File Usage Pattern
  • Observed in a study by Satyanarayanan (1981)
  • - Most files are small (lt 10K)
  • - Reading is much more frequent than writing
  • - Most RW accesses are sequential (random access
    is rare)
  • - Most files have a short lifetime -gt create the
    file on the client
  • - File sharing is unusual -gt caching at client
  • - The average process uses only a few files

20
Server System Structure
  • Are client and server different?
  • - Some system, all machines run the same basic
    software -gt any machine can offer file-service
    to the public- offer names of selected
    directories so that other machines can access
    them.
  • - The other systems, the file server and
    directory server are just user programs-gt run
    client and server software on the same machines
    or no

21
Server System Structure
  • Are client and server different?
  • - The other extreme systems have clients and
    server are on different machine.

22
Server System Structure
  • File directory service combined or not ?
  • - Combine file service and directory service into
    a single server that handles all the directory
    and file calls.
  • - Keep file service and directory service
    separate Directory-server map symbolic name onto
    its binary name.File-server with the binary name
    to read or write the file.

23
Server System Structure
  • Separating File directory service
  • Advantage
  • Produce simpler software
  • Disadvantage
  • Require more communications

24
Server System Structure
Separating File directory service Example
Look-up a/b/c
  • Client sends a symbolic name
  • to the directory-server
  • -gt binary name given by file-server
  • Directory-hierarchy
  • be partitioned among multiple servers
  • 1st directory on sever 1
  • contain an entry a for another directory on
    server 2.- 2nd directory on sever 2
  • contain an entry b for another directory on
    server 3.- 3rd directory on sever 3
  • contain an entry c for a file.- File with its
    binary name.

25
Server System Structure
Separating File directory service Example
Look-up a/b/c
  • Client send a message -gt server 1
  • Server 1 finds a and sees the binary name refers
    to another server -gt (1) tell the client which
    hold b
  • Requires the client to know which server holds
    which directory -gt require more messages.

26
Server System Structure
  • Client send a message -gt server 1
  • Server 1 finds a and sees the binary name refers
    to another server -gt (2) forward the remainder of
    the request to server 2.
  • Efficient
  • Can not use RPC (Remote Procedure Call) because
    the process which the client sends the message to
    is not one that sends the reply

Separating File directory service Example
Look-up a/b/c
27
Server System Structure
  • Separating File directory service
  • Problem
  • Path names look up, especially with multiple
    directory servers can be expensive.
  • Cache directory hints at client to accelerate the
    path name look up directory and hints must be
    kept coherent

28
Server System Structure
  • Another question
  • Whether or not file, directory and other servers
    should keep state information about clients ?
  • - Yes Stateful server.
  • - No Stateless server.

29
Server System Structure
Stateless vs. Stateful
30
Caching
  • Definition A cache is a block of memory for
    temporary storage of data likely to be used
    again.

Cache Memory
Main memory
Index Tag Data
0 2 abc
1 0 xyz
Index Data
0 xyz
1 pdq
2 abc
3 ght
31
Caching
  • There are four potential places to store files,
    or parts of files
  • -The Servers disk.
  • -The Servers main memory.
  • -The Client disk.
  • -The Client s main memory.
  • These different storage locations all have
    different properties .

32
Caching
33
Caching-Store all file in the servers disk.
  • Advantages
  • -Plenty of space.
  • -The file are accessible to all clients .
  • -Have one copy of each file -gtno consistency
    problems arises.
  • Problem
  • -Performance the file must be transferred from
    the servers disk to the servers main memory,and
    then again over the network to the clients main
    memory.

34
Caching files in the server's main memory.
  • Advantages
  • -Eliminates the disk transfer.
  • -Keep its memory and disk copies synchronized
  • Problems
  • -Network transfer still has to be done.
  • -What is the unit the cache manages?(whole files
    or disk blocks ).
  • -What to do when the cache fills up and
    something must be evicted.(one of algorithm
    LRU).

35
Caching at clients disk (if available)
  • -The disk holds more but is slower.
  • - If large amounts of data are being used, a
    client disk cache may be better.
  • - This method isnt used in practice.
  • - In any event, most systems that do client
    caching do it in the client's main memory.

36
Cache in the client's main memory
  • There are three options to decide where to put
    files
  • -Inside each process address space no sharing at
    client, it is effective only if individual
    processes open and close files repeatedly
  • -In the kernel kernel involvement on hits, a
    kernel call is needed in all cases
  • -In a separate user-level cache manager flexible
    and efficient if paging can be controlled from
    user-level

37
Cache in the client's main memory
38
Cache Consistency.
  • -Two clients simultaneously read the same file
    and then both modify it.
  • -Two files are written back to the server, the
    one written last will overwrite the other one.
  • - Client caching has to be thought out fairly
    carefully
  • -There are several ways to solve the consistency
    problem
  • - Write through Delayed write Write on close
  • Centralized control

39
Cache Consistency- Write-through algorithm
  • -When a cache entry (file or block) is modified,
    the new
  • value is kept in the cache, but is also sent
    immediately
  • to the server
  • -gt high traffic, requires cache managers to
    check (modification time) with server before can
    provide cached content to any client

40
Cache Consistency -Delayed write
  • -Delayed write coalesces multiple writes better
    performance but ambiguous semantics .
  • the client just makes a note that a file has
    been updated. Once every 30 seconds or so, all
    the file updates are gathered together and sent
    to the server all at once.
  • entire sequence happens before time to send all
    modified files back to the server

41
Cache Consistency -Write-on-close
  • -Write-on-close implements session semantics,
    write a file back to the server only after it
    has been closed.

42
Cache Consistency -Central control
  • -Central control file server keeps a directory
    of open/cached files at clients -gt Unix
    semantics, but problems with robustness and
    scalability problem also with invalidation
    messages because clients did not solicit them

43
Replication
  • -Multiple copies of selected files.
  • 1. To increase reliability by having independent
    backups of each file.
  • 2. To allow file access to occur even if one
    file server is down. A server crash should not
    bring the entire system down until the server can
    be rebooted.
  • 3. To split the workload over multiple .By
    having files replicated on two or more servers,
    the least heavily loaded one can be used.

44
Replication transparency
  • Replication transparency
  • -explicit file replication programmer controls
    replication
  • -lazy file replication copies made by the server
    in background
  • -use group communication all copies made at the
    same time in the foreground

45
(No Transcript)
46
Replication-Update protocols
  • Updating all replicas using a coordinator works
    but is not robust (if coordinator is down, no
    updates can be performed) gt Voting updates (and
    reads) can be performed if some specified of
    servers agree.
  • Voting Protocol
  • A version (incremented at write) is associated
    with each file
  • To perform a read, a client has to assemble a
    read quorum of Nr servers similarly, a write
    quorum of Nw servers for a write
  • If Nr Nw gt N, then any read quorum will contain
    at least one most recently updated file version
  • For reading, client contacts Nr active servers
    and chooses the file with largest version
  • For writing, client contacts Nw active servers
    asking them to write. Succeeds if they all say
    yes.

47
Replication-Update protocols
  • Nr is usually small (reads are frequent), but Nw
    is usually close to N (want to make sure all
    replicas are updated). Problem with achieving a
    write quorum in the presence of server failures
  • Voting with ghosts allows to establish a write
    quorum when several servers are down by
    temporarily creating dummy (ghost) servers (at
    least one must be real)
  • Ghost servers are not permitted in a read quorum
    (they dont have any files)
  • When server comes back it must restore its copy
    first by obtaining a read quorum

48
III.Network file system (NFS)
  • Three aspects of NFS
  • The architecture
  • The protocol
  • The implementation

49
NFS Architecture
  • Basic idea NFS An arbitrary collection of
    clients and servers.
  • Server export one or more directory for access by
    remote client.
  • List of director is maintained /etc/exports/

50
NFS Architecture
  • Clients access exported directories by mounting
    them.
  • Clients diskless can mount on remote root
    directory and else.
  • To programs running on clients is no difference
    between a file located.
  • So, the basic architectural characteristic NFS is
    server exported directory and clients mount them
    remotely.

51
NFS Protocol
  • The goal of NFS is to support heterogeneous
    system.
  • To accomblishing that must to define two
    client-server protocol.
  • The first NFS protocol handle mounting.
  • The second NFS protocol is for directory and file
    access.

52
NFS Protocol Mounting
  • Clients send the path name to a server and
    request to mount.
  • If legal, server return handle file to client
    else.
  • Handle file contains all information of file and
    directory.
  • Many clients contain /etc/rc to not manual
    intervention.

53
NFS Protocol Automounting
  • Allows a set remote directories to be associated
    with the local directory.
  • First time client sent a message to each of
    server and first one to reply wins.
  • Advantages
  • -If server down, it is possible to bring client
    up.
  • -allowing client to try to a set of servers in
    parallel.
  • Other, automounting most often used for read-only
    file and rarely change.

54
NFS Protocol Accessing
  • Clients send the message to server to manipulate
    and read and write file.
  • Most of UNIX system calls supported NFS exception
    OPEN and CLOSE.
  • To READ, clients send message to server and
    receive file handle.
  • To WRITE, clients only need a file handle, offset
    and the number of file desired.

55
NFS Protocol Accessing
  • Advantages
  • Servers dont remember any information between
    calls to open connection
  • Stateless, not efficient when server crashes and
    recovers
  • In contrast, statefull

56
NFS Protocol Security
  • Problem in stateless, locks cant associated
    with open file
  • NFS uses UNIX protection mechanism with rwx bit
  • Other, use public key cryptography
  • Information about all of keys are maintained by
    NIS (Network Information Services)
  • NISs function is to store (key, value) and
    mapping between user name to password, machine
    name to network address

57
NFS Inplementation
58
NFS Inplementation
  • System call layer
  • This handle calls like OPEN, READ and CLOSE.
  • Virtual file system layer (VFS)
  • Maintain a table with one entry for each open
    file
  • Entry is v-node (virtual, i-node)

59
NFS Inplementation Usage v-node
  • Mount
  • The system administrator Call mount program
  • Make a MOUNT system call
  • Kernel asked NFS client to create r-node (remote,
    i-node) in internal table to hold the file handle
  • V-node point to r-node

60
NFS Inplementation Usage v-node
  • OPEN
  • Kernel base on some point during parsing the
    name.
  • Kernel asked NFS client code to OPEN file
  • NFS client lookup in remain table and report back
    to VFS layer
  • Put in its table a v-node that point to r-node

61
NFS Inplementation Usage v-node
  • READ
  • The caller is given a file descriptor for the
    remote file
  • VFS locates the corresponding v-node
  • Transfers between client and server
  • Make in large chunks, normally 8192 bytes
  • caching

62
IV.Trends In Distributed File Systems
  • Some Problem make changes in File System
  • New Hardware
  • Scalability
  • WAN
  • Mobile Users
  • Fault Tolerance
  • Mulimedia

63
New Hardware
  • Well Designed Hardware can help solve problem

64
Scalability
  • Distributed file systems is toward lager . Old
    algorithm may not work and may cause bottle neck
    problem
  • A general way to solve this problem is partition
    the systems into smaller units which are
    relatively independent

65
WAN
  • Most current work now on distributed systems
    focuses on LAN-based systems but it will be
    interconnected to form transparent distributed
    systems covering countries and continent . So
    what kind of file system would be need to serve
    all the world ?
  • A larger system lead to a large variety encounter
    for example what format one should use for files
    containing floating-pint numbers .

66
Mobile Users
  • Laptop ,pocket pc , smart phone can be found
    every where these days and they are multiplying
    like rabbits . However the connection may not
    good at all .
  • And solution is based on caching.
  • Remote control

67
Fault Tolerance
  • If the a system goes down for an hour there are
    many serious problem so the demand for systems
    that essentially never fail will grow.
  • File replication become an essential requirement .

68
Multimedia
  • Real time conference , video on demand or
    multimedia will need completely different file
    system .
Write a Comment
User Comments (0)
About PowerShow.com