Distributed File Systems - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Distributed File Systems

Description:

Introduction advantages of distributed systems. 15. Structures network types, design ... Updates are made to the primary copy, others are invalid (e.g. Ibis) 28 ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 46
Provided by: sinc150
Category:

less

Transcript and Presenter's Notes

Title: Distributed File Systems


1
Distributed File Systems
  • Chapter 16

2
Distributed Systems
  • Introduction advantages of distributed systems
  • 15. Structures network types, design
  • 16. File Systems naming, cache updating
  • 17. Coordination event ordering, mutual
    exclusion

3
Multi-CPU Systems
M
M
M
CM
CM
C
C
C
C
C
C
C
C
C
Shared memory
C
M
Inter- connect
C
C
C
M
C
C
C
C
C
C
C
CM
CM
CM
M
M
M
Shared-memory multiprocessor
Wide-area distributed system
Message-passing multicomputer
Tanenbaum, Modern Operating Systems, 2nd Ed., p.
505
4
Examples of Multi-CPU Systems
  • Multiprocessors quad CPU PC
  • Multicomputer 512 nodes in a room working on
    pharmaceutical modelling
  • Distributed System Thousands of machines
    loosely cooperating over the Internet

Tanenbaum, p. 549
5
Types of Multi-CPU Systems
6
Interconnect Topologies
Grid
Single switch
Ring
Hypercube
Cube
Smallest diameter Most links
Double Torus
Tanenbaum, Modern Operating Systems, 2nd Ed., p.
528
7
Chapter 16 Distributed-File Systems
  • Background
  • Naming and Transparency
  • Remote File Access
  • Stateful versus Stateless Service
  • File Replication
  • Example Systems

8
Background
  • Distributed file system (DFS) a distributed
    implementation of the classical time-sharing
    model of a file system, where multiple users
    share files and storage resources.
  • A DFS manages set of dispersed storage devices
  • Overall storage space is composed of different,
    remotely located, smaller storage spaces.
  • A component unit is the smallest set of files
    that can be stored on a single machine,
    independently of other units.
  • There is usually a correspondence between
    constituent storage spaces and sets of files.

9
DFS Parts
  • Service software entity running on one or more
    machines and providing a particular type of
    function to a priori unknown clients.
  • Server service software running on a single
    machine.
  • Client process that can invoke a service using
    a set of operations that forms its client
    interface.
  • So a file system provides file services to
    clients.

10
DFS Features
  • Client Interface
  • A set of primitive file operations (create,
    delete, read, write).
  • Transparency
  • Local and remote files are indistinguishable
  • The multiplicity of its servers and storage
    devices should appear invisible.
  • Response time is ideally comparable to that of a
    local file system

11
DFS Implementation
  • Various Implementations
  • Part of a distributed operating system, or
  • A software layer managing communication between
    conventional Operating Systems

Tanenbaum, p. 551
12
Naming and Transparency
  • Naming mapping between logical and physical
    objects.
  • Multilevel mapping abstraction of a file that
    hides the details of how and where on the disk
    the file is actually stored.
  • A transparent DFS hides the location in the
    network where the file is stored.
  • A file can be replicated in several sites,
  • the mapping returns a set of the locations of
    this files replicas
  • both the existence of multiple copies and their
    location are hidden.

13
Naming Structures
  • Location transparency file name does not
    reveal the files physical storage location.
  • e.g. /server1/dir/dir2/x says that file is
    located on server1, but it does not tell where
    that server is located
  • File name still denotes a specific, although
    hidden, set of physical disk blocks.
  • Convenient way to share data.
  • Can expose correspondence between component units
    and machines.
  • However, if file x is large, the system might
    like to move x from server1 to server2, but the
    path name would change from /server1/dir/dir2/x
    to /server2/dir/dir2/x

14
Naming Structures
  • Location independence file name does not need
    to be changed when the files physical storage
    location changes.
  • Better file abstraction.
  • Promotes sharing the storage space itself.
  • Separates the naming hierarchy from the
    storage-devices hierarchy, allowing file
    migration
  • Difficult to achieve, few experimental examples
    only
  • (e.g. Andrews File System)
  • Even remote mounting will not achieve location
    independence, since it is not normally possible
    to move a file from one file group (the unit of
    mounting) to another, and still be able to use
    the old path name.

15
Naming Schemes Three Main Approaches
  • Combination names
  • Files named by combination of their host name and
    local name
  • Guarantees a unique systemwide name.
  • e.g. hostlocal-name
  • Mounting file systems
  • Attach remote directories to local directories,
    giving the appearance of a coherent directory
    tree
  • Automount allows mounts to be done on demand
  • Global name structure
  • Total integration of the component file systems.
  • Spans all the files in the system.
  • Location-independent file identifiers link files
    to component units

16
Types of Middleware
  • Document-based
  • Each page has a unique address
  • Hyperlinks within each page point to other pages
  • File System based
  • Distributed system looks like a local file
    system
  • Shared Object based
  • All items are objects, bundled with access
    procedures called methods
  • Coordination-based
  • The network appears as a large, shared memory

17
Document-based Middleware
  • Make a distributed system look like a giant
    collection of hyperlinked documents
  • E.g. hyperlinks on web pages.
  • Steps in accessing web page
  • http//www.acm.org/dl/faq.html
  • Browser asks DNS for IP address of www.acm.org
  • DNS replies with 199.222.69.151
  • Browser connects by TCP to Port 80 of
    199.222.69.151
  • Browser requests file dl/faq.html
  • TCP connection is released
  • Browser displays all text in dl/faq.html
  • Browser fetches and displays all images in
    dl/faq.html

18
File-system based Middleware
  • Make a distributed system look like a great big
    file system
  • Single global file system, with users all over
    the world able to read and write files for which
    they have authorization

Client
Server
Client
Server
Old file
New file
Upload/download model e.g. AFS
Remote access model e.g. NFS
19
Remote File Access
  • Reduce network traffic by retaining recently
    accessed disk blocks in a cache, so that repeated
    accesses to the same information can be handled
    locally.
  • If needed data not already cached, a copy of data
    is brought from the server to the user.
  • Accesses are performed on the cached copy.
  • Files identified with one master copy residing at
    the server machine, but copies of (parts of) the
    file are scattered in different caches.
  • Cache-consistency problem keeping the cached
    copies consistent with the master file.
  • Network virtual memory, with backing store at a
    remote server

20
Network Cache Location
  • Disk Cache
  • More reliable, survive crashes.
  • Main-Memory Cache
  • Permit workstations to be diskless.
  • Data can be accessed more quickly.
  • Technology trend to bigger, less expensive
    memories.
  • Server caches (used to speed up disk I/O) are in
    main memory regardless of where user caches are
    located using main-memory caches on the user
    machine permits a single caching mechanism for
    servers and users.
  • e.g. NFS has memory caching, optional disk cache

21
Cache Update Policy
  • Write-through policy write data through to disk
    as soon as they are placed on any cache.
    Reliable, but poor performance.
  • Delayed-write policy modifications written to
    the cache and then written through to the server
    later.
  • Fast write accesses complete quickly.
  • Less reliable unwritten data lost whenever a
    user machine crashes.
  • Update on flush from cache
  • But flushes happen at irregular intervals
  • Update on regular scan
  • Scan cache, flush blocks that have been modified
    since the last scan (NFS).
  • Write-on-close write data back to the server
    when the file is closed (AFS).
  • Best for files that are open for long periods
    and frequently modified.

22
Consistency
  • Is locally cached copy of the data consistent
    with the master copy? How to verify validity of
    cached data?
  • Client-initiated approach
  • Client initiates a validity check.
  • Server checks whether the local data are
    consistent with the master copy.
  • Check before every access, or timed checks
  • Server-initiated approach
  • Server records, for each client, the (parts of)
    files it caches.
  • When server detects a potential inconsistency, it
    reacts
  • e.g. When same file is open for read and write
    on different clients

23
Caching and Remote Service
  • Caching
  • Faster, especially with locality in file
    accessing
  • Servers contacted only occasionally (rather than
    for each access).
  • Reduced server load and network traffic
  • Enhanced potential for scalability.
  • Lower network overhead, as data is transmitted in
    bigger chunks
  • Remote server method
  • Useful for diskless machines
  • Avoids cache-consistency problem
  • Inter-machine interface mirrors the local
    user-file-system interface

24
Stateful File Service
  • Mechanism.
  • Client opens a file.
  • Server fetches information about the file from
    its disk, stores it in its memory, and gives the
    client a connection identifier unique to the
    client and the open file.
  • Identifier is used for subsequent accesses until
    the session ends.
  • Server must reclaim the main-memory space used by
    clients who are no longer active.
  • Increased performance.
  • Fewer disk accesses.
  • Stateful server knows if a file was opened for
    sequential access and can thus read ahead the
    next blocks.

25
Stateless File Server
  • Mechanism
  • Each request self-contained.
  • No state information retained between requests.
  • Each request identifies the file and position in
    the file.
  • File open and close are local to the client
  • Design implications
  • Reliable, survives server crashes
  • Slower, with longer request messages
  • System-wide file names needed, to avoid name
    translation
  • Idempotent File requests should leave server
    unchanged

26
Recovery from Failures
  • Stateful server
  • Server failure loses its volatile state
  • Restore state by recovery protocol in dialog with
    clients, or
  • Abort operations that were underway when the
    crash occurred.
  • Client failure
  • Server needs to be aware of client failures in
    order to reclaim space allocated to record the
    state of crashed client processes (orphan
    detection and elimination).
  • Stateless server
  • Server failure and recovery almost unnoticeable.
  • Newly refreshed server can respond to a
    self-contained request without any difficulty.

27
File Replication
  • Replicas of the same file on failure-independent
    machines.
  • Improves availability, shortens service time.
  • Replicated file name mapped to a particular
    replica.
  • Existence of replicas should be invisible to
    higher levels.
  • Replicas distinguished from one another by
    different lower-level names.
  • Updates replicas of a file denote the same
    logical entity
  • thus an update to any replica must be reflected
    in all other replicas e.g. Locus OS.
  • Demand replication reading a nonlocal replica
    causes it to be cached locally, thereby
    generating a new nonprimary replica.
  • Updates are made to the primary copy, others are
    invalid (e.g. Ibis)

28
Andrew Distributed Computing Environment
  • History
  • under development since 1983 at Carnegie-Mellon
    University.
  • Name honours Andrew Carnegie and Andrew Mellon
  • Highly scalable
  • the system is targeted to span over 5000
    workstations.
  • Distinguishes between client machines
    (workstations) and dedicated server machines.
  • Servers and clients run slightly modified UNIX
  • Workstation LAN clusters interconnected by a WAN.

29
Andrew File System (AFS)
  • Clients are presented with a partitioned space of
    file names a local name space and a shared name
    space.
  • Dedicated servers, called Vice, present the
    shared name space to the clients as a
    homogeneous, identical, and location transparent
    file hierarchy.
  • The local name space is the root file system of a
    workstation, from which the shared name space
    descends.
  • Workstations run the Virtue protocol to
    communicate with Vice, and are required to have
    local disks where they store their local name
    space.
  • Servers collectively are responsible for the
    storage and management of the shared name space.

30
AFS File Operations
  • Andrew caches entire files from servers.
  • A workstation interacts with Vice servers only
    during opening and closing of files.
  • Venus runs locally in the kernel on each
    workstation
  • Caches files from Vice when they are opened,
  • Stores modified copies of files back when they
    are closed.
  • Caches contents of directories and symbolic
    links, for path-name translation
  • Reading and writing bytes of a file
  • Done by the kernel without Venus intervention on
    the cached copy.

31
Types of Middleware
  • Document-based (e.g. Web)
  • Each page has a unique address
  • Hyperlinks within each page point to other pages
  • File System based (e.g NFS, AFS)
  • Distributed system looks like a local file
    system
  • Shared Object based (e.g. CORBA, Globe)
  • All items are objects, bundled with access
    procedures called methods
  • Coordination-based (e.g. Linda, Jini)
  • The network appears as a large, shared memory

32
Shared Object based Middleware
  • Objects
  • Everything is an object, a collection of
    variables bundled with access procedures called
    methods
  • Processes invoke methods to access the variables
  • Common Object Request Broker Architecture (CORBA)
  • Client processes on client machines can invoke
    operations on objects on (possibly) remote server
    machines
  • To match objects from different machines, Object
    Request Brokers (ORBs) are interposed between
    client and server to allow them to match up
  • Interface Definition Language (IDL)
  • Tells what methods the object exports,
  • Tells what parameter types each object expects

33
CORBA Model
Server code
Client stub
Skeleton
Client code
server
client
Object adapter
IIOP
Tanenbaum, p. 567
34
CORBA
  • Allows different client and server applications
    to communicate
  • e.g. a C program can use CORBA to access a
    COBOL database
  • ORB (Object Request Broker)
  • implements the interface specified by the IDL
  • ORB is on both client and server side
  • IIOP (Internet InterORB Protocol)
  • specifies how ORBs can communicate
  • Stub Client-side library of IDL object specs
  • Skeleton Server-side procedure for IDL-specd
    object
  • Object adapter
  • wrapper that registers object,
  • generates object references,
  • activates the object

35
Remote Method Invocation
  • Procedure
  • Process creates CORBA object, receives its
    reference
  • Reference is available to be passed to other
    processes, or stored in an object database for
    lookup
  • Client process acquires a reference to the object
  • Client process marshals required parameters into
    a parcel
  • Client process contacts client ORB
  • Client ORB sends the parcel to the server ORB
  • Server ORB arranges for invocation of method on
    the object

36
Globe System
  • Scope
  • Scales to 1 billion users and 1 trillion objects
  • e.g. stock prices, sports scores
  • Method
  • Replicate object, spread load over replicas
  • Every globe object has a class object with its
    methods
  • The object interface is a table of pointers, each
    a ltmethod pointer, state pointergt pair
  • State pointers can point to interfaces such as
    mailboxes, each with its own language or function
  • e.g. business mail, personal mail
  • e.g. languages such as C, C, Java, assembly

37
Globe Object
Class object contains the method
State of Mailbox 2
State of Mailbox 1
Interface used to access Mailbox 2
Interface used to access Mailbox 1
38
Accessing a Globe Object
  • Reading
  • Process looks it up, finds a contact address (e.g
    IP, port)
  • Security check, then object binding
  • Class object (code) loaded into callers address
    space
  • Instantiate a copy of its state
  • Process receives a pointer to its standard
    interface
  • Process invokes methods using the interface
    pointer
  • Writing
  • According to object replication policy
  • Obtain a sequence number from the sequencer
  • Multicast a message containing the sequence
    number, operation name and parameters to all
    other processes bound to the object
  • Apply writes in order of sequence, to master, and
    update replicas

39
Globe Object
Interface
Control subobject
Semantic subobject
Replication subobject
Communication subobject
Security subobject
Operating System
Messages
40
Subobjects in a Globe Object
  • Control subobject
  • Accepts incoming invocations, distributes tasks
  • Semantics subobject
  • Actually does the work required by object
    interface only part actually programmed by coder
  • Replication subobject
  • Manages object replication
  • (e.g. all active, or master-slave)
  • Security suboject implements security policy
  • Communication subobject network protocols (e.g.
    IP v4)

41
Coordination-based Middleware
  • Linda
  • Developed at Yale, 1986
  • Users appear to share a big memory, known as
    tuple space
  • Processes on any machine can insert tuples into
    tuple space or remove tuples from tuple space
  • Publish/Subscribe, 1993
  • Processes connected by a broadcast network
  • Each process can be a producer of information, a
    consumer, or both
  • Jini
  • From Sun Microsystems, 1999
  • Self-contained Jini devices are plugged into a
    network, not a computer
  • Each device offers or uses services

42
Linda
  • tuples
  • Like a structure in C, pure data, with no
    associated methods
  • e.g. (abc, 2, 5)
  • (matrix-1, 1, 6, 3.14)
  • (family, is-sister, Stephany, Roberta)
  • Operations
  • Out put a tuple into tuple space e.g.
    out(abc, 2, 5)
  • In retrieve a tuple from tuple space e.g.
    in((abc, 2, ?i)
  • addressed by content rather than ltname, addressgt
  • tuple space is searched for a match to the
    specified contents
  • Process is blocked until a match is found
  • Read a tuple, but leave it in tuple space
  • Eval to evaluate tuple parameters and the
    resulting tuple put out

43
Publish/subscribe
  • Publishing
  • New information broadcast as a tuple on the
    network
  • Tuple has a subject line with multiple fields
    separated by periods
  • Processes can subscribe to certain subjects
  • Subscribing
  • The tuple daemon on each machine copies all
    broadcasted tuples into its RAM
  • It inspects each subject line, forwards a copy to
    each interested process.

Producer
WAN
LAN
Consumer
Information router
Daemon
44
Jini
  • Network-centric computing
  • An attempt to change from CPU-centric computing
  • Many self-contained Jini devices offer services
    to the others
  • e.g. Computer, cell phone, printer, palmtop, TV
    set, stereo
  • A loose confederation of devices, with no central
    administration
  • Coded in JVM (Java Virtual Machine language)
  • Joining a Jini federation
  • Broadcasts a message asking for a lookup service
  • Uses the discovery protocol to find the service
  • Lookup service sends code to register the new
    device
  • Device acquires a lease to register for a fixed
    time
  • The registration proxy can be sent to other
    devices looking for service

45
Jini
  • JavaSpaces
  • Entries like Linda tuples, but strongly typed
  • e.g. Employee entry could have ltstring, integer,
    integer, booleangt to accommodate ltname,
    department, telephone, works fulltimegt
  • Operations
  • Write put an entry into JavaSpace, specifying
    the lease time
  • Read copy an entry that matches a template out
    of JavaSpace
  • Take copy and remove an entry that matches a
    template
  • Notify notify the caller when a matching entry
    is written
  • Transactions can be atomic, so multiple methods
    can be safely grouped all or none will execute
Write a Comment
User Comments (0)
About PowerShow.com