The Hashing Approach to the Internet File System Problem - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

The Hashing Approach to the Internet File System Problem

Description:

Release Consistency - Shared data are made consistent when a critical region ... IFS support Release cache Consistency model using Tokens and Acq/Rel protocol ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 32
Provided by: gabriel50
Category:

less

Transcript and Presenter's Notes

Title: The Hashing Approach to the Internet File System Problem


1
The Hashing Approach to the Internet File System
Problem
  • By - Gabriel Mizrahi
  • Supervised by - Dr. Yosi Ben-Asher

2
Purpose
In this work we consider the problem of
developing an efficiently distributed file system
over the internet. It should serve many clients
to perform concurrent I/O to a virtual shared
H.D. created from many local disks geographically
dispersed over the Internet. The key feature of
the proposed Internet File System (IFS) is that
mapping between files and physical blocks is
based on hashing and not on global data
structures.
3
Topics of Discussion
  • IFS requirements.
  • Overview of existing File Systems.
  • The Meta Data problem on DFS.
  • Usage of DMM simulations.
  • IFS components and APIs.
  • Semantics and Cache Consistency.
  • VOD models.
  • Experimental results and system simulation.
  • Conclusions.

4
IFS Requirements
  • It should allow a dynamic set of un-known clients
    to access files over the Internet concurrently.
  • The storage space should simulate a shared H.D.
    created of many local disks over remote servers.
  • It should support extremely fast and fully
    distributed mapping between files and physical
    blocks.
  • It should support consistent cooperative caching.
  • There should be a good load balancing in the
    system.
  • Due to the relatively long communication times,
    each access to a file (read/write) should involve
    very few servers.

5
  • No central data structures should be used

6
IFS vs. DFS configurations
7
Overview of existing DFS
  • NFS
  • Files are distributed between the servers.
  • Timing dependent semantics (3 sec. for files 30
    sec. for dirs).
  • Complete stateless service.
  • A network service so that independent
    workstations should share remote files
    transparently.
  • AFS
  • Usage of Session semantics.
  • Whole file cashing in local disks.
  • I/O are served directly by the cache without
    involving servers.
  • Servers initiated approach for cache validation.
  • Designed for large scale systems.

8
  • xFS
  • A serverless design.
  • Distribution of data and metadata across multiple
    machines including clients.
  • Token based cache consistency.
  • Implement cooperative caching.
  • Sprite
  • Apply UNIX semantics by disabling caches.
  • Designed for an environment consisting of
    diskless workstations with huge main memory.
  • GFS
  • Designed for shared systems over SAN.
  • Use of Extendible Hashing for Meta Data
    implementation.
  • Implement locks on the storage devices to
    maintain coherence of file and metadata.

9
Issues to consider for DFS
  • Generally organized according to the
    client-server model.
  • Client side supporting caching.
  • Support for server replication to meet
    scalability, reliability and load balance.

DFS main differences
  • Data and Meta Data distribution.
  • The semantics of file sharing.
  • The client cache granularity and management.

10
The Meta Data problem on DFS
  • Search over directories and i-nodes trees
  • Allow concurrent read/write and delete operations
    and keep it consistent
  • Accessing the meta-data should not pass through
    too many servers

Ways for achieving these goals
  • Centralizing
  • Replication
  • Partitioning

11
The Meta Data Solution of IFS
  • Not to use search trees at all.
  • Base the mapping on hash values, viewing the
    shared disk as a large hash table partitioned
    between the servers disk.
  • No need to maintain a global list of free blocks.
  • Servers and clients work independently and never
    exchange messages.

12
The IFS Metadata solution scheme
13
The DMM Model
  • Let n be the number of processors each having a
    local module of memory.
  • And m the number of data items in a global shared
    address space.
  • The goal is to find a scheme that distribute the
    shared memory cells over the processor's memory
    modules, such any set of addresses accessed by
    the processors is equally partitioned between the
    memory modules.
  • The result is that the load and access time to
    the simulated shared memory are minimized.

14
Usage of DMM Simulations
  • The Constrains of IFSs (no communication between
    the servers and load balancing) resemble those
    involved with the problem of simulating shared
    memory over DMM. (studies around 1984)
  • We observe that simulating a shared virtual disk
    over the internet is similar to the way shared
    memory is implemented on DMM.
  • To address the above constrains, DMM simulation
    uses a complex hashing scheme which, translated
    to our problem of simulating a shared virtual HD,
    make use of
  • Pseudo Random mapping of logical blocks to
    servers.
  • Replication of physical blocks of the virtual HD.

15
Random distribution of items to servers improve
loads of two sets of requests
16
Previously known results about DMM simulations
schemes.
  • The results on DMM simulations shows that with
    high probability the number of accesses to a
    memory module made in one cycle dos not exceed
  • Melhorn and Vishkin (1984) O(log n / loglog n).
  • Upfal and Wigderson (1987) O(log n (loglog n)2).
  • Mayer auf der Heide et. Al. (1993) O(loglog n
    log n).

17
The hashing scheme used in IFS
  • There is a family of Universal hash functions
  • H ha,b (x) ((a x b) mod k) mod p a,b
    ÃŽ 0, 1,..., k
  • k prime (vhd)
  • vhd the size of the virtual hard disk
  • p is the number of servers in IFS
  • At the beginning we chose at random three
    functions h1, h2, h3 from H by choosing their
    coefficients a and b at random.

18
  • Each server (IFSi) is responsible to store every
    block Bx for which either h1(Bx) i, h2(Bx) i
    or h3(Bx) i. Consequently there can be three
    copies of each block.
  • Fetching a block Bx (read/write) to the cache of
    client CLi requires fetching two copies Bx1 and
    Bx2 (out of the three possible) from the
    respective (IFS) servers. This is done using the
    three functions h1, h2, h3 in a random order. Out
    of the two copies we select the one with the
    highest global time tag according to some
    approximation tagging, and store it in the cache.
  • When the cache of a client becomes full the least
    recently used block is flashed to the servers
    to be stored using the above scheme.

19
IFS Components and operations
20
APIs supported in IFS
  • Create Create a file that was already created.
  • Delete Delete/Create a file that was already
    Deleted Delete a file while some process perform
    I/O on it.
  • Read/Write R/W while some process is adding
    blocks to the file.
  • Seek
  • Tokens Locks (Acq, Rel) Delete of a file while
    using Tokens.

21
Semantics and Cache Consistency
UNIX semantics is desired for DFSs. However,
implementing Unix semantics requires invalidating
caches before every write. This is not practical
in IFS setting. Thus, we choose release
consistency instead.
  • UNIX Semantic - Every operation on a file is
    instantly visible to all processes
  • Release Consistency - Shared data are made
    consistent when a critical region is exited

22
IFS Consistency Models
  • IFS support Release cache Consistency model using
    Tokens and Acq/Rel protocol for synchronization.
  • If cache on clients is supported and Tokens are
    not used for synchronization, cache consistency
    is not guaranteed.
  • If cache on clients is disable, the usage of
    global tagging and majority rule makes the
    replicated data items relative consistent.

23
VOD Models
  • Tiger Video Fileserver (Microsoft) Movies are
    sent in a stream mode pushed. Use central
    scheduler control to maintain block
    delivery. Stripe of movies between servers. Use
    block level mirroring with block declustering.
  • Fault Tolerant VoD (Hebrew Univ.) Movies are
    sent in a stream mode pushed Allows the
    clients to send control rate messages to the
    servers. Reallocates active clients of servers
    that crash to servers with replicas of the movie.

24
  • The IBM VideoCharger (IBM) Movies are sent in a
    stream mode pushed. Use central scheduler
    control to maintain block delivery. Use RAID
    devices to achieve data availability.
  • Parallel Video Server (Chinese Univ. of
    HK) Pull based mode for receiving blocks. No
    need of control scheduler. Extend RAID
    technologies to the servers level.
  • IFS (Haifa Univ.) Pull based mode for
    receiving blocks. No need of control
    scheduler. Replicate blocks three times.

25
Experimental Results
  • Using more servers increase the amount of blocks
    that each client receives.
  • The DMM simulation scheme used in IFS succeed to
    distribute the load between the servers.

26
Average number of received blocks in a time unit
for a single client.
27
Number of received blocks by a single client as a
function of time.
28
Histogram of idle servers per step.
29
Histogram of max difference (5 servers).
30
Effect of low transfer rate on viewing quality.
Effect of sufficient transfer rate on viewing
quality.
31
Conclusions
  • IFS has been specially designed to work over the
    Internet.
  • It serve clients performing concurrent I/O to a
    collection of files held on a set of servers that
    can be geographically sparse over the Internet.
  • In principle Cooperative caches are used to
    overcame the relative long delays caused by the
    large communication distance in the Internet.
  • The overall bandwidth improves due to the
    geographical distribution.
  • Guaranty equal distribution of the load between
    servers.
  • Support direct access to physical blocks without
    searching on global metadata.
  • Special care was given to optimize IFS for VOD.
Write a Comment
User Comments (0)
About PowerShow.com