File Systems - PowerPoint PPT Presentation

About This Presentation
Title:

File Systems

Description:

Link & unlink: link is a common technique used for sharing files or directories between users. ... Directories ... Implementing Directories ... – PowerPoint PPT presentation

Number of Views:318
Avg rating:3.0/5.0
Slides: 76
Provided by: RPy1
Category:

less

Transcript and Presenter's Notes

Title: File Systems


1
File Systems
  • We need a mechanism that provides long-term
    information storage with following
    characteristics
  • Possible to store large amount of INFO
  • INFO survives after termination of any process
  • Multiple processes can access INFO concurrently
  • The file system is the component of O.S. that
    manipulate the INFO as files and directories
  • The file systems is the appearance of INFO from
    the users standpoint that involved two main
    structures Files and directories

2
Files
  • INFO stored in the files must be persistent, that
    is, not be affected by process creation and
    termination
  • A file is a logical storage unit defined by the
    O.S. providing the user a mechanism to store INFO
    on a physical storage devices such as disk , tape
    , CD and etc.
  • user O.S.
    Physical
  • Logical View
    view

--- ---- -----
3
File Naming
  • Some O.S. recognize difference between upper and
    lower case letters ( e.g., Unix) and some of them
    dont (e.g., MS-DOS)
  • The file extension usually indicates what type of
    file it is (see the next slide). In some systems
    (e.g., UNIX), file extension are just conventions
    and are not enforced by O.S. Some other systems
    (e.g., Windows) are aware of extension and use
    programs that are assigned to the extensions
    (e.g., file.doc starts Word)

4

5
File Structure
  • The structure of a file is determined by O.S.
  • Some O.S.,s (e.g., CPM and old mainframes)
    impose the view that a file is a sequence of
    fixed length records ( e.g., b in the next slide)
  • Other O.S.s may impose a B-tree (or index) like
    structure on a file in order to support rapid
    search ( e.g., c in the next slide)
  • The problem with imposing more structure by O.S.
    is it is difficult to do something out of the
    ordering that is not foreseen by O.S. designer

6

7
File Structure
  • O.S. systems such as UNIX and Windows impose no
    structure to ensure maximum flexibility. They
    consider a file as a steam of bytes , and user
    processes define any structure that they want
  • I/O is usually performed in units of ONE physical
    Block and all blocks have the same size that is
    related to the page size in paging scheme.

8
File Types
  • Some of the file types are
  • Regular files User files (ASCII files or binary
    files)
  • Directory files System files used to maintain
    directory structure
  • I/O files Special system files dedicated to I/O
  • Executable files O.S. usually expects special
    structure for these files. For example in Unix
    they must start with Magic Number. Next slide
    shows difference between executable (a) and
    archive (i.e. compiled but not linked) file in
    Unix

9

10
File Access
  • Generally two types of access are provided for
    the files
  • Sequential access starts from the beginning and
    read sequentially (usually is using with tapes)
  • Random access can access any byte in the file
    directly.
  • O.S. provides these operations to the user

11
File Attributes
  • Deals with
  • Location where the file is physically located
  • Size how big is the file
  • Type what kind of file it is
  • Protection who can access the file
  • Time Date when was the last access or
    modification
  • User who created the file
  • and other information. Some of the attributes
    are shown in the next slide

12

13
File Operations
  • Most common system calls relating to files
  • Create announce that file is coming and set
    attributes and allocate space
  • Delete Free disk space, adjust directory
    structure
  • Open Fetch the attributes and location of the
    file
  • Close Release internal table space and writing
    the files last block

14
File Operations
  • Read Data read from the file and put into memory
    for user access
  • Write Data are written to the file usually at
    the current position
  • Append Adds data to the end of file
  • Seek Random access data from the file,
    repositioning the file pointer for reading
  • Rename Change the name of the file
  • Get Set attribute Get attributes of file or
    set attributes of a file (e.g., get and set read
    only attribute )
  • See the program for copying a file in UNIX shown
    in the next slides. It can be called by the
    following command line
  • copyfile abc xyz

15

16

17
Directories
  • Directories are the mechanism provided by O.S. to
    keep track of files. A directory records info a
    bout the files in the particular partition.
  • Directory typically contains one entry per file.
    It may contain Name, Attributes and Location or
  • It may contain Name and pointer to Attribute
    information

18
Directory Structure
  • Single level directory system
  • No owner, problem is the files with the same
    names created by two different owners
  • Note that in the following Figures the files are
    shown by the owner names. For example the files
    named A created by the same owner.

19
Directory Structure
  • Two-level directory system
  • Search in directories is based on user name.
    Problem is the user with the large number of
    files

20
Directory Structure
  • Hierarchical directory system

21
Path Names
  • Absolute path name /usr/ast/mailbox. Always
    starts with / (i.e.,separator)
  • Relative Path Name mailbox
  • Current directory or working directory determines
    the relative path name
  • In Unix . is current directory and
  • .. refers to parent
  • For example cp ../lib/abdy.doc .

22
Directory Operations
  • Create creates . , ..
  • Delete only empty directory can be deleted
  • Rename
  • Link unlink link is a common technique used
    for sharing files or directories between users.
    (see next slide). Instead of link, duplication of
    the files can be used for shared files but the
    problem of duplication is consistency is
    difficult to maintain. Link within a directory
    can be hard link (implemented by i-node that
    explained later) or symbolic linking (creating a
    file that contains the path of the linked file).

23
Directories
  • Creating a shared file by link changes the
    directory structure from a tree to a graph

24
File System Layout
  • Most disks divided up into one or more
    partitions, with independent file systems on each
    partition.
  • Sector 0 of disk is called MBR ( Master Boot
    Record) and contains partition table that
    contains start and ending address for each
    partition
  • The layout of a disk partition depends on its
    file system. For example after its first block (
    i.e., boot block) it may contain super block that
    contains administrative information such as magic
    numbers to identify file types. (see next slide)

25

26
Implementing the Files
  • Various methods are used in different O.S. for
    implementing the files
  • Contiguous Allocation Each file is stored on
    consecutive disk blocks. For example for a disk
    with 4K block size a 20K file is stored on 5
    consecutive blocks. (see next slide)
  • Advantages
  • simple to implement because we need to know only
    disk address of the first block and number of
    blocks
  • The read performance is excellent because we need
    only one disk operation to read the entire file.

27
Contiguous Allocation

28
Contiguous Allocation
  • The disadvantages of Contiguous allocation are
  • Disk fragmentation happens when the files are
    removed. Compaction is difficult because all the
    blocks following the holes should be copied. It
    is worse when the disk filled up.
  • Needs to know the final size of new file to be
    able to choose the correct hole to place it. That
    is also difficult
  • Consecutive allocation is good for write once
    medias such as CD-ROMS and DVDs

29
Linked List Allocation
  • A linked list of disk blocks (first word is
    pointer) is kept in this method
  • Every disk blocks can be used (except for
    internal fragmentation)
  • The sequential read for the blocks of the file is
    easy but random access to each block is hard
    because we have to read all the blocks of a file
    before that block
  • Because of pointer the amount of data stored in
    each block is not a power of two

30
Linked List Allocation

31
Linked List Allocation using a Table in Memory
  • Both of disadvantages of the linked list
    allocation can be eliminated by keeping the table
    of pointer to the blocks (FAT) in the memory.
    MSDOS uses that.
  • Random access to blocks is easy because there is
    no disk reference involved. We need only the
    starting block number.
  • The problem is for 20 GB disk, and a 1 KB block
    size table needs 20 million entries if each be 4
    bytes, table will take approximately 80 MB .

32
File Allocation Table

33
I-nodes
  • To solve the problem of the large file table we
    can use i-node
  • In this method for each file there is a table
    contains attributes and disk address of the
    blocks of that file. So if i-node occupies n
    bytes for k files open we have kn bytes of
    memory. Thus i-node depends on open files not
    disk size
  • Problem is if each i-node has room for a fixed
    number of disk addresses what happens when a file
    grows beyond this limit?
  • One solution is keeping multiple indexes in
    i-node.

34

35
I-node in Unix
  • i-node in UNIX has
  • Initial 10 disk addresses.
  • Single indirect blocks keeps address of file more
    blocks for larger files.
  • Double indirect block that holds address of the
    blocks each contains a list of single indirect
    block
  • Triple indirect block has the address of block
    each is double indirect block

36
I-node in Unix

37
Implementing Directories
  • Basically, a directory is a file that contains an
    entry for each file or subdirectory in that
    directory
  • When a file is opened, O.S. uses the path name to
    locate directory entry
  • Each directory entry contain the file information
  • Each file information can be stored directly in
    directory entry (a in the next slide)
  • Or file information can be stored in i-node and
    each directory entry refers to i-node (b in the
    next slide)

38
Implementing Directories

39
Directories in MS-DOS
  • Same as CP/M directory entries they are 32 bits
    each
  • The extension is for a large file size that
    requires more than one directory entry. The order
    in which directory entries should be followed
  • First block number is the physical block number
    address of the file

40
Directories in MS-DOS

41
Directories in UNIX
  • Each directory entry contains file name and
    i-node number

42
Directories in UNIX
  • Directory lookup in Unix and all hierarchical
    system is same
  • First file system locates the root directory.
  • Then it looks up the first component of the path
    and its i-node
  • From the i-node system looks up the block address
    of next component and it works in the same way
    until the file can be found. For example next
    slide shows the steps in looking up /usr/ast/mbox

43

44
Disk Space ManagementPhysical Disk Structure
  • Main secondary storage is disk. Tape mainly is
    used for backup
  • The physical disk consists of cylinders. Each
    cylinder is divided into tracks. A track is
    divided further into sectors. One or more sectors
    form a logical block. Data transfers between the
    main memory and disk are in the units of logical
    blocks. The size of a logical block is usually
    512 bytes or larger, although the disk can be
    formatted to have different logical block sizes

45
(No Transcript)
46

47
Disk Read Speed
  • The total time for accessing a file consists of
    the time to move the head to the right track
    (seek time), the time to find a correct sector
    (rotational delay), and the time to transfer data
    (transfer time). Disk seek time contributes more
    to the total delay of accessing the files,
    especially when files are not stored in
    contiguous blocks.

48
Disk Read
  • Example The seek time is 10 msec per block in
    average, and rotation latency is 8 msec per block
    in average and transfer time is 0.25 msec for
    1KB block for a disk system. The average reading
    time for each block in this disk system is 10
    8 0.25 18.25 ms
  • Usually as shown in this example seek time and
    rotation time contribute more to disk read
    latency.
  • It means if we reduce seek time or rotation
    latency we can increase disk read time
    significantly. Therefore most of the
    optimizations for increasing disk performance are
    based on reducing disk seek time.
  • For example in Unix FFS uses cylinder grouping
    technique to reduce disk seek time

49
Cylinder Grouping Technique
  • Fast File System (FFS, a Unix file system) uses
    the cylinder grouping technique to provide both
    block-level and file-level clustering. In the
    cylinder grouping technique, users or
    applications have to place the related files into
    a directory. The files of the directory are
    allocated in one or more consecutive cylinders to
    reduce disk seek time (see next slide). In the
    cylinder grouping technique, files belonging to a
    directory are stored on consecutive blocks on
    disk(s). With the same approach, FFS also tries
    to store a single file in consecutive disk
    blocks.

50

51
Keeping Track of Free Blocks
  • There are two methods for keeping track of the
    free disk blocks. Linked list and bitmap
  • Often free blocks on disk can be used to hold the
    number of free blocks. For example (a) in the
    next slide shows three free blocks (16,17 and 18)
    that maintain the block numbers of the free
    blocks with linked list method.

52
Free disk blocks 16, 17 , 18

(b)
(a)
53
Keeping Track of Free Blocks
  • In the bit map method one bit required for each
    block, where 1 shows block is used and 0 shows
    the block is free. Bit map method requires less
    space compare to linked list, except for the
    situation in which disk is full and there is
    only free few blocks on disk.

54
File System Reliability
  • Bad block management Most hard disk have bad
    blocks that can be resolved by hardware solution
    or software solution

55
File System Reliability
  • Backups
  • Full backups
  • Problem taking long time and space.
  • Solution instead of the entire file system
    only part of that can be backed up. There is no
    reason to backup /bin or /dev files in UNIX

56
File System Reliability
  • Incremental dumps to make a complete dump
    (backup) periodically and make daily backup of
    only those files that have been modified since
    the last dump
  • Advantage minimize the backup time
  • Disadvantage It makes recovery more complicated

57
File System Consistency
  • If the system crashes before writing all the
    modified blocks, file system becomes
    inconsistent.
  • Solution Checking the file system consistency.
    For example fsck in UNIX or scandisk in Windows

58
File System Consistency
  • Two type of consistency checks can be made
    block and files consistency check
  • Block consistency check
  • Two tables are builds each contains a counter for
    all blocks
  • Program reads all i-nodes to find used blocks and
    updates first table
  • Program examines free list/bit map to find not
    used blocks and updates second table

59
Block Consistency Check
Block number

Missing block
Consistent
Duplicate data block
Duplicate block in free list
60
File Inconsistency Check
  • Can be done by
  • Using a table of counters per file.
  • Verifying directory system by traversing the
    directory tree. It can be done by incrementing
    the counter for each file based on the number of
    time that file has been used in the directories
  • Comparing the number of file usage with the link
    count (i.e., a number reported by i-node of that
    file) shows the consistency/inconsistency

61
File System Performance
  • Access to disk is much slower than access to
    memory. In memory reading a word takes 10 nsec
  • Solution Using block cache buffer in the memory
  • For each read request, cache is checked for
    availability of the requested block

62
Caching
  • Cache references are less than paging so using
    LRU for cache is feasible
  • Disadvantage of using LRU is a crash will leave
    file system inconsistence

63
Buffer Cache Data Structure

64
Caching
  • Solution
  • The needed blocks such as i-node and directory
    can put at the front (to be evicted faster)
    instead of rear. It means they can be written on
    disk more frequently. This reduces the chance of
    inconsistency in file system.
  • Writing modified data blocks immediately. Sync in
    UNIX and write-through cache in MS-DOS can do
    that.

65
Block Read Ahead
  • It is the second technique for improving the file
    performance
  • Reading ahead the blocks on each file read. Only
    good for sequential file reads
  • Solution Keeping access pattern of file by using
    a bit for that file. By setting that bit in each
    sequential access and resetting in each random
    access (i.e., seek is done) system can guess if
    the file is in sequential or random access mode.

66
Reducing Arm Motion
  • Placing i-nodes in the middle of the disk instead
    of start of the disk (see the next slide)
  • Cylinder grouping technique (i-nodes and related
    files are in the same cylinder group)

67

68
Log-Structured File System
  • Log-structured (or journaling) file system
    designed in Berkeley for UNIX to reduce disk seek
    times for the write operations
  • In UNIX most of the write operations are small
    writes

69
Log-Structured File System
  • LFS considers the entire disk as a log and by
    buffering the writes in the memory, writes them
    in a single segment at the end of log
    periodically.
  • Each segment may contain i-nodes, directory entry
    blocks and data blocks
  • The problem is i-nodes are scattered all over the
    log instead of being in the fixed disk position

70
Log-Structured File System
  • Opening a file consists of using map to locate
    the i-node for that file
  • LFS has a book keeping program named cleaner that
    moves around the log and remove old segments

71
The Sun Network File System (NFS)
  • The implementation is part of the Solaris and
    SunOS operating systems running on Sun
    workstations using an unreliable datagram
    protocol (UDP/IP protocol and Ethernet.
  • NFS is designed to operate in a heterogeneous
    environment
  • In NFS clients access the server directories by
    mounting them

72
Remote Mounting in NFS
73
Remote Mounting in NFS
74
Remote Mounting in NFS
  • Mount operation includes name of remote directory
    to be mounted and name of server machine that is
    storing it.
  • Mount request is mapped to corresponding RPC and
    forwarded to mount server running on server
    machine.
  • Export list specifies local file systems that
    server exports for mounting, along with names of
    machines that are permitted to mount them.

75
Remote Mounting in NFS
  • Following a mount request that conforms to its
    export list, the server returns a file handlea
    key for further accesses.
  • File handle a file-system identifier, and an
    i-node number is used to identify the mounted
    directory within the exported file system.
Write a Comment
User Comments (0)
About PowerShow.com