File System Implementation 1 - PowerPoint PPT Presentation

About This Presentation
Title:

File System Implementation 1

Description:

Two local general-purpose file systems. System V file ... RAM disks are implemented by a device driver that emulates a disk. File System Implementation 29 ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 38
Provided by: csieNc5
Category:

less

Transcript and Presenter's Notes

Title: File System Implementation 1


1
Chapter 9. File System Implementation
  • Introduction
  • System V File System
  • Berkeley Fast File System
  • Temporary File System
  • Special-purpose File Systems
  • Old Buffer Cache

2
Introduction
  • Two local general-purpose file systems
  • System V file system (s5fs)
  • Berkeley fast file system (FFS)
  • S5fs
  • original UNIX file system
  • FFS
  • introduced in 4.2BSD
  • Vnode/vfs
  • integrated version of FFS is known as UNIX file
    system (ufs)

3
System V File System
  • On-disk layout

B
S
inode list
data blocks
boot area
superblock
  • Boot area
  • contains code required to bootstrap
  • Superblock
  • contains attributes and metadata of the file
    system

4
System V File System (cont)
  • Inode list
  • linear array of inodes
  • one inode for each file
  • size of inode is 64 bytes
  • inode list has a fixed size
  • limits the maximum number of files the partition
    can contain

5
S5fs Directories
  • Contains fixed size records of 16 bytes
  • First two bytes inode number
  • Next fourteen bytes filename
  • Limits
  • 65535 files per disk partition
  • 14 characters per filename

6
S5fs Inodes
  • On-disk inode and In-core inode
  • struct dinode, struct inode

struct dinode
Field
Size (bytes)
Description
di_mode di_nlinks di_uid di_gid di_size di_addr di
_gen di_atime di_mtime di_ctime
2 2 2 2 4 39 1 4 4 4
File type, permission, etc. number of hard links
to file owner UID owner GID size in bytes array
of block addresses generation number time of last
access time file was last modified time inode was
last changed
7
S5fs Inodes (cont)
sgid
sticky
owner
group
others
di_mode
suid
type (4 bits)
u
g
s
r
w
x
r
w
x
r
w
x
Disk block
disk
inode block array
0

1

2


...

10
indirect
11
double indirect
12
triple indirect
8
S5fs Superblock
  • Metadata about the file system
  • The kernel reads the superblock when mounting the
    file system and stores it in memory until the
    file system is unmounted
  • Contains the following information
  • size in blocks of the file system
  • size in blocks of the inode list
  • number of free blocks and inodes
  • free block list, free inode list
  • does not keep free list completely in the
    superblock

9
S5fs Kernel Organization
  • In-core inodes
  • struct inode
  • contains all the fields of the on-disk inode, and
    some additional fields, such as
  • vnode
  • the i_vnode field of the inode contains the vnode
    of the file
  • Device ID of the partition containing the file
  • Inode number of the file

10
S5fs Kernel Organization (cont)
  • Flags for synchronization and cache management
  • Pointers to keep the inode on a free list
  • Pointers to keep the inode on a hash queue
  • The kernel hashes inodes by their inode numbers,
    so as to locate them quickly when needed
  • Block number of last block read

11
S5fs Kernel Organization (cont)
inode free list
hash queue 0
i_number 40
i_number 268
i_number 1056
i_number 8
hash queue 1
i_number 73
i_number 17
i_number 593
hash queue 2
i_number 86
hash queue 3
i_number 11
i_number 199
i_number 27
i_number 103
12
S5fs Inode Lookup
  • Lookuppn( )
  • in the file-system-independent layer
  • performs pathname parsing
  • parses one component at a time, invoking
    VOP_LOOKUP operation
  • when searching an s5fs directory, translates to a
    call to s5lookup( ) function
  • s5lookup( )
  • Check the directory name lookup cache
  • In case of a cache miss, it reads the directory
    one block at a time, searching the entries for
    the specified file name

13
S5fs Inode Lookup (cont)
  • If the directory contains a valid entry for the
    file, s5lookup( ) obtains the inode number from
    the entry
  • Calls iget( ) to locate that inode and
    initializes the vnode
  • Finally, iget( ) returns a pointer to the inode
    to s5lookup( ). s5lookup( ), in turn, returns a
    pointer to the vnode to lookuppn( )

14
S5fs File I/O
  • read and write system calls
  • accept a file descriptor (the index returned by
    open)
  • File descriptor
  • used as an index into the descriptor table to
    obtain the pointer to the open file object
    (struct file)
  • the kernel obtains the vnode pointer from the
    file structure
  • Before starting I/O
  • the kernel invokes VOP_WRLOCK operations to
    serialize access to the file

15
S5fs File I/O (cont)
  • The kernel then invoke VOP_READ or VOP_WRITE
    operation
  • This results in a call to s5read( ) or s5write( )
  • In case of s5read( )
  • s5read( ) translates the starting offset to the
    logical block number
  • it then reads the data one page at a time
  • by mapping the block into the kernel virtual
    address space and calling uiomove( ) to copy the
    data into user space

16
S5fs File I/O (cont)
  • uiomove( ) calls the copyout( ) routine to
    perform the actual data transfer
  • if the page is not in memory, copyout( ) will
    generate a page fault
  • the page fault handler will invoke VOP_GETPAGE
    operation on its vnode
  • in s5fs, VOP_GETPAGE is implemented by s5getpage(
    )
  • the calling process sleeps until the I/O
    completes
  • s5read( ) returns when all data has been read
  • the system-independent code
  • unlocks the vnode, advanced the offset pointer in
    the file structure, and returns to the user

17
Allocating and Reclaiming Inodes
  • An inode remains active as long as its vnode has
    a non-zero reference count
  • When the count drops to zero, the
    file-system-independent code invokes the
    VOP_INACTIVE operation which frees the inode
  • When an inode becomes inactive, the kernel puts
    it on the free list, but does not invalidate it

18
Analysis of s5fs
  • Simple design introduces problems in
  • reliability, performance, functionality
  • Reliability
  • superblock contains vital information about the
    entire file system
  • Performance
  • s5fs groups all inodes together at the beginning
    of the file system
  • accessing a file requires reading the inode then
    the file data, causes a long seek on the disk
  • e.g. ls -l causes a random disk access pattern

19
Analysis of s5fs (cont)
  • Disk block allocation is also suboptimal
  • After the file system has been used for a while,
    the order of blocks in the free block list
    becomes completely random
  • This slows down sequential access operations on
    files, since logically consecutive block may be
    very far apart on the disk
  • Restricting of file names to 14 characters

20
Berkeley Fast File System
  • Address many limitation of s5fs
  • Hard disk structure
  • platter, disk head, track, sector, cylinder
  • head seek, rotational latency
  • FFS on-disk organization
  • FFS divides the partition into one or more
    cylinder groups, each containing a small set of
    consecutive cylinders
  • This allows UNIX to store related data in the
    same cylinder group to minimize disk head movement

21
Berkeley FFS (cont)
  • Superblock is divided into two structures
  • FFS superblock contains information about the
    entire file system, it does not change unless the
    file system is rebuilt
  • Each cylinder group has a data structure
    describing summary information about that group,
    including the free inode and free block lists.
  • Each cylinder group contains a duplicate copy of
    the superblock
  • FFS maintains there duplicates at different
    offsets in each cylinder group in such as way
    that no single track, cylinder, or platter
    contains all copies of the superblock

22
FFS Blocks
  • Blocks and Fragments
  • FFS allows each block to be divided into one or
    more fragments
  • The number of fragments per block may be set to
    1, 2, 4, or 8, allowing a lower bound of 512
    bytes, the same as the disk sector size
  • An FFS is composed entirely of complete blocks,
    except for the last block, which may contain one
    or more consecutive fragments
  • This scheme reduces space wastage, but requires
    occasional recopying of file data

23
FFS Disk Allocation
  • Allocation policies
  • FFS aims to colocate related information on the
    disk and optimize sequential access
  • 1. Attempt to place the inodes of all files of a
    single directory in the same cylinder group
  • 2. Create each new directory in a different
    cylinder group from it parent, so as to
    distribute data uniformly over the disk
  • 3. Try to place the data blocks of the file in
    the same cylinder group as the inode

24
FFS Disk Allocation (cont)
  • 4. To avoid filling an entire cylinder group with
    one large file, change the cylinder group when
    the file size reaches 48Kbytes and again at every
    megabyte
  • 5. Allocate sequential blocks of a file at
    rotationally optimal positions
  • Rotational optimization tries to determine the
    number of sectors to skip so that the desired
    sector is under the disk head when the read is
    initiated.

25
FFS Functionality Enhancements
  • Long file names
  • maximum size of the filename is 255 characters
  • Symbolic links, and atomic rename( )

inode number
7
7
allocation size
4
24
name length
2
2
name plus extra space
f 1 0 0
f 1 0 0
14
padding
8
5
f i l e 2 0 0 0
(a) initial state
(b) after deleting file2
FFS Directory
26
Analysis of FFS
  • Substantial performance gains
  • read throughput
  • 29Kbyte/sec in s5fs ? 221Kbytes/sec in FFS
  • CPU utilization 11 ? 43
  • write throughput
  • 48Kbytes/sec ? 142 Kbytes/sec
  • CPU utilization 29 ? 43
  • Disk space wastage
  • half a block per file in s5fs
  • half a fragment per file in FFS
  • more space is required to monitor the free blocks
    and fragments

27
Analysis of FFS (cont)
  • Modern SCSI disks do not have fixed size
    cylinders
  • FFS is oblivious to this
  • Overall, FFS provides great benefits
  • wide acceptance
  • 4.3BSD added two types of caching to speed up
    name lookups

28
Temporary File Systems
  • Basic concepts
  • Many utilities and applications extensively use
    temporary files to store results of intermediate
    phases of execution
  • The synchronous updates are really unnecessary
    for temporary files, because they are not meant
    to be persistent
  • Addressed by using RAM disks, which provide file
    systems that reside entirely in physical memory
    (dedicating a large amount of memory)
  • RAM disks are implemented by a device driver that
    emulates a disk

29
Temporary File Systems (cont)
  • Two implementations
  • Memory File System (mfs)
  • tmpfs File System
  • mfs
  • Developed by UC Berkeley
  • Entire file system is built in the virtual
    address space of the process that handled the
    mount operation
  • This process does not return from the mount call,
    but remains in the kernel, waiting for I/O
    requests to the file system

30
Temporary File Systems (cont)
  • Each mfsnode, which is the file-system-dependent
    part of the vnode, contains the PID of the mount
    process, which now functions as an I/O server
  • The pages of the mfs files compete with all other
    processes for physical memory
  • Using a separate process to handle all I/O
    requires two context switches for each operation
  • The file system still resides in a separate
    address space, which means we still need extra
    in-memory copy operations

31
Temporary File Systems (cont)
  • tmpfs file system
  • Developed by Sun Microsystems
  • Combined the powerful facilities of the vnode/vfs
    interface and the new VM architecture
  • tmpfs is implemented entirely in the kernel
  • All file metadata is stored in non-paged memory,
    dynamically allocated from the kernel heap
  • The data blocks are in paged memory and are
    represented using the anonymous pages facility in
    the VM subsystem

32
Temporary File Systems (cont)
  • Each page is mapped by an anonymous object
    (struct anon), which contains the location of the
    page in physical memory or on the swap space
  • The tmpnode, which is the file-system-dependent
    object for each file, has a pointer to the
    anonymous map (struct anon_map) for the file
  • Pages can be swapped out by the paging system and
    compete for physical memory

33
Temporary File Systems (cont)
  • Advantages of tmpfs
  • does not use a separate I/O server and thus
    avoids wasteful context switches
  • holding the metadata in unpaged kernel memory
    eliminates the memory-to-memory copies and some
    disk I/O
  • the support for memory mapping allows fast,
    direct access to file data

34
Locating tmpfs pages
swap area on disk
struct anon_map
page
struct anon
struct vnode
struct anon
struct tmpnode
page in memory
35
Special-Purpose File Systems
  • The specfs file system
  • Provides a uniform interface to device files
  • The primary purpose of specfs is to intercept I/O
    calls to device files and translate them to calls
    to the appropriate device driver routines
  • The /proc file system
  • Provides an elegant and powerful interface to the
    address space of any process
  • The processor file system
  • Provides an interface to the individual
    processors on a multiprocessor machine

36
Old Buffer Cache
  • Background
  • Traditional UNIX systems use a dedicated area in
    memory called block buffer cache to cache blocks
    accessed through file system
  • Backing store of a cache is the persistent
    location of the data
  • A cache can be write-through or write-behind
  • write-through cache writes out modified data to
    the backing store immediately
  • write-behind modified blocks are simply marked
    as dirty, and written to the disk at a later time

37
Old Buffer Cache (cont)
  • Advantages
  • Reduce disk traffic and eliminate unnecessary
    disk I/O
  • Synchronizes access to disk blocks through the
    locked and wanted flags
  • Disadvantages
  • The write-behind nature of the cache means the
    data may be lost if the system crashes
  • Reducing disk access greatly improves
    performance, but the data must be copied twice
  • disk ? buffer, then buffer ? user address space
  • e.g. cache wiping problem
Write a Comment
User Comments (0)
About PowerShow.com