File System Implementation 1 - PowerPoint PPT Presentation

About This Presentation

Title:

File System Implementation 1

Description:

Two local general-purpose file systems. System V file ... RAM disks are implemented by a device driver that emulates a disk. File System Implementation 29 ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 38

Provided by: csieNc5

Category:

more less

Transcript and Presenter's Notes

Title: File System Implementation 1

1
Chapter 9. File System Implementation

Introduction
System V File System
Berkeley Fast File System
Temporary File System
Special-purpose File Systems
Old Buffer Cache

2
Introduction

Two local general-purpose file systems
System V file system (s5fs)
Berkeley fast file system (FFS)
S5fs
original UNIX file system
FFS
introduced in 4.2BSD
Vnode/vfs
integrated version of FFS is known as UNIX file
system (ufs)

3
System V File System

On-disk layout

B
S
inode list
data blocks
boot area
superblock

Boot area
contains code required to bootstrap
Superblock
contains attributes and metadata of the file
system

4
System V File System (cont)

Inode list
linear array of inodes
one inode for each file
size of inode is 64 bytes
inode list has a fixed size
limits the maximum number of files the partition
can contain

5
S5fs Directories

Contains fixed size records of 16 bytes
First two bytes inode number
Next fourteen bytes filename
Limits
65535 files per disk partition
14 characters per filename

6
S5fs Inodes

On-disk inode and In-core inode
struct dinode, struct inode

struct dinode
Field
Size (bytes)
Description
di_mode di_nlinks di_uid di_gid di_size di_addr di
_gen di_atime di_mtime di_ctime
2 2 2 2 4 39 1 4 4 4
File type, permission, etc. number of hard links
to file owner UID owner GID size in bytes array
of block addresses generation number time of last
access time file was last modified time inode was
last changed
7
S5fs Inodes (cont)
sgid
sticky
owner
group
others
di_mode
suid
type (4 bits)
u
g
s
r
w
x
r
w
x
r
w
x
Disk block
disk
inode block array
0

1

2

...

10
indirect
11
double indirect
12
triple indirect
8
S5fs Superblock

Metadata about the file system
The kernel reads the superblock when mounting the
file system and stores it in memory until the
file system is unmounted
Contains the following information
size in blocks of the file system
size in blocks of the inode list
number of free blocks and inodes
free block list, free inode list
does not keep free list completely in the
superblock

9
S5fs Kernel Organization

In-core inodes
struct inode
contains all the fields of the on-disk inode, and
some additional fields, such as
vnode
the i_vnode field of the inode contains the vnode
of the file
Device ID of the partition containing the file
Inode number of the file

10
S5fs Kernel Organization (cont)

Flags for synchronization and cache management
Pointers to keep the inode on a free list
Pointers to keep the inode on a hash queue
The kernel hashes inodes by their inode numbers,
so as to locate them quickly when needed
Block number of last block read

11
S5fs Kernel Organization (cont)
inode free list
hash queue 0
i_number 40
i_number 268
i_number 1056
i_number 8
hash queue 1
i_number 73
i_number 17
i_number 593
hash queue 2
i_number 86
hash queue 3
i_number 11
i_number 199
i_number 27
i_number 103
12
S5fs Inode Lookup

Lookuppn( )
in the file-system-independent layer
performs pathname parsing
parses one component at a time, invoking
VOP_LOOKUP operation
when searching an s5fs directory, translates to a
call to s5lookup( ) function
s5lookup( )
Check the directory name lookup cache
In case of a cache miss, it reads the directory
one block at a time, searching the entries for
the specified file name

13
S5fs Inode Lookup (cont)

If the directory contains a valid entry for the
file, s5lookup( ) obtains the inode number from
the entry
Calls iget( ) to locate that inode and
initializes the vnode
Finally, iget( ) returns a pointer to the inode
to s5lookup( ). s5lookup( ), in turn, returns a
pointer to the vnode to lookuppn( )

14
S5fs File I/O

read and write system calls
accept a file descriptor (the index returned by
open)
File descriptor
used as an index into the descriptor table to
obtain the pointer to the open file object
(struct file)
the kernel obtains the vnode pointer from the
file structure
Before starting I/O
the kernel invokes VOP_WRLOCK operations to
serialize access to the file

15
S5fs File I/O (cont)

The kernel then invoke VOP_READ or VOP_WRITE
operation
This results in a call to s5read( ) or s5write( )
In case of s5read( )
s5read( ) translates the starting offset to the
logical block number
it then reads the data one page at a time
by mapping the block into the kernel virtual
address space and calling uiomove( ) to copy the
data into user space

16
S5fs File I/O (cont)

uiomove( ) calls the copyout( ) routine to
perform the actual data transfer
if the page is not in memory, copyout( ) will
generate a page fault
the page fault handler will invoke VOP_GETPAGE
operation on its vnode
in s5fs, VOP_GETPAGE is implemented by s5getpage(
)
the calling process sleeps until the I/O
completes
s5read( ) returns when all data has been read
the system-independent code
unlocks the vnode, advanced the offset pointer in
the file structure, and returns to the user

17
Allocating and Reclaiming Inodes

An inode remains active as long as its vnode has
a non-zero reference count
When the count drops to zero, the
file-system-independent code invokes the
VOP_INACTIVE operation which frees the inode
When an inode becomes inactive, the kernel puts
it on the free list, but does not invalidate it

18
Analysis of s5fs

Simple design introduces problems in
reliability, performance, functionality
Reliability
superblock contains vital information about the
entire file system
Performance
s5fs groups all inodes together at the beginning
of the file system
accessing a file requires reading the inode then
the file data, causes a long seek on the disk
e.g. ls -l causes a random disk access pattern

19
Analysis of s5fs (cont)

Disk block allocation is also suboptimal
After the file system has been used for a while,
the order of blocks in the free block list
becomes completely random
This slows down sequential access operations on
files, since logically consecutive block may be
very far apart on the disk
Restricting of file names to 14 characters

20
Berkeley Fast File System

Address many limitation of s5fs
Hard disk structure
platter, disk head, track, sector, cylinder
head seek, rotational latency
FFS on-disk organization
FFS divides the partition into one or more
cylinder groups, each containing a small set of
consecutive cylinders
This allows UNIX to store related data in the
same cylinder group to minimize disk head movement

21
Berkeley FFS (cont)

Superblock is divided into two structures
FFS superblock contains information about the
entire file system, it does not change unless the
file system is rebuilt
Each cylinder group has a data structure
describing summary information about that group,
including the free inode and free block lists.
Each cylinder group contains a duplicate copy of
the superblock
FFS maintains there duplicates at different
offsets in each cylinder group in such as way
that no single track, cylinder, or platter
contains all copies of the superblock

22
FFS Blocks

Blocks and Fragments
FFS allows each block to be divided into one or
more fragments
The number of fragments per block may be set to
1, 2, 4, or 8, allowing a lower bound of 512
bytes, the same as the disk sector size
An FFS is composed entirely of complete blocks,
except for the last block, which may contain one
or more consecutive fragments
This scheme reduces space wastage, but requires
occasional recopying of file data

23
FFS Disk Allocation

Allocation policies
FFS aims to colocate related information on the
disk and optimize sequential access
1. Attempt to place the inodes of all files of a
single directory in the same cylinder group
2. Create each new directory in a different
cylinder group from it parent, so as to
distribute data uniformly over the disk
3. Try to place the data blocks of the file in
the same cylinder group as the inode

24
FFS Disk Allocation (cont)

4. To avoid filling an entire cylinder group with
one large file, change the cylinder group when
the file size reaches 48Kbytes and again at every
megabyte
5. Allocate sequential blocks of a file at
rotationally optimal positions
Rotational optimization tries to determine the
number of sectors to skip so that the desired
sector is under the disk head when the read is
initiated.

25
FFS Functionality Enhancements

Long file names
maximum size of the filename is 255 characters
Symbolic links, and atomic rename( )

inode number
7
7
allocation size
4
24
name length
2
2
name plus extra space
f 1 0 0
f 1 0 0
14
padding
8
5
f i l e 2 0 0 0
(a) initial state
(b) after deleting file2
FFS Directory
26
Analysis of FFS

Substantial performance gains
read throughput
29Kbyte/sec in s5fs ? 221Kbytes/sec in FFS
CPU utilization 11 ? 43
write throughput
48Kbytes/sec ? 142 Kbytes/sec
CPU utilization 29 ? 43
Disk space wastage
half a block per file in s5fs
half a fragment per file in FFS
more space is required to monitor the free blocks
and fragments

27
Analysis of FFS (cont)

Modern SCSI disks do not have fixed size
cylinders
FFS is oblivious to this
Overall, FFS provides great benefits
wide acceptance
4.3BSD added two types of caching to speed up
name lookups

28
Temporary File Systems

Basic concepts
Many utilities and applications extensively use
temporary files to store results of intermediate
phases of execution
The synchronous updates are really unnecessary
for temporary files, because they are not meant
to be persistent
Addressed by using RAM disks, which provide file
systems that reside entirely in physical memory
(dedicating a large amount of memory)
RAM disks are implemented by a device driver that
emulates a disk

29
Temporary File Systems (cont)

Two implementations
Memory File System (mfs)
tmpfs File System
mfs
Developed by UC Berkeley
Entire file system is built in the virtual
address space of the process that handled the
mount operation
This process does not return from the mount call,
but remains in the kernel, waiting for I/O
requests to the file system

30
Temporary File Systems (cont)

Each mfsnode, which is the file-system-dependent
part of the vnode, contains the PID of the mount
process, which now functions as an I/O server
The pages of the mfs files compete with all other
processes for physical memory
Using a separate process to handle all I/O
requires two context switches for each operation
The file system still resides in a separate
address space, which means we still need extra
in-memory copy operations

31
Temporary File Systems (cont)

tmpfs file system
Developed by Sun Microsystems
Combined the powerful facilities of the vnode/vfs
interface and the new VM architecture
tmpfs is implemented entirely in the kernel
All file metadata is stored in non-paged memory,
dynamically allocated from the kernel heap
The data blocks are in paged memory and are
represented using the anonymous pages facility in
the VM subsystem

32
Temporary File Systems (cont)

Each page is mapped by an anonymous object
(struct anon), which contains the location of the
page in physical memory or on the swap space
The tmpnode, which is the file-system-dependent
object for each file, has a pointer to the
anonymous map (struct anon_map) for the file
Pages can be swapped out by the paging system and
compete for physical memory

33
Temporary File Systems (cont)

Advantages of tmpfs
does not use a separate I/O server and thus
avoids wasteful context switches
holding the metadata in unpaged kernel memory
eliminates the memory-to-memory copies and some
disk I/O
the support for memory mapping allows fast,
direct access to file data

34
Locating tmpfs pages
swap area on disk
struct anon_map
page
struct anon
struct vnode
struct anon
struct tmpnode
page in memory
35
Special-Purpose File Systems

The specfs file system
Provides a uniform interface to device files
The primary purpose of specfs is to intercept I/O
calls to device files and translate them to calls
to the appropriate device driver routines
The /proc file system
Provides an elegant and powerful interface to the
address space of any process
The processor file system
Provides an interface to the individual
processors on a multiprocessor machine

36
Old Buffer Cache

Background
Traditional UNIX systems use a dedicated area in
memory called block buffer cache to cache blocks
accessed through file system
Backing store of a cache is the persistent
location of the data
A cache can be write-through or write-behind
write-through cache writes out modified data to
the backing store immediately
write-behind modified blocks are simply marked
as dirty, and written to the disk at a later time

37
Old Buffer Cache (cont)

Advantages
Reduce disk traffic and eliminate unnecessary
disk I/O
Synchronizes access to disk blocks through the
locked and wanted flags
Disadvantages
The write-behind nature of the cache means the
data may be lost if the system crashes
Reducing disk access greatly improves
performance, but the data must be copied twice
disk ? buffer, then buffer ? user address space
e.g. cache wiping problem