Title: A Fast File System for Unix
1A Fast File System for Unix
- Marshall K. Mckusick, William N. Joy,
- Samual J. Leffler and Robert S. Fabry
- Computer Systems Research Group, UCB
Presented By Parang Saraf
CS 5204 Operating Systems, Virginia Tech
2About the Paper
- Considered as one of the most fundamental papers
in operating systems - Have been cited around 930 times
- Describes a new file system
3Traditional File System
- File System developed at Bell Laboratories
- A file system is described by its Super-Block
- Number of Data Blocks
- Count of maximum number of files
- Pointer to free list (linked list to all free
blocks) - Disk drive is divided into partitions
- Each disk partition may contain one file system
- A file system never spans multiple partitions
4Traditional File System
5Traditional File System Inode
- Each file has a descriptor associated with it
Inode. - Information includes
- Ownership of the file
- Time stamps marking last modification and access
time - Array of indices pointing to the data blocks
- Direct Blocks 8
- Indirect Blocks Singly, Doubly and Triply
6Traditional File System Inode
7Traditional File System Inode
8Traditional File System Problem
- Inode information segregated from Data
- Long seek time from inode to its data
- Files in single directory are not typically
allocated consecutive slots for inode information - Many non-consecutive blocks of inodes are
accessed when executing operations on inodes of
several files in a directory - Sub-optimum allocation of data blocks
- Small Block size 512 bytes
- Many Seeks Next sequential block is not on the
same cylinder - Limited read-ahead
9Old File System
- Developed at Berkeley
- Increased Throughput
- Changing the basic block size from 512 bytes to
1024 bytes - Each disk transfer accessed twice as much data
- Less number to indirect blocks used
- Increased Reliability
- Staging modifications to critical file system
information so that they could either be
completed or repaired cleanly after a crash
10Old File System Problem
- Old file system was still using just 4 of disk
bandwidth - Main problem Scrambled Free List
11Old File System Problem
- Old file system was still using just 4 of disk
bandwidth - Main problem Scrambled Free List
- Initially ordered for optimal access
- Scrambled because files were created and removed
- Eventually becomes entirely random blocks
allocated randomly - On creation provides transfer rates up to 175
kbps - Rate deteriorates to 30 kbps after a few weeks of
moderate use - Possible Solution Dump, rebuild and restore /
Fragmentation
12New File System
- Each disk drive contains one or more file systems
- A File System is described by its super-block,
located at the beginning of the disk partition - Super-block is replicated to protect against
catastrophic loss - Block size is any power of two gt 4096 bytes
- Decided at the time of file system creation and
cant be changed - File Systems can have different block sizes
13New File System Cylinder Groups
- Comprises of one or more consecutive cylinders
-
14New File System Cylinder Groups
- Comprises of one or more consecutive cylinders
- Disk partition is divided into one or more
cylinder groups - Has associated book-keeping information
- A redundant copy of super-block
- Space for inodes
- A bit map describing available blocks replaces
free list - Summary information describing usage of data
blocks
15New File System Cylinder Groups
- Contains static number of inodes
- Allocated at file system creation time
- Default policy one inode for each 2048 bytes
- Book-keeping information begins at varying offset
from the beginning of the cylinder group - Redundant information spirals down into the
cylinder - Any single track, cylinder or platter can be lost
without losing copies of the super-block
16New File System Structure
17New File System Key Contributions
- Optimizing storage utilization
- File System Parameterization
- Layout Policies
18Optimizing Storage Utilization
- New 4096 size blocks transfers 4 times more
- Problem with large blocks
- Wasted space due to small files
19Optimizing Storage Utilization
- Solution
- Divide the 4096 block into 2, 4 or 8 fragments to
accommodate small files - Fragment size is specified at the time file
system is created - Block map records the space available at fragment
level
20Optimizing Storage Utilization
21Optimizing Storage Utilization
- Space allocation
- Space is allocated when a program does a write
system call - Three possible conditions
- Enough space left in an already allocated block
or fragment - File contains no fragmented blocks allocate new
blocks and fragments - File contains one or more fragmented blocks but
has insufficient space to hold new data new
block is allocated, old fragments are copied and
new fragments are appended
22Optimizing Storage Utilization
- Free space reserve
- Minimum acceptable percentage of file system
blocks that should be free 90 - Only system administrator can allocate blocks
after that - Important for the layout policies to be effective
- After this the file system throughput is cut in
half because of the inability to localize blocks
in a file
23Optimizing Storage Utilization
- Wasted space comparison
- Space wasted by 4096/1024 byte new file system is
same as 1024 byte Old File System - New file system uses less space for indexing
large files - Uses same amount of space for small files
- Free space reserve should also be counted as
wasted space
24File System Parameterization
- Optimum block allocation based on hardware
parameters - Speed of Processor
- Hardware support for mass storage transfers
- Characteristics of the mass storage devices
- Blocks are allocated on the same cylinder
- Block allocation depends on whether the processor
has an input/output channel or not
25File System Parameterization
Accessing which data is faster?
26File System Parameterization
Accessing which data is faster?
Depends whether processor has I/O channel or not
27File System Parameterization
- Rotationally Optimal Blocks
- Processors without I/O channels must field an
interrupt and then prepare for a new disk
transfer - Disk rotates during this time
- Place blocks such that disk rotation is taken
into account before the start of a new disk
transfer operation - Cylinder group summary information includes count
of blocks based on different rotational positions
8 positions - Super-block contains a vector of lists called as
- Rotational Layout Tables Used by system when
allocating new blocks
28File System Parameterization
29Layout Policies
- Layout policies divided into two distinct parts
- Global Policies
- Local Allocation Routines
- Two allocable resources
- Inodes
- Data Blocks
30Layout Policies
- Global Policies
- Uses file system wide summary information to make
decisions regarding the placement of new inodes
and data blocks - Tries to localize data that is concurrently
accessed while spreads out unrelated data - Inodes
- Places all inodes of files in a directory in the
same cylinder group - A new directory is placed in a cylinder group
that has a greater than average number of free
inodes and the smallest number of directories
already in it ensures that files are
distributed throughout the disk
31Layout Policies
- Global Policies
- Data Blocks
- Tries to place all data blocks for a file in the
same cylinder group - None of the cylinder groups should ever become
completely full - Heuristic Solution redirect block allocation to
a different cylinder group when a file exceeds 48
kb and at every MB thereafter - Ensures that cost of one long seek per MB is
small - New cylinder groups are chosen from those
cylinder groups that have a greater than average
number of free blocks left - Finally it calls Local Allocation Routines for
block allocation
32Layout Policies
- Local Allocation Routines
- Allocates a free block as requested by the Global
layout policies - Uses a four level allocation
- First Level use the next free block that is
rotationally closest to the requested block on
the same cylinder
33Layout Policies
- Local Allocation Routines
- Second Level if there are no free blocks on the
same cylinder, a free block in the same cylinder
group is selected
Cylinder Group
34Layout Policies
- Local Allocation Routines
- Third Level if the cylinder group is full, use
the quadratic hash function to hash the cylinder
group number to find another cylinder group to
look for a free block - Fourth Level if the hash fails, use an
exhaustive search on all cylinder groups - Quadratic Hash
- is used because of its speed in finding unused
slots in nearly full hash tables - File systems parameterized to maintain 10 free
space rarely use this
35Performance
36Performance
- List Directory command performance
- For large directories containing many
directories, disk access for inodes is cut by a
factor of two - For large directories containing only files, disk
access for inodes is cut by a factor of eight - Both reads and writes are faster in new file
system - Because larger block sizes are used
- The overhead of allocating is more but cost per
byte allocation is same - Reading rate is always at least as fast as
writing rate - Writes are slower for 4096 byte block as compared
to 8096 byte block - In old file system writing was 50 faster than
reading
37New File System - Limitations
- Limited by memory to memory copy operations
required to move data from disk buffers in the
systems address space to data buffers in the
users address space - Buffer alignment of both address space
- One block is allocated to a file at a time
- Pre-allocate several blocks at once and releasing
unused ones on file closing
38Functional Enhancements
- Long File Name
- File Locking
- Symbolic Links
- Rename
- Quotas
39Long File Name
- Maximum length of file name is 255 characters
- Directories are allocated 512 byte units called
chunks - Chunks are broken into Directory Entries
- Contains information necessary to map the name of
file with inode - First three fields are fixed length inode
number, size of entry and length of file name
40File Locking
- Hard Lock always enforced when a program tries
to access a file - Advisory shared or exclusive locks requested by
the programs - System administrator privilege can override locks
- No deadlock detection is attempted
41Symbolic Links
- A symbolic link is implemented as a file that
contains a pathname - Pathname can be relative or absolute
- On encountering a symbolic link while
interpreting a component of a pathname, the
contents of the symbolic link is prepended to the
rest of the pathname
42Rename
- Old file system required three system calls for
renaming - Target file could be left with temporary name due
to crash - New rename system call added that guarantees the
existence of the target name - Renaming works both on directory and files
43Quotas
- Old file system any single user can allocate
all the available space in the file system - Quota restricts the amount of file system
resources that a user can obtain - Sets limits to both inodes and number of disk
blocks - Hard and soft limits
44Key Take-Away points
- Substantially higher throughput rates large
block size - Flexible allocation policies
- Better locality of reference
- Less wastage
- Adapted to wide range of peripheral and processor
characteristics
45References
- Presentation on A Fast File System by
- Zhifei Wang www.cs.pdx.edu/walpole/class/cs533/
spring2006/slides/191.ppt - pdc-amd01.poly.edu/wein/cs6243/ppts/fastfile.ppt
- Sean Mondesire and Subramanian Kasi
www.cs.ucf.edu/courses/cop5611/spring05/item/FFS.p
pt - www.scs.ryerson.ca/aabhari/File_System.ppt
- http//flylib.com/books/en/3.224.1.79/1/
- http//osr507doc.sco.com/en/HANDBOOK/graphics/hard
disk.gif