Title: File System Implementation
1File System Implementation
2Objectives
- To describe the details of implementing local
file systems and directory structures - To describe the implementation of remote file
systems - To discuss block allocation and free-block
algorithms and trade-offs
3File-System Structure
- A file system poses two distinct design problems
- Defining how the file system should look to the
user - Creating algorithms and data structures to map
the logical file system onto the physical device - File system resides on secondary storage (disks)
- File system organized into layers.
- Each level uses the lower level to create new
features for use by higher levels - File control block storage structure consisting
of information about a file
4A Typical File Control Block
5File-System Implementation
- Several on-disk and in-memory structures are used
to implement a file system - On-disk structures
- A boot control block
- A partition control block (superblock)
- A directory structure
- file control blocks
- In-memory structures
- An in-memory partition table
- An in-memory directory structure
- The system-wide open-file table
- The per-process open-file table
6Creating a File
- To create a new file the application program
calls the logical file system (which knows the
format of the directory structures) - Allocates a new FCB
- Reads the appropriate directory into memory
- Updates the directory with new file name and FCB
- Writes back to disk
- Some operating systems (UNIX) treat a directory
exactly as a file, other operating systems
(Windows), implement separate system calls for
files and directories and treat directories
separate from files.
7Opening a File
- Before a file can be used for I/O operations it
must first be opened - Open call passes the file name to the file system
- The directory structure (usually cached) is
searched for the given file name - Once the file is found, the FCB is copied into
the system-wide open-file table in memory - An entry is made in the per-process open-file
table, with a pointer to the system-wide
open-file table - The open call returns a pointer to the
appropriate entry in the per-process open-file
table, all file operations are performed via this
pointer (file descriptor in Unix, file handle in
Windows)
8In-Memory File System Structures
(a) refers to opening a file. (b) refers to
reading a file.
9Closing a File
- After all I/O operations are complete a file
should be closed - The per-process table entry is removed and the
system-wide entrys open count is decremented - When all users that have opened the file close
it, the updated file information is copied back
to the disk-based directory structure and the
system-wide open-file table entry is removed - Some systems Use a caching a scheme. All
information about an open file, except for its
actual data blocks, is in memory
10Disk Partition and Mounting
- The layout of a disk can have many variations,
depending on the operating system - A disk can be divided into multiple partitions,
or a partition can span multiple disks - Raw containing no file system
- Cooked containing a file system
- Boot information can be stored in a separate
partition - The root partition which contains the
operating-system kernel is mounted at boot time
(other partitions can be mounted later) - The operating system notes in its mount table
that a file system is mounted and the type of
file system - Windows mount each partition in a separate drive
letter - UNIX, file systems can be mounted at any
directory
11Directory Implementation
- Linear List
- Uses a linear list of file names with pointers to
data blocks, requires a linear search to find a
particular entry - Simple to program but time-consuming to execute
- Hash Table
- Uses a linear list to stores directory entries
but uses hashing to find the entry - Hashing can greatly decrease the directory search
time - Handle collisions situations where two file
names hash to the same location - Major difficulties with a hash table are its
fixed size and dependence on the hash function
12Allocation Methods
- An allocation method refers to how disk blocks
are allocated for files - Three major methods of allocating disk space are
in wide use - Contiguous allocation
- Linked allocation
- Indexed allocation
- Each method has its advantages and disadvantages
- Some systems support all three but more commonly
a system will use one particular method
13Contiguous-Allocation
- Requires each file to occupy a set of contiguous
blocks on the disk - Disk addresses define a linear ordering on the
disk - Simple only starting location (block ) and
length (number of blocks) are required - For a file n blocks long and starts at location
b, then it occupies blocks b, b1, b2, , bn-1 - The directory entry for each block represents
indicates the starting address of each block and
the length allocated for this file - Both sequential and direct access is supported
14Contiguous Allocation of Disk Space
15Contiguous Allocation (Cont.)
- Contiguous allocation has some problems
- Dynamic storage-allocation
- How to satisfy a request of size n from a list of
free blocks - External fragmentation
- Free space is broken into chunks and the largest
chunk is insufficient for a request - Determining how much space is needed for a file
- Allocate too little and the file may not be
extended - Allocate too much and space is wasted
- File cannot grow
16Extent Based Systems
- To minimize the drawbacks of contiguous file
allocation some file systems (I.e. Veritas File
System) use a modified scheme - A contiguous chunk of space is allocated
initially and when the amount is not large
enough, another chunk of contiguous space
(extent) is added to the initial allocation - Extent-based file systems allocate disk blocks in
extents - Internal fragmentation can still be a problem if
the extents are too large - External fragmentation can be a problem as
extents of various sizes are allocated and
de-allocated
17Linked Allocation
- Solves all the problems of contiguous allocation
- Each file is a linked list of disk blocks blocks
may be scattered anywhere on the disk - The directory contains a pointer to the first and
last blocks of a file
18Linked Allocation
19Linked Allocation (Cont.)
- No external fragmentation
- Any free block on the free-space list can be used
to satisfy a request - A file can grow as long as free blocks are
available, never need to compact disk space - Linked allocation does have disadvantages
- Only effective for sequential-access files
- Space required for the list pointers. Use
clusters to improve disk usage and access time. - Reliability
- The File Allocation Table (FAT) is a variation to
the linked allocation method used to support
direct access
20File-Allocation Table
21Indexed Allocation
- Solves the external-fragmentation and
size-declaration problems of contiguous
allocation - Supports direct access by bringing all the
pointers together into the index block - Each file has its own index block, which is an
array of disk-block addresses
22Example of Indexed Allocation
23Indexed Allocation (Cont)
- Indexed allocation does suffer from wasted space
- Every file must have an index block. So the block
needs to be as small as possible. - A File may require more than one index blocks.
Why? - Linked scheme
- Multilevel scheme
- Combined scheme
24Linked Index Scheme
- An index block is normally one disk block
- Can be read and written directly by itself
- To allow for large files, link together several
index blocks (no limit on size)
25Multilevel Index
- Use index of index blocks
- Use a first-level index block to point to a set
of second-level index blocks, which in turn point
to the file blocks - With 4KB blocks and index size of 4 bytes, what
is the maximum file size using 2-level index? - Could be extended to a third or fourth level,
depending on the maximum file size
26Multi-level Index mapping
27Combined Scheme UNIX (4K bytes per block)
- keep the first n pointers of the index block in
the files inode - Indexed-allocation suffers from some of the same
performance problems as does linked allocation - The index blocks can be cached in memory, but the
data blocks may be spread all over a volume
The Unix inode
28Free-Space Management
- Need to reuse the space from deleted files for
new files - To keep track of free disk space, the system
maintains a free-space list - Stores all free blocks those not allocated to a
file or directory - To create a file the free-space list is searched
and that space is allocated to the new file, this
space is then removed form the list - When a file is deleted its disk space is added to
the free space list
29Bit Vector
- Frequently, the free-space list is implemented as
a bit-map or bit vector - Each block is represented by 1 bit
- If the block is free the bit is 1 if the block
is allocated the bit is 0
30Linked List
- Link together all the free disk blocks
- The first block contains a pointer to the next
free disk block, - Grouping
- Stores the addresses of n free blocks in the
first free block - Large numbers of free blocks can be found quickly
- Counting
- Stores the address of the first free block and
the number n of free contiguous blocks - The overall list will be shorter
31Linked Free Space List on Disk
32Efficiency and Performance
- Efficiency dependent on
- disk allocation and directory algorithms, e.g.
pointer size. - types of data kept in files directory entry
- Performance
- disk buffer cache separate section of main
memory for frequently used blocks - free-behind and read-ahead techniques to
optimize sequential access - improve PC performance by dedicating section of
memory as virtual disk, or RAM disk
(memory-mapped IO)
33Page Cache
- A page cache caches pages rather than disk blocks
using virtual memory techniques - Memory-mapped I/O uses a page cache
- Routine I/O through the file system uses the
buffer (disk) cache
34Unified Buffer Cache
- A unified buffer cache uses the same page cache
to cache both memory-mapped pages and ordinary
file system I/O
35Recovery
- Care must be taken to ensure that system failure
does not result in loss of data or in data
inconsistency - Consistency checking
- Compares data in directory structure with data
blocks on disk, and tries to fix inconsistencies - The allocation and free-space-management
algorithms dictate what types of problems the
checker can find - Backup and Restore
- Use system programs to back up data from disk to
another storage device (floppy disk, magnetic
tape). - Recover lost file or disk by restoring data from
backup
36Log Structured File Systems
- Log structured (or journaling) file systems
record each update to the file system as a
transaction. - All transactions are written to a log. A
transaction is considered committed once it is
written to the log. However, the file system may
not yet be updated. - The transactions in the log are asynchronously
written to the file system. When the file system
is modified, the transaction is removed from the
log. - If the file system crashes, all remaining
transactions in the log must still be performed.