Title: File System Part 2
1File System (Part 2)
UNIVERSITY of WISCONSIN-MADISONComputer Sciences
Department
CS 537Introduction to Operating Systems
Andrea C. Arpaci-DusseauRemzi H.
Arpaci-Dusseau Haryadi S. Gunawi
- Directories and Naming
- From pathname to inode
- Tree-structure Directory hierarchy
- Block and Inode allocation
- Mechanism (bitmaps) and policy
- More optimizations
2Layers
- Human
- Jump to slide 20 of /tmp/slides.ppt ? Random
access - Say /tmp/ is mounted on /dev/hda4
- Powerpoint application
- Convert slide 20 to byte offset (e.g. 20000-th
byte) - System call
- read(/tmp/slides.ppt, byte offset 20000)
- File System
- Get the file information of /tmp/slides.ppt ?
inode 76 - Convert byte offset into block offset in a file
(e.g. block offset 20) - Get the block number at the block offset 20
(e.g. block number 6543) - To block layer read logical block number 6543
(logical wrt this partition /dev/hda4) - Block layer
- Converts LBN 6543 of /dev/hda4 to disk sector
- Block layer to device driver read sector
Today
3Abstraction File
- User view
- Named collection of bytes
- Untyped or typed
- Examples text, source, object, executables,
application-specific - Permanently and conveniently available
- Operating system view
- Map bytes as collection of blocks on physical
non-volatile storage device - Magnetic disks, tapes, NVRAM, battery-backed RAM
- Persistent across reboots and power failures
- File Meta-data (or Inode) Additional system
information associated with each file - Name of file
- Type of file (e.g. directory, regular file,
device file, symbolic link, etc.) - Pointer to data blocks on disk
- File size
- Times Creation, access, modification
- Owner and group id
- Protection bits (read or write)
- Inode is stored on disk
- Conceptually meta-data can be stored as array on
disk
4Abstraction Directories
- Organization technique Map file name to blocks
of file data on disk - Actually, map file name to file inode (which
enables one to find data on disk) - /tmp/slides.ppt ? inode 76
- Old (bad) approaches
- Single-level directory
- Each file has unique name
- Special part of disk holds directory listing
- Two-level directory
- Directory for each user
- Specify file with user name and file name
5Directories Tree-Structured
- Directory in Linux file system
- Directory is stored and treated like a file
- Special bit set in meta-data for directories
- User programs can read directories
- Only system programs can write directories
- Director listing is stored in the files data
blocks in the form of ltinode, namegt - name can be of arbitrary length (there is a max
limit) - Array of inodes is stored in a fixed location on
the disk - Special directories
- Root Fixed index for meta-data (e.g., 2)
- This directory .
- Parent directory ..
6Absolute path to inode
inode
data blocks
Inode Type DIR Ino 2 direct0 direct1
2 . 2 .. 50 a 51 tmp 52 boot
Inode Type DIR Ino 50 direct0 direct1
- Example read(/a/b/c) // read some bytes from c
- Read inode 2, look for a find lta, 50gt
- Read inode 50, look for b find ltb, 900gt
- Read inode 900, look for c find ltc, 2000gt
- Read inode 2000, and get the data block
- Total 8 I/Os (roughly)
- (if inodes are not in memory initially, and all
the inodes needed in this example are stored in
different blocks)
7Working Directory and Tilde
- Ideas
- Cumbersome to write full path
- Bad for OS if you always give the full path
- Example current working directory lecture
open(/mylife/college/courses/cs/537/lecture/14/not
es) - OS must traverse the directory hierarchy every
time ? lots of I/Os - If use cwd, the parent paths (e.g.
/mylife/college/courses/cs/537) do not have to be
in memory - Store cwd in PCB
- PCB stores the inode number of the cwd
- read(notes) ? generally speaking, only 6 I/Os
- Read lectures inode, lectures dir data block
(to get inode for folder 14) - Read 14s inode, 14s dir data block (to get
inode for notes) - Read notes inode, and notes data block
- Tilde
- Not a feature of the file system
- (chdir(mjordan) ? fail!)
- Look at entry mjordan in /etc/password file to
get the absolute path name
8Acyclic-Graph Directories
- More general than tree structure
- Add connections across the tree (no cycles)
- Create links from one file (or directory) to
another - Hard link ln a b (a must exist already)
- Idea Can use name a or b to get to same file
data - Implementation Multiple directory entries point
to same inode - What happens when you remove a? Does b still
exist? - Yes
- How is this feature implemented? A counter in the
inode - Unix Does not create hard links to directories.
Why? - Theres only one entry for ..
- Please search for file A will never end
- Due to cyclic-graph
9Acyclic-Graph Directories
- Symbolic (soft) link
- Some kind of shortcut
- ln -s /a/very/long/directory/path/
/shortpath - shortpath
- is a special file (designated by bit in
meta-data) - Content of shortpaths data block is
/a/very/long/director/path - Optimization put the pathname in the array that
is supposedly used for pointing to data blocks
and indirect blocks - Only if the pathname is short
- To access the pathname, no need to read a data
block - ls l /shortpath
- shortpath ? /a/very/long/directory/path/
10Bitmaps (space management)
- How do know which blocks are free/allocated?
- Old days
- Free list (like in Project 3)
- Poor freelist organization
- Consecutive file blocks not close together
- Pay seek cost between even sequential disk
transfers - Today bitmaps
- 1 bit for each block
- Very small space overhead 1 bit / 4 KB (0.003)
- 400 GB file system 100 Mbits 12.5 MB
- Automatically merge adjacent free blocks
- Bitmap vs. Free list
- Bitmap is useful for space management for
fixed-size chunks - The smallest allocation unit is large (e.g. a 4KB
block) ? small space overhead - Number of requested units is small ? scan bitmap
is fast - Free list is useful for variable-size chunks
- The smallest allocation unit is small (e.g. in
Project 3, the unit is 4 bytes) ? high space
overhead - Number of requested units can be large (e.g.
Alloc(1000 bytes)) ? scan bitmap is slow
11Allocation Policy (1)
- Problems
- create file.txt in /tmp, which inode should
represent file.txt? - give me 10 new data blocks, which data blocks
to give? - A good policy needs to match with the common
workload - Workloads influence design of file system
- File characteristics (measurements of UNIX and
NT) - Most files are small (about 8KB)
- Most of the disk is allocated to large files
- (90 of data is in 10 of files)
- Access patterns
- Sequential Data in file is read/written in order
- Most common access pattern
- Random (direct) Access block without referencing
predecessors - Difficult to optimize
- Access files in same directory together
- Spatial locality
- Access meta-data when access file data
- Need meta-data to find data
12Allocation Policy (2)
- Old days Unorganized free list
- Initially files have contiguous data blocks
- But FS is long-lived entities
- No locality in allocation to disk
- At the end, data are scattered
- Inodes far from data blocks
- Pay long seek for every data transfer
- I-nodes of files in directory not close together
- Pay seek for every inode (e.g., ls -l)
13Allocation Policy (3)
- Implications locality-driven policy
- Large files should be allocated sequentially
- Files in same directory should be allocated near
each other - Data should be allocated near its meta-data
- Spread unrelated data far apart
- Leaves room for related things to be placed
together - Keep freespace on disk
- Always find free block nearby
- 90 rule of thumb
14BSD/Linux Solution Cylinder Groups
- Divide disk into cylinder groups
- Set of adjacent cylinders
- Little seek time between cylinders in same group
- Each cylinder groups contains
- Superblock (contains information about where
- Vary offset within each cylinder group for
reliability - Otherwise all superblock copies will be on the
same platter - Inodes
- Fixed number per cylinder group
- Bitmap of free blocks
- Usage summary for high-level allocation policy
- Data blocks
- Cylinder groups vs. 1 cylinder group
- Compare this approach with the default approach
where array of inodes is stored in the beginning
of the disk - Lots of seeks! Because you need to read inode and
its data (in both cases reading/writing
directories/regular files) - Inode is in the beginning part of the disk and
data could be in any part of the disk
15Solution to Achieving Locality
- Maintain locality of each file
- Allocate runs of blocks within a cylinder group
- Maintain locality of files and inodes in a
directory - Problem Create a file file.txt inside
directory dir, where should file.txt be
located? - Keep files in a directory in same cylinder group
- Make room for locality within a directory
- Spread out directories among the cylinders groups
- A new directory is placed in a cylinder group
that has a greater than average of free inodes,
and the smallest of directories - The files in the directory will be placed in the
same cylinder group - Goal inode clustering policies to succeed most
of the time - Switch to a different cylinder group for large
files - After 48KB and every 1MB thereafter
- Prevent one file from filling a cylinder group
16Layout Global vs. Local
- Decompose allocation into two steps
- Global Heuristics for allocate filesdirectories
to cylinder groups - Pick optimal next block for allocation
- Example
- A file is in cg10, and its first data block is
in block 1001 - A user appends the file such that a second data
block must be added - Heuristics put 2nd data block in cg10, and
block 1002 - Local Handles request for specific block
- If free, use it, else
- Find a block (or some blocks) within cylinder
group (e.g. cg10) - Rehash on cylinder group to choose another group
- Exhaustive search
17More optimizations
- Preallocation
- Problem append(f1), append(f2), append(f1),
append(f2) - Preallocates disk data blocks before they are
actually used - E.g. block preallocation 8
- If the file is closed, preallocation is released
- Why work?
- Common workloads when creating a file, it tends
to grow within the near future as long as file is
not closed - Readahead
- disk_read(block 1001)
- 8 block readahead reads 1001 - 1008
- Why useful? Common workload sequential access
18Misc (Roles of FS)
- Standard library
- Provide file abstraction
- Simplify access to disk (a bunch of bits)
- Resource coordination
- Provide protection across users
- Provide fair and efficient performance
- User processes do not have direct access to
devices - Could crash entire system
- Could read/write data without appropriate
permissions - Could hog device unfairly
- FS exports higher-level functions
- File system Provides file and directory
abstractions - File system operations mkdir, create, read, write